Rules | |
---|---|
R1 |
|
IF the sum of the lengths (i.e. number of characters) of all tokens s i belonging to s is smaller than Lmin, THEN discard s. | |
R2 | ∃i | i = affix_in_sequence_tails(s) → discard(s) ∧ add(s') s' = {s1,...,si-1} |
IF an affix from the list of problem affixes is matched at the tail of s, THEN discard s AND add to the facts base a modified copy of s that does not include the tokens corresponding to the matched affix. | |
R3 | ∃i | i = affix_in_sequence_head(s) → discard(s) ∧ add(s') s' = {si+1,...,s n } |
IF an affix from the list of problem affixes is matched at the head of s, THEN discard s AND add to the facts base a modified copy of s that does not include the tokens corresponding to the matched affix. | |
R4 |
|
IF all tokens s i from s belong to the custom dictionary of English words, THEN discard s. | |
R5 | ∃(i, j) | (i, j) = affix_within_sequence(s) → discard(s) ∧ add(s') ∧ add(s'') s' = {s1,...,si-1}, s'' = {si+j,...,s n } |
IF an affix from the list of problem affixes is matched within s, THEN discard s AND add to the facts base the sub-lists of tokens s' and s'' that include all the tokens in s that occur before and after the matched affix respectively. | |
R6 | in_dictionary(s1) ∧ length(s1) ≥ 3 → discard(s) ∧ add(s') s' = {s2,...,sn} |
IF the first token of s belongs to the custom dictionary of English words AND its length (i.e. number of characters) is greater or equal to 3 THEN discard s AND add to the facts base the sub-list of tokens s', that includes all but the first token of s. | |
R7 | in_dictionary(s n ) ∧ length(s n ) ≥ 3 → discard(s) ∧ add(s') s' = {s1,...,sn-1} |
IF the last token of s belongs to the custom dictionary of English words AND its length (i.e. number of characters) is greater or equal to 3 THEN discard s AND add to the facts base the sub-list of tokens s', that includes all but the last token of s. | |
R8 | size(s) ≥ 2 → merge(s) |
IF s contains 2 or more tokens, THEN convert s into a singleton by concatenating all its tokens. |