Features | Definitions/Remarks | Values | Examples |
---|---|---|---|
Distance_KP1 | The distance defined by number of words appearing between keyword K and protein name P1 in the sentence. | Integer value | In sentence LLL.d33.s1 of LLL corpus, “GerE binds to a site on one of these promoters, cotX, that overlaps its -35 region,” keyword is ‘bind’ and Distance_KP1 is 0. |
Distance_KP2 | The distance between keyword K and P2 in the sentence. | Integer value | In sentence LLL.d33.s1 above, Distance_KP2 is 8. |
Distance_P1P2 | The distance between two protein names in the sentence. | Integer value | In sentence LLL.d33.s1 above, Distance_P1P2 is 9. |
Position_P1 | The value adding word distance between protein name P1 and beginning of the sentence to one. | Integer value | In sentence LLL.d33.s1 above, Position_P1 is 1. |
Position_P2 | The value adding word distance between protein name P2 and beginning of the sentence to one. | Integer value | In sentence LLL.d33.s1 above, Position_P2 is 11. |
Position of keyword | The word order of keyword K and protein pair P1 and P2. ‘Infix’: order of words is [ P1-K- P2]), ‘prefix’: order of words is [K- P1- P2]), or ‘postfix’: order of words is [ P1- P2-K]). | ‘Infix’, ‘prefix’, or ‘postfix’ | In sentence LLL.d33.s1 above, feature value is ‘infix’. |
Comma between keyword and protein pair | Because topic of the sentence frequently changes before and after commas, we utilize the information if there is a comma between protein pair and keyword. ‘ x 1 x 2’: x 1 is ‘t’ if a comma exists between A and B, and x 2 is ‘t’ if a comma exists between B and C, otherwise x 1 or x 2 is ‘f’, where A, B, and C represent a keyword and two protein names in order of their appearance in the sentence. | ‘tt’, ‘ff’, ‘tf’, or ‘ft’ | In sentence LLL.d33.s1 above, feature value is ‘ft’. |
Multiple occurrences of keywords | Check whether there is more than one keyword in a sentence. | ‘true’ or ‘false’ | In sentence LLL.d33.s1 above, feature value is ‘false’. |
Parallel expression of a protein pair | Check whether the two protein names of the protein pair are contiguous in the word order of the sentence containing them (they are also considered contiguous even if ‘-’, ‘/’, ‘and’, ‘or’, ‘(’ appears between them). If two protein names are described in parallel in a sentence, an interaction between them is unlikely. | ‘true’ or ‘false’ | In sentence LLL.d30.s0, “In vitro, both sigma(A) and sigma(X) holoenzymes recognize promoter elements within the sigX-ypuN control region,” feature values of protein pairs (sigma(A), sigma(X)) and (sigX, ypuN) are ‘true’ (only PPIs are in the remaining protein pairs). |