From: Gene/protein name recognition based on support vector machine using dictionary as features
Feature | Value |
---|---|
word | all words in the training data |
orthography | capital, symbol, etc. (see Table 2) |
prefix | 1, 2, or 3 gram of the starting letters of a word |
suffix | 1, 2, or 3 gram of the ending letters of a word |
part of speech | Brill tagger |
preceding class | -2, -1 |
gene/protein name dictionary | protein names collected from SWISS-PROT and TrEMBL |