Skip to main content

Table 1 Features extracted.

From: Gene/protein name recognition based on support vector machine using dictionary as features

Feature

Value

word

all words in the training data

orthography

capital, symbol, etc. (see Table 2)

prefix

1, 2, or 3 gram of the starting letters of a word

suffix

1, 2, or 3 gram of the ending letters of a word

part of speech

Brill tagger

preceding class

-2, -1

gene/protein name dictionary

protein names collected from SWISS-PROT and TrEMBL