Figure 1From: Building a protein name dictionary from full text: a machine learning term extraction approachFiltering process and exclusion lists. Oval boxes on the left of the figure present the exclusions lists used as input to the filtering process. An exclusion list is connected to the step of the filtering process that uses it. Each step filters out terms that match the exclusion list in sequence based on the rules described in the boxes of the second column. Some steps perform matches by considering the entire term (e.g., the first step on the top), other steps use only specific words in a term. These lists have been built and are being maintained manually. The rectangular boxes on the right show the number of terms that have been excluded at each step, when processing the most frequent terms from JBC2000 (numbers in parentheses indicate counts of unique terms).Back to article page