Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

Funk, Christopher; Baumgartner, William; Garcia, Benjamin; Roeder, Christophe; Bada, Michael; Cohen, K Bretonnel; Hunter, Lawrence E; Verspoor, Karin

doi:10.1186/1471-2105-15-59

BMC Bioinformatics

Table 2 System parameter description and values

From: Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

NCBO Annotator parameters
Parameter	Description and possible values
wholeWordOnly	Term recognition must match whole words - (YES, NO)
filterNumber	Specifies whether the entity recognition step should filter numbers - (YES, NO)
stopWords	List of stop words to exclude from matching - (PubMed - commonly found terms from PubMed (included as Additional file 1), NONE)
stopWordsCaseSensitive	Whether stop words are case sensitive - (YES, NO)
minTermSize	Specifies minimum length of terms to be returned - (ONE, THREE, FIVE)
withSynonyms	Whether to include synonyms in matching - (YES, NO)
MetaMap parameters
Parameter	Description and possible values
model	Determines which data model is used - (STRICT - lexical, manual, and syntactic filtering are applied, RELAXED - lexical and manual filtering are used)
gaps	Specifies how to handle gaps in terms when matching - (ALLOW, NONE)
wordOrder	Specifies how to handle word order when matching - (ORDER MATTERS, IGNORE)
acronymAbb	Determines which generated acronym or abbreviations are used - (NONE, DEFAULT, UNIQUE - restricts variants to only those with unique expansions)
derivationalVars	Specifies which type of derivational variants will be used - (NONE, ALL, ONLY ADJ NOUN)
scoreFilter	MetaMap reports a score from 0–1000 for every match, with 1000 being the highest, those matches with scores ≤ will be returned - (0, 600, 800, 1000)
minTermSize	Specifies minimum length of terms to be returned - (ONE, THREE, FIVE)
ConceptMapper parameters
Parameter	Description and possible values
searchStrategy	Specifies the dictionary lookup strategy - (CONTIGUOUS - longest match of contiguous tokens, SKIP ANY - returns longest match of not-necessarily contiguous tokens and next lookup begin in next span, SKIP ANY ALLOW OVERLAP - returns longest match of not-necessarily contiguous tokens in the span and next lookup begin after next token)
caseMatch	Specifies the case folding mode to use - (IGNORE - fold everything to lower case, INSENSITIVE - fold only tokens with initial caps to lowercase, SENSITIVE - no folding, FOLD DIGIT - fold only tokens with digits to lower case)
stemmer	Name of the stemmer to use before matching - (Porter - classic stemmer that removes common morphological and inflectional endings from Engish words, BioLemmatizer - domain specific lemmatization tool for the morphological analysis of biomedical literature presented in Liu et al. [48], NONE)
orderIndependentLookup	Specifies if ordering of tokens within a span can be ignored - (TRUE, FALSE)
findAllMatches	Specifies if all matches will be returned - (TRUE, FALSE - only the longest match will be returned)
stopWords	List of stop words to exclude from matching - (PubMed - commonly found terms from PubMed (included as Additional file 1), NONE)
synonyms	Specifies which synonyms will be included when creating the dictionary - (EXACT ONLY, ALL)

Parameters that were evaluated for each system along with a description and possible values are listed in all capital letters. For the most part, parameters are self-explanatory, but for more information see documentation for each system. CM [29], NCBO Annotator [44], MM [18].

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com