Skip to main content

Table 2 System parameter description and values

From: Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

NCBO Annotator parameters
Parameter Description and possible values
wholeWordOnly Term recognition must match whole words - (YES, NO)
filterNumber Specifies whether the entity recognition step should filter numbers - (YES, NO)
stopWords List of stop words to exclude from matching - (PubMed - commonly found terms from PubMed (included as Additional file 1), NONE)
stopWordsCaseSensitive Whether stop words are case sensitive - (YES, NO)
minTermSize Specifies minimum length of terms to be returned - (ONE, THREE, FIVE)
withSynonyms Whether to include synonyms in matching - (YES, NO)
MetaMap parameters
Parameter Description and possible values
model Determines which data model is used - (STRICT - lexical, manual, and syntactic filtering are applied, RELAXED - lexical and manual filtering are used)
gaps Specifies how to handle gaps in terms when matching - (ALLOW, NONE)
wordOrder Specifies how to handle word order when matching - (ORDER MATTERS, IGNORE)
acronymAbb Determines which generated acronym or abbreviations are used - (NONE, DEFAULT, UNIQUE - restricts variants to only those with unique expansions)
derivationalVars Specifies which type of derivational variants will be used - (NONE, ALL, ONLY ADJ NOUN)
scoreFilter MetaMap reports a score from 0–1000 for every match, with 1000 being the highest, those matches with scores ≤ will be returned - (0, 600, 800, 1000)
minTermSize Specifies minimum length of terms to be returned - (ONE, THREE, FIVE)
ConceptMapper parameters
Parameter Description and possible values
searchStrategy Specifies the dictionary lookup strategy - (CONTIGUOUS - longest match of contiguous tokens, SKIP ANY - returns longest match of not-necessarily contiguous tokens and next lookup begin in next span, SKIP ANY ALLOW OVERLAP - returns longest match of not-necessarily contiguous tokens in the span and next lookup begin after next token)
caseMatch Specifies the case folding mode to use - (IGNORE - fold everything to lower case, INSENSITIVE - fold only tokens with initial caps to lowercase, SENSITIVE - no folding, FOLD DIGIT - fold only tokens with digits to lower case)
stemmer Name of the stemmer to use before matching - (Porter - classic stemmer that removes common morphological and inflectional endings from Engish words, BioLemmatizer - domain specific lemmatization tool for the morphological analysis of biomedical literature presented in Liu et al. [48], NONE)
orderIndependentLookup Specifies if ordering of tokens within a span can be ignored - (TRUE, FALSE)
findAllMatches Specifies if all matches will be returned - (TRUE, FALSE - only the longest match will be returned)
stopWords List of stop words to exclude from matching - (PubMed - commonly found terms from PubMed (included as Additional file 1), NONE)
synonyms Specifies which synonyms will be included when creating the dictionary - (EXACT ONLY, ALL)
  1. Parameters that were evaluated for each system along with a description and possible values are listed in all capital letters. For the most part, parameters are self-explanatory, but for more information see documentation for each system. CM [29], NCBO Annotator [44], MM [18].
\