Skip to main content

Table 2 System parameter description and values

From: Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

NCBO Annotator parameters

Parameter

Description and possible values

wholeWordOnly

Term recognition must match whole words - (YES, NO)

filterNumber

Specifies whether the entity recognition step should filter numbers - (YES, NO)

stopWords

List of stop words to exclude from matching - (PubMed - commonly found terms from PubMed (included as Additional file 1), NONE)

stopWordsCaseSensitive

Whether stop words are case sensitive - (YES, NO)

minTermSize

Specifies minimum length of terms to be returned - (ONE, THREE, FIVE)

withSynonyms

Whether to include synonyms in matching - (YES, NO)

MetaMap parameters

Parameter

Description and possible values

model

Determines which data model is used - (STRICT - lexical, manual, and syntactic filtering are applied, RELAXED - lexical and manual filtering are used)

gaps

Specifies how to handle gaps in terms when matching - (ALLOW, NONE)

wordOrder

Specifies how to handle word order when matching - (ORDER MATTERS, IGNORE)

acronymAbb

Determines which generated acronym or abbreviations are used - (NONE, DEFAULT, UNIQUE - restricts variants to only those with unique expansions)

derivationalVars

Specifies which type of derivational variants will be used - (NONE, ALL, ONLY ADJ NOUN)

scoreFilter

MetaMap reports a score from 0–1000 for every match, with 1000 being the highest, those matches with scores ≤ will be returned - (0, 600, 800, 1000)

minTermSize

Specifies minimum length of terms to be returned - (ONE, THREE, FIVE)

ConceptMapper parameters

Parameter

Description and possible values

searchStrategy

Specifies the dictionary lookup strategy - (CONTIGUOUS - longest match of contiguous tokens, SKIP ANY - returns longest match of not-necessarily contiguous tokens and next lookup begin in next span, SKIP ANY ALLOW OVERLAP - returns longest match of not-necessarily contiguous tokens in the span and next lookup begin after next token)

caseMatch

Specifies the case folding mode to use - (IGNORE - fold everything to lower case, INSENSITIVE - fold only tokens with initial caps to lowercase, SENSITIVE - no folding, FOLD DIGIT - fold only tokens with digits to lower case)

stemmer

Name of the stemmer to use before matching - (Porter - classic stemmer that removes common morphological and inflectional endings from Engish words, BioLemmatizer - domain specific lemmatization tool for the morphological analysis of biomedical literature presented in Liu et al. [48], NONE)

orderIndependentLookup

Specifies if ordering of tokens within a span can be ignored - (TRUE, FALSE)

findAllMatches

Specifies if all matches will be returned - (TRUE, FALSE - only the longest match will be returned)

stopWords

List of stop words to exclude from matching - (PubMed - commonly found terms from PubMed (included as Additional file 1), NONE)

synonyms

Specifies which synonyms will be included when creating the dictionary - (EXACT ONLY, ALL)

  1. Parameters that were evaluated for each system along with a description and possible values are listed in all capital letters. For the most part, parameters are self-explanatory, but for more information see documentation for each system. CM [29], NCBO Annotator [44], MM [18].