Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science, and its application on natural product (NP) occurrence database processing

Fig. 2

Assessment of the effects of PhyloSophos' core concepts on scientific name mapping. A. Partial mapping coverage could be improved with multiple database usage. (Whole): all canonical scientific names uniquely appear in either CoL, EoL or GBIF. CoL: canonical scientific names uniquely appear in CoL. EoL: canonical scientific names uniquely appear in EoL. GBIF: canonical scientific names uniquely appear in GBIF. Dark blue: Theoretical maximum mapping coverage achievable with single taxonomic database usage. Gold: Mapping coverage achieved with multiple database usage. B. Phylogenetic domain matching accuracy of scientific names with homonymic generic epithets. Dark blue: random choice (null hypothesis). Gold: PhyloSophos mapping result. C. Identification of name inputs with strain-like elements (n = 2,988). Fractions of name inputs which assigned mapping status codes of either 0–5 (exact match code) or 40 (strain name code) were calculated per each taxonomic reference. Dark blue: exact match. Gold: nearest match. Grey: strain-like element identified. D. Reconstruction accuracy of name inputs with Latin declension (n = 353). Fractions of name inputs which assigned mapping status codes 30/31 were calculated per each taxonomic reference. Dark blue: Fraction of name inputs mapped with edit distance (Damerau-Levenshtein) based correction. Gold: Fraction of name inputs mapped Latin declension correction

Back to article page