Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science, and its application on natural product (NP) occurrence database processing

Fig. 3

Mapping statistics of scientific names using PhyloSophos. A-B: Raw statistics of scientific name mapping. Dark blue: Exact match, Light blue: Match with simple correction, Dark green: Recursive mapping using other taxonomic DB, Light green: Nearest mapping using other taxonomic DB, Dark red: Mapping with Damerau-Levenshtein correction, Light red: Nearest mapping for strain-level scientific names, Gold: Mapping with Latin declension correction, Purple: Partial mapping, Grey: Unmapped (also see Table 5). A: Results of mapping scientific names found within four NP occurrence databases combined (n = 59,570) using different taxonomic references. B: Results of mapping scientific names found within four NP occurrence databases individually, using NCBI taxonomy as a target taxonomic reference. C. Comparative analysis between PhyloSophos mapping results and the taxonomic mapping provided by the original NP occurrence database. X-axis represents the Taxonomic reference-NP database pairs utilized for the analysis, while the Y-axis indicates the percentage of name inputs correctly mapped to a single corresponding taxon ID. Dark blue: Taxonomic mapping provided in DB metadata. Gold: PhyloSophos mapping. D-F. Diagrammatic representation of mapping status. D-F. Diagrammatic representation of mapping status. Original: counts of taxon IDs provided in the original metadata. PhyloSophos: counts of name inputs which uniquely & precisely assigned with taxon IDs by PhyloSophos. D: Mapping of LOTUS species entries to NCBI taxonomy. E: Mapping of LOTUS species entries to GBIF. F: Mapping of NPASS species entries to NCBI taxonomy

Back to article page