Skip to main content

Table 2 PhyloSophos mapping status for canonical scientific names (n = 453,779) uniquely appearing in either Catalogue of Life (CoL, n = 33,639), Encyclopedia of Life (EoL, n = 3,399) and GBIF (n = 416,741) using NCBI taxonomy as a reference system

From: PhyloSophos: a high-throughput scientific name mapping algorithm augmented with explicit consideration of taxonomic science, and its application on natural product (NP) occurrence database processing

Scientific name mapping status

Total (n = 453,779)

CoL (n = 33,639)

EoL (n = 3,399)

GBIF (n = 416,741)

Exact mapping

5,434

(1.20%)

613

(1.82%)

905

(26.62%)

3,916

(0.94%)

(raw & simple correction)

1,397

66

21

1,310

(recursive mapping)

4,021

543

881

2,597

(multiple taxa linked)

16

4

3

9

Nearest mapping

437,364

(96.3%)

32,796

(97.5%)

2,452

(72.1%)

402,116

(96.5%)

(genus level)

268,594

21,063

1,658

245,873

(higher taxonomic level)

168,770

11,733

794

156,243

Exceptions

7

0

4

3

Unmapped

10,974

230

38

10,706

Theoretical maximum for single DB usage

269,994

(59.5%)

21,129

(62.8%)

1,682

(49.5%)

247,183

(59.3%)

PhyloSophos mapping

442,798

(97.6%)

33,409

(99.3%)

3,357

(98.8%)

406,032

(97.4%)