Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life

Fig. 4

NBC++ species-level accuracy is evaluated on simulated reads from different species. The x-axis represents each year, where the training database is only the genomes added/updated that year. We show that the accuracy on reads from known species up-to-that-year has more than 80% accuracy and has a drop as more closely related species are added (yellow curve). The accuracy on all testing species (blue curve) increases as a function of the percentage of reads from to-year known species (green curve). Therefore, the accuracy is only as good as the knowledge in our database. If for example, we only have knowledge of 3 species (which was the case in 1999), the accuracy is poor since it is testing on the thousands of species known today (see the Additional file 1 – The number of taxonomic labels per year: Figure A)

Back to article page