geneRFinder: gene finding in distinct metagenomic data complexities

BMC Bioinformatics

Table 3 Benchmark description

Dataset name	Genomes	Sequences	CDS	Description
Training1	20	108,004	54,002	First training set
Validation	5	19,337	14,016	Validation set used to setup parameters of model built
Training2	129	712,886	356,443	Second training set used to build final model
Test1	12	54,980	31,507	First test set used to evaluate geneRFinder
Test2low	40	255,589	41,068	Data extracted from low complexity metagenomic (CAMI)
Test2medium	132	347,642	57,894	Data extracted from medium complexity metagenomic (CAMI)
Test2high1	160^a	200,000	34,640	Data extracted from high complexity metagenomic (CAMI) (sample 01)
Test2high2	156^a	200,000	34,445	Data extracted from high complexity metagenomic (CAMI) (sample 02)
Test2high3	157^a	200,000	34,486	Data extracted from high complexity metagenomic (CAMI)(sample 03)

^aThe estimated number of genomes was obtained by taxonomic analysis performed by the Kaiju tool [39]

ISSN: 1471-2105