From: geneRFinder: gene finding in distinct metagenomic data complexities
Dataset name | Genomes | Sequences | CDS | Description |
---|---|---|---|---|
Training1 | 20 | 108,004 | 54,002 | First training set |
Validation | 5 | 19,337 | 14,016 | Validation set used to setup parameters of model built |
Training2 | 129 | 712,886 | 356,443 | Second training set used to build final model |
Test1 | 12 | 54,980 | 31,507 | First test set used to evaluate geneRFinder |
Test2low | 40 | 255,589 | 41,068 | Data extracted from low complexity metagenomic (CAMI) |
Test2medium | 132 | 347,642 | 57,894 | Data extracted from medium complexity metagenomic (CAMI) |
Test2high1 | 160a | 200,000 | 34,640 | Data extracted from high complexity metagenomic (CAMI) (sample 01) |
Test2high2 | 156a | 200,000 | 34,445 | Data extracted from high complexity metagenomic (CAMI) (sample 02) |
Test2high3 | 157a | 200,000 | 34,486 | Data extracted from high complexity metagenomic (CAMI)(sample 03) |