Skip to main content
Fig. 6 | BMC Bioinformatics

Fig. 6

From: PlasForest: a homology-based random forest classifier for plasmid detection in genomic datasets

Fig. 6

Datasets and application of a hold-out method for supervised learning. Schematic representation of the processes that allow to generate the datasets used to build PlasForest and to benchmark its performances. A 10,152 bacterial genomes from NCBI Refseq Genomes FTP server were randomly cut into contigs, and were distributed into the following datasets: the (balanced) training set contains 70% of the initial 10,152 genomes assemblies and it is used to train the random forest classifier; the testing set contains 30% of the genomes. B Other genome assemblies were drawn from more recent releases of NCBI Refseq Genomes FTP or from other sources to build the COMGENOME, CONTIG, and METAGENOME datasets. With the testing set, they are used to benchmark the performance of PlasForest compared to other plasmid identification methods

Back to article page