Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction

Figure 2

Assembly of E. coli strain MG1655. Statistics from overlaps derived from S. typhi training reads were used to train a J48 Weka model. Overlaps from the MG1655 test data were classified based on this model and any overlaps predicted to be false were removed. The remaining overlaps were used in the assembly of MG1655. The N50 contig length of the final assembly as well as the percentage of the reference MG1655 genome matched by the contigs are plotted. Within parenthesis, the percent cutoff for strains to be analyzed with the comparative score is shown. For the 'One related genome' and 'Two related genomes' data, only one (ATCC8739 for the test set) and two (ATCC8739 and E24377A for the test set) related genomes, respectively, for the training and test sets were used.

Back to article page