De novo Nanopore read quality improvement using deep learning

BMC Bioinformatics

Table 3 MiniScrub reduces downstream assembly errors

	MECAT Raw	MiniScrub + MECAT	Canu Raw	MiniScrub + Canu
% genome assembled	79.39%	99.86%	99.69%	99.71%
NGA50	242478	1053459	1055037	696460
LGA50	12	3	2	5
# of contigs	38	11	7	19
# mis-assembled contigs	28	5	2	2
# local mis-assemblies	209	4	5	3
# indels > 5 bp	1099	394	84	46
Runtime (hours)	2.5	9	80	9

MiniScrub significantly improves assembly, tested with MECAT [32], increasing genome coverage and NGA50 while limiting LGA50, mis-assemblies, mismatches, and indels. Canu’s assembly had slightly reduced errors and misassemblies when reads were preprocessed with MiniScrub, but the assembly was more fractured, likely due in part to resolving large misassemblies and indels. Notably, Canu assembly of raw reads took about 3.5 days, while the MiniScrub+Canu pipeline took about 9 hours, likely due to a reduction in the amount of error correction needed in the latter situation. Results were evaluated using QUAST [33] Best performance numbers are shown in bold

ISSN: 1471-2105