Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: HALC: High throughput algorithm for long read error correction

Fig. 2

Illustration of the HALC algorithm. Long reads r 1 to r 3 are aligned to contigs c 1 to c 3 with a relatively low identity requirement based on the similar repeat based alignment approach in (1), and a contig graph is constructed to validate the alignments and correct the long reads based on the long read support based validation approach and the adjacent alignment based validation approach in (2). (1) The long read region r 1(B) (region B of r 1; below follows) is error rich, so it is aligned either to its true genome region in the contigs c 1(B) or its similar repeat c 3(E) (shaded). The reads r 1(C), r 2(C) and r 3(C) do not have their true genome regions in the contigs and thus are aligned to their similar repeat c 3(G) (shaded). The aligned contig region c 1(A B) is split into c 1(A) and c 1(B), and the long read regions are split accordingly. (2) A contig graph is constructed, with vertices A, B, D, E and G representing the aligned contig regions connected by weighted edges. Edge (A,B) (edge between vertices A and B; below follows) is weighted 0, since the contig regions A and B are adjacent. (B,G) and (G,D) are weighted 0, since sufficient adjacent long read regions are aligned to contig regions B and G and G and D, respectively. As a result, a path of the minimum total edge weight to correct all the long reads is found containing vertices A, B, G and D. The reads r 1(C), r 2(C) and r 3(C) are corrected using their similar repeats and can be refined with the initial short reads

Back to article page