Skip to main content
Figure 7 | BMC Bioinformatics

Figure 7

From: Viral quasispecies inference from 454 pyrosequencing

Figure 7

QuasQ quasispecies sequence reconstruction. Post-processed local error-corrected reads are stacked, to identify (a) all polymorphic sites (PS) having two or more alleles. (b) The genomic sequence is then ‘reduced’ to consist of only the polymorphic sites identified in (a). All mapped reads are similarly ‘reduced’ to their sequenced bases in PS. (c) These ‘reduced’ reads are then grouped into disjoint sets according to their starting PS positions, and in each set, the longest representative reads based on sequence identity would be identified (R4 collapsing into R5). (d) A read-graph method is then applied to connect the disjoint sets of representative merged reads, where each node consist of the read sequence of DNA bases, with directed edges to connect two nodes if the first node is a complex prefix of the second node. Solid arrows such as e(ii) represent possible directed edges, while non-probable edges such as e(iii) and e(iv) due to non-identical node sequences overlap are represented by dotted arrows. For e(i), though identical overlap occurs between nodes R1 + R2 and R6, it is not considered probable by QuasQ as there is no sequence read spanning the nodes and the immediate neighboring polymorphic site. A region that is centered at the right-most polymorphic site of the overlap, with coverage greater than 10th percentile of the total coverage of the reads is defined. The coverage of the region is important since if the coverage is low, the absence of any supporting read is not necessarily an indication that the merging of the two contiguous segments is false. For pairs of reads with identical overlap e(i), if the allele combination in the defined region is not the same as that of any other read in that region, the two nodes are not joined. In this figure, the only constructed haplotype is thus R4 + R5 + R7 + R8.

Back to article page