Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer

Figure 1

Algorithm overview. Overview of the algorithm steps with reads of length 7, a minimal coverage of 2 and k-mers of length k=3. a) Representation of the sub-starter generation step. A set of reads is mapped to the starter s. First, reads are error-corrected according to a voting procedure (see lower right read for instance). Then, each sub-starter (s1 and s2) is computed from each perfect multiple read alignment. The Hamming distance between each sub-starter and s is required to be below a certain threshold. b) Representation of an extension. Three reads have prefix of length at least k mapping perfectly to the suffix of an extension s. All fragments of these reads longer than extension s are used for generating extension of s. As minimal coverage is 2, the last character of the first extending reads (T) is not stored for generating extension of s. The generated extension of s (ACT) is stored in a new node linked to extension s. Note that suffix of length k−1 of extension s (TC) is stored as prefix of extension of s (then called enriched extension). This avoids to omit overlapping k-mers between extensions such as TCA or CAC while mapping reads on extension of s.

Back to article page