Skip to main content
Figure 3 | BMC Bioinformatics

Figure 3

From: TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

Figure 3

Simplified example showing the calculation of shift values for 5-mers at the 3'-end. The 5-mers at the 3'-end of all sequences are extracted (A) and sorted by decreasing frequency (B). The first 5-mer in the list (highest frequency) is then aligned to the second 5-mer (C) to calculate the minimum number of shift operations to align the two 5-mers without gaps (D). The shift direction is based on the 5-mer with the higher frequency. Shifts to the left have negative values assigned, whereas shifts to the right have positive values assigned. If the number of shift operations is less than or equal to a given threshold (default: 2), then the two 5-mers are joined into one k-mer. In the next step, the third 5-mer is aligned with either the first 5-mer or the joined k-mer and the same operations are performed. These steps are repeated for the remaining 5-mers. The values of shift operations for the 3'-end are then adjusted (E) by the negative of the maximum number of shift operations (- max{- 1, 1}).

Back to article page