Skip to main content
Fig. 5 | BMC Bioinformatics

Fig. 5

From: A performant bridge between fixed-size and variable-size seeding

Fig. 5

Analysis and effects of occurrence filtering. a) The four kinds of boxes visualize subset relations among the SMEMs computed by the FMD-index and Alg. 2a, where Alg. 2a uses (w, k)-minimzers. The light and dark shaded areas represent the sets of all SMEMs with l ≥ k and l ≥ w + k − 1, respectively. The top diagram shows the subset relations without occurrence filtering, while the bottom diagram displays the situation with occurrence filtering. For l ≥ w + k − 1 and without filters, Alg. 2a and the FMD-index deliver equal sets of SMEMs (the two innermost boxes in the top diagram). For l ≥ k, Alg. 2a misses some SMEMs discovered by the FMD-index (the dotted box within SMEMs l ≥ k). In the presence of occurrence filtering, all boxes shrink in size. Further, Alg. 2a computes false negatives (the blue box, case c) and false positives (the red box, case d). b) The diagram delivers a positive-negative classification of Alg. 2a using the seeds remaining after occurrence filtering with the FMD-Index as ground truth for (10, 19)-minimizers on the human genome. The four curves correspond to the sizes of the four respectively colored areas in a) for various error rates. The x-axis represents the error rate of reads as described in Fig. 3. Each dot shows the average value for 1000 CCS PacBio reads generated using Survivor [24]. c) There is a MEM covering the unique sequence CTCAGA on query and reference. Assuming, the occurrence threshold for minimizers is set to two, the four colored 1,3-mers are purged. In this case, the MEM cannot be discovered by Alg. 2a. Using the FMD-index, this MEM is discovered directly and not purged by the occurrence filter, since it occurs only once. d) explains the generation of SMEMs that are false positives. The purple SMEM cannot be discovered using the 3,4-mers, since no 3,4-mer is contained within the seed. However, the orange MEM can be discovered using the orange 3,4-mer. Now, Alg. 2a will not delete this MEM, since it is not enclosed by any other seed

Back to article page