Skip to main content

Advertisement

Figure 1 | BMC Bioinformatics

Figure 1

From: Querying large read collections in main memory: a versatile data structure

Figure 1

Example of the read index data structure: the Gk -arrays. Example of the read index data structure: the Gk arrays. Example for a collection R = (aacaact, caattca, aacaagc) of q = 3 reads of length m = 7 and considering 3-mers (k = 3). The index is composed of three tables and uses a fourth one during construction (GkSA, GkIFA, GkCFA, and GkCFPS). The first table shows the starting indices of k-mers in the text made by the concatenation of all reads, C R , the SA built on C R , and the function g that renumbers P-positions of C R to make them consecutive. P-positions are {0,1,2,3,4,7,8,9,10,11,14,15,16,17,18}; all other positions, those starting positions where the k-mer overlaps two reads, are displayed with a gray background (lines j and SA[j]). Line SA refers to the usual Suffix Array of C R . The k-mer caa occurs 4 times in C R at positions 2, 7, 12 and 16. Among those, only 2, 7, and 16 are P-positions. The lexicographic rank of the P k -factor starting at position 16 is given by GkIFA[g(16)] = GkIFA 12 = 7, and the number of occurrences of the P k -factor caa is given by GkCFA 7, which equals 3. The positions of these occurrences are thus obtained by the set {g-1(GkSA[j]) | GkCFPS[7 - 1] ≤ j < GkCFPS 7} = {2,7,16}. See also Figure 2.

Back to article page