Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Refined repetitive sequence searches utilizing a fast hash function and cross species information retrievals

Figure 1

Index file A and data file L for sequence s 1 : CAATTACGAGCTCTGCCTACAATGAT. The format for and are discussed in the text. To demonstrate how different regions map to different genes, the first 13 bases map to the gene with PID = 1234 and the last 13 bases map to the gene with PID = 5678. We add leading zeroes to each location so that all numbers in are four bytes and we record this as numbersize in each line in . Keys in this example are made from two bases of sequence so there are 42 = 16 lines in ranging from m(AA) = 5 through m(CC) = 20. Key number m(GT) = 11 and number m(GG) = 15 are not present in the sequence. For clarity, each offset in is repeated in the correct position above the line in and each PID is underlined. Two arrows map two different lines from into by pointing to two bubbles that show the content of two hash bins.

Back to article page