Skip to main content
Figure 6 | BMC Bioinformatics

Figure 6

From: Comparing sequences without using alignments: application to HIV/SIV subtyping

Figure 6

The «local decoding of order N » computing strategy. Figure 6a (top): Four (N = 5)-related sites (containing the letter T) are taken from three input nucleotide sequences seq1, seq2 and seq3. Each of the four boxed sectors (2N - 1 = 9 letters in length) has T at its center (in bold face type) and is identified by the sequence where it is situated and the position of T in this sequence (that is seq1,11, seq2,5, seq3,5 and seq3,12, see Figure 6a bottom). Figure 6a (bottom): each 9-letters-long segment (identified by the corresponding site containing a T in bold face type) is displayed with the set of corresponding overlapping (step 1) words of length N = 5 underneath the corresponding site (boxed). The four sites are 5-related; seq1,11, seq2,5 and seq3,12, are directly 5-related by TGGAC (in bold face type) at the position 1; seq1,11 and seq3,12 are also directly 5-related by CTGGA at the position 2; seq3,5 is directly 5-related with only seq2,5 by CACTT at the position 5, so that it is connected by seq2,5 with the other two sites. Figure 6b: the symbols that identify each class containing at least two sites, are shown together with the segments covered by the overlapping 5-words that lie over the letter (boxed). Figure 6c: the re-written sequences generated by the program. The identifiers corresponding to classes containing only one site are only represented by their corresponding letter in the input sequence; in fact, they cannot contribute to calculating the similarities between pairwise compared re-written sequences. Figure 6d: the double-entry table for constructing a pairwise distance matrix between the three sequences (re-written in figure 6c). Each class identifier with at least two sites is indicated in the corresponding row. For each row and for each of the three sequences that label the three columns, the table gives the number of sites of this N-class that appear in the sequence. Figure 6e: similarity matrix and the corresponding normalized dissimilarity matrix (see text) for the three sequences.

Back to article page