Skip to main content

Table 2 Detection sample obtained with TRF with different alignment weights, Sputnik with different mismatch penalty, and Mreps with different resolution, in the human X chromosome.

From: Detecting microsatellites within genomes: significant variation among algorithms

   start end divergence motif sequence
TRF
alignment scores
2,7,7   304646 304658 0 CTCTC CTCTCCTCTCCTC
   304696 304713 5.55 TCCTC TCCTCTTCTCTCCTCTCC
   305863 305872 0 CCTTC CCTTCCCTTC
2,5,7 c 304646 304713 18.3099 TCTCC CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGG TCCTCTTCTCTCCTCTCC
   305863 305872 0 TTCCC CCTTCCCTTC
2,5,5   304646 304713 18.0556 TCTCC CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGGTCCTCTTCTCTCCTCTCC
  e 305836 305872 17.9487 TTCCC CCCTCTCCACTTCCTTCTCTTCC A C CT CCTTCCCTTC
2,3,5 e 304643 304713 18.9189 CTCCT CTG CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGGTCCTCTTCTCTCCTCTCC
  n 305765 305800 25.641 CCA CCACACCACCTCTGACGCCCACCACAGCCCCCCACC
   305836 305872 17.9487 CCCTT CCCTCTCCACTTCCTTCTCTTCCACCTCCTTCCCTTC
Sputnik
mismatch penalty
-10   552928 552935 0 AG GAGAGAGA
   552939 552948 0 AG GAGAGAGAGA
   552954 552963 0 AAGAG AAGAGAAGAG
   552964 552975 0 AG AGAGAGAGAGAG
-6   552928 552935 0 AG GAGAGAGA
   552939 552948 0 AG GAGAGAGAGA
  c 552954 552975 9.09 AAGAG AAGAGAAGAGAGAGAGAGAGAG
-5 c 552928 552948 9.52 AG GAGAGAGAAAG GAGAGAGAGA
   552954 552975 9.09 AAGAG AAGAGAAGAGAGAGAGAGAGAG
Mreps
resolution
1   119591 119610 20 AAT ACAAAAAATAATAATTATAA
   119611 119628 5.56 AAAAAT ATAAATAAAAATAAAAAT
2 e 119591 119615 24 AAT ACAAAAAATAATAATTATAAATAAA
   119611 119628 5.56 AAAAAT ATAAATAAAAATAAAAAT
3 c 119591 119638 33.33 A ACAAAAAATAATAATTATAAATAAATAAAAATAAAAATTCAACTGTAA
6 e 119590 119638 34.69 A T ACAAAAAATAATAATTATAAATAAATAAAAATAAAAATTCAACTGTAA
  1. Threshold alignment score of TRF was set to 20 and alignment weights varied from {2,7,7} to {2,3,5}. Sputnik mismatch penalty was set to -10, -6, and -5. Mreps resolution value varied from 1 to 6. For each detection, we report the start/end positions, divergence from a pure repeat, motif and actual sequence. Variation of detection when reducing weights is as follows: n: newly detected sequence; e: enlargement of a previous sequence; c: concatenation of previous sequences. New nucleotides detected by enlarging or concatenating previous sequences are underlined. The sequence at position 305765 is an example of a microsatellite detected at low values of alignment weights of TRF. It cannot be detected with alignment weights down to {2,3,5} because correct match bonuses cannot compensate for imperfection penalties. Reducing alignment weights may also enlarge detections, as shown for alignment weights {2,5,5} at position 305836. A succession of close errors (in boldface) decreases the alignment score, which falls under the threshold score for weight values larger than {2,5,5}. Reducing alignment weights also provokes concatenation, when an enlarged tandem repeat overlaps with one of its neighbors. At position 304696, two substitutions (in boldface), stops detection when alignment weights are set to {2,7,7}. With a smaller substitution penalty (5 or less), the detection is enlarged up to position 304646 and overlaps with the other detection. Reducing Sputnik mismatch penalty allows detection of larger microsatellites, by concatenating shorter, perfect ones. The two detections at position 552928 and 552939 are concatenated with a mismatch penalty of -5, because the penalty induced by two errors at position 552936 and 552938 are compensated by the second detection. A second concatenation occurs at position 552964 with a mismatch of -6. The two merged detections are not of the same motif, but the two errors induced by this difference are compensated by the matching bases with low values of mismatch penalty.
  2. A larger resolution value for Mreps enlarges already-detected tandem repeats. In the first part of the tandem repeat at position 119591, adjacent repeats are separated by at most one error, and this part is detected at resolution 1; however repeats TAT and AAA are separated by two errors, so the second part can only be found at resolution 2 or higher. Finally, increasing resolution provokes concatenation. Detections for resolution 2 at positions 119591 and 199611 are enlarged when resolution is 3; both periods are reduced to 1 (see explanations in Methods), and the two sequences are merged.