Skip to main content

Table 2 Detection sample obtained with TRF with different alignment weights, Sputnik with different mismatch penalty, and Mreps with different resolution, in the human X chromosome.

From: Detecting microsatellites within genomes: significant variation among algorithms

  

start

end

divergence

motif

sequence

TRF

alignment scores

2,7,7

 

304646

304658

0

CTCTC

CTCTCCTCTCCTC

  

304696

304713

5.55

TCCTC

TCCTCTTCTCTCCTCTCC

  

305863

305872

0

CCTTC

CCTTCCCTTC

2,5,7

c

304646

304713

18.3099

TCTCC

CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGG TCCTCTTCTCTCCTCTCC

  

305863

305872

0

TTCCC

CCTTCCCTTC

2,5,5

 

304646

304713

18.0556

TCTCC

CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGGTCCTCTTCTCTCCTCTCC

 

e

305836

305872

17.9487

TTCCC

CCCTCTCCACTTCCTTCTCTTCC A C CT CCTTCCCTTC

2,3,5

e

304643

304713

18.9189

CTCCT

CTG CTCTCCTCTCCTCCTTCTCCGCTCCCTGCACTGCCCTCCGCTCCCTCCGGTCCTCTTCTCTCCTCTCC

 

n

305765

305800

25.641

CCA

CCACACCACCTCTGACGCCCACCACAGCCCCCCACC

  

305836

305872

17.9487

CCCTT

CCCTCTCCACTTCCTTCTCTTCCACCTCCTTCCCTTC

Sputnik

mismatch penalty

-10

 

552928

552935

0

AG

GAGAGAGA

  

552939

552948

0

AG

GAGAGAGAGA

  

552954

552963

0

AAGAG

AAGAGAAGAG

  

552964

552975

0

AG

AGAGAGAGAGAG

-6

 

552928

552935

0

AG

GAGAGAGA

  

552939

552948

0

AG

GAGAGAGAGA

 

c

552954

552975

9.09

AAGAG

AAGAGAAGAGAGAGAGAGAGAG

-5

c

552928

552948

9.52

AG

GAGAGAGAAAG GAGAGAGAGA

  

552954

552975

9.09

AAGAG

AAGAGAAGAGAGAGAGAGAGAG

Mreps

resolution

1

 

119591

119610

20

AAT

ACAAAAAATAATAATTATAA

  

119611

119628

5.56

AAAAAT

ATAAATAAAAATAAAAAT

2

e

119591

119615

24

AAT

ACAAAAAATAATAATTATAAATAAA

  

119611

119628

5.56

AAAAAT

ATAAATAAAAATAAAAAT

3

c

119591

119638

33.33

A

ACAAAAAATAATAATTATAAATAAATAAAAATAAAAATTCAACTGTAA

6

e

119590

119638

34.69

A

T ACAAAAAATAATAATTATAAATAAATAAAAATAAAAATTCAACTGTAA

  1. Threshold alignment score of TRF was set to 20 and alignment weights varied from {2,7,7} to {2,3,5}. Sputnik mismatch penalty was set to -10, -6, and -5. Mreps resolution value varied from 1 to 6. For each detection, we report the start/end positions, divergence from a pure repeat, motif and actual sequence. Variation of detection when reducing weights is as follows: n: newly detected sequence; e: enlargement of a previous sequence; c: concatenation of previous sequences. New nucleotides detected by enlarging or concatenating previous sequences are underlined. The sequence at position 305765 is an example of a microsatellite detected at low values of alignment weights of TRF. It cannot be detected with alignment weights down to {2,3,5} because correct match bonuses cannot compensate for imperfection penalties. Reducing alignment weights may also enlarge detections, as shown for alignment weights {2,5,5} at position 305836. A succession of close errors (in boldface) decreases the alignment score, which falls under the threshold score for weight values larger than {2,5,5}. Reducing alignment weights also provokes concatenation, when an enlarged tandem repeat overlaps with one of its neighbors. At position 304696, two substitutions (in boldface), stops detection when alignment weights are set to {2,7,7}. With a smaller substitution penalty (5 or less), the detection is enlarged up to position 304646 and overlaps with the other detection. Reducing Sputnik mismatch penalty allows detection of larger microsatellites, by concatenating shorter, perfect ones. The two detections at position 552928 and 552939 are concatenated with a mismatch penalty of -5, because the penalty induced by two errors at position 552936 and 552938 are compensated by the second detection. A second concatenation occurs at position 552964 with a mismatch of -6. The two merged detections are not of the same motif, but the two errors induced by this difference are compensated by the matching bases with low values of mismatch penalty.
  2. A larger resolution value for Mreps enlarges already-detected tandem repeats. In the first part of the tandem repeat at position 119591, adjacent repeats are separated by at most one error, and this part is detected at resolution 1; however repeats TAT and AAA are separated by two errors, so the second part can only be found at resolution 2 or higher. Finally, increasing resolution provokes concatenation. Detections for resolution 2 at positions 119591 and 199611 are enlarged when resolution is 3; both periods are reduced to 1 (see explanations in Methods), and the two sequences are merged.