Skip to main content

Table 1 Comparisons of the performances of RepeatScout, ReCon, WindowMasker, and Red. Repeats detected by RepeatMasker are considered the ground truth in this study

From: Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale

 

SN te

SN tr

SN low

SN other

SN all

SP exon

PP

FPL

PR

Time

Memory

Tool

(%)

(%)

(%)

(%)

(%)

(%)

(%)

(bp)

(bp)

(sec)

(MB)

Homo sapiens– 3,099,750,718 bp

 

RS

62.5

79.6

13.6

29.2

63.5

90.5

33.7

239474

9355324a

948,350

4701

Red

61.0

86.2

68.9

33.9

62.8

89.3

35.5

2657024

16414125a

5184

6775

WM

55.2

74.9

81.9

25.7

56.7

87.2

36.1

423707488

3109241a

14866

615

RC

55.0

75.2

11.0

11.5

56.2

95.4

29.2

137633

3575640a

898,844

14666

Drosophila melanogaster– 143,726,002 bp

 

Red

90.0

59.4

43.3

83.0

84.1

94.0

23.2

312686

9401953

206

916

RS

86.3

24.3

1.9

71.8

74.4

98.0

18.4

0

4913141

79008

979

RC

86.7

18.2

1.8

80.0

74.0

99.0

17.6

0

4002422

13979

1513

WM

45.5

64.8

62.7

42.1

48.8

90.8

22.3

15150087

17118084

2869

325

Zea mays– 2,059,943,587 bp

 

RS

96.7

55.5

25.8

89.9

96.3

80.0

44447

66587503

347082

7344

Red

93.3

58.1

31.9

88.7

93.0

78.8

6257

94687287

2731

6741

RC

91.6

33.5

12.9

88.8

91.1

74.3

20864

32450624

192223

3419

WM

82.3

63.7

40.1

86.6

82.1

67.2

36189699

33998795

7589

639

Glycine max– 973,344,380 bp

 

RC

96.3

42.5

22.6

99.9

92.7

95.1

46.4

2144642

123719267

304490

8609

RS

92.5

39.6

19.1

94.4

89.0

92.0

43.6

2690420

110068092

134936

1516

Red

86.9

42.5

28.9

96.1

83.9

94.5

41.6

1794609

107704170

1653

1770

WM

68.1

83.4

83.6

3.5

68.9

95.4

44.4

170352943

186081334

13319

356

Dictyostelium discoideum– 34,121,699 bp

 

Red

94.7

93.8

96.5

1.0

94.3

54.9

2378281

10582912

61

235

WM

35.1

92.7

95.1

4.8

84.7

53.0

14238455

10769264

20

2

RC

95.0

25.0

7.6

0.0

31.9

13.6

0

1883829

18317

957

RS

79.4

27.0

4.6

0.0

30.4

13.5

0

1969788

17476

925

Plasmodium falciparum– 23,264,338 bp

 

WM

91.2

90.6

28.5

89.2

61.4

10902380

10246592

23

7

Red

87.6

84.2

90.7

87.2

51.8

972553

8129833

63

416

RS

43.9

9.9

40.1

39.4

15.3

36882

1797419

36194

918

RC

20.3

7.7

34.5

19.1

9.3

4827

1314829

7011

1052

Mycobacterium tuberculosis– 4,403,837 bp

 

Red

88.7

81.0

88.5

44.0

160705

1914667

7

1

WM

63.6

33.3

63.0

17.5

672523

755248

2

2

RS

20.8

27.3

21.0

5.6

0

240852

331

640

RC

0.0

0.0

0.0

0.0

0

2089

69

852

  1. SN te is the sensitivity to all types of transposable elements. SN tr is the sensitivity to tandem repeats including microsatellites and satellites. SN low is the sensitivity to low complexity regions. SN other is the sensitivity to repeats that are not transposons, tandem repeats, or low complexity regions. SN all is the sensitivity to all types of repeats. SP exon is the specificity to coding regions. PP stands for the percentage of the nucleotides of a chromosome predicted to be repeats. The False Positive Length (FPL) is the total length of repeats found in a synthetic random genome with the same length as the original genome; the synthetic genome is generated by a group of Markov chains of the 6th order. Each chain is trained on one real chromosome. Repeats found in the synthetic genome by RepeatMasker were removed. Potential Repeats (PR) is the number of nucleotides that were found in the repeats predicted by a tool but not in the repeats located by RepeatMasker. The symbol “bp” stands for base pair. “MB” represents the unit megabyte. The ‘a’ next to the PR indicates that these repeats are confirmed novel repeats