Skip to main content

Table 2 Mean Probalign and SSEARCH percent error shown for each RFAM family in the full benchmark and for datasets with maximum pairwise sequence identity of 30%.

From: Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities

RFAM Family

Complete benchmark dataset

Subset with pairwise identity up to 30%

 

Probalign

SSEARCH

Difference

Probalign

SSEARCH

Difference

5S_rRNA

22.7

20.7

-2.0

Zero datasets

U1 (4)

15.0

15.6

0.6

87.3

100.0

12.7

tRNA (256)

62.0

74.4

12.3

69.8

84.8

15.0

RNaseP_bact_a

34.0

33.0

-1.0

Zero datasets

RNaseP_bact_b

29.0

29.1

-0.1

   

U3

41.3

38.8

-2.5

   

U4 (8)

25.3

22.2

-3.1

52.8

11

-41.8

SRP_euk_arch (132)

43.8

56.4

12.6

62.1

78.0

15.9

tmRNA (180)

32.0

36.3

4.3

50.5

59.8

9.4

Intron_gpI (4)

67.4

80.1

12.7

100.0

100.0

0.0

SECIS (208)

82.3

93.9

11.5

87.9

100.0

12.1

IRE (216)

44.4

48.7

4.2

88.7

96.5

7.7

THI

29.5

30.1

0.6

Zero datasets

Hammerhead_1

43.7

46.0

2.3

   

Purine (4)

16.2

16.4

0.2

17.4

1.8

-15.6

Lysine (16)

48.0

57.3

9.3

73.1

100.0

26.9

SRP_bact (80)

28.5

25.7

-2.8

62.6

65.0

2.3

SSU_rRNA_5 (4)

30.5

32.4

1.9

39

61

22

T-box

27.4

46.0

18.6

Zero datasets

glmS (4)

23.4

21.0

-2.4

73.8

78.4

4.6

RNaseP_arch (8)

32.4

34.0

1.6

87

100.0

13

IRES_Cripavirus

5.7

3.9

-1.8

Zero datasets

  1. Unlike Table 1 above, where some datasets are omitted due to BLAST, all datasets of the benchmark are considered here. Difference is always calculated as the SSEARCH error minus Probalign error, meaning positive numbers indicates Probalign outperforms SSEARCH. Shown in parenthesis is the number of datasets in each family with maximum pairwise sequence identity of 30% (the same query RNA but with different flank sizes is considered a separate dataset).