Skip to main content

Table 2 Mean Probalign and SSEARCH percent error shown for each RFAM family in the full benchmark and for datasets with maximum pairwise sequence identity of 30%.

From: Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities

RFAM Family Complete benchmark dataset Subset with pairwise identity up to 30%
  Probalign SSEARCH Difference Probalign SSEARCH Difference
5S_rRNA 22.7 20.7 -2.0 Zero datasets
U1 (4) 15.0 15.6 0.6 87.3 100.0 12.7
tRNA (256) 62.0 74.4 12.3 69.8 84.8 15.0
RNaseP_bact_a 34.0 33.0 -1.0 Zero datasets
RNaseP_bact_b 29.0 29.1 -0.1    
U3 41.3 38.8 -2.5    
U4 (8) 25.3 22.2 -3.1 52.8 11 -41.8
SRP_euk_arch (132) 43.8 56.4 12.6 62.1 78.0 15.9
tmRNA (180) 32.0 36.3 4.3 50.5 59.8 9.4
Intron_gpI (4) 67.4 80.1 12.7 100.0 100.0 0.0
SECIS (208) 82.3 93.9 11.5 87.9 100.0 12.1
IRE (216) 44.4 48.7 4.2 88.7 96.5 7.7
THI 29.5 30.1 0.6 Zero datasets
Hammerhead_1 43.7 46.0 2.3    
Purine (4) 16.2 16.4 0.2 17.4 1.8 -15.6
Lysine (16) 48.0 57.3 9.3 73.1 100.0 26.9
SRP_bact (80) 28.5 25.7 -2.8 62.6 65.0 2.3
SSU_rRNA_5 (4) 30.5 32.4 1.9 39 61 22
T-box 27.4 46.0 18.6 Zero datasets
glmS (4) 23.4 21.0 -2.4 73.8 78.4 4.6
RNaseP_arch (8) 32.4 34.0 1.6 87 100.0 13
IRES_Cripavirus 5.7 3.9 -1.8 Zero datasets
  1. Unlike Table 1 above, where some datasets are omitted due to BLAST, all datasets of the benchmark are considered here. Difference is always calculated as the SSEARCH error minus Probalign error, meaning positive numbers indicates Probalign outperforms SSEARCH. Shown in parenthesis is the number of datasets in each family with maximum pairwise sequence identity of 30% (the same query RNA but with different flank sizes is considered a separate dataset).