Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities

Table 1 Mean and median percent error for all methods on the full benchmark (13,716 datasets) including query RNAs with flanks of size 50, 100, and 150.

Mean and median error	Probalign	SSEARCH	BLAST	ClustalW
Complete benchmark	35.3 \| 30.7	38.7 \| 33.2	41.0 \| 34.0	47.6 \| 50.3
Datasets with pairwise sequence identity at most 30%	66.5* \| 71.2*	73.0 \| 83.4	75.9 \| 85.3	82.9 \| 85.0

BLAST does not return an alignment in 425 datasets and hence they are omitted from the calculations. HMMER is not shown since queries with unalignable flanks cannot be used to produce a reliable model. There are 14 families that contain datasets with at most 30% sequence identity. Probalign has overall lowest mean and median error. Bold indicates the best performance; the difference is larger on datasets with low sequence identity and significant with P-value < 0.05 (indicated by *).

ISSN: 1471-2105

Mean and median error	Probalign	SSEARCH	BLAST	ClustalW
Complete benchmark	35.3 \| 30.7	38.7 \| 33.2	41.0 \| 34.0	47.6 \| 50.3
Datasets with pairwise sequence identity at most 30%	66.5* \| 71.2*	73.0 \| 83.4	75.9 \| 85.3	82.9 \| 85.0