Skip to main content

Table 1 Accuracy as average RMSD values for combinations of data modelsand estimators

From: Scoredist: A simple and robust protein sequence distance estimator

 

testset

 
 

Dayhoff

MV

JTT

WAG

average

Scoredist – Dayhoff

12.68

20.85

13.67

12.81

15.00

ML – Dayhoff

12.70

28.40

14.75

15.15

17.75

ED – Dayhoff

13.57

31.36

16.10

16.63

19.41

Scoredist – MV

19.28

13.15

16.29

18.73

16.86

ML – MV

19.96

13.44

19.36

19.21

17.99

ED – MV

15.68

13.35

13.95

14.75

14.43

Scoredist – JTT

13.67

17.16

12.89

13.47

14.30

ML – JTT

12.15

25.07

12.10

13.44

15.69

ED – JTT

12.56

27.71

12.70

14.37

16.84

Jukes-Cantor

23.92

16.28

19.88

22.48

20.64

Kimura

16.24

29.81

22.36

19.16

21.89

  1. For each testset and method, the average root mean square deviation from the true distance was calculated for 2,000 alignment samples in the interval 1–200 PAM units. Lower RMSD values indicate higher accuracy on a single testset. The column 'average' gives the mean of the four evaluated testsets. A low value in this column shows the estimator's robustness as it measures the accuracy over all four models (including "wrong" data models). Scoredist was more robust than ML, as it for each training set always had higher accuracy on average. The ED estimator gave good results when trained with MV, but was poor in all other cases (see Discussion for details). Scoredist, Jukes-Cantor, and Kimura distances were calculated with the Belvu alignment viewer. The Maximum Likelihood (ML) and Expected Distance (ED) estimates were produced by lapd (L. Arvestad, unpublished).