Skip to main content

Advertisement

Table 1 Accuracy as average RMSD values for combinations of data modelsand estimators

From: Scoredist: A simple and robust protein sequence distance estimator

  testset  
  Dayhoff MV JTT WAG average
Scoredist – Dayhoff 12.68 20.85 13.67 12.81 15.00
ML – Dayhoff 12.70 28.40 14.75 15.15 17.75
ED – Dayhoff 13.57 31.36 16.10 16.63 19.41
Scoredist – MV 19.28 13.15 16.29 18.73 16.86
ML – MV 19.96 13.44 19.36 19.21 17.99
ED – MV 15.68 13.35 13.95 14.75 14.43
Scoredist – JTT 13.67 17.16 12.89 13.47 14.30
ML – JTT 12.15 25.07 12.10 13.44 15.69
ED – JTT 12.56 27.71 12.70 14.37 16.84
Jukes-Cantor 23.92 16.28 19.88 22.48 20.64
Kimura 16.24 29.81 22.36 19.16 21.89
  1. For each testset and method, the average root mean square deviation from the true distance was calculated for 2,000 alignment samples in the interval 1–200 PAM units. Lower RMSD values indicate higher accuracy on a single testset. The column 'average' gives the mean of the four evaluated testsets. A low value in this column shows the estimator's robustness as it measures the accuracy over all four models (including "wrong" data models). Scoredist was more robust than ML, as it for each training set always had higher accuracy on average. The ED estimator gave good results when trained with MV, but was poor in all other cases (see Discussion for details). Scoredist, Jukes-Cantor, and Kimura distances were calculated with the Belvu alignment viewer. The Maximum Likelihood (ML) and Expected Distance (ED) estimates were produced by lapd (L. Arvestad, unpublished).