Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score
© Pandit and Skolnick; licensee BioMed Central Ltd. 2008
Received: 16 July 2008
Accepted: 12 December 2008
Published: 12 December 2008
Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true.
To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TM-align, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9%) and coverage (~7%) in comparison to those generated by TM-align. Fr-TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align.
Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr-TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/.
Protein tertiary structure comparison is widely employed in the field of structural biology, with applications ranging from protein fold classification [1–3], protein structure prediction and modeling [4–8] to structure-based protein function annotation [9–11]. With the rapid increase in the number of protein structures deposited in the Protein Data Bank (PDB) , it is important to develop faster and better algorithms to compare protein structures. In general, there are two types of protein tertiary structure comparisons approaches. The easiest involves the comparison of a pair of protein structures with an a priori specified equivalence between pairs of residues (as provided by sequence or threading alignments ). The second type involves the comparison when the set of equivalent residues is not a priori given. Therefore, an optimal structural alignment needs to be identified; this problem is NP-hard with no exact solution . Nevertheless, a number of methods have been developed that employ heuristics to search for the best structural alignment. These methods use different representations of protein structure, definitions of similarity measures and optimization algorithms [15, 16]. Some approaches compare the respective distance matrices of each protein structure, trying to minimize the intra-atomic distances for the set of aligned substructures [17–21]; this is the approach employed in the widely used DALI algorithm . Another approach, tries to minimize the inter-atomic distances between two structures [22–28]; representatives of this type of algorithm include CE , MAMMOTH , and TM-align .
In practice, most structural alignment procedures start with the generation of an initial set of equivalent residues. Then, using a structural similarity score, the initial seed alignment is extended and/or refined using methods such as dynamic programming or Monte Carlo procedures. Finally, the structural similarity score is assessed; this is generally done on the basis of the statistical significance of the structural similarity. In practice, many structural alignment algorithms use fragment assembly to build the initial set of equivalences [18–20, 26, 28–30]. This involves the comparison of many, if not all, small fragments in the two structures. Then, similar fragments are assembled into a larger, consistent set. Fragments can be secondary structure elements [20, 30] or arbitrary substructures of a given length as in CE  or MAMMOTH .
In most structural alignment methods, structural similarity is assessed by a structure comparison score. One commonly used measure is the root-mean-square deviation, RMSD, between a pair of structures with a specified set of equivalent residues. As pointed out by Zhang and Skolnick  among others , statistically significant values of the RMSD are length dependent. Another problem is that the RMSD can be reduced by decreasing the coverage (the number of aligned residues). These issues were addressed by introduction of the TM-score , which is a modification of the Levitt-Gerstein (LG) weight factor  that weights residue pairs at smaller distances greater than those at larger distances. The TM-score is length independent, with a value of 0.30 (0.01) for the average (standard deviation of the) TM-score of the best structural alignment for randomly selected pairs of proteins .
Based on the TM-score, a new structural alignment algorithm TM-align  was developed. TM-align employs a very simple approach that uses both gapless threading and secondary structure similarity to generate the initial set of equivalent residues. This set of aligned residues is refined using dynamic programming to maximize the TM-score. The scoring matrix used for dynamic programming is derived from the TM-score rotation matrix, which results in faster convergence and a better structural alignment. On average, TM-align provides structural alignments with higher accuracy and coverage than the most often-used methods such as DALI and CE. In a separate study , TM-align was compared with other competitive structural alignment programs using different measures of structural similarity, with the result that TM-align is one of the best structural alignment programs [22, 35]. However, TM-align sometimes is unable to identify a good structural similarity. To address this issue, in the present work, we improve the TM-align program by generating a better initial set of equivalences using a fragment assembly approach. This is followed by heuristic iterations involving the TM-score rotation matrix and dynamic programming to obtain the structural alignment with the largest TM-score.
We have used two datasets for the evaluation of different structural alignment programs. The first dataset is the same as that used in the original TM-align paper  and consists of 200 proteins that range in size from 46 to 1058 residues. The first dataset is also used in deriving various parameters for Fr-TM-align algorithm. The second dataset has 200 proteins, and is a randomly selected, representative subset of the PDB template library of non-homologous proteins with pairwise sequence identity of ≤ 35% . It is comprised of proteins whose length ranges from 40 to 910 residues and includes representatives of all secondary structure classes, viz. all α, all β or mixed α/β proteins.
Algorithm and implementation
Fr-TM-align employs the backbone Cα coordinates of the protein structures for the structural alignment. The Fr-TM-align algorithm consists of the following steps:
1. Generation and scoring of aligned fragment pairs
where d a = 0.5 and LF is the length of the fragment. Here, the value of d a is derived empirically. In order to derive the empirical value of d a , we have used four reasonable values for d a = 0.25, 0.5, 1.0 or 1.5, and the value which resulted in the maximum average TM-score for all pairs of alignments in the training dataset using Fr-TM-align was selected. In addition, the secondary structure similarity between the two fragments is given by the SS-score, which is simply the number of residues with identical secondary structure normalized by the length of the fragment. The secondary structure assignment procedure is same as in TM-align .
Let L A and L B be the length of the first and second chains respectively, and let L F be the length of the fragment. Then, the numbers of fragments are I = L A /L F and J = L B /L F . The length of the fragment is selected empirically. L F , is 8 residues, if the length of the smaller protein is less than 100 residues; otherwise, L F = 12 residues. We perform an all-against-all (I X J) structural superposition of the non-overlapping fragments from the two protein structures. The structural similarity and secondary structure similarity of the fragments from the two proteins as given by their GL-score and SS-score are stored in the G(I, J) and S(I, J) scoring matrices respectively.
2. Assembly of fragments and generation of seed alignments
In order to generate different seed alignments, the fragments are assembled into a large consistent set. For this, we employ dynamic programming (DP) with no end gap penalty to align fragments using the scoring matrices derived from the fragment comparison in the first step. We observed that the optimal fragment alignment does not usually result in the structural alignment with highest TM-score. Hence, in addition to the optimal alignment obtained using DP, we generate three suboptimal alignments using a variation of the Waterman and Eggert algorithm . The generation of suboptimal alignments involves re-computing of the forward trace matrix for finding the optimal path with the modification that the similarity score of the previously aligned positions is not used in the calculation of the forward trace matrix. The optimal path alignment is obtained by back tracing from the last column/row using this forward trace matrix using no end gap penalty. Other variations of the DP algorithm to generate the suboptimal alignment  would have been equally applicable.
We have used two scoring matrices for DP, G(I, J) and (G(I, J)+S(I, J))/2, along with three different gap opening penalties of -0.6, -0.1 and 0.0, which are chosen empirically to generate various fragment alignments. The fragment alignment, in turn, provides the initial set of equivalent residues between the two protein structures. We could use these initial set of alignments as seed alignments and use heuristic iteration to get best structural alignment. However, we observed that refining the alignment with another round of DP before heuristic iteration results in alignments with improved TM-score.
The fragment alignment is used to generate initial equivalent residues, which are modified by DP using three different scoring matrices to form the final set of seed alignments. The three scoring matrices are: (1) a distance score matrix generated by rotating one of the structures by the RMSD rotation matrix based on the aligned residues. The RMSD rotation matrix is the matrix which minimizes the distance between pairs of equivalent residues for given two structures. This provides the best structural superposition (lowest RMSD) of the predefined equivalent residues between two structures and is used to calculate the distance matrix (as defined in equation 2(a)), which is used in the DP step. (2) Modification of the first scoring matrix based on the identity of the secondary structure assignment of the residues. When the secondary structure assignment of the pair of aligned residues is the same, 0.5 is added to the respective score. (3) A distance score matrix generated using the TM-score rotation matrix based on the aligned residues. Along with these scoring matrices, we have used gap opening penalties of -1.0 and 0.0. This step is repeated for each fragment alignment. Finally, this gives a set of seed alignments, which are refined using the heuristic iteration procedure discussed below. The generation of initial seed alignments is an exhaustive process. However, we can generate fewer seed alignments by using only one gap opening penalty rather then using two. This decreases the computational time of the algorithm. We have implemented this in the faster version of the program (see Results section).
3. Heuristic Iteration
Here, d kl is the distance of the kth residue in structure 1 from the lth residue in structure 2 under the TM-score superposition, with L min the length of the smaller protein. A new alignment can be obtained by implementing DP on the matrix S(k, l), with optimal gap opening penalties of -0.6 and 0.0; these parameters were chosen empirically (see following paragraph). We then superimpose the structures by the TM-score rotation matrix according to the new alignment and obtain a newer alignment by implementing DP with the new score matrix. The procedure is repeated until the alignment becomes stable and the alignment with the highest TM-score is returned.
In all the above mentioned DP procedures, we obtained gap opening penalties with the objective function being the maximization of the average TM-score of the final alignment for all pairs of proteins in the training dataset. For this, we spanned gap penalties from -1.6 to 0.0 with a bin width of 0.1. The gap opening penalty value selected was the one that resulted in the maximum average TM-score of the final alignment for all pairs of proteins in the training dataset.
Alignment quality assessment
Here, Ltarget is the length of the target protein that other PDB structures are aligned to; L ali is the number of aligned residues; d i is the distance between the i th pair of aligned residues and , which is the average distance between a pair of residues for the best structural alignment in a randomly selected pair of structures of length Ltarget.
In addition, we have used other scores for the objective structural alignment quality assessment and for the comparison of structural alignment algorithms. The percentage of structural similarity (PSI) is defined as the number of aligned amino acid pairs with Cα atoms that are closer in space then 4 Å after optimal superposition normalized by the length of the shorter chain in the alignment. The relevant PSI (rPSI) value does not include fragments shorter than four aligned amino acid from the calculated PSI value. The coordinate RMSD (cRMSD) is computed for all aligned pairs after optimal superposition. The cRMSD (core) is computed for those aligned pairs that contribute to the PSI value . Here, PSI/rPSI provides a more detailed view of the alignment. However, these values are length dependent.
Results and discussion
In the literature, various structural alignment methods have been evaluated in different ways . Most evaluations use SCOP  or CATH  as the gold standard and assess the structural alignment based on the fold classifications found in these databases [35, 39–41]. Because the SCOP and CATH classifications are discrete, a drawback in this kind of evaluation is that the detailed alignment quality is not taken into account. Moreover, evolutionary information is also used in the SCOP or CATH classification apart from the structure of the protein under consideration. Recent studies [42, 43] have shown that significant structural similarity exists between proteins belonging to different fold families in the CATH and SCOP classifications. Here, we have compared different structural alignments pairs of structures purely by their geometric match.
Comparison of alignment quality
In the present analysis, we have evaluated the performance of the structural alignments algorithms CE, TM-align and Fr-TM-align on two different datasets (as described in the Methods). We have compared the alignment quality of Fr-TM-align with other structural alignment algorithms using cRMSD, TM-score, PSI and rPSI. These measures of alignment accuracy capture various features of the alignment. For example, PSI counts spatially close residues, whereas rPSI represents core continuous fragments that are spatially close. The TM-score represents a quality measure that combines both alignment coverage and accuracy.
Structural alignment by different algorithms for two datasets
Average over all pairs (39,800)a
In Table 1, columns 2–6 show the alignment accuracy as measured by various scores and columns 7–8 show the number of aligned residues and coverage averaged over 39,800 protein pairs. Based on cRMSD, the structural alignments from Fr-TM-align have on average better accuracy in comparison to both CE and TM-align for both datasets. Similarly, cRMSD (core) is also better on average for both TM-align and Fr-TM-align in comparison to CE. However, cRMSD (core) shows no difference between Fr-TM-align and TM-align. Based on another alignment accuracy score PSI or rPSI, Fr-TM-align performs better in comparison to both CE and TM-align for both the datasets. All three programs show a significant decreased value in the value of rPSI in comparison to PSI. This shows that all three programs achieve high PSI value by finding aligned fragments of length less than four residues in length.
Based on another structural alignment accuracy measure (TM-score), Fr-TM-align resulted in structural alignments with higher TM-score in comparison to both TM-align and CE. The higher TM-score could result from better identifying structurally similar regions with/without including more residues in the aligned region or by decreasing the number of aligned residues. As shown in Table 1, the average coverage and average number of aligned residues is higher for Fr-TM-align in comparison to TM-align and CE for both datasets. Hence, the improvement observed in TM-score by Fr-TM-align mainly results from the better identification of structurally similar regions in the pair of structures. This is also evident by the observed higher PSI/rPSI value for the alignments from Fr-TM-align. This suggests that Fr-TM-align better identifies structurally similar regions with increased accuracy and higher coverage. Next, we tested the statistical significance of the observed increase in the mean TM-score, PSI and rPSI for Fr-TM-align in comparison to TM-align and CE, using the unpaired t-test for the independent samples. The improvement in the mean TM-score, PSI and rPSI for Fr-TM-align is found to be statistically significant (p-value << 0.001).
While the data in Table 1 are averaged over all structure pairs, where the majority have different folds and low TM-score, a more realistic assessment is to evaluate the performance of different methods for the most significant match to a given target protein. Various methods employ different metrics of structural similarity: CE uses the CE z-score, whereas TM-align and Fr-TM-align use the TM-score. In the present analysis, we used both measures of structural similarity, the CE z-score and TM-score, to extract the most significant structurally similar protein pairs for the evaluation of the three methods.
Comparison of best structural alignment (alignment with highest CE z-score) from CE
Average of pairs with best z-score from CE (200 proteins for each dataset)a
Comparison of best structural alignment (alignment with maximum TM-score) from Fr-TM-align
Average of pairs with best TM-score from Fr-TM-align (200 proteins for each dataset)a
For a more detailed analysis, we have mostly used the TM-score to compare the alignments from TM-align and Fr-TM-align. As discussed before, the TM-score provides an appropriate combined quality measure of coverage and accuracy (low RMSD). In addition, we have previously shown that the TM-score also has the strongest correlation with foldability using MODELLER , in comparison with other structural similarity scores .
Comparison of TM-align and Fr-TM-align
Next, we have analyzed the relative improvement in the TM-score by Fr-TM-align. The TM-score difference (dTM) is defined as (TM-score (Fr-TM-align) – TM-score (TM-align)). The exact value of dTM which shows a significant change in the alignment quality is difficult to quantify. In fact, the numerical value of the TM-score has a contribution from the mean square distance between residues as well as from the number of aligned residues. The origin of a given dTM value depends on the TM-score as well. For example, an increase of 0.05 (dTM) in the TM-score from the initial value of 0.85 most likely arises because alignments with lower RMSD at identical coverage are generated. Whereas given an initial TM-score of 0.45, the increase to 0.5 most likely reflects both a decrease in RMSD and an increase in alignment coverage. Since most TM-scores of interest (as we are interested in detecting more distant similarities), are below 0.6, we have empirically considered a dTM ≥ 0.05 as a significant improvement and a dTM between 0 and 0.05 as not so significant only for the heuristic comparison of performance of TM-align with Fr-TM-align.
On average, Fr-TM-align takes ~4.5 seconds of CPU time per structure pair on a single core of a 2 GHz AMD Opteron processor. This is ~12 times slower than TM-align. The speed of the program might limit its application for performing structural alignments on a very large scale, but with the increase in the number of cores/processor, this is not likely to be a practical impediment. To increase the speed of Fr-TM-align, we have implemented a slightly fast version of the algorithm, which essentially scans a smaller number of initial seed alignments with the heuristic iteration (as discussed in Methods). This is provided as an option before the execution of the program. On average, for all structure pairs compared, Fr-TM-align with the fast option takes ~2.8 seconds with a final average (standard deviation) in the TM-score of 0.277 (0.093). This is comparable to the average (standard deviation) TM-score, 0.279 (0.094), obtained using the slower and more sensitive version of Fr-TM-align. The faster version of the algorithm results in a TM-score improvement of ~8.6% in comparison to TM-align and is ~7 times slower. This could potentially be used for large scale scanning to detect remote structural similarities. The slower version provides slightly better structural alignments for some structure pairs. The source code and executables of Fr-TM-align are available at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/.
We have developed an improved structural alignment algorithm Fr-TM-align that uses a fragment alignment and assembly approach to generate initial seed alignments which when refined generates more significant structural alignments than those provided by the original TM-align program. We have compared Fr-TM-align with other competing structural alignment algorithms, TM-align and CE, using various alignment quality assessment scores such as PSI and the TM-score. The evaluation shows that Fr-TM-align performs better in comparison to both TM-align and CE. In comparison to TM-align, on an average, Fr-TM-align results in an improved TM-score (by ~9%) and increased coverage (by ~7%) in comparison to TM-align. A more detailed comparison between Fr-TM-align and TM-align shows that alignments from Fr-TM-align have an improved TM-score for ~86% of protein pairs. The improvement in TM-score by Fr-TM-align is observed for all lengths of protein pairs and over all TM-score (obtained from TM-align) ranges. However, Fr-TM-align achieves this higher accuracy and coverage at the expense of longer computation time. On an average, Fr-TM-align is ~12 times slower than TM-align. Nevertheless, the ability of Fr-TM-align to detect more subtle structural similarities is a more desirable attribute. For a more practical application of Fr-TM-align, we have modified the algorithm that has reduced the computation time by ~1.6 times with a decrease in final TM-score only by ~1% relative to the slower version of the algorithm.
This research was supported in part by NIH grant No. GM-48845.
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–540.PubMedGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH–a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–1108.View ArticlePubMedGoogle Scholar
- Holm L, Sander C: The FSSP database: fold classification based on structure-structure alignment of proteins. Nucleic Acids Res 1996, 24(1):206–209.PubMed CentralView ArticlePubMedGoogle Scholar
- Moult J, Fidelis K, Zemla A, Hubbard T: Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 2003, 53(Suppl 6):334–339.View ArticlePubMedGoogle Scholar
- Sanchez R, Sali A: Evaluation of comparative protein structure modeling by MODELLER-3. Proteins 1997, (Suppl 1):50–58.
- Akutsu T, Sim KL: Protein Threading Based on Multiple Protein Structure Alignment. Genome Inform Ser Workshop Genome Inform 1999, 10: 23–29.PubMedGoogle Scholar
- Vogt G, Etzold T, Argos P: An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol 1995, 249(4):816–831.View ArticlePubMedGoogle Scholar
- Standley DM, Eyrich VA, An Y, Pincus DL, Gunn JR, Friesner RA: Protein structure prediction using a combination of sequence-based alignment, constrained energy minimization, and structural alignment. Proteins 2001, (Suppl 5):133–139.
- Skolnick J, Fetrow JS, Kolinski A: Structural genomics and its importance for gene function analysis. Nat Biotechnol 2000, 18(3):283–287.View ArticlePubMedGoogle Scholar
- Lisewski AM, Lichtarge O: Rapid detection of similarity in protein structure and function through contact metric distances. Nucleic Acids Res 2006, 34(22):e152.PubMed CentralView ArticlePubMedGoogle Scholar
- Baker D, Sali A: Protein structure prediction and structural genomics. Science 2001, 294(5540):93–96.View ArticlePubMedGoogle Scholar
- Berman HM: The Protein Data Bank: a historical perspective. Acta Crystallogr A 2008, 64(Pt 1):88–95.View ArticlePubMedGoogle Scholar
- Skolnick J, Kihara D, Zhang Y: Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm. Proteins 2004, 56(3):502–518.View ArticlePubMedGoogle Scholar
- Lathrop RH: The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Eng 1994, 7(9):1059–1068.View ArticlePubMedGoogle Scholar
- Sierk ML, Kleywegt GJ: Deja vu all over again: finding and analyzing protein structure similarities. Structure 2004, 12(12):2103–2111.PubMedGoogle Scholar
- Koehl P: Protein structure similarities. Curr Opin Struct Biol 2001, 11(3):348–353.View ArticlePubMedGoogle Scholar
- Taylor WR, Orengo CA: Protein structure alignment. J Mol Biol 1989, 208(1):1–22.View ArticlePubMedGoogle Scholar
- Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol 1993, 233(1):123–138.View ArticlePubMedGoogle Scholar
- Alexandrov NN, Takahashi K, Go N: Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. J Mol Biol 1992, 225(1):5–9.View ArticlePubMedGoogle Scholar
- Orengo CA, Taylor WR: SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol 1996, 266: 617–635.View ArticlePubMedGoogle Scholar
- Vriend G, Sander C: Detection of common three-dimensional substructures in proteins. Proteins 1991, 11(1):52–58.View ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 2005, 33(7):2302–2309.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhu J, Weng Z: FAST: a novel protein structure alignment algorithm. Proteins 2005, 58(3):618–627.View ArticlePubMedGoogle Scholar
- Ye Y, Godzik A: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 2003, 19(Suppl 2):ii246–255.View ArticlePubMedGoogle Scholar
- Russell RB, Barton GJ: Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins 1992, 14(2):309–323.View ArticlePubMedGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998, 11(9):739–747.View ArticlePubMedGoogle Scholar
- Oldfield TJ: CAALIGN: a program for pairwise and multiple protein-structure alignment. Acta Crystallogr D Biol Crystallogr 2007, 63(Pt 4):514–525.View ArticlePubMedGoogle Scholar
- Ortiz AR, Strauss CE, Olmea O: MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 2002, 11(11):2606–2621.PubMed CentralView ArticlePubMedGoogle Scholar
- Alesker V, Nussinov R, Wolfson HJ: Detection of non-topological motifs in protein structures. Protein Eng 1996, 9(12):1103–1119.View ArticlePubMedGoogle Scholar
- Alexandrov NN, Fischer D: Analysis of topological and nontopological structural similarities in the PDB: new examples with old structures. Proteins 1996, 25(3):354–365.View ArticlePubMedGoogle Scholar
- Zhang Y, Skolnick J: Scoring function for automated assessment of protein structure template quality. Proteins 2004, 57(4):702–710.View ArticlePubMedGoogle Scholar
- Siew N, Elofsson A, Rychlewski L, Fischer D: MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 2000, 16(9):776–785.View ArticlePubMedGoogle Scholar
- Levitt M, Gerstein M: A unified statistical framework for sequence comparison and structure comparison. Proc Natl Acad Sci USA 1998, 95(11):5913–5920.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J: On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci USA 2006, 103(8):2605–2610.PubMed CentralView ArticlePubMedGoogle Scholar
- Teichert F, Bastolla U, Porto M: SABERTOOTH: protein structural alignment based on a vectorial structure representation. BMC Bioinformatics 2007, 8(1):425.PubMed CentralView ArticlePubMedGoogle Scholar
- Waterman MS, Eggert M: A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol 1987, 197(4):723–728.View ArticlePubMedGoogle Scholar
- Shibuya T, Imai H: Enumerating suboptimal alignments of multiple biological sequences efficiently. Pac Symp Biocomput 1997, 409–420.Google Scholar
- Kolodny R, Koehl P, Levitt M: Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 2005, 346(4):1173–1188.PubMed CentralView ArticlePubMedGoogle Scholar
- Leplae R, Hubbard TJ: MaxBench: evaluation of sequence and structure comparison methods. Bioinformatics 2002, 18(3):494–495.View ArticlePubMedGoogle Scholar
- Novotny M, Madsen D, Kleywegt GJ: Evaluation of protein fold comparison servers. Proteins 2004, 54(2):260–270.View ArticlePubMedGoogle Scholar
- Sierk ML, Pearson WR: Sensitivity and selectivity in protein structure comparison. Protein Sci 2004, 13(3):773–785.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang AS, Honig B: An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. J Mol Biol 2000, 301(3):665–678.View ArticlePubMedGoogle Scholar
- Kihara D, Skolnick J: The PDB is a covering set of small protein structures. J Mol Biol 2003, 334(4):793–802.View ArticlePubMedGoogle Scholar
- Sali A, Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993, 234(3):779–815.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.