- Research article
- Open Access
The meaning of alignment: lessons from structural diversity
© Pirovano et al; licensee BioMed Central Ltd. 2008
- Received: 20 August 2008
- Accepted: 23 December 2008
- Published: 23 December 2008
Protein structural alignment provides a fundamental basis for deriving principles of functional and evolutionary relationships. It is routinely used for structural classification and functional characterization of proteins and for the construction of sequence alignment benchmarks. However, the available techniques do not fully consider the implications of protein structural diversity and typically generate a single alignment between sequences.
We have taken alternative protein crystal structures and generated simulation snapshots to explicitly investigate the impact of structural changes on the alignments. We show that structural diversity has a significant effect on structural alignment. Moreover, we observe alignment inconsistencies even for modest spatial divergence, implying that the biological interpretation of alignments is less straightforward than commonly assumed. A salient example is the GroES 'mobile loop' where sub-Ångstrom variations give rise to contradictory sequence alignments.
A comprehensive treatment of ambiguous alignment regions is crucial for further development of structural alignment applications and for the representation of alignments in general. For this purpose we have developed an on-line database containing our data and new ways of visualizing alignment inconsistencies, which can be found at http://www.ibi.vu.nl/databases/stralivari.
- Root Mean Square Deviation
- Structural Alignment
- Reference Structure
- Reference Alignment
- Alignment Region
Sequence comparison has become a major tool for biological research in the post-genomic era, forming the basis for functional annotation, classification, and analysis of evolutionary relationships. At the residue level, however, the relation between sequence, structure and function can often be obscure, and examples abound of proteins with a clear functional and homologous relationship but sharing negligible similarity at the sequence level.
Structural alignment therefore is the method of choice for reliable homology assessment and derived features like functional classification and phylogeny. This importance is reflected in the number of tools available for structural alignment, such as DALI , SSAP , STRUCTAL , MAMMOTH , CE  and COMPARER  (for recent reviews on the topic, see Kolodny et al.  and Mayr et al. ). Databases for functional classification such as CATH , FSSP  and PASS2  each derive directly from the use of one or more of these methods, whereas for SCOP expert input in the structural classification is deemed critical . Structural alignments are also routinely used for benchmarking sequence alignment methods. A number of databases have been developed for this purpose, among which BAliBASE , HOMSTRAD  and SABmark  are widely used. These databases often rely on expert knowledge and include a notion of 'core blocks', i.e. where alignment ambiguity does not occur and hence can be trusted. The general problem of uncertainty in sequence alignment has recently been discussed by Wong et al. . Due to the complexity of interpreting non-trivial alignment regions, these are often omitted in large-scale evolutionary analyses, even though there is ample evidence for their fundamental importance [16, 17]. An approach to pinpointing alignment ambiguity is the generation of ensembles of suboptimal alignments , but computational demands remain prohibitive for genome wide studies.
Our main results show that in many cases structural variation strongly affects structural alignments, even for highly similar sequences. Moreover, the derived alignment appears to be highly sensitive to even small conformational changes of the proteins. The uncertainty in pairing up structural equivalent residues makes it difficult to determine which alignment alternative would describe most closely the functional relationship between the proteins. To address this issue, we show how alternative alignment visualizations may be used to exploit the information contained within variable alignment regions.
Structural diversity and alignment stability
Although alignment uncertainty has been shown to have a great impact on large scale sequence analysis [16, 17], the relation with structural variation has not been widely explored . This is remarkable given that structural alignments are generally employed to benchmark sequence alignment methods. We demonstrate that in many cases structural alignments can vary dramatically even for small structural changes. Trends observed in the set of crystal structures corroborate those observed in the set of simulation snapshots, albeit alignment differences in the latter set are more pronounced due to larger structural variations.
A depository for alignment variability
It is questionable whether a single reference alignment captures the full width of naturally occurring sequence variability . Yet, current visualization and alignment methods are not designed to take variable regions into account, and they are typically ignored in sequence alignment benchmark protocols. Since variable regions are often important structurally and/or functionally, new approaches for visualization, alignment and benchmarking are desirable.
To this end we have constructed a database of 'flexible' reference alignments. This database is available online http://www.ibi.vu.nl/databases/stralivari and contains all structures and alignments used in this study. For each alignment in our database, variation is visualized using alignment matrices and consistency plots as shown in Figure 4B. In addition the database contains the ensemble 'master-slave' alignments as shown in Figure 4A. This pinpoints alignment regions that are affected by variability.
Structural variation, as presented here by alternative crystal structures and molecular dynamics simulations, has a profound effect on structural alignment. The sensitivity to structural variation is a bottleneck for the effective application of structural alignment approaches. This undermines the current basis of all sequence alignment methodologies and is an underestimated problem for the homology assessment used in structural and functional classification. The GroES 'mobile loop' example demonstrates how functionally essential protein regions can coincide with variable structural alignment segments. Our database should therefore be useful for alignment verification and delineation of functionally important protein regions.
The HOMSTRAD database of homologous structure alignments  was used as a source to select homologous proteins with known structure. HOMSTRAD families containing two homologous proteins (A and B in Figure 2) were selected. The corresponding structures were retrieved from the PDB  and taken as reference. For each reference structure, after equilibration, molecular dynamics simulations were performed for up to 10 ns, and snapshot structures were stored every 1 ns. Standard solvated conditions in the Gromos 43a1 forcefield  and the Gromacs simulation package  were used (details summarized in additional file 2). In addition, for each reference structure, we retrieved all alternative PDB structures with 100% sequence identity. In the subsequent analysis only the residues corresponding to the HOMSTRAD sequences were used.
From each pair of reference HOMSTRAD structures, we constructed reference alignments with the widely used structural alignment tool DALI . We also used DALI to create pairwise alignments between each reference structure and the alternatives of the other reference structure (PDB and snapshots). The sequence differences between the alignments were calculated using Sum-of-Pairs (SP) scoring implemented in the BAliBASE alignment comparison tool . SP scores range from 0 (non-identical) to 1 (identical sequence alignments). Finally we calculated the root mean square deviation (RMSD) between the Cα atoms of the alternative structures and their reference structure using the McLachlan algorithm  as implemented in the program ProFit version 2.5.3 .
Our final database consists of 496 proteins (divided over 341 families) for which 3309 snapshot structures could be made and 565 proteins (divided over 395 families) for which we found in total 2998 alternative crystal structures with redundant sequences. A full list of all aligned structures and relevant details is provided in additional file 3.
We like to thank Sander W. Timmer and Anneke van der Reijden for development of the data-analysis scripts and Bernd W. Brandt for the set of redundant protein structures. Financial support was provided by the Netherlands Bioinformatics Centre, BioRange Bioinformatics research programmes SP 3.2.2 and SP 2.3.1.
- Holm L, Park J: DaliLite workbench for protein structure comparison. Bioinformatics 2000, 16(6):566–567. 10.1093/bioinformatics/16.6.566View ArticlePubMedGoogle Scholar
- Taylor WR, Orengo CA: Protein structure alignment. J Mol Biol 1989, 208(1):1–22. 10.1016/0022-2836(89)90084-3View ArticlePubMedGoogle Scholar
- Gerstein M, Levitt M: Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins. Protein Sci 1998, 7(2):445–456.PubMed CentralView ArticlePubMedGoogle Scholar
- Lupyan D, Leo-Macias A, Ortiz AR: A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 2005, 21(15):3255–3263. 10.1093/bioinformatics/bti527View ArticlePubMedGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 1998, 11(9):739–747. 10.1093/protein/11.9.739View ArticlePubMedGoogle Scholar
- Sali A, Blundell TL: Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J Mol Biol 1990, 212(2):403–428. 10.1016/0022-2836(90)90134-8View ArticlePubMedGoogle Scholar
- Kolodny R, Koehl P, Levitt M: Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 2005, 346(4):1173–1188. 10.1016/j.jmb.2004.12.032PubMed CentralView ArticlePubMedGoogle Scholar
- Mayr G, Domingues FS, Lackner P: Comparative analysis of protein structure alignments. BMC Struct Biol 2007, 7: 50. 10.1186/1472-6807-7-50PubMed CentralView ArticlePubMedGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH – a hierarchic classification of protein domain structures. Structure 1997, 5(8):1093–1108. 10.1016/S0969-2126(97)00260-8View ArticlePubMedGoogle Scholar
- Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G: A database of protein structure families with common folding motifs. Protein Sci 1992, 1(12):1691–1698.PubMed CentralView ArticlePubMedGoogle Scholar
- Bhaduri A, Pugalenthi G, Sowdhamini R: PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics 2004, 5: 35. 10.1186/1471-2105-5-35PubMed CentralView ArticlePubMedGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247(4):536–540.PubMedGoogle Scholar
- Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 1999, 15(1):87–88. 10.1093/bioinformatics/15.1.87View ArticlePubMedGoogle Scholar
- Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci 1998, 7(11):2469–2471.PubMed CentralView ArticlePubMedGoogle Scholar
- van Walle I, Lasters I, Wyns L: SABmark – a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics 2005, 21(7):1267–1268. 10.1093/bioinformatics/bth493View ArticlePubMedGoogle Scholar
- Wong KM, Suchard MA, Huelsenbeck JP: Alignment uncertainty and genomic analysis. Science 2008, 319(5862):473–476. 10.1126/science.1151532View ArticlePubMedGoogle Scholar
- Rokas A: Genomics. Lining up to avoid bias. Science 2008, 319(5862):416–417. 10.1126/science.1153156View ArticlePubMedGoogle Scholar
- Godzik A: The structural alignment between two proteins: is there a unique answer? Protein Sci 1996, 5(7):1325–1338.PubMed CentralView ArticlePubMedGoogle Scholar
- Ye Y, Godzik A: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 2003, 19(Suppl 2):ii246–255.View ArticlePubMedGoogle Scholar
- Shatsky M, Nussinov R, Wolfson HJ: A method for simultaneous alignment of multiple protein structures. Proteins 2004, 56(1):143–156. 10.1002/prot.10628View ArticlePubMedGoogle Scholar
- Menke M, Berger B, Cowen L: Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol 2008, 4(1):e10. 10.1371/journal.pcbi.0040010PubMed CentralView ArticlePubMedGoogle Scholar
- Mosca R, Schneider TR: RAPIDO: a web server for the alignment of protein structures in the presence of conformational changes. Nucleic Acids Res 2008, (36 Web Server):W42–46. 10.1093/nar/gkn197Google Scholar
- Maiorov VN, Crippen GM: Size-independent comparison of protein three-dimensional structures. Proteins 1995, 22(3):273–283. 10.1002/prot.340220308View ArticlePubMedGoogle Scholar
- Xu Z, Horwich AL, Sigler PB: The crystal structure of the asymmetric GroEL-GroES-(ADP)7 chaperonin complex. Nature 1997, 388(6644):741–750. 10.1038/41944View ArticlePubMedGoogle Scholar
- Maizel JV Jr, Lenk RP: Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc Natl Acad Sci USA 1981, 78(12):7665–7669. 10.1073/pnas.78.12.7665PubMed CentralView ArticlePubMedGoogle Scholar
- Zuker M: Suboptimal sequence alignment in molecular biology. Alignment with error analysis. J Mol Biol 1991, 221(2):403–420. 10.1016/0022-2836(91)80062-YView ArticlePubMedGoogle Scholar
- Notredame C: Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 2007, 3(8):e123. 10.1371/journal.pcbi.0030123PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28(1):235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Hunenberger PH, Mark AE, van Gunsteren WF: Fluctuation and cross-correlation analysis of protein motions observed in nanosecond molecular dynamics simulations. J Mol Biol 1995, 252(4):492–503. 10.1006/jmbi.1995.0514View ArticlePubMedGoogle Scholar
- Lindahl E, Hess B, Spoel D: GROMACS 3.0: a package for molecular simulation and trajectory analysis. J Mol Mod 2001, 7(8):306–317.Google Scholar
- McLachlan A: Rapid comparison of protein structures. Acta Cryst 1982, A38: 871–873.View ArticleGoogle Scholar
- Clamp M, Cuff J, Searle SM, Barton GJ: The Jalview Java alignment editor. Bioinformatics 2004, 20(3):426–427. 10.1093/bioinformatics/btg430View ArticlePubMedGoogle Scholar
- Kaplan W, Littlejohn TG: Swiss-PDB Viewer (Deep View). Brief Bioinform 2001, 2(2):195–197. 10.1093/bib/2.2.195View ArticlePubMedGoogle Scholar
- Persistence of Vision (TM) Raytracer[http://www.povray.org]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.