Amino-Acid Substitutions In Membrane Proteins: Applications To Homology Recognition And Comparative Modelling
© Author(s); licensee BioMed Central Ltd. 2005
Published: 21 September 2005
Structural genomic initiatives are expected to make 3D experimental data in the near future available for most proteins in nature. Yet, the class of transmembrane (TM) proteins represents a major obstacle to this goal. This is because their physicochemical properties make them extremely difficult to crystallize for X-ray crystallography studies, and hardly tractable for NMR spectroscopy experiments. Consequently, computational methods for predicting 3D structures are highly valuable. Comparative/homology modelling remains the most effective approach to protein structure prediction . This is because it takes advantage from already available experimental structural templates to build 3D models for a related protein of interest. Therefore the tools that search for structural templates need to be highly accurate. Many algorithms have been developed to increase the sensitivity and specificity of homology recognition for globular proteins, many of which exploit evolutionary and structural information . However, they may not be generally applicable to TM proteins which have different structural features, amino acid composition and substitution rates. Thus, TM-specific algorithms are much needed. Our aim is to develop a sequence-structure homology recognition method that can use environmentspecific substitution tables and structure-dependent gap penalties  to (1) increase the accuracy of alignments involving TM protein sequences and structures and (2) improve the specificity and sensitivity of homology searches for TM proteins.
First, we have generated a highly-curated set of structure-based alignments of TM protein structures from which environment-specific amino acid substitutions are derived. We started by making an exhaustive search for all available high resolution TM structures. These are either alpha-helical and beta barrel folds. We did not include structures from the latter fold since these are very small in number and most likely have different rules of folding. A large number of TM proteins are complexes of multiple domains, with only certain regions spanning the membrane. In order to distinguish those from other globular regions, we developed a geometry optimisation method, 1 PDB2TMD, which searches for the most probable location of the membrane that spans any given TM protein structure.
Next, we used domain definitions and structural-evolutionary classification of protein structures from two databases; SCOP  and HOMSTRAD [5, 9]. We gathered 795 TM domains derived from 226 structures and grouped them into 65 homologous families. To remove redundant structures, we developed MKNEWFAM2. For a given family, this automatic procedure clusters sequences at 90% threshold and select representatives by favouring domains closest to native and having the highest resolution. The number of domains remaining after this filtering process is 129 making 26 multimember families and 39 single-membered families. Structure-based alignments were then generated for each family using COMPARER  and the residues structural environment in these alignments were annotated using JOY .
In order to account for the biased distribution of structures among the families, we enriched each family with sequence information. PSI-BLAST  was used to search UNIREF100 protein sequence database  for close homologues (e value cut-off 10-6), and only sequences predicted with TMHMM  to contain all expected TM helices were retained. The members of an average family share 20–30% PID.
The patterns of the amino acid substitutions derived from the constructed dataset has lead to a number of findings which are of particular relevance to TM helix packing: (1) lipid-tail-accessible TM residues tend to be more hydrophobic, less conserved and contain different residue types compared to buried residues; (2) charged residues are not always buried and, when accessible to membrane lipid tails, they often interact with phospholipid headgroups or with other residue types and few pair with another charge and (3) residues that are lipid-accessible or located at the interface between different TM chains are more variable than those buried in the cores of individual chains. This suggests that helix-helix interactions within the same chain and those at the interface between different chains may arise differently. Substitutions tables which take into account residue environments are being calculated and incorporated as scoring matrices for a homology search program we had previously developed, FUGUE . Benchmarking is carried out in order to examine the quality of alignments and the extent of extra sensitivity and specificity this may offer to homology searches for TM proteins.
- Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 2004,32(Database):D226–229. 10.1093/nar/gkh039PubMed CentralView ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, 33: D154–159. 10.1093/nar/gki070PubMed CentralView ArticlePubMedGoogle Scholar
- Chen CP, Rost B: Long membrane helices and short loops predicted less accurately. Protein Sci 2002, 11: 2766–2773. 10.1110/ps.0214602PubMed CentralView ArticlePubMedGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001, 305: 567–580. 10.1006/jmbi.2000.4315View ArticlePubMedGoogle Scholar
- Mizuguchi K, Deane CM, Blundell TL, Johnson MS, Overington JP: JOY: protein sequence-structure representation and analysis. Bioinformatics 1998, 14: 617–623. 10.1093/bioinformatics/14.7.617View ArticlePubMedGoogle Scholar
- Mizuguchi K: Fold recognition for drug discovery. Drug Discovery Today 2004, 3: 18–23. 10.1016/S1741-8372(04)02392-8View ArticleGoogle Scholar
- Sali A, Blundell TL: Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J Mol Biol 1990, 212: 403–428. 10.1016/0022-2836(90)90134-8View ArticlePubMedGoogle Scholar
- Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310: 243–257. 10.1006/jmbi.2001.4762View ArticlePubMedGoogle Scholar
- Stebbings LA, Mizuguchi K: HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database. Nucleic Acids Res 2004,32(Database):D203–207. 10.1093/nar/gkh027PubMed CentralView ArticlePubMedGoogle Scholar
- Williams MG, Shirai H, Shi J, Nagendra HG, Mueller J, Mizuguchi K, Miguel RN, Lovell SC, Innis CA, Deane CM, Chen L, Campillo N, Burke DF, Blundell TL, de Bakker PI: Sequence-structure homology recognition by iterative alignment refinement and comparative modeling. Proteins 2001, (Suppl 5):92–97. 10.1002/prot.1169Google Scholar