- Research article
- Open Access
MPRAP: An accessibility predictor for a-helical transmem-brane proteins that performs well inside and outside the membrane
© Illerård et al; licensee BioMed Central Ltd. 2010
- Received: 25 October 2009
- Accepted: 18 June 2010
- Published: 18 June 2010
In water-soluble proteins it is energetically favorable to bury hydrophobic residues and to expose polar and charged residues. In contrast to water soluble proteins, transmembrane proteins face three distinct environments; a hydrophobic lipid environment inside the membrane, a hydrophilic water environment outside the membrane and an interface region rich in phospholipid head-groups. Therefore, it is energetically favorable for transmembrane proteins to expose different types of residues in the different regions.
Investigations of a set of structurally determined transmembrane proteins showed that the composition of solvent exposed residues differs significantly inside and outside the membrane. In contrast, residues buried within the interior of a protein show a much smaller difference. However, in all regions exposed residues are less conserved than buried residues. Further, we found that current state-of-the-art predictors for surface area are optimized for one of the regions and perform badly in the other regions. To circumvent this limitation we developed a new predictor, MPRAP, that performs well in all regions. In addition, MPRAP performs better on complete membrane proteins than a combination of specialized predictors and acceptably on water-soluble proteins. A web-server of MPRAP is available at http://mprap.cbr.su.se/
By including complete a-helical transmembrane proteins in the training MPRAP is able to predict surface accessibility accurately both inside and outside the membrane. This predictor can aid in the prediction of 3D-structure, and in the identification of erroneous protein structures.
- Support Vector Machine
- Substitution Rate
- Interface Residue
- Relative Solvent Accessibility
- Membrane Protein Structure
During the folding of a protein some residues become exposed to the environment while others become buried in the protein interior. For water-soluble proteins the dominant driving force during folding is the hydrophobic effect, which minimizes unfavorable interactions between hydrophobic residues and (hydrophilic) water [1, 2]. Therefore, water-soluble proteins consist of a hydrophobic interior and hydrophilic exterior. In contrast, the tendency to bury polar residues from the (hydrophobic) solvent environment within the membrane is much weaker. In membrane proteins residues face three distinct environments; a hydrophobic lipid environment inside the membrane, a hydrophilic water environment outside the membrane and an interface region in between. Studies of the bacteriorhodopsin structure suggested that membrane proteins are "inside-out", i.e. that they consist of a hydrophilic interior and a hydrophobic exterior [3–7]. However, later studies indicated that the "inside-out" rule is not generally applicable [6–10]. Since membrane proteins are exposed to distinctly different environments, the composition of exposed residues will differ significantly in different regions. Also, the main driving forces of folding and stabilization are different from water-soluble proteins and less well understood. However, irrespectively of environment, buried residues are in general under stronger evolutionary constraints than exposed sites .
For membrane proteins most bioinformatical efforts have been focused on the development of methods to predict the topology, i.e. the location of residues relative to the membrane. A topology prediction might be a useful first step towards structure prediction, while a predictor of solvent accessibility provides complementary information. Such a predictor might also be useful for predicting functional relevance of individual residues, since residues responsible for e.g. catalysis or substrate binding, are often buried in the protein interior , while residues involved in protein-protein-interactions occur on solvent exposed sites. For water-soluble proteins many methods for predicting the accessibility have been developed .
However, only a few attempts have been made to predict the accessibility of membrane proteins [9, 13–15]. To our knowledge all existing methods have been specialized to predict the exposure within the membrane. Therefore, these methods require an initial prediction step to determine the exact location of the transmembrane segment, which current topology predictors only can do with a limited accuracy . Here, we constructed a single accessibility predictor for entire membrane proteins. First, the amino acid distributions and evolutionary conservation in a set of α-helical transmembrane proteins of known structures were analyzed. Thereafter, we examined the ability of state-of-the-art predictors to identify exposed and buried residues. In particular, we analyzed the performance of the predictors in regions for which they had been optimized as well as of the regions where they had not. Subsequently, we developed a novel predictor, MPRAP, optimized to perform well in all regions. Finally, we investigated some additional potential uses of MPRAP.
Membrane protein surfaces adapt to the environment
Surface area predictors optimized for one of the environments performs badly in the other environments
Quite a few methods for predicting accessibility of soluble proteins have been developed in the past . Due to the low number of determined membrane protein structures most methods have been developed primarily for water-soluble proteins. However, there exist a few methods that have been developed for membrane proteins, including BW , LIPS , ASAP mem  and TMX . Here, we investigated the performance of the two most recent membrane predictors, ASAP mem and TMX, as well as three recent predictors for soluble proteins, ASAP glob , SABLE  and ACCPRO . The ability to accurately predict the accessibility state of residues in membrane proteins was examined. The predictors differ in their output; ACCPRO and TMX predict accessibility in a binary alphabet, i.e. exposed and buried, with an approximately equal fraction in both classes, while SABLE, ASAP glob , and ASAP mem predict the relative accessibility. Therefore, for comparisons the real value predictions were transformed to binary states. The specific cutoffs for transformations were optimized for each method independently, resulting in an approximately equal frequency of buried and exposed residues.
Benchmarking accessibility predictors
Z: > 22
Optimization of MPRAP
R4S + Zpred
AA + Zpred
AA + R4S
AA + R4S + Zpred
PSI scr + Zpred
PSI scr + R4S
PSI scr + R4S + Zcoord
PSI scr + R4S + Zpred
PSI scr + R4S + Zpred (Linear)
PSI scr + R4S + Zpred (Polynomial)
PSI scr + R4S + Zpred (Radial basis)
Different probe radii
PSI scr + R4S + Zpred (1.4 Å)
PSI scr + R4S + Zpred (2.0 Å)
PSI scr + R4S + Zpred (2.0 Å/1.4 Å)
As shown above, there are two major factors that differentiate between exposed and buried residues in a membrane protein, substitution rate (conservation) and amino acid preferences. Exposed residues are evolving faster than buried residues both inside and outside the membrane, see Figure 2. The preference for certain amino acids to be buried or exposed is, however, dependent on the location relative to the membrane; Polar residues are more likely to be exposed outside the membrane than within the membrane, see Figure 1. Therefore, an optimal predictor needs to be able to determine the location of a residue in relationship to the membrane. This can be done either by explicitly providing this information as an input to the predictor or by assuming that the predictor indirectly will learn this.
During the optimization of MPRAP a two state binary classification of exposed and buried residues was used. During the testing 5-fold cross validation was used. The performance of the SVM was optimized by a systematic search of model parameters and the best performance was used. First, it was found that a window size of 9 seemed to be optimal and was therefore used for all different inputs. Several alternative methods to identify membrane and non-membrane residues, including topology predictions by OCTOPUS , were tested and the best performance was obtained using predicted distance from membrane center by Zpred . It was also found that a slightly higher performance was obtained using a radial basis kernel than alternative kernels, see Table 2.
After these initial optimizations four different inputs to the SVM were examined, amino acid information (AA), predicted distance to the membrane center (Zpred), substitution rate (R4S) and PSSM information (PSI scr ), see Table 2. All these inputs contain some information that is useful for predicting accessibility by themselves. However, the useful information in AA and Zpred is much lower (MCC = 0.19 and 0.19) than for R4S and PSI scr (MCC = 0.37 and 0.39). The good prediction performance of R4S is due to the strong correlation between accessibility and substitution rates, see Figure 2B. Since the PSSMs in similarity to R4S also contain conservation information this is most likely also the reason why input consisting of only PSI scr perform so well. However, accessibility is also dependent on the topology and the polarity of the residue . Consequently, amino acid information (AA) in combination with R4S increased the prediction accuracy to MCC = 0.41, while using the PSSMs provides another slight increase to MCC = 0.43. The inclusion of a predicted distance from membrane center by Zpred  increased the performance only marginally. Interestingly, using the Z-coordinates from the structures provide slightly lower prediction accuracies than using the predicted Z-coordinates. This might very well be due to the observations that the most hydrophobic region not always correspond to the central membrane regions in the structures of membrane proteins .
Performance of MPRAP
Performance of predictors using absolute numbers
Table 1 shows the performance of the MPRAP predictions after a transformation to a binary alphabet. The accuracy in the non-membrane region and membrane is comparable, see Figure 3 and Table 1, while the performance in the water-lipid interface region is slightly worse, perhaps due to greater structural variability in this region . Anyhow, MPRAP outperforms all other predictors in the membrane core and in the water-lipid interface region, Table 1. In the non-membrane region MPRAP is outperformed by ACCPRO . Further, MPRAP outperforms the combined predictor in all but the non-membrane region. Finally, we investigated the performance of three predictors on a dataset of water-soluble proteins, Table 1. As expected, ACCPRO and SABLE , methods optimized for such proteins performed well (MCC= 0.63 and 0.56 respectively). However, MPRAP performs on parity with SABLE (MCC = 0.55).
The reason for the improved prediction in the membrane region is probably mainly due to that MPRAP is trained on a larger dataset than earlier methods. Increasing the dataset size from 40 to 80 proteins increased the performance of the predictions from MCC = 0.36 to MCC = 0.45. This also suggests that one reason why ACCPRO outperforms MPRAP in the non-membrane regions might be because it was trained on a considerably larger dataset. However, including a larger set of soluble proteins into the training set of MPRAP did not improve the performance significantly (data not shown).
In order to investigate the predictive performance on different proteins in the dataset the proteins were divided into subgroups by the number of transmembrane regions, fraction of TM-residues and their multimeric state. In all these subgroups the performance was similar (MCC = 0.42-0.46). Thus, no particular type of membrane proteins was identified where MPRAP performed significantly better or worse for.
Above, a van der Waals radii of 1.4Å, mimicking the size of a water molecule, was used to calculate the accessibility of membrane proteins. However, within the membrane, a more realistic choice for calculating the accessibility might be to use a larger probe (2.0 Å) to mimic a CH2 group. Therefore, at the end three different versions of MPRAP were developed, using probes of 1.4Å, 2.0Å or a combination of these. They all perform similar, see Table 3, and the probe size of 1.4Å is set as default.
The most important improvement over earlier predictors is that MPRAP is the first accessibility predictor that shows an acceptable prediction quality in all regions of membrane proteins. This is obtained without any pre-processing. We believe this is predominantly the result of careful selection of an appropriate training set consisting of entire membrane proteins.
MPRAP can identify some erroneous protein structures
Assessing the quality of protein structures
Some interaction surfaces can be identified
Identification of interface residues
One prominent feature of membrane proteins is that their surfaces face three distinct environments; a hydrophobic lipid environment inside the membrane, an interface environment and a hydrophilic water environment outside the membrane. Here, we have analyzed the properties of exposed and buried sites in a set of membrane proteins of known structures. As expected, we found that exposed sites are different inside and outside the membrane. In contrast, residues at buried sites are more similar but also on average more hydrophobic inside the membrane than outside. Further, in all regions exposed residues are less conserved than buried residues.
The problem of predicting accessibility of individual residues is a well-studied problem for non-membrane proteins but less so for membrane proteins. We found that all state-of-the-art predictors for surface area are optimized for one of the environments and therefore perform poorly in the non-optimized environments. To circumvent this problem we included complete membrane proteins in the training set and developed a new predictor, the Membrane Protein Surface Accessibility Predictor (MPRAP). The new predictor performs well both inside and outside the membrane. Further, MPRAP is better than the combination of two specialized predictors. This shows that MPRAP is capable of recognizing the fact that there are different preferences for exposed sites within and outside the membrane, i.e. it can adjust the predictions depending on the relative localization to the membrane. One reason why this is possible is the strong correlation between exposure and conservation.
The creation of the dataset started from 136 α-helical TM protein structures containing 601 polypeptide chains with TM segments from OPM  in April 2008. Poly-alanine chains, theoretical models and obsolete entries (as defined by PDB) were excluded. In addition, fragments (1D6G, 1ORS, 2AHY, 1R3J, 1S5H), very low-resolution structures (>3.9) A, 1IFK, 2BG9) and structures with secondary structure or membrane boundary problems (2QFI, 1ORQ, 2A01, 1YEW) were removed.
Uniprot sequences corresponding to the remaining sequences were used to search for homologs by running three rounds of PSI-BLAST  using a conservative E-value cutoff of 10-5 against uniref90  from November 2007. Chains with less than four identified homologs, mostly of short transmembrane proteins were removed in a second filtering step, to enable the use of substitution rate information. The structure of highest resolution from each OPM family was chosen as a representative. Blastclust  was used to further reduce the number of chains. Default parameters were used with a sequence identity cutoff set to 20%, no E-value cutoff and using a default length cutoff of 0.9. The obtained sequences were also checked afterwards by running blast for each protein on the new dataset and no pairs showed an identity over 20% independent of length. Further, all chains from the same OPM superfamily were put in the same cross-validation group during the SVM training (see below). The final selection of protein chains and their grouping into the five cross validation groups are provided as supplementary data [Additional file 1].
As in our previous studies [11, 25, 30], all proteins were oriented so that the predicted membrane center was located at the X-Y plane, thus the proteins could easily be studied as a function of the Z-coordinate. Here, the OPM method was used to find orientation . The Z-coordinates were used to classify all residues into three main groups: non-membrane (Z > 22Å), lipid-water interface (10 Å < Z < 22Å) and membrane core (Z < 10Å). In addition, membrane boundaries as defined by OPM were used. The classification into different groups was just used for evaluation purpose and not as input for predictions. The final dataset contained 21,624 residues, with 5,565 in the core, 7,114 in the lipid-water interface and 8,945 in the non-membrane region. The residues were grouped after physico-chemical similarity, using the biological hydrophobicity scale  into hydrophobic [A, F, I, L, M, V], weakly polar [G, Y, W, C, S, T] and strongly polar [D, E, K, R, H, N, P, Q]. In total 9,931 hydrophobic, 6,258 weakly polar and 5,435 strongly polar residues were found.
The amino acid substitution rates were estimated as described in , by using the PSI-BLAST derived multiple sequence alignments (mentioned above) as input to rate4site . The residue values from rate4site are normalized for each protein individually, by subtracting the average substitution score and dividing by the standard deviation.
Surface accessibility was calculated by Naccess 2.1.1  using probe sizes of 1.4 and 2.0 Å. The complete protein-structure (including prosthetic groups and other polypeptide chains) was used, but lipids and water were removed. The relative surface area (RSA) was obtained directly from Naccess, where the accessible surface area of a residue is normalized by an extended A-X-A tri-peptide conformation. All accessibility values were also converted into a binary state alphabet, with residues less than a certain cutoff as buried and all others as exposed. The cutoffs were optimized to give highest MCC during the evaluation independently for each method. This resulted in an approximately equal frequency for the two states. In addition a pre-compiled subset of 1,607 sequences from the entire PDB with maximum resolution, R-factor and mutual sequence identity of 1.6 A, 0.25 and 20%, respectively, was downloaded in January 2010 from the PISCES database server . Chains predicted to have at least one transmembrane region were removed. The accessibility of the remaining and assumed water-soluble proteins was calculated in the same way as the membrane dataset.
Development of MPRAP
The ability to predict the relative solvent accessibility was investigated using support vector machines implemented in the svmlight package . All experiments were performed using a 5-fold cross-validated training of support vector machines, with approximately equally many proteins in each subgroup. For each set of input variables three different kernels (linear, polynomial and radial basis) were tested. For all kernels a grid search was used to find the optimal parameters. For all kernels the trade-off between training error and margin (the -C parameter in svmlight) was varied between 0.25 and 50 in steps of 0.25. For the polynomial kernel three exponents were used, 2, 3, and 4, and for the radial basis function values of the parameter were tested between 0.0005 to 0.05. In most cases the radial basis kernel was found to be optimal.
A number of sequence-derived parameters were tested as inputs to the SVM, including amino acid frequencies from PSI-BLAST, substitution rates, and predicted distance from membrane center, see Table 2. Different sizes of a symmetric window were investigated and a window size of 9 was found to be optimal. It was also found that including the PSI-BLAST PSSM values directly was superior to amino acid frequencies or normalized PSSM values.
A number of ways of distinguishing TM and non-TM residues, including topology predictions, were tested. The best performance was obtained using predicted distance from membrane center by Zpred. In an attempt to increase the performance in the lipid-water interface region, residues were classified based on the predicted distance from membrane center into two (non-membrane, membrane) or three (non-membrane, lipid-water interface and deep membrane core) different groups. Thereafter, SVMs were trained separately for each group. However, this did not result in any improved performance (not shown). Another attempt to increase the performance in the non-membrane region was to add a varying number of water-soluble proteins to the training set. However, no attempts in this direction improved the performance significantly and therefore no soluble proteins were included in the training of MPRAP. Two different probe radii were used for accessibility calculations (see above). A probe radius of 1.4 Å might be ideal for water-soluble region (to mimic water) and a probe radius around 2 Å might be better for membrane region (to mimic a CH2-group). The performance of MPRAP trained for 1.4 Å, 2.0 Å, or a combination of 1.4 (outside the membrane) and 2.0 (inside the membrane) resulted in very similar performance.
For evaluation purposes the real value predictions were transformed to a binary classification in a similar way as the training values. Values below a cutoff were in this step classified as buried and others as exposed. This procedure resulted in an approximately equal amount of buried and exposed residues. For evaluation purposes buried residues that were predicted to be buried were assigned as true positive (TP), buried residues predicted to be exposed as false negatives (FN), exposed residues predicted to be exposed as true negatives (TN) and exposed residues predicted to be buried as false positives (FP). From these numbers the following measures were calculated:
Here, MCC is the Matthems Correlation Coefficient . Additionally, for real value predictions mean absolute error (MAE) and Pearson correlation coefficients (Cc) were calculated.
Benchmarking surface area predictors
Zpred, TMX and the ASAP-predictors were run directly from the web-servers, while ACCPRO and SABLE were run locally. The predictors have used slightly different probe sizes, different programs and methods for calculation of accessibility. Therefore, the exact definition of the predicted feature differs. Both ACCPRO and TMX predict accessibility in a binary state alphabet at approximately equal frequency, while MPRAP, SABLE, ASAP glob and ASAP mem predict real accessibility values. Therefore, to make as fair comparison as possible the real values were transformed to binary states. The cutoffs used for transformation were optimized separately for best performance, mostly resulting in approximately equal frequency of buried and exposed states. The cutoffs were set to 20% for ASAP, 15% for SABLE and 25% for MPRAP. Thereafter, the binary states derived from the predictors were compared to the binary states derived from the known NACCESS values (see above). For the combination of TMX and ACCPRO, Zpred was used to decide which predictor to trust for each residue. If a residue were predicted to be closer than a certain cutoff from the membrane center TMX was trusted, and if not ACCPRO was used. The cutoff was optimized to be 12.5 Å.
Assessing quality of protein structures
Agreement between predicted and structurally derived accessibility was tested on six PDB-structures downloaded from PDB: 1S7B, 2F2M, 1JSQ, 1PF4, 1Z2R, 1L7V and 2HYD. Four of the first five were marked as obsolete in PDB. The accessibility values were calculated for the full complexes using the same procedure as in the training dataset for MPRAP (see above). The values that were considered for evaluation are from the A subunit of the full (homo oligomeric) complex. The cross-validation group containing 2HYD where left out in the training set when assessing the quality of 1JSQ, 1PF4, 1Z2R, 1L7V and 2HYD.
Sites at polypeptide chain interfaces
Residues at interaction surfaces, I NACCESS , were identified as all residues that have higher accessibility than 25% in the single chain structure and lower than 25% in the full complex.
A residue was predicted to be in an interface if it is exposed in the single chain and MPRAP predicts it to have lower accessibility than a certain cutoff. The fraction of interface residues detected among all sites that are exposed in the single chain structure were evaluated for all sites that were exposed in the single chain.
Data analysis and visualization
The molecular illustrations were created with PyMol (DeLano, W.L. The PyMOL Molecular Graphics System (2002) DeLano Scientific, Palo Alto, CA, USA). The remaining figures were generated in R .
This work was supported by grants from the Swedish Natural Sciences Research Council, SSF (the Foundation for Strategic Research). The EU 6'Th Framework Program is gratefully acknowledged for support to the Embrace project, Contract No: LSHG-CT-2004-512092. We would like to thank Sara Light for useful suggestions regarding the manuscript.
- Honig B, Yang A: Free energy balance in protein folding. Adv Protein Chem 1995, 46: 27–58. full_textView ArticlePubMedGoogle Scholar
- Lins L, Brasseur R: The hydrophobic effect in protein folding. FASEB J 1995, 9(7):535–540.PubMedGoogle Scholar
- Engelman D, Zaccai G: Bacteriorhodopsin is an inside-out protein. Proc Natl Acad Sci USA 1980, 77(10):5894–5898. 10.1073/pnas.77.10.5894View ArticlePubMedPubMed CentralGoogle Scholar
- Rees D, DeAntonio L, Eisenberg D: Hydrophobic organization of membrane proteins. Science 1989, 245(4917):510–513. 10.1126/science.2667138View ArticlePubMedGoogle Scholar
- Stevens T, Arkin I: Are membrane proteins "inside-out" proteins? Proteins 1999, 36: 135–143. 10.1002/(SICI)1097-0134(19990701)36:1<135::AID-PROT11>3.0.CO;2-IView ArticlePubMedGoogle Scholar
- Rees D, Eisenberg D: Turning a reference inside-out: commentary on an article by Stevens and Arkin entitled: "Are membrane proteins 'inside-out' proteins?" (Proteins 1999;36:135–143). Proteins 2000, 38(2):121–122. 10.1002/(SICI)1097-0134(20000201)38:2<121::AID-PROT1>3.0.CO;2-MView ArticlePubMedGoogle Scholar
- Stevens T, Arkin I: Turning an opinion inside-out: Rees and Eisenberg's commentary (Proteins 2000;38:121–122) on "Are membrane proteins 'inside-out' proteins?" (Proteins 1999;36:135–143). Proteins 2000, 40(3):463–464. 10.1002/1097-0134(20000815)40:3<463::AID-PROT120>3.0.CO;2-DView ArticlePubMedGoogle Scholar
- Wallin E, Tsukihara T, Yoshikawa S, von Heijne G, Elofsson A: Architecture of helix bundle membrane proteins. An analysis of cytochrome c oxidase from bovine mitochondria. Protein Science 1997, 6: 808–815. 10.1002/pro.5560060407View ArticlePubMedPubMed CentralGoogle Scholar
- Adamian L, Liang J: Prediction of transmembrane helix orientation in polytopic membrane proteins. BMC Struct Biol 2006, 6: 13. 10.1186/1472-6807-6-13View ArticlePubMedPubMed CentralGoogle Scholar
- White S, Wimley W: Membrane protein folding and stability: physical principles. Annu Rev Biophys Biomol Struct 1999, 28: 319–365. 10.1146/annurev.biophys.28.1.319View ArticlePubMedGoogle Scholar
- Kauko A, Illergard K, Elofsson A: Coils in the membrane core are conserved and functionally important. J Mol Biol 2008, 380: 170–180. 10.1016/j.jmb.2008.04.052View ArticlePubMedGoogle Scholar
- Pollastri G, Baldi P, Fariselli P, Casadio R: Prediction of coordination number and relative solvent accessibility in proteins. Proteins 2002, 47(2):142–153. 10.1002/prot.10069View ArticlePubMedGoogle Scholar
- Beuming T, Weinstein H: A knowledge-based scale for the analysis and prediction of buried and exposed faces of transmembrane domain proteins. Bioinformatics 2004, 20(12):1822–1835. 10.1093/bioinformatics/bth143View ArticlePubMedGoogle Scholar
- Yuan Z, Zhang F, Davis M, Boden M, Teasdale R: Predicting the solvent accessibility of transmembrane residues from protein sequence. J Proteome Res 2006, 5(5):1063–1070. 10.1021/pr050397bView ArticlePubMedGoogle Scholar
- Park Y, Hayat S, Helms V: Prediction of the burial status of transmembrane residues of helical membrane proteins. BMC Bioinformatics 2007, 8: 302. 10.1186/1471-2105-8-302View ArticlePubMedPubMed CentralGoogle Scholar
- Elofsson A, von Heijne G: Membrane protein structure: prediction versus reality. Annu Rev Biochem 2007, 76: 125–140. 10.1146/annurev.biochem.76.052705.163539View ArticlePubMedGoogle Scholar
- Lomize M, Lomize A, Pogozheva I, Mosberg H: OPM: orientations of proteins in membranes database. Bioinformatics 2006, 22(5):623–625. 10.1093/bioinformatics/btk023View ArticlePubMedGoogle Scholar
- Fleishman S, Harrington S, Friesner R, Honig B, Ben-Tal N: An automatic method for predicting transmembrane protein structures using cryo-EM and evolutionary data. Biophys J 2004, 87(5):3448–3459. 10.1529/biophysj.104.046417View ArticlePubMedPubMed CentralGoogle Scholar
- Adamczak R, Porollo A, Meller J: Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 2004, 56(4):753–767. 10.1002/prot.20176View ArticlePubMedGoogle Scholar
- Granseth E, Viklund H, Elofsson A: ZPRED: predicting the distance to the membrane center for residues in alpha-helical membrane proteins. Bioinformatics 2006, 22(14):e191–6. 10.1093/bioinformatics/btl206View ArticlePubMedGoogle Scholar
- Viklund H, Elofsson A: OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar. Bioinformatics 2008, 24(15):1662–1668. 10.1093/bioinformatics/btn221View ArticlePubMedGoogle Scholar
- Illergard K, Kauko A, Elofsson A: Polar residues in the membrane core are conserved and directly involved in function. 2010, in press.Google Scholar
- Kauko A, Hedin L, Thebaud E, Cristobal S, Elofsson A, von Heijne G: Repositioning of transmembrane alpha-helices during membrane protein folding. J Mol Biol 2010, 397: 190–201. 10.1016/j.jmb.2010.01.042View ArticlePubMedGoogle Scholar
- Dor O, Zhou Y: Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins 2007, 66(4):838–845. 10.1002/prot.21298View ArticlePubMedGoogle Scholar
- Granseth E, von Heijne G, Elofsson A: A study of the membrane-water interface region of membrane proteins. J Mol Biol 2005, 346: 377–385. 10.1016/j.jmb.2004.11.036View ArticlePubMedGoogle Scholar
- Miller G: Scientific publishing. A scientist's nightmare: software problem leads to five retractions. Science 2006, 314(5807):1856–1857. 10.1126/science.314.5807.1856View ArticlePubMedGoogle Scholar
- Papaloukas C, Granseth E, Viklund H, Elofsson A: Estimating the length of transmembrane helices using Z-coordinate predictions. Protein Sci 2008, 17(2):271–278. 10.1110/ps.073036108View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389–3402. 10.1093/nar/25.17.3389View ArticlePubMedPubMed CentralGoogle Scholar
- Suzek B, Huang H, McGarvey P, Mazumder R, Wu C: UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 2007, 23(10):1282–1288. 10.1093/bioinformatics/btm098View ArticlePubMedGoogle Scholar
- Viklund H, Granseth E, Elofsson A: Structural classification and prediction of reentrant regions in alpha-helical transmembrane proteins: application to complete genomes. J Mol Biol 2006, 361(3):591–603. 10.1016/j.jmb.2006.06.037View ArticlePubMedGoogle Scholar
- Hessa T, Meindl-Beinker N, Bernsel A, Kim H, Sato Y, Lerch-Bader M, Nilsson I, White S, von Heijne G: Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature 2007, 450(7172):1026–1030. 10.1038/nature06387View ArticlePubMedGoogle Scholar
- Mayrose I, Graur D, Ben-Tal N, Pupko T: Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol 2004, 21(9):1781–1791. 10.1093/molbev/msh194View ArticlePubMedGoogle Scholar
- Hubbard SJTJ: NACCESS, Computer program. Department of Biochemistry and Molecular Biology 1993, 1: 1–2. [http://wolf.bi.umist.ac.uk/unix/naccess.html]Google Scholar
- Wang G, Dunbrack R Jr: PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res 2005, (33 Web Server):W94–8. 10.1093/nar/gki402Google Scholar
- Joachims T: Making large-Scale SVM Learning Practical. In: B Sch¨olkopf and C Burges and A Smola, (eds), Advances in kernel methods - support vector learning. MIT Press, Cambridge Massachusetts, London England; 1999.Google Scholar
- Matthews B: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975, 405(2):442–451.View ArticlePubMedGoogle Scholar
- R Development Core Team 2006:R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2005. [http://www.R-project.org]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.