Prediction of the burial status of transmembrane residues of helical membrane proteins
© Park et al; licensee BioMed Central Ltd. 2007
Received: 11 May 2007
Accepted: 20 August 2007
Published: 20 August 2007
Helical membrane proteins (HMPs) play a crucial role in diverse cellular processes, yet it still remains extremely difficult to determine their structures by experimental techniques. Given this situation, it is highly desirable to develop sequence-based computational methods for predicting structural characteristics of HMPs.
We have developed TMX (TransMembrane eXposure), a novel method for predicting the burial status (i.e. buried in the protein structure vs. exposed to the membrane) of transmembrane (TM) residues of HMPs. TMX derives positional scores of TM residues based on their profiles and conservation indices. Then, a support vector classifier is used for predicting their burial status. Its prediction accuracy is 78.71% on a benchmark data set, representing considerable improvements over 68.67% and 71.06% of previously proposed methods. Importantly, unlike the previous methods, TMX automatically yields confidence scores for the predictions made. In addition, a feature selection incorporated in TMX reveals interesting insights into the structural organization of HMPs.
A novel computational method, TMX, has been developed for predicting the burial status of TM residues of HMPs. Its prediction accuracy is much higher than that of previously proposed methods. It will be useful in elucidating structural characteristics of HMPs as an inexpensive, auxiliary tool. A web server for TMX is established at http://service.bioinformatik.uni-saarland.de/tmx and freely available to academic users, along with the data set used.
Helical membrane proteins (HMPs) play a crucial role in diverse cellular processes, including energy generation, signal transduction, the transport of solutes across the membrane, and the maintenance of ionic and proton concentrations. Several studies have suggested that HMPs account for 20 – 30% of the open reading frames of sequenced genomes [1, 2]. In spite of their biological importance and genomic abundance, less than 1% of the proteins with known structure are HMPs , and this situation is not expected to improve dramatically in the near future. Hence, it is desirable to develop sequence-based computational methods for predicting structural characteristics of HMPs. In the realm of soluble proteins, two particular structural characteristics have been the main target of computational prediction methods: secondary structure [4–10] and solvent accessibility [11–26] (often in a form of binary burial status; buried inside vs. exposed to the environment). For HMPs, the prediction of secondary structures does not carry as significant a momentum as for soluble proteins because transmembrane (TM) segments, which can be relatively reliably identified from the sequence by several techniques [27–37], are known to usually adopt helical conformations to satisfy the hydrogen bonding capacity of the backbone polar atoms. On the other hand, the problem of predicting the burial status (i.e. buried in the protein core vs. exposed to the membrane) of TM residues of HMPs has remained nearly untouched until now, in contrast to the situation for soluble proteins, which have been extensively studied (see the references listed above) following the pioneering work of Rost and Sander . This is quite "remarkable" given that it is much more difficult to determine the structures of HMPs than those of soluble proteins by experimental techniques. The virtue of the ability to predict the burial status of TM residues of HMPs was already appreciated by several studies around the early 90s [38–40]. The burial status prediction should be useful in several tasks. One simple example would be to help design mutational experiments aimed at identifying catalytically important TM residues [41–45] by providing a list of TM residues highly likely to be buried in the protein core because catalytically important TM residues are usually found buried in the protein core, not being exposed to the membrane. Another simple example would be to help design mutational experiments aimed at identifying TM residues important for protein-protein interactions in the membrane by providing a list of TM residues highly likely to be exposed to the membrane.
In 2004, Beuming and Weinstein pioneered the first sequence-based computational method for predicting the burial status of TM residues of HMPs (denoted hereafter as the BW method), which was based on sequence conservation patterns and a newly derived knowledge-based propensity scale of the 20 amino acids to be exposed to the membrane . For a rather small benchmark set, the BW method achieved an impressive prediction accuracy of 80%. Recently, Adamian and Liang reported the development of a similar method , but it predicts the face of a TM helix exposed to the membrane, not the burial status of individual TM residues. Hildebrand and his coworkers described a computational method for predicting whether a given residue is located at a helix-helix interface in the membrane . Yet, this is a distinct prediction problem from the one the current study deals with: a residue located outside of a helix-helix interface can still be buried. Quite recently, Yuan and his coworkers developed a method for predicting the relative solvent-accessible surface area (rSASA) of TM residues based on support vector regression (SVR, denoted hereafter as the YU method) . Even though the YU method does not explicitly predict the burial status of TM residues, it is possible to do so using the predicted rSASA values. To our best knowledge, the BW and YU methods are the only ones currently available for predicting the burial status of TM residues of HMPs.
We have developed TMX (TM eXposure), a novel sequence-based computational method for predicting the burial status of TM residues of HMPs. Its accuracy is 78.71% over a much larger data set of 3138 TM residues, representing a considerable improvement over 68.67% of the BW method when evaluated on the same data set. This prediction accuracy is also higher than 71.06% of the YU method. Importantly, unlike the BW and YU methods, TMX automatically yields confidence scores for the predictions made, a highly desirable component for any computational prediction method, which allows the user to selectively utilize prediction results depending on confidence scores in real application situations. In addition, a feature selection incorporated in TMX reveals interesting insights into the structural organization of HMPs.
Results and Discussion
Analysis of the BW method
TMX is novel in several aspects compared to the BW and YU methods and can be described without any reference to these previous methods. However, we prefer to describe the logic behind its development in reference to the BW method in order to contrast it with the BW method and highlight its novelties.
Prediction accuracies of different methods examined in the study
Prediction accuracy [%]1
The BW method
The YU method
There is an issue to be clarified before we move on. We implemented the BW method, and its performance was evaluated on the same data set as for TMX. This was necessary since it is often difficult to directly compare performance values of different prediction methods reported in different studies because of the variety of data sets used and the discrepancy in state definitions. A serious difficulty arose in implementing the BW method, namely setting thresholds manually. As mentioned above, upon computing the positional score of a target residue, the BW method compares it with a threshold that has been manually set. If the positional score is greater than the threshold, it is predicted to be buried. Otherwise, it is predicted to be exposed. In a leave-one-out (jack-knife) testing scheme, thresholds need to be manually set separately for each of 43 protein chains in the benchmark data set (see Methods). Admittedly, it is impossible for us to exactly reproduce this step in the way it was performed in the original publication for the BW method . In addition, we feel that it might be subjective to set thresholds manually. Then, is there any mathematical formalism that allows thresholds to be set in such a manner that (1) we exactly mimic the manual setting of thresholds as was done in the BW method and (2) yet, thresholds are set objectively and reproducibly? Our answer is a linear support vector classifier (lSVC, i.e. an SVC with a linear kernel). Since the hyperplane – f(x) = β0 + βTx = 0, where βT is the transpose of a column vector β – obtained by an lSVC in a one-dimensional space represents a scalar value of -β/β0 , setting a threshold via an lSVC is an exact computational analogue to setting it manually, yet in an objective, reproducible way. It is to be noted that the introduction of an lSVC to the prediction scheme transforms it to a two-step scheme because an lSVC also needs training and, as a result, the jack-knife scheme should be applied to both steps. We want to stress that the sole purpose of using an lSVC here is to mimic the manual assignment of thresholds as exactly as possible yet in an objective, reproducible fashion. Thus, we intentionally did not seek SVCs with a non-linear kernel or other sophisticated classifiers at this stage (but see below).
Improved use of conservation indices
Another point well worth considering in Eq. 3 is how conservation indices are incorporated. The average identities of sequences retrieved from sequence databases for different query sequences can be appreciably varying. Thus, without normalization, one may assign overall high conservation indices to one protein chain while assigning overall low conservation indices to another. Normalization of conservation indices effectively solves this bias problem, just as in microarray data processing. In the BW method, conservation indices are not normalized. We found that normalizing conservation indices by subtracting the mean followed by division by the standard deviation separately for each protein chain leads to a significant improvement in the prediction accuracy, raising it from 71.13% to 73.84%.
A second, minor aspect to be considered is how conservation indices are actually computed in the first place. The BW method computes conservation indices as follows.C(i) = 0.5 × V(i) + 0.5 × IC(i)(4)
In Eq. 4, V(i) is the volume of the polytope for sequence position i derived from a multiple sequence alignment (MSA), estimating the probability for the presence of a set of different amino acids from a set of pairwise distribution probabilities, and IC(i) is the information content of sequence position i . Eq. 4 relies on many assumptions that are yet to be validated. The first are ad hoc measures taken to enforce the Euclidean space to the distances between aligned sequences for computing V(i) . The second is the assumption used in computing IC(i) that the 20 naturally occurring amino acids are equally likely to occur in the TM region. The third is that even though it seems reasonable to assign equal weights to both terms in Eq. 4, it is not clear whether that choice is optimal.
In Eq. 5, the index j runs over the 20 naturally occurring amino acids, C(i) is the conservation index for sequence position i, f i (j) is the frequency of amino acid j in sequence position i, and f(j) is the overall frequency of amino acid j in the alignment. As expected, the use of Eq. 5 instead of Eq. 4 improved the prediction accuracy from 73.84% to 74.51%. It is to be noted that conservation indices obtained by Eqs. 4 and 5 were from the same MSAs.
Extending the window size for the input vector
Prediction accuracies obtained by linear regression with different window sizes
The logic behind increasing window sizes for better predictions is that one can better account for long-range effects with enlarged windows. However, the shortcoming of enlarged windows is that the signal-to-noise ratio deteriorates as the window size is increased, as demonstrated in Table 2. For example, compare the prediction accuracies for window sizes of 15 and 21. The tradeoff between long-range effects and signal-to-noise ratios would suggest a window size of 15 instead of 21. Is there any way of circumventing this unpleasant tradeoff? Feature selection might be an answer. A simple illustration will make this point clearer. An input vector for a window size of 21 consists of 441 elements (21 elements for each of the 21 residues). It is intuitively clear that not all 441 elements will contribute equally to the prediction. Many of them might simply be noise. Thus, it might be possible to use enlarged windows for a better consideration of long-range effects and still maintain a high signal-to-noise ratio by filtering out noisy elements.
Of many techniques available for feature selection, we chose the Fisher's index for the following reasons. First, the Fisher's index is conceptually attractive, having a clear meaning easy to understand . Put simply, the Fisher's index represents the ability of a given element to maximize the distance between the centroids of the two given classes and simultaneously minimize the overlap between them. Second, unlike techniques involving linear combinations of feature vectors, the Fisher's index is highly interpretable. This is a big advantage given the high dimensions of our feature spaces. Most importantly, one can gain interesting biological insights into the structural organization of HMPs from the Fisher's index (see below). Third, the Fisher's index can be computed cheaply. Fourth, the Fisher's index is well suited to continuous features (as opposed to discrete ones).
The 441 elements of a window of size 21 were ranked according to their Fisher's indices, and increasing fractions of them (in steps of 0.05) were input to the prediction (see first and second columns of Table 3). The best prediction accuracy, 77.21%, was obtained when using the top 20% elements only. This accuracy is higher than 75.97% obtained by an "unintelligently" increased window of size 15 in the above section. Which elements rank top? As shown in Table 4, the top-ranking elements are mostly conservation indices, in line with previous findings that conservation properties of TM residues correlate strongly with their degree of exposure to the membrane [52, 54–56]. Also, Table 4 shows that the frequencies of occurrence of L, I, V and F at the target residue are highly indicative of its burial status. In this regard, it is interesting to note that our previous study showed that these amino acids possess the highest propensities to preferentially interact with the membrane . The frequency of occurrence of G at the target residue is also strongly correlating with its burial status, ranking at the 9th place, which is consistent with earlier findings that glycine residues play a pivotal role in mediating helix-helix interactions in the membrane [57–60]. Table 4 also shows that the frequencies of occurrence of I, G and L at the 4th residue N terminal to the target one also strongly correlate with the burial status of the target residue, which makes sense considering the canonical helix conformation as mentioned above.
Prediction accuracies obtained by increasing fractions of the 441 elements of a window of size 21
Fraction used in the prediction
SVR C – 101
SVR C – 1
SVR C – 0.1
SVR C – 0.01
Top 20 elements of the 441 ones of a window of size 21 according to the Fisher's index
If there still remain untapped non-linear combinations of profile elements or profile elements and conservation indices that correlate with the burial status of TM residues, the use of non-linear methods might be profitable . Of the vast array of available non-linear regression techniques, we made use of SVR with a radial kernel because a nice interface with R is already available (see Methods) and it has performed respectably in studies of water-soluble proteins [23, 25]. Our preliminary analysis showed that SVR with a radial kernel tends to rival SVR with other kernels. Once a kernel type is chosen, another important parameter to be fine-tuned is the regularization constant C, i.e. how much weight one should put on minimizing the costs of violating a decision boundary relative to maximizing the closest distance of a data point to the boundary [50, 62]. The general expectation from the theory is that as the regularization constant gets higher, a heavier weight is put on minimizing the violation costs and, as a result, a more wiggly decision boundary is obtained with a possibly larger generalization error. The default C value is 1, and we tried 4 different C values, 10, 1, 0.1 and 0.01. As above, the 441 elements of a window of size 21 were ranked according to their Fisher's indices, and increasing fractions of them (in steps of 0.05) were input to the prediction via SVR with a radial kernel. Table 3 shows the results (columns 3 – 6). It is immediately clear that, in almost all cases, linear regression outperforms SVR, indicating that the generalization errors of SVR are larger than those of linear regression, presumably due to its over-flexibility in fitting a separating boundary to a given data set. Thus, SVR does not seem advantageous over linear regression on this data set. Admittedly, we can not rule out the possibility that highly fine-tuned SVR can outperform linear regression. Given limited computational resources and considerable amounts of computation required for a leave-one-out validation of a two-step prediction method (~40 CPU hours on a 2.4 GHz processor), it is beyond our capability to exhaustively scan all possible combinations of SVR parameters. However, it is our experience that SVR with all parameters set to default values generally performs nearly optimally. Thus, we are quite certain that, at least for the current purpose of predicting the burial status of TM residues of HMPs, linear regression is at least as effective as SVR. Supporting this conclusion, previous studies on water-soluble proteins demonstrated that sophisticated linear methods can rival non-linear ones in performance [14, 21, 26].
Upon computing a positional score for the target residue, a classifier is invoked to classify it as either buried in the protein core or exposed to the membrane. Although any machine-learning technique can be used as a classifier, we have only considered lSVCs so far. The original reason for choosing lSVCs was, as mentioned earlier, to implement the BW method as exactly as possible, yet in an objective, reproducible manner, so that the BW method can be justly compared with ours. However, we may choose other classifiers for our prediction method. Although there are tons of available classifiers, we primarily focused on SVCs for practical reasons as mentioned above. Preliminary analysis showed that SVCs with a linear or radial kernel tend to outperform others. Thus, SVCs with a linear or radial kernel were pursued further in combination with 5 different regularization constants, 1, 0.5, 0.1, 0.05 and 0.01, chosen on the basis of the results shown in Table 3. In addition to searching for a better classifier, it might also be helpful in boosting prediction accuracies to refine input vectors themselves. So far, the input vectors for a classifier have been one-dimensional, i.e. consisting of a positional score for a given target residue. The input vectors for a classifier can be straightforwardly refined exactly in the same way as the input vectors for computing positional scores were refined in Table 3.
Table 5 shows the best prediction accuracies for each combination of an SVC kernel and a regularization constant. An SVC with a linear kernel outperforms that with a radial one, and a regularization constant of 0.5 is optimal among those investigated. The best prediction accuracy, 78.71%, was obtained by an SVC with a linear kernel that considers the top 16 positional scores out of the 21 positional scores (i.e. the positional scores of the target residue and its 10 neighbors on the N terminus and its 10 neighbors on the C terminus) derived from considering the top 10% of the 441 elements of a window of size 21 (Table 3). An SVC with a radial kernel also achieved this prediction accuracy at a regularization constant of 0.5. Due to its sustained performance over the examined range of regularization constants, however, an SVC with a linear kernel is preferred. The method that gives rise to the best performance becomes the method of choice and is named "TMX (TM eXposure)." Detailed jack-knife test results of TMX are available as additional information [see Additional file 1] and also on its web server.
Best prediction accuracies for each combination of an SVC kernel and a regularization constant C
Regularization constant C
Comparison with the YU method
The YU method computes the positional score of a target residue via SVR using position-specific scoring matrices (PSSMs) obtained by PSI-BLAST . In studies of water-soluble proteins, it has been very popular to use PSSMs as input vectors in order to boost the accuracy of predicting solvent accessibility [8–10, 17, 19–24]. The popularity of PSSMs has partially stemmed from the fact that one does not have to explicitly generate an MSA for obtaining PSSMs. As with the BW method, we implemented the YU method for a transparent performance comparison using the R interface [64, 65] of the LIBSVM library . In implementing the YU method, we set all the parameters of SVR as optimized by Yuan et al. and did not intentionally seek any further optimizations.
Prediction accuracies obtained by SVR with different input vectors
PSSM (the YU method)
C – 11
C – 2
C – 5
C – 7
Analysis of the TMX predictions
In addition to prediction accuracies, there are other interesting aspects worthy of analyzing. For example, are there any amino acids for which it is easier to predict the burial status? Is it easier to correctly predict buried residues as being buried than exposed residues as being exposed?
Prediction accuracies for each amino acid
Number of occurrence
Prediction accuracy [%]
Fraction of exposed residues in the data set [%]
Specificity and sensitivity of TMX
Confidence scores for the predictions made
In this study, we have focused on HMPs because they are much more abundant and play a more important role in diverse cellular processes than beta-barrel membrane proteins (BMPs). But, obviously, one can apply the same strategy behind the development of TMX to BMPs. Since the burial status of TM residues of BMPs tends to alternate due to prevalent beta strand elements , it would be interesting to see whether one can achieve higher prediction accuracies than 78.71% for HMPs. Moreover, it would be worth investigating which features correlate strongly with the burial status of TM residues of BMPs via a feature selection of the sort used here for Table 4.
We have presented TMX, a novel sequence-based computational method for predicting the burial status of TM residues of HMPs. It significantly outperforms previously proposed methods. In addition, feature selection incorporated in TMX revealed interesting insights into the structural organization of HMPs. Importantly, unlike the previous methods, TMX automatically generates confidence scores for the predictions made, and it was shown that predictions with a high confidence score tend to be more accurate than those with a low one. Thus, in a real application setting, the user of TMX can selectively utilize prediction results on the basis of their confidence scores. The developmental course of TMX clearly highlighted the importance of conservation indices and feature selection in boosting prediction accuracies. In this regard, it was rather surprising to find that the most effective method for predicting the solvent accessibility of water-soluble proteins considers neither conservation indices nor feature selection. It would be interesting to investigate whether these two "new" findings can be favorably transferred to one of the classical bioinformatics problems of predicting the solvent accessibility of water-soluble proteins.
Generation of the benchmark data set
As is always the case in machine-learning studies, constructing a well-curated data set was the starting point of the current study. Special care was taken in selecting protein chains, delineating their TM boundaries and computing the rSASA values of TM residues.
43 Protein chains used in the study
KcsA potassium channel
Glycerol facilitator channel
Rotor of V-type Na+-ATPase
Photosynthetic reaction center
L, M, H
Fumarate reductase (E. coli)
Fumarate reductase (W. succinogenes)
Formate dehydrogenase N
Nitrate reductase A
Mitochondrial ADP/ATP carrier
Cytochrome C oxidase (aa3 type)
B, D, G, I, J, L, M
Cytochrome C oxidase (ba3 type)
Cytochrome bc1 complex
D, E, G, J
AcrB multidrug efflux transporter
GlpG rhomboid-family intramembrane protease
Putative metal-chelate-type ABC transporter
Exposed residues were defined as those with an rSASA greater than 0.00, as in a previous study . This threshold rSASA value is justified for HMPs given the large probe radius chosen in this study. As discussed before , this threshold is also free from artifacts arising from normalization and subsequent binary classification. Nevertheless, it was argued that the threshold for a binary classification should be set such that the data set is equally partitioned into the two classes to avoid statistical artifacts. An rSASA of 0.00 induces a slightly skewed partitioning of 44.14% of buried residues and 55.86% of exposed ones. Equipartitioning of the data set was achieved with an rSASA of 0.04. Additional analysis showed that the conclusions drawn in this study remain fully valid for this new threshold (data not shown).
Computation of profiles and conservation indices
In general, the use of a profile (the frequencies of the 20 amino acids for a sequence position) improves the performance of sequence-based prediction methods. For extracting profiles, one needs to generate MSAs. As with any sequence-based prediction methods, the careful choice of sequences in MSAs is very important for the performance of the prediction method. MSAs generated using different criteria would yield results of differing quality. Thus, it would be desirable to generate "optimal" MSAs for different query sequences. Unfortunately, it is currently impossible to do so in an objective, consistent manner without any prior knowledge about the three-dimensional structures of the query sequences. Thus, a reasonable approach that is also objective, consistent and easily reproducible by others, was taken for generating MSAs, even though it might produce suboptimal MSAs for some query sequences. Its detail has been described elsewhere [52, 53]. Briefly, for a given query sequence, a maximum of 1000 homologous sequences were retrieved from the non-redundant database using BLAST . Initial MSAs were built using ClustalW . Then, sequence fragments were deleted from the MSA. Sequences that are less than 25% identical to the query sequence were also removed. The remaining sequences were realigned using ClustalW to yield a final MSA, which was used to obtain profiles. When deriving profiles from an MSA, amino acid frequencies were weighted using a modified method of Henikoff and Henikoff as implemented in PSI-BLAST [63, 75]. Actual computations were performed using the program AL2CO . Conservation indices (Eq. 5) were also derived using AL2CO.
Support vector machines
The support vector classifier (SVC)/support vector regression (SVR) [50, 62] implementation in R [64–66] was used for the current work. The parameters for SVC/SVR were set to default values unless otherwise noted. The YU method was implemented using the SVR implementation in R as described in .
A leave-one-out ('jack-knife') test was carried out to measure the performance of different prediction methods examined in this study. For two-step prediction methods, the jack-knife scheme was applied to both steps as it should be. Prediction accuracies mean the fractions of the benchmark data set for which the burial status was correctly predicted.
List of abbreviations used
helical membrane protein
support vector classifier
SVC with a linear kernel
support vector regression
multiple sequence alignment
This work was supported by Grant I/80469 of the Volkswagen Foundation. We thank Thijs Beuming and Harel Weinstein for sharing their program for computing conservation indices. We thank crystallographers of HMPs because the current work was impossible without their work.
- Wallin E, von Heijne G: Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 1998, 7: 1029-1038.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu Y, Engelman DM, Gerstein M: Genomic analysis of membrane protein families: abundance and conserved motifs. Genome Biol. 2002, 3: research0054.1–research0054.12-PubMed CentralGoogle Scholar
- Chen CP, Rost B: State-of-the-art in membrane protein prediction. Appl Bioinformatics. 2002, 1: 21-35.PubMedGoogle Scholar
- Lim VI: Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol. 1974, 88: 873-894. 10.1016/0022-2836(74)90405-7.View ArticlePubMedGoogle Scholar
- Chou PY, Fasman GD: Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry. 1974, 13: 211-222. 10.1021/bi00699a001.View ArticlePubMedGoogle Scholar
- Garnier J, Osguthorpe DJ, Robson B: Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol. 1978, 120: 97-120. 10.1016/0022-2836(78)90297-8.View ArticlePubMedGoogle Scholar
- Rost B, Sander C: Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993, 232: 584-599. 10.1006/jmbi.1993.1413.View ArticlePubMedGoogle Scholar
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999, 292: 195-202. 10.1006/jmbi.1999.3091.View ArticlePubMedGoogle Scholar
- Cuff JA, Barton GJ: Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins. 2000, 40: 502-511. 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-Q.View ArticlePubMedGoogle Scholar
- Guo J, Chen H, Sun Z, Lin Y: A novel method for protein secondary structure prediction using dual-layer SVM and profiles. Proteins. 2004, 54: 738-743. 10.1002/prot.10634.View ArticlePubMedGoogle Scholar
- Lee BK, Richards FM: The interpretation of protein structures: Estimation of static accessibility. J Mol Biol. 1971, 55: 379-400. 10.1016/0022-2836(71)90324-X.View ArticlePubMedGoogle Scholar
- Rost B, Sander C: Conservation and prediction of solvent accessibility in protein families. Proteins. 1994, 20: 216-226. 10.1002/prot.340200303.View ArticlePubMedGoogle Scholar
- Thompson MJ, Goldstein RA: Predicting solvent accessibility: higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins. 1996, 25: 38-47. 10.1002/(SICI)1097-0134(199605)25:1<38::AID-PROT4>3.3.CO;2-H.View ArticlePubMedGoogle Scholar
- Li X, Pan XM: New method for accurate prediction of solvent accessibility from protein sequence. Proteins. 2001, 42: 1-5. 10.1002/1097-0134(20010101)42:1<1::AID-PROT10>3.0.CO;2-N.View ArticlePubMedGoogle Scholar
- Pascarella S, De Persio R, Bossa F, Argos P: Easy method to predict solvent accessibility from multiple protein sequence alignments. Proteins. 1998, 32: 190-199. 10.1002/(SICI)1097-0134(19980801)32:2<190::AID-PROT5>3.0.CO;2-P.View ArticlePubMedGoogle Scholar
- Yuan Z, Burrage K, Mattick JS: Prediction of protein solvent accessibility using support vector machines. Proteins. 2002, 48: 566-570. 10.1002/prot.10176.View ArticlePubMedGoogle Scholar
- Pollastri G, Baldi P, Fariselli P, Casadio R: Prediction of coordination number and relative solvent accessibility in proteins. Proteins. 2002, 47: 142-153. 10.1002/prot.10069.View ArticlePubMedGoogle Scholar
- Ahmad S, Gromiha MM, Sarai A: Real value prediction of solvent accessibility from amino acid sequence. Proteins. 2003, 50: 629-635. 10.1002/prot.10328.View ArticlePubMedGoogle Scholar
- Kim H, Park H: Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor. Proteins. 2004, 54: 557-562. 10.1002/prot.10602.View ArticlePubMedGoogle Scholar
- Adamczak R, Porollo A, Meller J: Accurate prediction of solvent accessibility using neural networks-based regression. Proteins. 2004, 56: 753-767. 10.1002/prot.20176.View ArticlePubMedGoogle Scholar
- Chen H, Zhou HX: Prediction of solvent accessibility and sites of deleterious mutations from protein sequence. Nucleic Acids Res. 2005, 33: 3193-3199. 10.1093/nar/gki633.PubMed CentralView ArticlePubMedGoogle Scholar
- Sim J, Kim SY, Lee J: Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method. Bioinformatics. 2005, 21: 2844-2849. 10.1093/bioinformatics/bti423.View ArticlePubMedGoogle Scholar
- Nguyen MN, Rajapakse JC: Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins. 2006, 63: 542-550. 10.1002/prot.20883.View ArticlePubMedGoogle Scholar
- Nguyen MN, Rajapakse JC: Prediction of protein relative solvent accessibility with a two-stage SVM approach. Proteins. 2005, 59: 30-37. 10.1002/prot.20404.View ArticlePubMedGoogle Scholar
- Yuan Z, Huang B: Prediction of protein accessible surface areas by support vector regression. Proteins. 2004, 57: 558-564. 10.1002/prot.20234.View ArticlePubMedGoogle Scholar
- Wang JY, Lee HM, Ahmad S: Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression. Proteins. 2005, 61: 481-491. 10.1002/prot.20620.View ArticlePubMedGoogle Scholar
- von Heijne G: Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J Mol Biol. 1992, 225: 487-494. 10.1016/0022-2836(92)90934-C.View ArticlePubMedGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry. 1994, 33: 3038-3049. 10.1021/bi00176a037.View ArticlePubMedGoogle Scholar
- Rost B, Casadio R, Fariselli P, Sander C: Transmembrane helices predicted at 95% accuracy. Protein Sci. 1995, 4: 521-533.PubMed CentralView ArticlePubMedGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305: 567-580. 10.1006/jmbi.2000.4315.View ArticlePubMedGoogle Scholar
- Tusnady GE, Simon I: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J Mol Biol. 1998, 283: 489-506. 10.1006/jmbi.1998.2107.View ArticlePubMedGoogle Scholar
- Liakopoulos TD, Pasquier C, Hamodrakas SJ: A novel tool for the prediction of transmembrane protein topology based on a statistical analysis of the SwissProt database: the OrienTM algorithm. Protein Eng. 2001, 14: 387-390. 10.1093/protein/14.6.387.View ArticlePubMedGoogle Scholar
- Yuan Z, Mattick JS, Teasdale RD: SVMtm: support vector machines to predict transmembrane segments. J Comput Chem. 2004, 25: 632-636. 10.1002/jcc.10411.View ArticlePubMedGoogle Scholar
- Cao B, Porollo A, Adamczak R, Jarrell M, Meller J: Enhanced recognition of protein transmembrane domains with prediction-based structural profiles. Bioinformatics. 2006, 22: 303-309. 10.1093/bioinformatics/bti784.View ArticlePubMedGoogle Scholar
- Cserzo M, Wallin E, Simon I, von Heijne G, Elofsson A: Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 1997, 10: 673-676. 10.1093/protein/10.6.673.View ArticlePubMedGoogle Scholar
- Granseth E, Viklund H, Elofsson A: ZPRED: predicting the distance to the membrane center for residues in alpha-helical membrane proteins. Bioinformatics. 2006, 22: e191-e196. 10.1093/bioinformatics/btl206.View ArticlePubMedGoogle Scholar
- Viklund H, Elofsson A: Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci. 2004, 13: 1908-1917. 10.1110/ps.04625404.PubMed CentralView ArticlePubMedGoogle Scholar
- Donnelly D, Johnson MS, Blundell TL, Saunders J: An analysis of the periodicity of conserved residues in sequence alignments of G-protein coupled receptors. Implications for the three-dimensional structure. FEBS Lett. 1989, 251: 109-116. 10.1016/0014-5793(89)81438-3.View ArticlePubMedGoogle Scholar
- Donnelly D, Overington JP, Ruffle SV, Nugent JH, Blundell TL: Modeling alpha-helical transmembrane domains: the calculation and use of substitution tables for lipid-facing residues. Protein Sci. 1993, 2: 55-70.PubMed CentralPubMedGoogle Scholar
- Taylor WR, Jones DT, Green NM: A method for alpha-helical integral membrane protein fold prediction. Proteins. 1994, 18: 281-294. 10.1002/prot.340180309.View ArticlePubMedGoogle Scholar
- Abramson J, Smirnova I, Kasho V, Verner G, Kaback HR, Iwata S: Structure and mechanism of the lactose permease of Escherichia coli. Science. 2003, 301: 610-615. 10.1126/science.1088196.View ArticlePubMedGoogle Scholar
- Huang Y, Lemieux MJ, Song J, Auer M, Wang DN: Structure and mechanism of the glycerol-3-phosphate transporter from Escherichia coli. Science. 2003, 301: 616-620. 10.1126/science.1087619.View ArticlePubMedGoogle Scholar
- Pebay-Peyroula E, Rummel G, Rosenbusch JP, Landau EM: X-ray structure of bacteriorhodopsin at 2.5 angstroms from microcrystals grown in lipidic cubic phases. Science. 1997, 277: 1676-1681. 10.1126/science.277.5332.1676.View ArticlePubMedGoogle Scholar
- Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H, Fox BA, Le Trong I, Teller DC, Okada T, Stenkamp RE, Yamamoto M, Miyano M: Crystal structure of rhodopsin: A G protein-coupled receptor. Science. 2000, 289: 739-745. 10.1126/science.289.5480.739.View ArticlePubMedGoogle Scholar
- Sui H, Han BG, Lee JK, Walian P, Jap BK: Structural basis of water-specific transport through the AQP1 water channel. Nature. 2001, 414: 872-878. 10.1038/414872a.View ArticlePubMedGoogle Scholar
- Beuming T, Weinstein H: A knowledge-based scale for the analysis and prediction of buried and exposed faces of transmembrane domain proteins. Bioinformatics. 2004, 20: 1822-1835. 10.1093/bioinformatics/bth143.View ArticlePubMedGoogle Scholar
- Adamian L, Liang J: Prediction of transmembrane helix orientation in polytopic membrane proteins. BMC Struct Biol. 2006, 6: 13-10.1186/1472-6807-6-13.PubMed CentralView ArticlePubMedGoogle Scholar
- Hildebrand PW, Lorenzen S, Goede A, Preissner R: Analysis and prediction of helix-helix interactions in membrane channels and transporters. Proteins. 2006, 64: 253-262. 10.1002/prot.20959.View ArticlePubMedGoogle Scholar
- Yuan Z, Zhang F, Davis MJ, Boden M, Teasdale RD: Predicting the solvent accessibility of transmembrane residues from protein sequence. J Proteome Res. 2006, 5: 1063-1070. 10.1021/pr050397b.View ArticlePubMedGoogle Scholar
- Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning. 2001, New York, SpringerView ArticleGoogle Scholar
- Shi L, Simpson MM, Ballesteros JA, Javitch JA: The first transmembrane segment of the dopamine D2 receptor: accessibility in the binding-site crevice and position in the transmembrane bundle. Biochemistry. 2001, 40: 12339-12348. 10.1021/bi011204a.View ArticlePubMedGoogle Scholar
- Park Y, Helms V: How strongly do sequence conservation patterns and empirical scales correlate with exposure patterns of transmembrane helices of membrane proteins?. Biopolymers. 2006, 83: 389-399. 10.1002/bip.20569.View ArticlePubMedGoogle Scholar
- Park Y, Helms V: On the derivation of propensity scales for predicting exposed transmembrane residues of helical membrane proteins. Bioinformatics. 2007, 23: 701-708. 10.1093/bioinformatics/btl653.View ArticlePubMedGoogle Scholar
- Yeates TO, Komiya H, Rees DC, Allen JP, Feher G: Structure of the reaction center from Rhodobacter sphaeroides R-26: membrane-protein interactions. Proc Natl Acad Sci USA. 1987, 84: 6438-6442. 10.1073/pnas.84.18.6438.PubMed CentralView ArticlePubMedGoogle Scholar
- Baldwin JM, Schertler GF, Unger VM: An alpha-carbon template for the transmembrane helices in the rhodopsin family of G-protein-coupled receptors. J Mol Biol. 1997, 272: 144-164. 10.1006/jmbi.1997.1240.View ArticlePubMedGoogle Scholar
- Stevens TJ, Arkin IT: Substitution rates in alpha-helical transmembrane proteins. Protein Sci. 2001, 10: 2507-2517. 10.1110/ps.ps.10501.PubMed CentralView ArticlePubMedGoogle Scholar
- Lemmon MA, Flanagan JM, Treutlein HR, Zhang J, Engelman DM: Sequence specificity in the dimerization of transmembrane alpha-helices. Biochemistry. 1992, 31: 12719-12725. 10.1021/bi00166a002.View ArticlePubMedGoogle Scholar
- Javadpour MM, Eilers M, Groesbeek M, Smith SO: Helix packing in polytopic membrane proteins: role of glycine in transmembrane helix association. Biophys J. 1999, 77: 1609-1618.PubMed CentralView ArticlePubMedGoogle Scholar
- Russ WP, Engelman DM: The GxxxG motif: a framework for transmembrane helix-helix association. J Mol Biol. 2000, 296: 911-919. 10.1006/jmbi.1999.3489.View ArticlePubMedGoogle Scholar
- Senes A, Gerstein M, Engelman DM: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J Mol Biol. 2000, 296: 921-936. 10.1006/jmbi.1999.3488.View ArticlePubMedGoogle Scholar
- Rögnvaldsson T, You L: Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics. 2004, 20: 1702-1709. 10.1093/bioinformatics/bth144.View ArticlePubMedGoogle Scholar
- Vapnik V: The Nature of Statistical Learning Theory. 2000, New York, SpringerView ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 2004Google Scholar
- Karatzoglou A, Meyer D, Hornik K: Support Vector Machines in R. Journal of Statistical Software. 2006, 15:Google Scholar
- Chang CC, Lin CJ: LIBSVM : a library for support vector machines. 2001, [http://www.csie.ntu.edu.tw/~cjlin/libsvm]Google Scholar
- Wimley WC: The versatile beta-barrel membrane proteins. Curr Opin Struct Biol. 2003, 13: 404-411. 10.1016/S0959-440X(03)00099-X.View ArticlePubMedGoogle Scholar
- White SH: Membrane Proteins of Known Structure. [http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html]
- Michel H: MEMBRANE PROTEINS OF KNOWN STRUCTURE. [http://www.mpibp-frankfurt.mpg.de/michel/public/memprotstruct.html]
- Lomize MA, Lomize AL, Pogozheva ID, Mosberg HI: OPM: orientations of proteins in membranes database. Bioinformatics. 2006, 22: 623-625. 10.1093/bioinformatics/btk023.View ArticlePubMedGoogle Scholar
- Edelsbrunner H, Facello M, Fu P, Liang J: Measuring proteins and voids in proteins. "Proc 28th Ann Hawaii Internat Conf System Sciences, 1995". 1995, V: Biotechnology Computing: 256-264.Google Scholar
- Edelsbrunner H: The union of balls and its dual shape. Discrete Comput Geom. 1995, 13: 415-440. 10.1007/BF02574053.View ArticleGoogle Scholar
- Adamian L, Nanda V, Degrado WF, Liang J: Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins. Proteins. 2005, 59: 496-509. 10.1002/prot.20456.View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralView ArticlePubMedGoogle Scholar
- Henikoff S, Henikoff JG: Position-based sequence weights. J Mol Biol. 1994, 243: 574-578. 10.1016/0022-2836(94)90032-9.View ArticlePubMedGoogle Scholar
- Pei J, Grishin NV: AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics. 2001, 17: 700-712. 10.1093/bioinformatics/17.8.700.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.