Lipid exposure prediction enhances the inference of rotational angles of transmembrane helices
- Jhih-Siang Lai†1,
- Cheng-Wei Cheng†1,
- Allan Lo2Email author,
- Ting-Yi Sung1Email author and
- Wen-Lian Hsu1
© Lai et al.; licensee BioMed Central Ltd. 2013
Received: 24 May 2013
Accepted: 1 October 2013
Published: 11 October 2013
Since membrane protein structures are challenging to crystallize, computational approaches are essential for elucidating the sequence-to-structure relationships. Structural modeling of membrane proteins requires a multidimensional approach, and one critical geometric parameter is the rotational angle of transmembrane helices. Rotational angles of transmembrane helices are characterized by their folded structures and could be inferred by the hydrophobic moment; however, the folding mechanism of membrane proteins is not yet fully understood. The rotational angle of a transmembrane helix is related to the exposed surface of a transmembrane helix, since lipid exposure gives the degree of accessibility of each residue in lipid environment. To the best of our knowledge, there have been few advances in investigating whether an environment descriptor of lipid exposure could infer a geometric parameter of rotational angle.
Here, we present an analysis of the relationship between rotational angles and lipid exposure and a support-vector-machine method, called TMexpo, for predicting both structural features from sequences. First, we observed from the development set of 89 protein chains that the lipid exposure, i.e., the relative accessible surface area (rASA) of residues in the lipid environment, generated from high-resolution protein structures could infer the rotational angles with a mean absolute angular error (MAAE) of 46.32˚. More importantly, the predicted rASA from TMexpo achieved an MAAE of 51.05˚, which is better than 71.47˚ obtained by the best of the compared hydrophobicity scales. Lastly, TMexpo outperformed the compared methods in rASA prediction on the independent test set of 21 protein chains and achieved an overall Matthew’s correlation coefficient, accuracy, sensitivity, specificity, and precision of 0.51, 75.26%, 81.30%, 69.15%, and 72.73%, respectively. TMexpo is publicly available at http://bio-cluster.iis.sinica.edu.tw/TMexpo.
TMexpo can better predict rASA and rotational angles than the compared methods. When rotational angles can be accurately predicted, free modeling of transmembrane protein structures in turn may benefit from a reduced complexity in ensembles with a significantly less number of packing arrangements. Furthermore, sequence-based prediction of both rotational angle and lipid exposure can provide essential information when high-resolution structures are unavailable and contribute to experimental design to elucidate transmembrane protein functions.
Integral membrane proteins participate in diverse cellular functions such as signal transductions, bioenergetics, ion transport, cell adhesion, and cell-cell recognition. It has also been estimated that about 20-30% of a typical genome encode for proteins with a transmembrane (TM) domain [1, 2]. Despite their biological importance and abundance, the mechanism by which TM proteins fold into native structures remains poorly understood due to a limited number of solved structures, accounting for less than 1% of all deposited structures in the Protein Data Bank (PDB) . Therefore, computational methods play an important role in deciphering the sequence-to-structure relationships and advancing our knowledge in this particular class of proteins.
Though recently solved structures of several amino-acid transporters, e.g., eukaryotic CLC Transporter (coded as 3ORG  in PDB) and potassium ion transporter (coded as 3PJZ  in PDB) revealed the existence of short helices in the reentrant region , the canonical topologies of TM proteins can be viewed as pairs of interacting transmembrane helices (TMHs), connecting loops and extramembraneous domains. In particular, the interaction between TMHs is an important determinant of folding and stability by the proposed two-stage model [7, 8]. Such an interaction is mediated by structural contacts at the helical interfaces with the protein itself, the ligands, as well as the lipid environment. From the perspective of structural modeling, the rotational angle of a TMH is a strong determinant of its interacting faces with the rest of the protein structure and the lipids. At the stage of conformation space sampling, we could filter out decoys that severely deviate from the predicted rotational angles. To elucidate rotational angles, Eisenberg et al. [9, 10] showed that hydrophobic scales can be used to estimate the hydrophobic moment direction to approximate the lipid-facing direction and proposed equations to calculate the rotational angles of TMHs based on such property. Later, several hydrophobicity scales or propensities have been proposed [11-15] to predict exposed residues or faces. To the best of our knowledge, there have been few advances to use lipid exposure, specifically the relative accessible surface area (rASA) in the lipid environment, to predict rotational angles. Henceforth, we use rASA for convenience to represent rASA in the lipid environment since in this paper we focus on the residues in such environment.
To determine the rotational angle of each TMH in the tertiary structure of a TM protein requires the information of lipid-facing direction, which has been defined differently in the literature. Pilpel et al.  described the lipid-facing direction as the vector opposite to the bisector of the acute angle formed by the two lines from the geometric center of the target TMH pointing to the geometric centers of the two nearest TMHs in the whole molecule. Stevens and Arkin  defined lipid-facing direction of a TMH as the vector connecting the geometric centers of the target helix and the whole molecule. The molecule could be a single chain or the complete protein. Dastmalchi et al.  accepted both the above definitions for lipid-facing direction.
Lipid exposure of each TMH has been shown useful in distinguishing between surfaces and interior interfaces, identifying potential functional residues buried in the protein core and exposed residues for protein-protein interactions [14, 17], and therefore facilitates the prediction of helix-packing conformations [17, 18]. In this work, we propose to use predicted rASA to estimate lipid-facing direction and then determine rotational angles of TMHs.
In order to train machine learning models by observed rASA from solved structures, we calculated the accessible surface area (ASA) by rolling a spherical probe along the van der Waal’s (VDW) surface of a protein molecule . The observed rASA for each residue is defined as dividing ASA by its reference value in an extended Gly-X-Gly tripeptide conformation. Several methods have been proposed to predict ASA instead of rASA in the lipid environment. For example, ASAP  uses evolutionary profiles to predict solvent accessibility by the support vector regression (SVR) and reports a Pearson correlation coefficient (PCC) of 0.62 for ASA prediction. The MPRAP  uses the support vector machine (SVM) to predict ASA of complete TM proteins by evolutionary profiles and residue distance from membrane center . A number of methods have been proposed to predict the burial or exposed status of each TM residue, where the status is determined by rASA whether below a predefined threshold. Several methods predict the status of TM residues directly from sequences without predicting their ASA; most of them rely on sequence conservation or knowledge-based propensities, including kPROT , ProperTM , TMLIP , MO  and TMX . RHYTHM  predicts the burial status of TM residues by matrix-based helix-helix contact prediction and sequence conservation, but this method requires prior knowledge of TMHs such as membrane coils or transporter/channels.
In this paper, we present TMexpo, a method to predict rotational angles of TMHs. For each TM residue, TMexpo first predicts rASA by SVR and predicts the burial or exposed status by SVM; both models use evolutionary profiles, sequence conservation, helix insertion energy and biochemical properties as features. Next, TMexpo determines rotational angles of TMHs based on the predicted rASA. In rotational angle prediction, TMexpo outperformed predictors using hydrophobicity scales and propensities by at least 19.2˚ in terms of mean absolute angular error (MAAE) on the independent test set of 21 protein chains. Notably, the prediction results showed that rotational angles of TMHs could be better inferred by predicted rASA than by existing scales. We expect the rotational angle prediction could benefit the structure prediction, especially for free modeling of transmembrane protein structures due to its difficulty and the necessities of reducing the number of packing arrangements.
Results and discussion
Observed rASA can better infer the lipid-facing direction than hydrophobicity scales and lipid-facing propensities
The packing mechanism in which TM protein assembles in the lipids is not fully understood. One significant advance in this area is seen in the mechanism of the Sec translocon which demonstrates how TM proteins enter the membrane [25-27]. A commonly accepted model of TM protein folding is the two-stage model . Later, White and colleagues extended the two-stage model by including a four-step thermodynamic cycle of folding energy description . Several scales and propensities were developed to understand the lipid-facing direction of TMHs based on sequence analysis or knowledge-based information; however, using these scales to predict the rotational angle of TM proteins often results in low prediction accuracy. Therefore, we investigated whether an environment descriptor such as lipid exposure could better infer a geometric parameter such as the rotational angles of TMHs.
The MAAE of rotational angles determined by various approaches
MAAE in development set (554 TMHs)
MAAE in independent test set (188 TMHs)
NACCESS (observed rASA)
TMexpo (predicted rASA)
The success of determining the rotational angle via the observed rASA calculated by NACCESS may be due to the following two reasons. First, the description of observed rASA is derived from known protein structures, but the descriptions of hydrophobicity and lipid-facing propensities are derived from the sequence. Therefore, hydrophobicity and lipid-facing propensities alone are insufficient for accurate inference of helical packing, thereby rendering worse rotational angle estimation. Second, researchers had a simplified view of membrane proteins being “inside-out” proteins, which have interior polar core and exterior apolar surface [30, 31]. However, this paradigm was challenged since biased distribution of hydrophobic residues could not be detected in every membrane protein . As more solved structures become available, statistical analyses on these structures also support the above finding [33-36]. The canonical view of the “inside-out” property of membrane proteins based on hydrophobicity is challenged. On the contrary, observed rASA is a structural environment descriptor of helical packing and better infer the rotational angle.
Relative accessible surface area predicted by TMexpo can also better infer the lipid-facing direction than hydrophobicity scales and lipid-facing propensities
To evaluate the capability of our proposed method for TM proteins with unknown structures, we used rASA predicted by TMexpo to determine the rotational angles of TMHs. Then we evaluated how predicted rASA could infer the rotational angle in terms of MAAE by the following two experiments. First, we used the development set of 89 chains to develop the TMexpo prediction model and tested on the independent test set of 21 chains. Second, we performed leave-one-out cross validation (LOOCV) on the development set. The results of both experiments are shown in Table 1. Both rASA prediction results of the development set and the independent test set are provided in the Additional file 1: Dataset S1. Particularly, detailed prediction results of 188 TMHs in the independent test set are reported in the Additional file 2: Table S1. Notably, 44 of 188 TMHs had the angular error less than 15˚ and their rotational angles were predicted precisely by TMexpo.
Observed from the first experiment, TMexpo-predicted rASA is shown to be comparable to the observed rASA derived from NACESS for inferring the rational angles; the predicted rASA achieved an MAAE of 48.31˚, slightly worse than the MAAE of 45.55˚ achieved by the observed rASA. Nevertheless, TMexpo-predicted rASA achieved much better MAAE than the other predictors using different hydrophobicity scales and propensities, including Eisenberg et al.’s consensus hydrophobicity scale (ES) , kPROT , ProperTM , TMLIP , and MO  by at least 19.2˚. The second experiment reported consistent results with the first experiment. Specifically, performing LOOCV on the development set resulted in the MAAE of 51.05˚, slightly worse than that inferred by the observed rASA, but better than the compared predictors by at least 20.42˚. Our results show that without known structures, TMexpo can effectively infer the rotational angle within a close margin to that inferred by the observed rASA and improve the prediction compared to hydrophobicity scales or lipid-facing propensities.
Comparison of relative accessible surface area prediction methods
Comparison of different methods for classifying exposed/buried residues on the independent test set without interface TM residues
Next, we compared TMexpo’s performance of predicting rASA with existing methods, including TMX , RHYTHM  and MPRAP . One distinction among these methods is that RHYTHM requires prior knowledge of protein types as membrane-coil or channel for prediction. For comparison, we retrieved the prediction results for TMX, RHYTHM and MPRAP from their web servers by using their default parameters. Table 2 shows that TMexpo outperforms the compared methods across most of the measures except a slightly lower specificity compared to MPRAP by 1.16%. The specificity of TMexpo is lower than its sensitivity by 12.15%, and most of the predictors except MPRAP have the same trend as TMexpo’s results. This observation implies that the detection of buried residues may be more difficult than that of exposed residues in the TM domains. To further gain insights into this issue, we extracted buried residues in our dataset of 110 chains from the helix-packing database TMPad , and found that over 77% residues have at least one interhelical contact. This suggests that prediction of buried residues could very likely be improved by detecting contacts of interhelical interactions; however, TMexpo and most of the compared predictors retrieve features by local information of subsequences only. Interestingly, interhelical contacts may be conserved in sequences and discovered from evolutionary information such as PSSM profiles . We consider that evolutionary information is an effective feature for capturing interhelical interactions that contributes to rASA/burial status prediction; however, interhelical interaction prediction is still a challenging problem.
Comparison of different methods for classifying exposed/buried residues on 392 interface TM residues of the independent test set by rASA derived from both subunit structure and complete structure (in parentheses)
Comparison of different methods for classifying exposed/buried residues on 3,553 entire TM residues of the independent test set by rASA derived from both subunit structure and complete structure (in parentheses)
Rotational angle can help in determining helical packing in transmembrane proteins
Harrington and Ben-Tal  characterized five structural features of interhelical interactions, namely, aromatic interactions, hydrogen bonds, salt bridges, and two interactions from packing motifs, that are useful for helical packing. They proposed an algorithm to pack the TMHs of TM proteins, as follows: First, the algorithm ordered the TMHs by the sequence from the N-terminus, and then iteratively grouped sequential TMHs by a scoring function based on the five types of interactions. They demonstrated helical packing on 15 diverse proteins, and the average RMSD of Cα in the native structure of the 15 reconstructed TM proteins ranged from 0.51 Å to 1.35 Å. In this subsection, we reexamined these proteins to study the rotational angle of TMHs and its relationship to helical packing. Since the protein 1AFO discussed in their work contains only one TMH, we excluded this protein in our analysis.
Rotational angle prediction on the protein chains from Harrington and Ben-Tal’s work
TM helix sequence
TM helix sequence
On the other hand, since rotational angle prediction is strongly correlated with the periodicity of a helix, predicted rotational angles may not work well for packing of helices that deviate from regular periodicities of rASA, such as those severely kinked, disrupted, highly tilted, or associated with a reentrant loop. Six out of all 73 TMHs were poorly predicted with angular errors over 100.29˚. We observed these six TMHs being classified as kinked or containing partial non-helical structure in the TM domain, and therefore the moment-based prediction performed poorly.
Rotational angles prediction based on both predicted topology and predicted relative accessible surface area
To illustrate the capability of the proposed method for rotational angle prediction based on predicted topology information, we submitted sequences of independent test set to three web servers including SVMSignal , TOPCONS  and MemBrain . As a pre-processing step, we removed predicted N-terminal signal peptide sequences by SVMSignal. For all predicted TMHs, a correctly predicted TMH is defined as a one-to-one overlap with the observed TMH of PDBTM, and the minimum residue number of overlaps between predicted TMHs and observed TMHs of PDBTM is eight. Within 188 TMHs of independent test set, the recall and precision of TOPCONS is 93.09% (175/188) and 100% (188/188), respectively. The recall and precision of MemBrain is 97.87% (184/188) and 95.34% (184/193), respectively. For fair comparison in rotational angles, we only discuss 175 TMHs that have been predicted well by both of the two topology predictors. Prediction results from SVMSignal, TOPCONS and MemBrain are available in the Additional file 3: Dataset S2.
To obtain observed rotational angles corresponding to each predicted TMH, we removed atoms which were predicted outside the membrane by the topology prediction for each protein chain, and then we followed the definition in this work to calculate the rotational angles. Specifically, we did not directly assign rotational angles calculated by TMH of PDBTM to predicted TMHs, but we recalculated rotational angles based on atoms of protein structure selected by predicted topology within the TM region. There are two reasons to do that. First, since the sequence of a predicted TMH is not identical to that defined by PDBTM and the definition of rotational angle depends on the helical principal axis and the Cα vector of the first residue to its lipid-facing direction vector, we cannot simply assume their structural property is similar. Second, for a predicted 3D protein structure, the TM region information comes from the topology predictor, and the rotational angle of each TMH is established on the atoms within predicted TM region, not from the PDBTM. Therefore, we have to recalculate the rotational angles of predicted TMH for comparison. Finally, we ignored any predicted TMH which includes residues that do not have structural data within PDB entity, and 155 TMHs were left for comparison.
The Additional file 4: Table S2 and the Additional file 5: Table S3 provide sequences of predicted TMHs corresponding to TMHs annotated in PDBTM, observed angle defined by residues of predicted topology, predicted angle, and moment lengths. For all 155 TMHs, the MAAE of TOPCONS is 43.04˚ and MAAE of MemBrain is 56.59˚. These two tables demonstrate the ability of TMexpo to predict rotational angles based on predicted topology. Interestingly, the MAAE of 155 TMHs based on topology predicted by TOPCONS has better results than topology annotated in PDBTM. There are two possible explanations for this observation. First, TMHs predicted by TOPCONS is longer than annotated in PDBTM, and this may help calculating the helical principle axis. Second, we excluded TMHs that have partially incomplete structural data, and the performance of 155 TMHs would differ from the dataset of 188TMHs. We conclude that rotational angles calculated by both predicted TMHs and predicted rASA are still consistent with the observed rotational angles defined by the predicted TMH. Therefore, while TMH boundary is not perfectly predicted, the predicted rotational angle can still provide useful information to the interior side of a TM protein and constrain decoys of predicted 3D structure.
An application to an amino acid antiporter, AdiC
We selected from our independent test set an amino acid antiporter, called AdiC, of E. coli strain O157:H7 to demonstrate how predicted rASA and rotational angle in the helical wheel presentation of TMHs facilitate the analysis of TM proteins. The E. coli strain O157:H7 is a pathogen and causes hemorrhagic diarrhea, and AdiC is a multi-spanning TM protein that enables E. coli to resist the acidic environment via exchanging extracellular arginine and intracellular agmatine [46, 47]. An arginine-bounded structure of AdiC was solved and is coded as 3L1L  in PDB.
Interhelical contacts play an important role in relative accessible surface area prediction
Comparing Pearson correlation coefficients between contact-enriched set and reference set defined by different thresholds ( c )
Thresholds for contact-enriched set
PCC (number of residues) on contact-enriched set
PCC (number of residues) on reference set
c ≥ 1
c ≥ 2
c ≥ 3
Sequence-based prediction of both rotational angle and rASA can provide indispensable information for structure prediction when high-resolution structures are unavailable and contribute to experimental design to elucidate TM protein functions. In this paper, we present a novel concept of using lipid exposure to infer rotational angles and have developed a machine learning approach to predict rotational angles of TMHs. Significantly, using predicted rASA from our sequence-based model achieved an MAAE of 48.31˚ on the independent test set, which is better than that obtained by the best of the compared knowledge-based propensities (67.51˚). Furthermore, we demonstrate an application for structural analysis via an amino acid antiporter. We believe improving prediction of rotational angle can benefit the structure prediction because free modeling of TM protein structures is a tough task and reducing the number of packing arrangements is necessary.
The metric used for evaluating rotational angle prediction in this work is mean absolute angular error (MAAE). To evaluate the classification model, i.e., classifying burial and exposed status, we used the following performance measures, including Matthew’s correlation coefficient (MCC), accuracy, sensitivity, specificity, and precision. With respect to the regression model, i.e., rASA prediction, we used mean absolute error (MAE), root mean squared error (RMSE), and Pearson correlation coefficient (PCC).
See the Additional file 6: Table S4 for definitions of Pearson correlation coefficient, accuracy, sensitivity, specificity and precision.
The list of all protein chains (PDB:Chain) included in the development set and the independent test set
Independent test set
Calculation of relative accessible surface area from structures
To calculate lipid exposure or exposed area of a structure, we used NACCESS program [19, 29] with the probe radius set to 2.0 Å. The size of probe radius 2.0 Å was selected to mimic the -CH2 of hydrocarbon chains, and it is identical to that used in Yuan et al. , Illergård et al.  and Lo et al. . The ASA for a residue was the sum of ASA from all atoms belonging to that residue. To extract the helical boundaries from the protein chains, we used the annotations of PDBTM. From the 89 protein chains in the development set, we obtained a total of 10,441 residues in TM domain. For the independent test set of newly solved proteins, we obtained 3,581 residues in TM domain. To annotate missing residues and missing atoms, we used PDB Validation Suite . In order to obtain rASA as a normalized measure for a TM residue, we divided the ASA values by their reference values in a Gly-X-Gly tripeptide in an extended conformation. The reference values were derived from Samantha et al. . To classify burial status of each residue for model training and testing, we followed the rASA threshold defined in Miller et al.’s work , i.e., rASA <5% to characterize buried residues and otherwise exposed, though different thresholds have been used in the literature.
An SVM-based predictor for lipid exposure of TM helices
We proposed residue-wise predictors based on support vector machines (SVMs), i.e., an SVM classifier to predict the burial/exposed status and a support vector regression (SVR) model to predict rASA values of each residue in TM domain. Specifically, C-SVC and epsilon-SVR implemented in LIBSVM  were used to develop the models, and both of them used the RBF kernel function. The parameters of the models were optimized by chain-wise LOOCV procedure on the development set. In LOOCV procedure, the best set of parameters to train the burial/exposed status classification model is of cost c = 21 and gamma g = 2-4; and the best set of parameters to train the real-number rASA regression model is of cost c = 2-1, gamma g = 2-5, loss function p = 10-3 and tolerance of termination criterion e = 10-2. Details of LOOCV performances can be obtained in the Additional file 8: Text S1.
Given a TM domain of a protein chain, each residue to be predicted was located at the center of a sliding window of length 17 and features were generated according to the 17-mer sequence. To train the classification model, exposed residues with label “E” were considered as positive data, and buried residues with label “B” as negative data. To train the regression model, the input was taken from the real-number rASA. We searched parameters by LOOCV procedure for the classification model and the regression model based on optimizing the MCC and the PCC, respectively. We did not directly predict ASA values because they are not normalized in a zero to one interval and this could produce bias in the presence of an outlier.
In training and testing, we excluded residues that participate in interchain contacts and the rationale is as follows: A sequence-based rASA predictor, which accepts the sequence of a structural subunit as input, can only describe structural properties of one subunit, not of the complete structure. Thus predicted rASA of those residues may be drastically different depending on their locations in the interacting interfaces. In the case of residues residing on the interchain surface, we observed rASA of these residues in a single chain may be significantly different from those seen in the complete structure with multiple subunits. Out of 110 representative protein chains used in our work, 86 protein chains are multimeric. Among all of their 9,800 TM residues without any missing atom, 2,167 (22.11%) residues have two different rASA values calculated from the single subunit and the complete structure, respectively; and the former-derived rASA is always larger than the latter-derived rASA. Notably, the maximum and average differences of the two rASA values of these 2,167 residues are 82.28% and 23.43%, respectively. Furthermore, 831 out of the 2,167 residues would be assigned inconsistent burial/exposed status according to their two different rASA values. In other words, 8.48% of the overall 9,800 transmembrane residues were considered as exposed from the perspective of a single subunit but turned out to be buried in their complete structures. For example, in 2OAR:A, 41 residues of 52 TM residues have different rASA values, and 13 residues are calculated as being exposed in the single chain but as being buried in the complete structure. It is noteworthy that the 60S (i.e., 37S by PDB indexing) and the 45 V (i.e., 22 V by PDB indexing) have drastic differences in their rASA values, i.e., 60.77% vs. 2.42% and 60.02% vs. 1.68%, as calculated by single chain and by complete structure, respectively. Since we did not know the native state of amino acids lying on the interchain surface, we excluded these residues from our training and testing data. For each protein chain, we calculated rASA for both single subunit structure and complete structure. Later, we excluded residues which were not identical in rASA by comparing the above two calculations, and also excluded residues that were missing partial or entire atoms.
In the testing stage, we performed a simple post-processing by rounding off their upper and lower bound to 1 and 0 because rASA values are contained in this interval. To derive the ASA values for each residue, we multiplied the predicted rASA values by the reference values .
Input features for predictors
In the design of TMexpo, we did not use a specific feature selection technique, and all the features used in TMexpo belong to one of the three feature groups. The first group is about interhelical contacts, specifically volume, polarity, charge and residue interhelical contact propensity. Since we have observed buried residues tend to have more interhelical contacts, and therefore we examined features related to interhelical contacts. For example, the well-known GxxxG motif can be regarded as small-xxx-small motifs , and we use volume profiles to incorporate such feature in the machine learning model. The polarity and charge can also be seen as features related to hydrogen bonding  and cation-pi interaction , respectively. To encode features into TMexpo, the volume  of each residue was divided by their maximum value 237.2 of tyrosine. The polarity was also encoded by the sigmoidal functions 1-1/(1 + e -po ), where po denotes the mean residue polarity calculated by Radzicka and Wolfenden’s method . We defined positively charged residues as 1, neutral residues as 0.5, and negatively charged residues as 0 based on the index used by Klein et al. . The residue interhelical contact propensity were developed by Lo et al. , and we used in TMexpo the normalized propensity by division of the maximum value 1.43 of cysteine.
The second group provides evolutionary information as position-specific scoring matrix (PSSM) profiles and conservation score to machine learning model. Evolutionary information is an important feature and has been incorporated in interhelical interaction predictors [38, 63]. To encode PSSM as features, the matrix was generated by performing PSI-BLAST against NCBI’s non-redundant database. This feature of a 17-mer peptide was encoded by a vector of size 17 × 20, where each entry was normalized by 1- 1/(1 + e -PSSM ). The conservation score was calculated by an algorithm developed by Capra and Singh  on the multiple sequence alignment generated by MAFFT [65, 66] based on the 17-mer peptide. We used the raw scores without using the local Z-score transformation described in their method.
The third group includes the TMH insertion energy, amphiphilicity of residues and turn propensities, which relate to structural information. The first two features can reveal residue position toward hydrophobic membrane or water interface, which is akin to Zpred features used in MPRAP that directly predict relative position from the center of membrane for each residue. The position-specific free energy of TMH insertion, termed as “free energy” to describe the hydrophobic core, was encoded by a sigmoidal function as 1-1/(1 + e -energy ), where energy denotes the free energy of TMH insertion estimated by Hessa et al.’s method . The amphiphilicity was encoded by the sigmoidal functions 1-1/(1 + e am ), where am denotes the amphiphilicity derived by Mitaku et al.’s method . We also considered the helix turn propensities in order to capture sequence information related to tight turns in naturally occurring TM helices from Monné et al. . This feature was normalized propensities to [0, 1] by dividing the maximum value 2.7 of proline.
All of the above features were normalized to a closed [0, 1] interval. A feature value close to 1 means the corresponding residue is more hydrophobic, more amphiphilic, higher polarity, positively charged, larger volume, more conserved, tends to have turns and interhelical contacts. We filled 0.5 as features for nonexistent residues in windows, except charge, interhelical contacts, and volume, we filled zeroes.
Predicting rotational angle based on relative accessible surface area
Determination of rotational angle of a transmembrane helix
The rotational angle of a TMH was calculated as follows: First, we removed atoms which were annotated outside the membrane. Second, we computed the helical principal axis of the TMH of interest and aligned it with the z-axis with the N-terminal facing the screen, creating a top-view of the protein with respect to the target helix. Third, we identified the geometric centers of the molecule and each of the individual helices in the two-dimensional plane from the average x and y coordinates of Cα in the constituent TMH residues. We defined the lipid-facing direction of a TMH as in the opposite direction circumscribed by the geometric center of the target TMH connects to the molecular geometric center of the protein chain unit. The rotational angle of the target helix was measured as the angle rotated from the Cα vector of the first residue to its lipid-facing direction vector by clockwise motion viewed from the helix N-terminal to C-terminal. The angle ranges from 0˚ to 360˚.
Calculation of relative accessible surface area moment direction
For each residue, the rASA values can be seen as degree of directional lipid-facing. Therefore, for one TMH, the summation of all TMH residues’ lipid-facing tendency can characterize its rotational angle. In Equation 6, the x and the y terms are the vector summation of over n residues in a TMH. The moment length |M| is defined as Equation 7. The angle γ was solved first by inverse cosine function as Equation 8, and we determined moment direction θ by taking (360 - γ) as a result as Equation 9 if the sign of y term is negative.
Predicting rotational angles of transmembrane helices
Accessible surface area
Leave-one-out cross validation
Matthew’s correlation coefficient
Mean absolute angular error
Mean absolute error
Pearson correlation coefficient
Protein data bank
Relative accessible surface area
Root mean squared error
Support vector machine
Support vector regression
Van der Waal’s.
This work was supported in part by the National Science Council under grants NSC100-2319-B-010-002 and NSC101-2221-E-001-022. No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Wallin E, Von Heijne G: Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 1998, 7: 1029-1038.PubMed CentralView ArticlePubMedGoogle Scholar
- Stevens TJ, Arkin IT: Do more complex organisms have a greater proportion of membrane proteins in their genomes?. Proteins: Struct, Funct, Bioinform. 2000, 39: 417-420. 10.1002/(SICI)1097-0134(20000601)39:4<417::AID-PROT140>3.0.CO;2-Y.View ArticleGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- Feng L, Campbell EB, Hsiung Y, MacKinnon R: Structure of a eukaryotic CLC transporter defines an intermediate state in the transport cycle. Science. 2010, 330: 635-641. 10.1126/science.1195230.PubMed CentralView ArticlePubMedGoogle Scholar
- Cao Y, Jin X, Huang H, Derebe MG, Levin EJ, Kabaleeswaran V, Pan Y, Punta M, Love J, Weng J: Crystal structure of a potassium ion transporter, TrkH. Nature. 2011, 471: 336-340. 10.1038/nature09731.PubMed CentralView ArticlePubMedGoogle Scholar
- Von Heijne G: Membrane-protein topology. Nat Rev Mol Cell Biol. 2006, 7: 909-918. 10.1038/nrm2063.View ArticlePubMedGoogle Scholar
- Popot JL, Engelman DM: Membrane protein folding and oligomerization: the two-stage model. Biochemistry. 1990, 29: 4031-4037. 10.1021/bi00469a001.View ArticlePubMedGoogle Scholar
- Popot J-L, Engelman DM: Helical membrane protein folding, stability, and evolution. Annu Rev Biochem. 2000, 69: 881-922. 10.1146/annurev.biochem.69.1.881.View ArticlePubMedGoogle Scholar
- Eisenberg D, Weiss RM, Terwilliger TC: The helical hydrophobic moment: a measure of the amphiphilicity of a helix. Nature. 1982, 299: 371-374. 10.1038/299371a0.View ArticlePubMedGoogle Scholar
- Eisenberg D, Weiss RM, Terwilliger TC, Wilcox W: Hydrophobic moments and protein structure. Faraday Symp Chem Soc. 1982, 17: 109-120.View ArticleGoogle Scholar
- Beuming T, Weinstein H: A knowledge-based scale for the analysis and prediction of buried and exposed faces of transmembrane domain proteins. Bioinformatics. 2004, 20: 1822-1835. 10.1093/bioinformatics/bth143.View ArticlePubMedGoogle Scholar
- Pilpel Y, Ben-Tal N, Lancet D: kPROT: a knowledge-based scale for the propensity of residue orientation in transmembrane segments, application to membrane protein structure prediction. J Mol Biol. 1999, 294: 921-935. 10.1006/jmbi.1999.3257.View ArticlePubMedGoogle Scholar
- Adamian L, Nanda V, DeGrado WF, Liang J: Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins. Proteins: Struct, Funct, Bioinformatics. 2005, 59: 496-509. 10.1002/prot.20456.View ArticleGoogle Scholar
- Park Y, Helms V: On the derivation of propensity scales for predicting exposed transmembrane residues of helical membrane proteins. Bioinformatics. 2007, 23: 701-708. 10.1093/bioinformatics/btl653.View ArticlePubMedGoogle Scholar
- Dastmalchi S, Beheshti S, Morris MB, Bret Church W: Prediction of rotational orientation of transmembrane helical segments of integral membrane proteins using new environment-based propensities for amino acids derived from structural analyses. FEBS J. 2007, 274: 2653-2660. 10.1111/j.1742-4658.2007.05800.x.View ArticlePubMedGoogle Scholar
- Stevens TJ, Arkin IT: Substitution rates in α-helical transmembrane proteins. Protein Sci. 2001, 10: 2507-2517. 10.1110/ps.ps.10501.PubMed CentralView ArticlePubMedGoogle Scholar
- Adamian L, Liang J: Prediction of transmembrane helix orientation in polytopic membrane proteins. BMC Struct Biol. 2006, 6: 13-10.1186/1472-6807-6-13.PubMed CentralView ArticlePubMedGoogle Scholar
- Hildebrand PW, Lorenzen S, Goede A, Preissner R: Analysis and prediction of helix-helix interactions in membrane channels and transporters. Proteins: Struct, Funct, Bioinformatics. 2006, 64: 253-262. 10.1002/prot.20959.View ArticleGoogle Scholar
- Lee B, Richards FM: The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971, 55: 379-400. 10.1016/0022-2836(71)90324-X.View ArticlePubMedGoogle Scholar
- Yuan Z, Zhang F, Davis MJ, Bodén M, Teasdale RD: Predicting the solvent accessibility of transmembrane residues from protein sequence. J Proteome Res. 2006, 5: 1063-1070. 10.1021/pr050397b.View ArticlePubMedGoogle Scholar
- Illergård K, Callegari S, Elofsson A: MPRAP: an accessibility predictor for a-helical transmem-brane proteins that performs well inside and outside the membrane. BMC Bioinforma. 2010, 11: 333-10.1186/1471-2105-11-333.View ArticleGoogle Scholar
- Granseth E, Viklund H, Elofsson A: ZPRED: predicting the distance to the membrane center for residues in α-helical membrane proteins. Bioinformatics. 2006, 22: e191-e196. 10.1093/bioinformatics/btl206.View ArticlePubMedGoogle Scholar
- Park Y, Hayat S, Helms V: Prediction of the burial status of transmembrane residues of helical membrane proteins. BMC Bioinforma. 2007, 8: 302-10.1186/1471-2105-8-302.View ArticleGoogle Scholar
- Rose A, Lorenzen S, Goede A, Gruening B, Hildebrand PW: RHYTHM—a server to predict the orientation of transmembrane helices in channels and membrane-coils. Nucleic Acids Res. 2009, 37: W575-W580. 10.1093/nar/gkp418.PubMed CentralView ArticlePubMedGoogle Scholar
- White SH, Von Heijne G: How translocons select transmembrane helices. Annu Rev Biophys. 2008, 37: 23-42. 10.1146/annurev.biophys.37.032807.125904.View ArticlePubMedGoogle Scholar
- Driessen AJ, Manting EH, van der Does C: The structural basis of protein targeting and translocation in bacteria. Nat Struct Mol Biol. 2001, 8: 492-498. 10.1038/88549.View ArticleGoogle Scholar
- Bibi E: The role of the ribosome-translocon complex in translation and assembly of polytopic membrane proteins. Trends Biochem Sci. 1998, 23: 51-55. 10.1016/S0968-0004(97)01134-1.View ArticlePubMedGoogle Scholar
- White SH, Wimley WC: Membrane protein folding and stability: physical principles. Annu Rev Biophys Biomol Struct. 1999, 28: 319-365. 10.1146/annurev.biophys.28.1.319.View ArticlePubMedGoogle Scholar
- Hubbard SJ, Thornton JM: ‘NACCESS’ Computer program, Department of Biochemistry and Molecular Biology. 1993, University College LondonGoogle Scholar
- Eisenberg D, Schwarz E, Komaromy M, Wall R: Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol. 1984, 179: 125-142. 10.1016/0022-2836(84)90309-7.View ArticlePubMedGoogle Scholar
- Rees DC, DeAntonio L, Eisenberg D: Hydrophobic organization of membrane proteins. Science. 1989, 245: 510-513. 10.1126/science.2667138.View ArticlePubMedGoogle Scholar
- Stevens TJ, Arkin IT: Are membrane proteins “inside-out” proteins?. Proteins: Struct, Funct, Bioinformatics. 1999, 36: 135-143. 10.1002/(SICI)1097-0134(19990701)36:1<135::AID-PROT11>3.0.CO;2-I.View ArticleGoogle Scholar
- Mokrab Y, Stevens TJ, Mizuguchi K: Lipophobicity and the residue environments of the transmembrane α-helical bundle. Proteins: Struct, Funct, Bioinformatics. 2009, 74: 32-49. 10.1002/prot.22130.View ArticleGoogle Scholar
- Park Y, Helms V: How strongly do sequence conservation patterns and empirical scales correlate with exposure patterns of transmembrane helices of membrane proteins?. Biopolymers. 2006, 83: 389-399. 10.1002/bip.20569.View ArticlePubMedGoogle Scholar
- Hill JR, Kelm S, Shi J, Deane CM: Environment specific substitution tables improve membrane protein alignment. Bioinformatics. 2011, 27: i15-i23. 10.1093/bioinformatics/btr230.PubMed CentralView ArticlePubMedGoogle Scholar
- Ulmschneider MB, Sansom MSP: Amino acid distributions in integral membrane protein structures,Biochimica et Biophysica Acta (BBA). Biomembranes. 2001, 1512: 1-14. 10.1016/S0005-2736(01)00299-1.View ArticleGoogle Scholar
- Lo A, Cheng CW, Chiu YY, Sung TY, Hsu WL: TMPad: an integrated structural database for helix-packing folds in transmembrane proteins. Nucleic Acids Res. 2011, 39: D347-D355. 10.1093/nar/gkq1255.PubMed CentralView ArticlePubMedGoogle Scholar
- Lo A, Chiu YY, Rødland EA, Lyu PC, Sung TY, Hsu WL: Predicting helix-helix interactions from residue contacts in membrane proteins. Bioinformatics. 2009, 25: 996-1003. 10.1093/bioinformatics/btp114.PubMed CentralView ArticlePubMedGoogle Scholar
- Harrington SE, Ben-Tal N: Structural determinants of transmembrane helical proteins. Structure. 2009, 17: 1092-1103. 10.1016/j.str.2009.06.009.View ArticlePubMedGoogle Scholar
- Jasti J, Furukawa H, Gonzales EB, Gouaux E: Structure of acid-sensing ion channel 1 at 1.9 Å resolution and low pH. Nature. 2007, 449: 316-323. 10.1038/nature06163.View ArticlePubMedGoogle Scholar
- Huang L-S, Shen JT, Wang AC, Berry EA: Crystallographic studies of the binding of ligands to the dicarboxylate site of complex II, and the identity of the ligand in the “oxaloacetate-inhibited” state, Biochimica et Biophysica Acta (BBA). Bioenergetics. 2006, 1757: 1073-1083. 10.1016/j.bbabio.2006.06.015.View ArticleGoogle Scholar
- Gonen T, Cheng Y, Sliz P, Hiroaki Y, Fujiyoshi Y, Harrison SC, Walz T: Lipid-protein interactions in double-layered two-dimensional AQP0 crystals. Nature. 2005, 438: 633-638. 10.1038/nature04321.PubMed CentralView ArticlePubMedGoogle Scholar
- Lai JS, Cheng CW, Sung TY, Hsu WL: Computational comparative study of tuberculosis proteomes using a model learned from signal peptide structures. Plos One. 2012, 7: e35018-10.1371/journal.pone.0035018.PubMed CentralView ArticlePubMedGoogle Scholar
- Bernsel A, Viklund H, Hennerdal A, Elofsson A: TOPCONS: consensus prediction of membrane protein topology. Nucleic Acids Res. 2009, 37: W465-W468. 10.1093/nar/gkp363.PubMed CentralView ArticlePubMedGoogle Scholar
- Shen H, Chou JJ: MemBrain: improving the accuracy of predicting transmembrane helices. Plos One. 2008, 3: e2399-10.1371/journal.pone.0002399.PubMed CentralView ArticlePubMedGoogle Scholar
- Gao X, Zhou L, Jiao X, Lu F, Yan C, Zeng X, Wang J, Shi Y: Mechanism of substrate recognition and transport by an amino acid antiporter. Nature. 2010, 463: 828-832. 10.1038/nature08741.View ArticlePubMedGoogle Scholar
- Gao X, Lu F, Zhou L, Dang S, Sun L, Li X, Wang J, Shi Y: Structure and mechanism of an amino acid antiporter. Science. 2009, 324: 1565-1568. 10.1126/science.1173654.View ArticlePubMedGoogle Scholar
- Faham S, Yang D, Bare E, Yohannan S, Whitelegge JP, Bowie JU: Side-chain contributions to membrane protein structure and stability. J Mol Biol. 2004, 335: 297-305. 10.1016/j.jmb.2003.10.041.View ArticlePubMedGoogle Scholar
- Fleming KG, Engelman DM: Specificity in transmembrane helix-helix interactions can define a hierarchy of stability for sequence variants. Proc Natl Acad Sci U S A. 2001, 98: 14340-14344. 10.1073/pnas.251367498.PubMed CentralView ArticlePubMedGoogle Scholar
- Bowie JU: Membrane protein folding: how important are hydrogen bonds?. Curr Opin Struct Biol. 2011, 21: 42-49. 10.1016/j.sbi.2010.10.003.PubMed CentralView ArticlePubMedGoogle Scholar
- Joh NH, Min A, Faham S, Whitelegge JP, Yang D, Woods VL, Bowie JU: Modest stabilization by most hydrogen-bonded side-chain interactions in membrane proteins. Nature. 2008, 453: 1266-1270. 10.1038/nature06977.View ArticlePubMedGoogle Scholar
- Tusnády GE, Dosztányi Z, Simon I: PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank. Nucleic Acids Res. 2005, 33: D275-D278.PubMed CentralView ArticlePubMedGoogle Scholar
- Tusnády GE, Dosztányi Z, Simon I: Transmembrane proteins in the protein data bank: identification and classification. Bioinformatics. 2004, 20: 2964-2972. 10.1093/bioinformatics/bth340.View ArticlePubMedGoogle Scholar
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.View ArticlePubMedGoogle Scholar
- Westbrook J, Feng Z, Burkhardt K, Berman HM: Validation of protein structures for protein data bank. Methods Enzymol. 2003, 374: 370-385.View ArticlePubMedGoogle Scholar
- Samanta U, Bahadur RP, Chakrabarti P: Quantifying the accessible surface area of protein residues in their local environment. Protein Eng. 2002, 15: 659-667. 10.1093/protein/15.8.659.View ArticlePubMedGoogle Scholar
- Miller S, Janin J, Lesk AM, Chothia C: Interior and surface of monomeric proteins. J Mol Biol. 1987, 196: 641-656. 10.1016/0022-2836(87)90038-6.View ArticlePubMedGoogle Scholar
- Chang C-C, Lin C-J: LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011, 2: 1-27.View ArticleGoogle Scholar
- Johnson RM, Hecht K, Deber CM: Aromatic and cation-π interactions enhance helix-helix association in a membrane environment. Biochemistry. 2007, 46: 9208-9214. 10.1021/bi7008773.View ArticlePubMedGoogle Scholar
- Pontius J, Richelle J, Wodak SJ: Deviations from standard atomic volumes as a quality measure for protein crystal structures. J Mol Biol. 1996, 264: 121-136. 10.1006/jmbi.1996.0628.View ArticlePubMedGoogle Scholar
- Radzicka A, Wolfenden R: Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution. Biochemistry. 1988, 27: 1664-1670. 10.1021/bi00405a042.View ArticleGoogle Scholar
- Klein P, Kanehisa M, DeLisi C: Prediction of protein function from sequence properties: discriminant analysis of a data base. Biochim Biophys Acta Protein Struct Mol Enzymol. 1984, 787: 221-226. 10.1016/0167-4838(84)90312-1.View ArticleGoogle Scholar
- Fuchs A, Martin-Galiano AJ, Kalman M, Fleishman S, Ben-Tal N, Frishman D: Co-evolving residues in membrane proteins. Bioinformatics. 2007, 23: 3312-3319. 10.1093/bioinformatics/btm515.View ArticlePubMedGoogle Scholar
- Capra JA, Singh M: Predicting functionally important residues from sequence conservation. Bioinformatics. 2007, 23: 1875-1882. 10.1093/bioinformatics/btm270.View ArticlePubMedGoogle Scholar
- Katoh K, Misawa K, Kuma KI, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 2002, 30: 3059-3066. 10.1093/nar/gkf436.PubMed CentralView ArticlePubMedGoogle Scholar
- Katoh K, Kuma KI, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33: 511-518. 10.1093/nar/gki198.PubMed CentralView ArticlePubMedGoogle Scholar
- Hessa T, Meindl-Beinker NM, Bernsel A, Kim H, Sato Y, Lerch-Bader M, Nilsson I, White SH, Von Heijne G: Molecular code for transmembrane-helix recognition by the Sec61 translocon. Nature. 2007, 450: 1026-1030. 10.1038/nature06387.View ArticlePubMedGoogle Scholar
- Mitaku S, Hirokawa T, Tsuji T: Amphiphilicity index of polar amino acids as an aid in the characterization of amino acid preference at membrane-water interfaces. Bioinformatics. 2002, 18: 608-616. 10.1093/bioinformatics/18.4.608.View ArticlePubMedGoogle Scholar
- Monné M, Nilsson I, Elofsson A, Von Heijne G: Turns in transmembrane helices: determination of the minimal length of a “helical hairpin” and derivation of a fine-grained turn propensity scale. J Mol Biol. 1999, 293: 807-814. 10.1006/jmbi.1999.3183.View ArticlePubMedGoogle Scholar
- Donnelly D, Overington JP, Ruffle SV, Nugent JHA, Blundell TL: Modeling α-helical transmembrane domains: the calculation and use of substitution tables for lipid-facing residues. Protein Sci. 1993, 2: 55-70.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.