- Open Access
A hydrophobic spine stabilizes a surface-exposed α-helix according to analysis of the solvent-accessible surface area
© The Author(s). 2016
Published: 22 December 2016
Most of hydrophilic and hydrophobic residues are thought to be exposed and buried in proteins, respectively. In contrast to the majority of the existing studies on protein folding characteristics using protein structures, in this study, our aim was to design predictors for estimating relative solvent accessibility (RSA) of amino acid residues to discover protein folding characteristics from sequences.
The proposed 20 real-value RSA predictors were designed on the basis of the support vector regression method with a set of informative physicochemical properties (PCPs) obtained by means of an optimal feature selection algorithm. Then, molecular dynamics simulations were performed for validating the knowledge discovered by analysis of the selected PCPs.
The RSA predictors had the mean absolute error of 14.11% and a correlation coefficient of 0.69, better than the existing predictors. The hydrophilic-residue predictors preferred PCPs of buried amino acid residues to PCPs of exposed ones as prediction features. A hydrophobic spine composed of exposed hydrophobic residues of an α-helix was discovered by analyzing the PCPs of RSA predictors corresponding to hydrophobic residues. For example, the results of a molecular dynamics simulation of wild-type sequences and their mutants showed that proteins 1MOF and 2WRP_H16I (Protein Data Bank IDs), which have a perfectly hydrophobic spine, have more stable structures than 1MOF_I54D and 2WRP do (which do not have a perfectly hydrophobic spine).
We identified informative PCPs to design high-performance RSA predictors and to analyze these PCPs for identification of novel protein folding characteristics. A hydrophobic spine in a protein can help to stabilize exposed α-helices.
Prediction of the dominant fold of proteins and discovery of protein folding characteristics in an aqueous solution have been challenging problems recently [1, 2] although many methods, such as molecular dynamics simulations, folding recognition, and homology modeling, have been used to study protein folding in recent years. To elucidate folding states of proteins, estimation of accessible surface areas (ASAs) is a simple method to determine whether a residue is buried or exposed and hence the function of this residue can be ascertained. Therefore, the ASA is considered a crucial factor for prediction of protein structure. Predicting the ASA is an important approach in studies on the structure and function of proteins.
Hikijata et al.  predicted three-dimensional (3D) structures of proteins using alignment results and solvent accessibility of residues. Huang et al.  indicated that the ASA is useful for identification of DNA-binding domains in sequences. Zhu and Blundell  analyzed secondary structures of proteins and found amino acid patterns of solvent-inaccessible faces of α-helices and solvent-accessible sides of β-strands. Kumar and Bansal  analyzed α-helices in globular proteins and suggested that Ncap is mostly composed of Ser, Asp, Thr, Asn, Gly, and Pro. Pascarella et al.  and Bartlett et al.  used 3D structure information to study solvent accessibility of residues and characteristics of catalytic sites, respectively. Shirota et al.  estimated the surface-to-volume ratio of residues to examine the sequence-structure relation.
Relevant studies on real-value RSA prediction
Amino acid proportions
Amino acid proportions
Amino acid proportions, PSSM, and sequence length
PSSM and secondary structure
enhance PSSM and sequence length
Basic exact regression
PSSM, PCPs and disorder probability
PSSM, secondary structure, and native disorder
PSSM, PCPs, secondary structure, disorder probability
PSSM, PCPs, and sequence length
Tung and Ho  proposed an informative property-mining algorithm that involves an inheritable biobjective combinatorial genetic algorithm (IBCGA)  to select informative physicochemical properties (PCPs) to predict immunogenicity of MHC class I-binding peptides. In the present study, our aim was to design a high-performance predictor of RSA using SVR with informative PCPs obtained by means of IBCGA with SVR to identify new protein folding characteristics. These features were combined with informative PCPs, PSSMs, and sequence length, and the resulting predictor, named SVR-RSA, turned out to be more accurate on the Barton502 dataset than the exiting RSA predictors. The analysis of informative PCPs of residues yielded a special set of exposed hydrophobic residues of an α-helix, named a hydrophobic spine. The latter consists of periodically repeating exposed hydrophobic residues: every three or four positions.
To characterize the hydrophobic spine, proteins 1MOF and 2WRP (Protein Data Bank IDs), which have a perfectly hydrophobic spine and an imperfectly hydrophobic spine, respectively, were used as examples to analyze structural stability by molecular dynamics simulations of 10 ns at 300, 400, and 500 K . Two mutants, 1MOF_I54D and 2WRP_H16I, which have an imperfectly hydrophobic spine and a perfectly hydrophobic spine, respectively, were also compared with their wild-type versions. The simulation results revealed that a hydrophobic spine in a protein can help to stabilize exposed α-helices, and this result may be helpful in protein engineering.
We used the IBCGA algorithm to select small feature sets of informative PCPs and to discover knowledge by analyzing these feature sets. Each model for one of 20 amino acid residues has its own feature set. The analysis of the informative PCPs deduced the hydrophobic spine, which was further studied using molecular dynamics simulations.
The Barton502 dataset was used for designing the high-performance RSA predictor. Barton502 contains 502 nonhomologous sequences collected by Cuff and Barton . Barton502 was randomly subdivided into a training set and test set, which contain 336 and 166 sequences, respectively. According to Chang et al. , there are 84 sequences in the training dataset that was randomly selected, named the Sma dataset, for feature selection. Every protein was divided into a number of small segments using a sliding window 11 amino acid residues long , where the central residue of the segment is the prediction target while the five nearest bilateral residues provide additional information. All the segments were grouped according to their central residues, and 20 RSA prediction models were built. The real solvent-accessible surface areas were calculated using the DSSP software . According to the definition of Singh and Ahmad , the RSA value of a residue was computed by dividing the real ASA value by the value observed in the extended Ala-X-Ala conformation of the residue. In the present study, the ASA value is the main parameter for evaluating the real-value RSA predictors.
PSSMs of the sequences in the Barton502 dataset were calculated using the PSI-BLAST software . The settings of PSI-BLAST were as follows: the E-value threshold was 10−3, the multipass inclusion E-value threshold was 2 × 10−3, and the iteration number was 4. Each residue of a segment was represented by a 21-dimensional vector that contains 20 values representing effective frequencies of occurrence at respective positions in a multiple alignment and an extra value for the terminal flag as described in the article by Chang et al. . Finally, the PSSM of a segment was represented by 231 values. The score values were normalized using equation (1).
The population size, cross-over rate, and mutation rate of IBCGA were set to 50, 0.8, and 0.05, respectively . The rstart and rend were set to 40 and 10, respectively. The encoded chromosomes were designed as described elsewhere . The gene number was 531, plus three 4-bit genes for tuning parameters C, γ, and ε for ε-SVR. The fitness function involved MAE (detailed in the next section). To select robust feature sets, 30 independent runs were performed for each amino acid, and the feature sets having minimal MAE were selected for constructing RSA predictors.
The proposed method SVR-RSA is designed not only to predict RSAs of amino acid residues, but also to select informative PCPs for identification of characteristics of proteins. The design of SVR-RSA includes two steps: selecting informative PCPs using IBCGA-SVR and implementing the RSA predictors based on the informative PCPs.
After selection of the PCP feature sets using IBCGA-SVR, these feature sets were combined with PSSMs and sequence lengths to construct predictors. The corresponding C, γ, and ε of the models were optimized using grid search software available in the LIBSVM package. The target residues were predicted utilizing the corresponding model of the 20 different models. To avoid the overfitting problems and for performance comparing, Barton502 is divided into the training and test parts which are respectively utilized for creating the predicting models and evaluating the predicting power for comparisons between the predictors in this study and other RSA predictors built using Barton502.
Definitions of hydrophobic residues and the α-helix exposure degree
The 20 residues were classified into “hydrophilic” and “hydrophobic” categories using the Kyte-Doolittle index  with the threshold of 0. Seven residues, Ala, Cys, Ile, Leu, Met, Phe and Val, were defined as hydrophobic residues, while the other 13, Asp, Glu, Gly, His, Lys, Asn, Pro, Gln, Arg, Ser, Thr, Trp, and Tyr as hydrophilic residues. The residues were also assigned an exposed or buried status according to the RSA values at the threshold of 25% . Hence, all residues of proteins can be defined as exposed hydrophilic, exposed hydrophobic, buried hydrophilic, and buried hydrophobic residues.
where n e and n denote the number of exposed residues and the total number of residues in an α-helix, respectively.
Thermal stability analysis
Molecular dynamics simulations were carried out in GROMACS v4.5.5 . The OPLS force field  was applied in this study. 2WRP and 1MOF were retrieved from the Protein Data Bank (PDB) and the mutant structures, 2WRP_H15I and 1MOF_I54D, were constructed using the PS2 web server . The templates were set up using the wild-type 2WRP and 1MOF, respectively, and other parameters were set to default values. These four proteins were supplemented with missing hydrogen atoms and were protonated by considering the protonation state corresponding to pH 7. The periodic boundary conditions were obtained using the dodecahedron box and the minimum distances of a nonhydrogen protein atom to the box wall of at least 1 nm. By means of the single point charge model, water molecules were soaked into the box around the proteins and the counterions were added to neutralize the net charge of the whole simulation system. To ensure that the solvent distribution was kept at the minimum energy, the atoms of proteins were first fixed and subjected to 50,000 iterations of the steepest descent energy minimization, and then 200-ps molecular dynamics  simulations were performed for solvation. Next, the protein and solvent molecules were unconstrained and optimized using the steepest descent energy minimization for 50,000 iterations followed by 10-ns molecular dynamics simulations. In the simulation system, the particle-mesh Ewald (PME) method was used for calculating the long-range electrostatics, and all bonds were constrained using the LINCS algorithm. Time steps were set to 2 fs for 5,000,000 iterations for 10-ns simulations. Three simulation temperatures, 300, 400, and 500 K, were applied to the constant temperature and pressure (NPT)-simulated environments using a weak coupling algorithm that had the pressure of 1 atm and pressure coupling time of 1 ps. The temperature coupling was also applied, with 0.1 ps as coupling time. The trajectories were recorded every 2 ps for the analysis.
Results and discussion
Several real-value RSA predictors using Barton502 were compared in terms of performance. The results showed that the performance of the predictors proposed in this study is better than that of the predictors using Barton502. The informative PCPs were also analyzed. These PCPs indicated that a hydrophobic spine can help to stabilize a protein structure. Two proteins that have a perfectly and imperfectly hydrophobic spine, respectively, and their mutants that show the reverse situation (imperfectly hydrophobic spine and perfectly hydrophobic spine, respectively) were used to conduct several molecular dynamics simulations to validate the hydrophobic spine.
Performance comparisons among the real-value RSA predictors
The PCP feature number of the predictor of RSA for each amino acid residue and MAE of each predictor
The feature usage and a performance summary from other studies that used Barton502 as a dataset
et al. (2008)a
et al. (2006)a
et al. (2005)a
et al. (2005)a
et al. (2004)a
et al. (2003)a
amino acid composition
Performance comparison among real-value RSA predictors
Knowledge retrieval from informative PCPs
The amino acid residues were defined as hydrophobic and hydrophilic using the Kyte-Doolittle index . This definition, which specifies Ala, Cys, Ile, Leu, Met, Phe, and Val as hydrophobic residues, was used in another thermodynamic study on peptides . In the present study, the PCPs that were used to predict RSA of hydrophobic and hydrophilic residues were compared and analyzed.
Lesk et al.  suggested that surface exposure of hydrophilic residues and burying of hydrophobic residues minimize the free energy of a protein. This concept has been successfully applied in most protein folding studies and protein engineering methods. Nonetheless, there are still some protein folding cases not conforming to this principle. For example, Reidhaar-Olson and Sauer  used the λ-repressor to analyze the acceptable substitutions of residues. The results revealed that among most positions on the surface, many positions tolerate substitution of hydrophilic residues with hydrophobic ones and vice versa. Some positions have a strong preference for hydrophilic residues. There are still some results indicating that positions containing exposed hydrophobic residues show a strong preference for hydrophobic residues. Leu12, one of the two exposed hydrophobic residues, in helix 1 of the λ-repressor can be substituted with one of 10 amino acids including six hydrophobic ones. Two of the five exposed positions in α-helix 5 can contain only a hydrophobic residue. At these two positions, Ile84 cannot be substituted and Met87 can be changed only to Leu. This phenomenon indicates that there are some protein folding principles that are unknown to science.
To retrieve more knowledge from our RSA predictors, all the feature sets were compiled according to the binary property: hydrophobic or hydrophilic residues. The RSA predictors were analyzed according to the PCPs used in hydrophobic-residue predictions, hydrophilic-residues predictions, or PCPs used in predictions related to hydrophobic and hydrophilic residues. The results are listed in Additional file 1.
Among the PCPs appearing both in hydrophilic-residue and hydrophobic-residue RSA predictors, BIOV880101 and BIOV880102 were generated from globular proteins. These features—which can be described as “Information value for accessibility; average fraction 35%” and “Information value for accessibility; average fraction 23%”—are the accessibility scales with a different average fraction for different amino acid residues. Because these PCPs are statistical results of RSA, combining these two PCPs within the RSA predictor improved the performance. The other PCPs including the hydrophobicity properties (e.g., GUOD860101, CIDH920103, and MITS02101) and secondary-structure properties (e.g., CHOP780211, MAXF760104, and PALJ810113) suggest that the peptide conformation properties are important for prediction of RSA. Secondary-structure information contributing to prediction of RSA is in agreement with the results of other researchers [11–20], who predicted hydrophilic and hydrophobic residues using single models directly using the secondary structure probability from other secondary structure predictors.
Among the PCPs that appear only in hydrophilic predictors, the PCPs containing 29 parameters were categorized into three types. In the first type, the PCPs correlate with hydrophobicity, such as CORJ870105, JANJ790102, and GEIM800109. This type of PCP constitutes 37% (11/29) of the parameters of all the PCPs appearing only in hydrophilic predictors. The second type is the feature correlating with maintenance of protein structures, such as salt bridge formation and hydrogen-bounding properties. There are three parameters of this type: FAUJ880109, RACS770103, and RACS770103. One study  showed that a salt bridge and hydrogen bonding from the side chain are mediated by hydrophilic residues. On the other hand, those residues forming the hydrogen bonds and salt bridges will be buried in the interior of proteins. The PCPs of the third type are the features corresponding to active sites, such as the orientation and electrostatic properties. Those PCPs include FAUJ880102, FAUJ880103, JANJ790102, SIMZ760101, RACS770103, RACS820109, RACS820110, and RADA880103. Although the active-site residues interact with solvent and substrate molecules, these residues are buried. Bartlett et al.  analyzed 178 active sites of enzymes. In the solvent accessibility analysis, 89% of the catalytic residues showed the RSA less than 30%. Moreover, approximately 50% and 25% of catalytic residues had RSA of 0–10% and 10–20%, respectively; 5% of all catalytic residues had 0% RSA.
According to the PCP analysis of the hydrophilic-residue RSA predictors, except for the first type, which is related to residue hydrophobicity, the other two types are the properties related to characteristics of buried residues. We assumed that the hydrophilic-residue RSA predictors need to use some characteristics of buried residues to estimate the buried status of hydrophilic residues that favor exposure on the protein surface.
Among the 55 PCPs that were used only in hydrophobic-residue RSA predictors, the secondary-structure features constituted 36% (20/55), while the hydrophobicity properties and conformation properties constituted 53% (29/55) and 11% (6/55), respectively. Among the secondary-structure features, the properties related to an α-helix (17 member parameters) represented the majority of the 20 secondary-structure features, while other secondary-structure characteristics (related to β-sheets, γ-turns, and random-coils) represented only 15%. The effects of residue hydrophobicity on protein folding are well studied [42–44], but retrieval of more information from secondary-structure characteristics would be helpful.
According to a study on α-helical structure , an α-helix can be exposed on the surface or buried in a protein. Among the α-helix structure features in our study, the position of a residue in the α-helix represents the majority. Those features include URR980103, AURR980113, AURR980117, CHOP780207, MAXF760106, QIAN880108, QIAN880111, RICJ880101, RICJ880111, RICJ880112, and RICJ880115, whereas ROBB760102, ROBB760103, ROBB760104, and ROBB760107 are the characteristics of the α-helix position in a protein. The fact that the residue position characteristics are in the majority indicates that the position of the α-helix is important.
The hydrophobic spine in α-helices
Because the residue position in an α-helix is important for prediction of the RSA for hydrophobic residues, the influence of a hydrophobic residue position on an α-helix was evaluated. In hydrophilic-residue RSA predictors, we used buried-status characteristics of hydrophilic residues that favor exposure on the surface of proteins. Therefore, we hypothesized that the residue position in an α-helix is related to surface exposure-related characteristics that may also be important for prediction of RSA for hydrophobic residues.
Hydrophobic spines are different from hydrophobic cores which are packing of hydrophobic residues existing in proteins . Hydrophobic spine taking place on a single a-helix is composed of the adjacent hydrophobic residue contacting. The hydrophobic spine is hypothesized to play two roles. One is to drive the protein-protein or protein-ligand interaction, such as that in the leucine zipper . In the present study, the dataset was collected for analysis of protein folding; therefore, a hydrophobic spine appears to stabilize protein structure.
Estimating the hydrophobic-spine stability using molecular dynamics simulations
In the 500-K simulation of 1MOF, the native structures (which are not refolded structures) during the unfolding process are stable at 0.5 ns, while most of native helical structures of 1MOF_I54D were found to be disrupted at 0.25 ns. Stable β-sheet structures appear at 3 ns of 1MOF_I54D suggesting that this mutant assumes molten-globular structure faster than the wild type does. These results indicate that the protein structure becomes unstable if the hydrophobic spine is disrupted. 2WRP lost its native structures after 0.25 ns, and the molten-globular structures that have stable β-sheets appeared in the molecular dynamics simulation at 6 ns and 500 K. 2WRP_H15I kept the native structure at 1 ns, and there were no emerging β-sheet structures. These results suggest that 2WRP_H15I has a perfectly hydrophobic spine and has more stable structures than 2WRP does.
Average a-helix contents (%) from DSSP analysis for 1mof, 1mof-I54D, 2wrp, and 2wrp-H16I at different temperatures
hydrophobic spine regions
We assumed that a hydrophobic spine can prevent a solvent molecule from attacking. A study on human lysozyme  revealed that when the hydrophilic residue is exposed on the surface, this residue can interact with the solvent molecules and initiate the two-stage unfolding process. The solvent molecules first disrupt the backbone hydrogen bounds, and then this disruption will attract more attacks by solvent molecules [54, 55]. The hydrophobic-spine characteristics can be applied to protein engineering or may explain why in existing studies the exposed hydrophobic residues make the protein stable. For example, Arc repressor research  revealed that Ile84 and Met87 are exposed but cannot be mutated to hydrophilic residues. This may be because these hydrophobic residues are located in the hydrophobic spine.
RSA and protein folding correlate strongly. Hence, in this study, the aim was to discover knowledge on protein folding using high-performance RSA predictors. Comparing to most existing protein folding characteristic studies which supposed hypotheses and then provided the statistical evidences, this study interpreted the optimal feature sets of the models and discovered hydrophobic spines on α-helices which would be helpful to protein engineering enhancing the thermal stability of proteins. Twenty models for different amino acid residues were built here using PSSM, sequence length information, and PCPs which were selected using IBCGA-SVR. The MAE and correlation coefficient of the predictors are 14.11 and 0.69, respectively. Those PCPs were analyzed according to the models. In the hydrophilic-residue models, the buried-status-related characteristics including active-site parameters, hydrogen-bonding characteristics, and salt bridge properties were used. In the hydrophobic-residue models, the secondary structure characteristics are in the majority. After further analysis of these secondary-structure characteristics, the effect of the hydrophobic spine manifests itself. To validate the hydrophobic spine stability, 1MOF and 2WRP, which have a perfectly and imperfectly hydrophobic spine, respectively, were used in molecular dynamics simulations to estimate the structure stability. 1MOF_I54D and 2WRP_H16I that have a disrupted hydrophobic spine and a repaired perfectly hydrophobic spine, respectively, were also simulated for comparison. In the simulations at 300 or 400 K, all these four proteins did not show significant secondary-structure disruption. In the simulations at 500 K, 1MOF and 2WRP_H16I (which have a perfectly hydrophobic spine) were found to have more stable structures than 1MOF_I54D and 2WRP do, which have an imperfectly hydrophobic spine. These results indicate that a hydrophobic spine can help to stabilize protein structure.
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 19, 2016. 15th International Conference On Bioinformatics (INCOB 2016): bioinformatics. The full contents of the supplement are available online https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-17-supplement-19.
Publication charges for this article have been funded by the Ministry of Science and Technology under the contract numbers MOST 105-2627-M-009-008-, MOST 105-2221-E-009-138-MY2-, and by the Center for Bioinformatics Research supported by the Top University Program of the National Chiao Tung University and Ministry of Education, Taiwan, R.O.C. for project 104W962. The funders had no role in study design, data collection or analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
Not applicable (only simulated data and public datasets were used in the study).
HLH, SYH, and YFL participated in manuscript preparation. YFL analyzed the PCPs and conducted protein visualization. HLH implemented the software. HLH and SYH participated in the system design, supervised and coordinated the whole project, and helped to write the manuscript. All the coauthors have read and approved the final version of the manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181(4096):223–30.View ArticlePubMedGoogle Scholar
- Eichenberger AP, van Gunsteren WF, Riniker S, von Ziegler L, Hansen N. The key to predicting the stability of protein mutants lies in an accurate description and proper configurational sampling of the folded and denatured states. Biochim Biophys Acta. 2015;1850(5):983–95.View ArticlePubMedGoogle Scholar
- Hijikata A, Yura K, Noguti T, Go M. Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility. Proteins. 2011;79(6):1868–77.View ArticlePubMedPubMed CentralGoogle Scholar
- Huang HL, Lin IC, Liou YF, Tsai CT, Hsu KT, Huang WL, Ho SJ, Ho SY. Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinformatics. 2011;12(S1):S47.View ArticlePubMedPubMed CentralGoogle Scholar
- Blundell TL, Zhu ZY. The Alpha-helix as seen from the protein tertiary structure - a 3-D structural classification. Biophys Chem. 1995;55(1-2):167–84.View ArticlePubMedGoogle Scholar
- Kumar S, Bansal M. Dissecting alpha-helices: Position-specific analysis of alpha-helices in globular proteins. Proteins Struct Funct Genet. 1998;31(4):460–76.View ArticlePubMedGoogle Scholar
- Pascarella S, De Persio R, Bossa F, Argos P. Easy method to predict solvent accessibility from multiple protein sequence alignments. Proteins Struct Funct Genet. 1998;32(2):190–9.View ArticlePubMedGoogle Scholar
- Bartlett GJ, Porter CT, Borkakoti N, Thornton JM. Analysis of catalytic residues in enzyme active sites. J Mol Biol. 2002;324(1):105–21.View ArticlePubMedGoogle Scholar
- Shirota M, Ishida T, Kinoshita K. Effects of surface-to-volume ratio of proteins on hydrophilic residues: Decrease in occurrence and increase in buried fraction. Protein Sci. 2008;17(9):1596–602.View ArticlePubMedPubMed CentralGoogle Scholar
- Cuff JA, Barton GJ. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins. 1999;34(4):508–19.View ArticlePubMedGoogle Scholar
- Ahmad S, Gromiha MM, Sarai A. Real value prediction of solvent accessibility from amino acid sequence. Proteins. 2003;50(4):629–35.View ArticlePubMedGoogle Scholar
- Yuan Z, Huang B. Prediction of protein accessible surface areas by support vector regression. Proteins. 2004;57(3):558–64.View ArticlePubMedGoogle Scholar
- Adamczak R, Porollo A, Meller J. Accurate prediction of solvent accessibility using neural networks-based regression. Proteins. 2004;56(4):753–67.View ArticlePubMedGoogle Scholar
- Wang JY, Lee HM, Ahmad S. Prediction and evolutionary information analysis of protein solvent accessibility using multiple linear regression. Proteins. 2005;61(3):481–91.View ArticlePubMedGoogle Scholar
- Garg A, Kaur H, Raghava GP. Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure. Proteins. 2005;61(2):318–24.View ArticlePubMedGoogle Scholar
- Nguyen MN, Rajapakse JC. Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins. 2006;63(3):542–50.View ArticlePubMedGoogle Scholar
- Chang DT, Huang HY, Syu YT, Wu CP. Real value prediction of protein solvent accessibility using enhanced PSSM features. BMC Bioinformatics. 2008;9 Suppl 12:S12.View ArticlePubMedPubMed CentralGoogle Scholar
- Iqbal S, Mishra A, Hoque MT. Improved prediction of accessible surface area results in efficient energy function application. J Theor Biol. 2015;380:380–91.View ArticlePubMedGoogle Scholar
- Fan C, Liu D, Huang R, Chen Z, Deng L. PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinformatics. 2016;17 Suppl 1:8.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang J, Chen W, Sun P, Zhao X, Ma Z. Prediction of protein solvent accessibility using PSO-SVR with multiple sequence-derived features and weighted sliding window scheme. BioData Min. 2015;8:3.View ArticlePubMedPubMed CentralGoogle Scholar
- Tung CW, Ho SY. POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics. 2007;23(8):942–9.View ArticlePubMedGoogle Scholar
- Chimenti MS, Khangulov VS, Robinson AC, Heroux A, Majumdar A, Schlessman JL, Garcia-Moreno B. Structural Reorganization Triggered by Charging of Lys Residues in the Hydrophobic Interior of a Protein. Structure. 2012;20(6):1071–85.View ArticlePubMedPubMed CentralGoogle Scholar
- Cuff JA, Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins. 2000;40(3):502–11.View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22(12):2577–637.View ArticlePubMedGoogle Scholar
- Singh H, Ahmad S. Context dependent reference states of solvent accessibility derived from native protein structures and assessed by predictability analysis. BMC Struct Biol. 2009;9:25.View ArticlePubMedPubMed CentralGoogle Scholar
- Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 2008;36:D202–05.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.View ArticlePubMedPubMed CentralGoogle Scholar
- Ho SY, Shu LS, Chen JH. Intelligent evolutionary algorithms for large parameter optimization problems. IEEE T Evolut Comput. 2004;8(6):522–41.View ArticleGoogle Scholar
- Chang C-C, Lin C-J. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol. 2011;2(3):1–27.View ArticleGoogle Scholar
- Tung CW, Ho SY. Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics. 2008;9:310.View ArticlePubMedPubMed CentralGoogle Scholar
- Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157(1):105–32.View ArticlePubMedGoogle Scholar
- Pollastri G, Baldi P, Fariselli P, Casadio R. Prediction of coordination number and relative solvent accessibility in proteins. Proteins. 2002;47(2):142–53.View ArticlePubMedGoogle Scholar
- Van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC. GROMACS: Fast, flexible, and free. J Comput Chem. 2005;26(16):1701–18.View ArticleGoogle Scholar
- Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J Phys Chem B. 2001;105(28):6474–87.View ArticleGoogle Scholar
- Chen CC, Hwang JK, Yang JM. (PS)2: protein structure prediction server. Nucleic Acids Res. 2006;34(Web Server issue):W152–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y. SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem. 2012;33(3):259–67.View ArticlePubMedGoogle Scholar
- Adamczak R, Porollo A, Meller J. Combining prediction of secondary structure and solvent accessibility in proteins. Proteins. 2005;59(3):467–75.View ArticlePubMedGoogle Scholar
- Ahmad S, Gromiha MM, Sarai A. RVP-net: online prediction of real valued accessible surface area of proteins from single sequences. Bioinformatics. 2003;19(14):1849–51.View ArticlePubMedGoogle Scholar
- Cheng Y, Liu GR, Li ZR, Lu C, Mi D. A thermodynamic study of peptides binding to carbon nanotubes based on a hydrophobic-polar lattice model using Monte Carlo simulations. J Phys D Appl Phys. 2008;41(5):1–7.Google Scholar
- Lesk AM, Chothia C, Ramsay W, Foster R, Ingold C. Solvent Accessibility, Protein Surfaces, and Protein Folding. Biophys J. 1980;32(1):35–47.View ArticlePubMedPubMed CentralGoogle Scholar
- Reidhaar-Olson JF, Sauer RT. Functionally acceptable substitutions in two alpha-helical regions of lambda repressor. Proteins. 1990;7(4):306–16.View ArticlePubMedGoogle Scholar
- Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600.View ArticlePubMedGoogle Scholar
- Shaytan AK, Shaitan KV, Khokhlov AR. Solvent accessible surface area of amino acid residues in globular proteins: correlation of apparent transfer free energies with experimental hydrophobicity scales. Biomacromolecules. 2009;10(5):1224–37.View ArticlePubMedGoogle Scholar
- Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH. Hydrophobicity of amino acid residues in globular proteins. Science. 1985;229(4716):834–38.View ArticlePubMedGoogle Scholar
- Zhu ZY, Blundell TL. The use of amino acid patterns of classified helices and strands in secondary structure prediction. J Mol Biol. 1996;260(2):261–76.View ArticlePubMedGoogle Scholar
- Qian N, Sejnowski TJ. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol. 1988;202(4):865–84.View ArticlePubMedGoogle Scholar
- Biou V, Gibrat JF, Levin JM, Robson B, Garnier J. Secondary structure prediction: combination of three different methods. Protein Eng. 1988;2(3):185–91.View ArticlePubMedGoogle Scholar
- Munson M, Balasubramanian S, Fleming KG, Nagi AD, O‘Brien R, Sturtevant JM, Regan L. What makes a protein a protein? Hydrophobic core designs that specify stability and structural properties. Protein Sci. 1996;5(8):1584–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Landschulz WH, Johnson PF, Mcknight SL. The Leucine Zipper - a Hypothetical Structure Common to a New Class of DNA-Binding Proteins. Science. 1988;240(4860):1759–64.View ArticlePubMedGoogle Scholar
- Schrödinger L. The PyMOL molecular graphics system, Version 1.8. 2015.Google Scholar
- Pace CN. Polar group burial contributes more to protein stability than nonpolar group burial. Biochemistry-Us. 2001;40(2):310–13.View ArticleGoogle Scholar
- Eisenberg D. Three-dimensional structure of membrane and surface proteins. Annu Rev Biochem. 1984;53:595–623.View ArticlePubMedGoogle Scholar
- Sethuraman A, Vedantham G, Imoto T, Przybycien T, Belfort G. Protein unfolding at interfaces: Slow dynamics of alpha-helix to beta-sheet transition. Proteins. 2004;56(4):669–78.View ArticlePubMedGoogle Scholar
- Hua L, Zhou RH, Thirumalai D, Berne BJ. Urea denaturation by stronger dispersion interactions with proteins than water implies a 2-stage unfolding. P Natl Acad Sci USA. 2008;105(44):16928–33.View ArticleGoogle Scholar
- Bennion BJ, Daggett V. The molecular basis for the chemical denaturation of proteins by urea. P Natl Acad Sci USA. 2003;100(9):5142–147.View ArticleGoogle Scholar
- Cordes MHJ, Sauer RT. Tolerance of a protein to multiple polar-to-hydrophobic surface substitutions. Protein Sci. 1999;8(2):318–25.View ArticlePubMedPubMed CentralGoogle Scholar