A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants
© Li et al; licensee BioMed Central Ltd. 2010
Received: 28 September 2009
Accepted: 28 January 2010
Published: 28 January 2010
The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants.
We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins.
We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/.
Developing thermostable proteins has been a main focus of protein engineering because of its theoretical and practical significance [1–4]. Recently, computational protein design methods have been attracted much attention due to their potential cost and time savings over conventional directed evolution approaches [3, 5, 6]. These types of approaches utilize information extracted from protein sequences and/or 3D structures to predict favorable mutations that may enhance protein thermostability. Clearly, a key step in such approaches is the development of reliable methods for estimating the relative stability of possible mutants to identify favorable mutations. Such methods may also help better understand the protein-folding problem since the ultimate outcome of protein folding is a native structure with the lowest free energy among many possible structures of a protein.
A common approach to study the thermostability of proteins is to perform comparative studies of the sequences and/or structures of (hyper)thermophilic proteins (HPs) and their mesophilic counterparts (MPs) [7–15] because there exists a direct positive correlation between the optimal growth temperature (OGT) of an organism and the melting temperature of its proteins, a key metric of protein thermostability [16, 17]. Numerous studies have focused on amino acid composition changes caused by thermal adaptation at the whole genome level [7, 14, 18]. For example, Zeldovich et al. discovered that the total concentration of seven amino acids (INYWREL) in the proteins of an organism has a strong correlation with its OGT . Overall, the proteins of thermophiles contain more charged and hydrophobic amino acid residues at the expense of polar ones [7, 14, 18]. The observed composition differences have prompted the development of predictive models discriminating HPs and MPs [19–21]. For example, Gromiha and Suresh applied 12 different classification algorithms and the best accuracy achieved reached 89% .
Several amino acid substitution preference matrices have been created based on the sequence alignments of thermophilic proteins and their mesophilic homologues [22–24]. Analyzing these matrices and comparing sequences and structures of HPs and MPs have revealed a number of substitution trends potentially affecting thermostability [7, 8]. Notable features include: an increased level of charged residues in hyperthermophilic proteins at the cost of polar residues on surface compared to their mesophilic homologs [23, 25, 26]; elevated levels of proline or β-branched amino acids in loops to reduce the freedom of coil regions [1, 27]; a reduced number of residues in coil regions but increases in helix runs [28, 29]; increased numbers of the high helix-propensity residues such as Lys and Glu, etc. ; an increased compactness of hydrophobic cores resulted in enhanced apolar interactions and interior packing [30–32]; and reduced deamidation probability by replacing Gln with Glu and Asn with Asp [33, 34].
The goal of this study was to develop a scoring function for predicting relative thermostability of protein and their mutants using an integrated statistical and machine learning approach. We used HP/MP orthologs as research subjects because they are equivalent to mutants with multiple substitutions and, as discussed above, the difference between them may encode thermal-adaptation mechanisms. Thus a scoring function which can distinguish HP/MP orthologs is presumably able to rank the relative stability of a protein and its mutants, a key step for designing more thermostable proteins.
In this study, we first constructed a set of 540 non-redundant hyperthermophilic-mesophilic protein ortholog pairs. Since our dataset is significantly bigger than previous studies, we then calculated a substitution preference matrix using an established approach [11, 12, 22–24]. We used a feature selection procedure based on the random forest algorithm to identify sequence-based features important to pairwise discrimination of hyperthermophilic and mesophilic protein orthologs. We then used a hill-climbing algorithm to fit a scoring function based on a linear combination of these important discriminating features. Finally, we applied the scoring function to two experimental datasets to demonstrate that this scoring function can indeed be used to rank thermostability of protein mutants with high accuracy.
The list of organisms whose proteins were used to generate the non-redundant hyperthermophilic (upper) and mesophilic (bottom) orthologous pairs (adopted from ).
Number of proteins
Aquifex aeolicus VF5
Methanocaldococcus jannaschii DSM
Thermotoga maritima MSB8
Pyrococcus abyssi GE5
Corynebacterium glutamicus ATCC
30 - 40
Escherichia coli K12
Mycobacterium tuberculosis H37Rv
Bacillus halodurans C-125
25 - 35
Streptococcus pneumoniae TIGR4
30 - 35
Reciprocal best BLAST hits with the e-values in BLAST searches less than 10-10;
The difference in the number of residues is less than 5% of the shorter sequence so that only small insertions/deletions were allowed;
Higher than 30% amino acid sequence identity.
In addition, we removed transmembrane proteins, predicted by TMHMM 2.0 http://www.cbs.dtu.dk/services/TMHMM/, because they often use different strategies from soluble proteins to survive under high temperature environments . Furthermore, to reduce the statistical bias caused by redundancy, we clustered paralogues using the blastclust program available in the BLAST package . The minimum length coverage of blastclust was set to 0.5 and the sequence similarity threshold was set to 0.25. Sequences longer than 600 or shorter than 50 residues were also removed. The final dataset consists of 540 non-redundant HP-MP ortholog pairs. Pfam http://pfam.sanger.ac.uk/ domain scans of these proteins confirmed, as expected, that the two proteins of each ortholog pair contain the same domains. Thus the selected pairs are very likely true orthologs.
We also used a set of 373 structurally well-aligned protein pairs from (hyper)thermophilic and mesophilic organisms compiled by Glyakina et al. for testing purpose . The dataset includes 63 hyperthermophilic and 310 thermophilic proteins.
Amino acid substitution matrix
The amino acid residue substitution matrix was constructed following an established procedure [11, 12, 22–24]. In brief, we counted each of the 380 types of amino acid residue substitutions in the BLAST sequence alignments of all MP/HP pairs. Substitutions in converting MPs to HPs are considered as the "forward" direction. Two-tail binomial statistics were used to estimate the statistical significance of the asymmetry of the forward and reverse substitutions of any given pair of amino acids .
Two sets of experimentally investigated protein mutations
Two wild-type ADKs and a series of chimeric enzymes generated from these two enzymes.
The ranking of relative thermostability of wild type proteins and their mutated sequences using the scoring function.
Dmeh (GI: 640374)
BsCSP (GI: 16077975)
PhyA (GI: 464382)
PTDH (GI: 194552172)
CbADH (GI: 187935035)
β-GUS (GI: 868020)
FAOX (GI: 20302586)
Shble (GI: 3891709)
EcHPH (GI: 12539)
PDAO (GI: 129305)
The list of the 83 features used in the study.
Number of Features
Sequence length (L)
Count and composition of amino acids
Number and percentage of positive, negative and all charged residues, as well as the net charges
Number and percentage of small (T and D), tiny (G, A, S and P), aromatic (F, H, Y, W), aliphatic, hydrophobic and polar residues
Number and percentage of residues which can form hydrogen bond in sidechain
Number of sulfide atoms
Average solubility of amino acids in aqueous solutions under room temperature
The average of the maximum solvent accessible surface area (ASA) of each amino acid
Predicted isoelectric point (pI) of the protein, the average pI on all residues (pIa)
Instability index and instability class
Gravy hydropathy index
Composition of the predicted secondary structure residues
Predicted percentages of buried/exposed residues
The overall length and percentage of all coils, rem465, and hotloop
where k = (1, 2, ...,m) are possible classes and p k is the relative frequency of class k in a node A. Therefore I(A) equals to zero when all cases in the node belong to a single class and reaches its maximum when cases are equally distributed to all classes.
We used a random forest package implemented in the R environment for this study http://cran.r-project.org/web/packages/randomForest/index.html. Random forest models are usually insensitive to the model parameters . Consequently the default parameters were used in the study.
Results and Discussion
In this section, we first report a MP/HP residue substitution preference matrix generated from the BLAST pairwise alignments of MP and HP orthologs. Feature selection using the random forest algorithm is then described, followed by the scoring function construction. The performance of the scoring function in discriminating hyperthermophilic and mesophilic proteins was estimated with a set of holdout testing dataset. Finally, the application of the scoring function in predicting relative stability of proteins and their mutants is presented.
Amino acid composition
Comparison of the composition of the amino acids in hyperthermophilic and mesophilic proteins and their significance p-values of t-test and paired t-test.
Composition in HP
Composition in MP
p-value (paired t-test)
0.044 ± 0.016
0.050 ± 0.015
0.019 ± 0.011
0.037 ± 0.015
0.035 ± 0.014
0.035 ± 0.015
0.042 ± 0.014
0.055 ± 0.016
0.009 ± 0.011
0.010 ± 0.011
0.075 ± 0.019
0.079 ± 0.020
0.066 ± 0.023
0.080 ± 0.028
0.017 ± 0.010
0.024 ± 0.013
0.024 ± 0.011
0.026 ± 0.010
0.033 ± 0.014
0.027 ± 0.013
0.038 ± 0.015
0.033 ± 0.014
0.086 ± 0.021
0.082 ± 0.020
0.089 ± 0.021
0.089 ± 0.022
0.041 ± 0.015
0.040 ± 0.014
0.077 ± 0.020
0.066 ± 0.019
0.008 ± 0.007
0.007 ± 0.007
0.050 ± 0.015
0.057 ± 0.016
0.097 ± 0.023
0.079 ± 0.022
0.091 ± 0.023
0.060 ± 0.023
0.056 ± 0.023
0.055 ± 0.023
Amino acid residue substitutions
Many of the significant substitution asymmetries are consistent with various proposed protein thermo stability mechanisms. For example, Asp is preferred to be substituted by Glu or Lys in the direction from MP to HP, both are helix favored while Asp is coil favored. This is consistent with previous findings that in general HPs contain more helical regions at the cost of disordered regions than MPs [28, 29]. There is a strong preference for Ser, Thr, Asn and Gln to be substituted by Lys and Glu in HPs, which can be explained by the observed significant reduction of polar non-charged residues [23, 26] and deamidation vulnerable residues [33, 34] in HPs. Leu is preferred to be substituted by Ile to enhance thermo stability. This is consistent with the finding that increasing β-branched amino acids in loop regions enhance protein thermostability [1, 27].
It is worth mentioning that the significance threshold (p < 10-10, Fisher's exact test ) used in this study was significantly more stringent than the criteria used in previous studies (e.g. p < 10-2) because we used approximately five times as many HP/MP pairs as previous studies. The ratios of forward-to-reverse changes for these substitutions were also calculated based on more examples than in previous studies. For example, the matrix reported by Haney et al. contained 72 residue replacements with no or only single instances . In our matrix, the minimum number is 3 and there are only 14 substitutions with less than 10 examples. Therefore the ratios in this matrix may better reflect thermal adaptation induced substitution biases and should be useful in designing thermostable proteins.
Ranking features using a random forest algorithm
The analysis of the residue substitution preference between MPs and HPs clearly indicates that different residues contribute to protein thermostability differentially. In this section, we describe a procedure for ranking the importance of all 83 features derived from protein sequences in discriminating MPs and HPs using the random forest algorithm.
Developing the scoring function
where i runs over all 10 features are used in the scoring function and w i is the weight for each feature. The sign of the weight of each feature was determined by the location of the inflexion point of its cumulative curve: positive for features located to the left and negative for those to the right of the zero-difference line. Thus the signs of x _K, x _E, x _pos, x _charge, and ASA are positive, and negative for x _small, x _tiny, x _A, x _Q and x _T. We then used a hill-climbing algorithm to fit the weights of these features. The absolute values of all weights were restricted to the range of 0 to 1. We randomly assigned an initial weight to each feature and counted the number of correctly ranked ortholog pairs. The weights were then randomly updated and the number of correct ranks was recounted. The new weights were kept if they resulted in more correctly ranked ortholog pairs; otherwise the weights were rolled back to the previous values. This procedure was repeated 5 × 107 times and the batch of weights which maximized the number of positive score values was recorded. To check whether the optimization procedure was trapped in a local maximum, we repeated the procedure four more times using different random seeds. The results were very similar and thus we simply used the average of the weights in the scoring function. We then used the same procedure to develop four more scoring functions, each for one of remaining training datasets.
The discrimination ability of the scoring function
We calculated the accuracies of the discriminations made by the five scoring functions on their corresponding training datasets. The scoring functions using optimized weights were able to distinguish in average 427.1 ± 1.9 out of 432 (98.9% accuracy) ortholog protein pairs in the training datasets. We then tested each of the scoring functions with its corresponding holdout testing dataset. Out of 108 protein pairs in the testing sets, on average 105.1 ± 0.5 pairs were correctly ranked (97.3% accuracy). This was very close to the accuracy obtained from the training sets (98.9%). Thus the scoring function is robust and able to discriminate a broad spectrum of HP and MP homologous protein pairs.
The final weights of the ten features used in the scoring function.
We also applied the scoring function to discriminating (hyper)thermophilic and mesophilic proteins in the Glyakina dataset . Our scoring function was able to correctly discriminate not only 59 HP/MP pairs (93.7% accuracy), but also 238 thermophilic and mesophilic pairs (76.8% accuracy). The list of these proteins and their scores are provided in Table S2 in the additional file 1. We believe that the difference of the accuracy between hyperthermophilic and thermophilic proteins was caused by the different stabilization mechanisms of hyperthermophilic and thermophilic proteins, as previously suggested in literature [17, 31].
Discriminating non-homologous protein pairs
Encouraged by the results in the above test, we further challenged the scoring function in discriminating non-homologous HP/MP protein pairs. In this test, we compared each HP protein sequence against all MP sequences. The overall accuracy of these 540*540 pairwise comparisons was 88.4%. Such a high accuracy in discriminating non-homologous HP and MP sequences confirms that HP sequences share some common sequential patterns to generate sufficient stability at elevated temperature.
Application in ranking the thermostability of proteins and their mutants
The first test was carried out on two wild-type ADKs and a series of chimeric enzymes generated from these two enzymes . The predicted ranking of thermo-stability using the scoring function is highly consistent with the experimental results (Table 2). In all 28 ( ) pairwise comparisons, only two resulted in incorrect predictions (92.9% accuracy). Moreover, the two inaccurate predictions included one between VJV and JVJ in which the Tm differed by only 6.5°C, and the other between V160J and J36V in which the Tm differed by just 1°C, probably not an experimentally detectable difference.
In the second test, we used a batch of sequences collected by Montanucci, et al . The sequence lengths, the GI numbers of the wild-type proteins, and their melting temperatures are listed in Table 3. We used the scoring function to rank the relative thermostability of wild-type proteins and their mutants. In the case of proteins with two mutants, the relative stability of these mutants was also predicted. Overall there were 18 pairwise comparisons between these wild proteins and their mutants. The scoring function achieved an accuracy of 94.4% (17/18). The wrong prediction was for protein PDAO and its mutant (Table 3). It is a single mutation (F42C) and the difference in Tm is moderate (10°C).
Overall, the scoring function has consistently demonstrated a remarkable ability to rank the relative thermostability of proteins and their mutants. Thus a website http://www.abl.ku.edu/thermorank/ was created and made freely available to the general public.
Comparison with other Methods
The current study differs at the level of information granules from previous work focused amino acid composition differences between thermophilic and mesophilic organisms [7, 14, 18]. We focused on the differences between HP and MP ortholog pairs instead of on the differences between thermophilic or mesophilic proteins at the genome level. The difference between these two approaches is similar to the one between unpaired and paired two-sample t-tests. While previous studies have succeeded in revealing the overall changes caused by thermal adaptation at the genome level, our study has further focused on the protein level. Such an approach may reduce or eliminate the effects of confounding factors such as protein families because it is well established that the amino acid composition may vary in different protein classes . In addition, a protein level study may be more relevant to designing stable proteins because orthologs are essentially mutants with multiple mutations.
To compare the performance of our algorithm to other approaches is difficult because very few algorithms have been developed to rank the relative thermostability of HP/MP orthologous pairs and these studies have used different datasets [20, 48]. TargetStar, a scoring function based on the analysis of 1006 decoy structures for a given protein, can discriminate HP/MP orthologs pairs with 77% accuracy . Recently, Montanucci and colleagues reported a SVM model which achieves 88% accuracy on a set of redundancy-reduced HP/MP pairs . The SVM model used residue and dipeptide compositions as predictive features. Thus, the 97.3% predictive accuracy on the test dataset of our scoring function is considerably higher than the reported accuracies of both previous methods. Moreover, in the application of predicting the relative thermostability of proteins and their mutants, our approach achieved an accuracy of 94.4% (17/18) in the second blind test set, which represents one more correct prediction than Montanucci et al. on the same dataset .
We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may better reflect the thermal adaptation induced substitution biases than previous studies because a larger dataset was used. The large set of HP/MP is available in the supplementary website and should be useful to other researchers for further development of novel algorithms in this area.
optimal growth temperature
We thank Drs. Robert Hanzlik and Yang Zhang of the University of Kansas for their valuable scientific input. We also wish to thank Dr. Montanucci Ludovica for her kindly sharing the test sequences.
Funding: This work was partially supported by NIH grants P01 AG12993 (PI: E. Michaelis).
- Sterner R, Liebl W: Thermophilic adaptation of proteins. Critical Reviews in Biochemistry and Molecular Biology 2001, 36: 39–106. 10.1080/20014091074174View ArticlePubMedGoogle Scholar
- Dahiyat BI: In silico design for protein stabilization. Current Opinion in Biotechnology 1999, 10: 387–390. 10.1016/S0958-1669(99)80070-6View ArticlePubMedGoogle Scholar
- Korkegian A, Black ME, Baker D, Stoddard BL: Computational thermostabilization of an enzyme. Science 2005, 308: 857–860. 10.1126/science.1107387View ArticlePubMedPubMed CentralGoogle Scholar
- Lazar GA, Marshall SA, Plecs JJ, Mayo SL, Desjarlais JR: Designing proteins for therapeutic applications. Curr Opin Struct Biol 2003, 13: 513–518. 10.1016/S0959-440X(03)00104-0View ArticlePubMedGoogle Scholar
- Schweiker KL, Makhatadze GI: A Computational Approach for the Rational Design of Stable Proteins and Enzymes: Optimization of Surface Charge-Charge Interactions. Methods in Enzymology: Computer Methods 2009, 454(Pt A):175–211. full_textView ArticleGoogle Scholar
- Liao J, Warmuth MK, Govindarajan S, Ness JE, Wang RP, Gustafsson C, Minshull J: Engineering proteinase K using machine learning and synthetic genes. Bmc Biotechnol 2007, 7: 16. 10.1186/1472-6750-7-16View ArticlePubMedPubMed CentralGoogle Scholar
- Zhou XX, Wang YB, Pan YJ, Li WF: Differences in amino acids composition and coupling patterns between mesophilic and thermophilic proteins. Amino Acids 2008, 34: 25–33. 10.1007/s00726-007-0589-xView ArticlePubMedGoogle Scholar
- Razvi A, Scholtz JM: Lessons in stability from thermophilic proteins. Protein Science 2006, 15: 1569–1578. 10.1110/ps.062130306View ArticlePubMedPubMed CentralGoogle Scholar
- Menendez-Arias L, Argos P: Engineering protein thermal stability. Sequence statistics point to residue substitutions in alpha-helices. J Mol Biol 1989, 206: 397–406. 10.1016/0022-2836(89)90488-9View ArticlePubMedGoogle Scholar
- Gianese G, Argos P, Pascarella S: Structural adaptation of enzymes to low temperatures. Protein Eng 2001, 14: 141–148. 10.1093/protein/14.3.141View ArticlePubMedGoogle Scholar
- McDonald JH: Patterns of temperature adaptation in proteins from the bacteria Deinococcus radiodurans and Thermus thermophilus. Mol Biol Evol 2001, 18: 741–749.View ArticlePubMedGoogle Scholar
- Mandrich L, Pezzullo M, Del Vecchio P, Barone G, Rossi M, Manco G: Analysis of thermal adaptation in the HSL enzyme family. J Mol Biol 2004, 335: 357–369. 10.1016/j.jmb.2003.10.038View ArticlePubMedGoogle Scholar
- Metpally RP, Reddy BV: Comparative proteome analysis of psychrophilic versus mesophilic bacterial species: Insights into the molecular basis of cold adaptation of proteins. BMC Genomics 2009, 10: 11. 10.1186/1471-2164-10-11View ArticlePubMedPubMed CentralGoogle Scholar
- Zeldovich KB, Berezovsky IN, Shakhnovich EI: Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput Biol 2007, 3: e5. 10.1371/journal.pcbi.0030005View ArticlePubMedPubMed CentralGoogle Scholar
- Berezovsky IN, Zeldovich KB, Shakhnovich EI: Positive and negative design in stability and thermal adaptation of natural proteins. Plos Computational Biology 2007, 3: 498–507. 10.1371/journal.pcbi.0030052View ArticleGoogle Scholar
- Gromiha MM, Oobatake M, Sarai A: Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophysical Chemistry 1999, 82: 51–67. 10.1016/S0301-4622(99)00103-9View ArticlePubMedGoogle Scholar
- Mcfallngai MJ, Horwitz J: A Comparative-Study of the Thermal-Stability of the Vertebrate Eye Lens - Antarctic Ice Fish to the Desert Iguana. Experimental Eye Research 1990, 50: 703–709. 10.1016/0014-4835(90)90117-DView ArticleGoogle Scholar
- Greaves RB, Warwicker J: Mechanisms for stabilisation and the maintenance of solubility in proteins from thermophiles. Bmc Struct Biol 2007, 7: 18. 10.1186/1472-6807-7-18View ArticlePubMedPubMed CentralGoogle Scholar
- Wu LC, Lee JX, Huang HD, Liu BJ, Horng JT: An expert system to predict protein thermostability using decision tree. Expert Syst Appl 2009, 36: 9007–9014. 10.1016/j.eswa.2008.12.020View ArticleGoogle Scholar
- Montanucci L, Fariselli P, Martelli PL, Casadio R: Predicting protein thermostability changes from sequence upon multiple mutations. Bioinformatics (Oxford, England) 2008, 24: I190-I195. 10.1093/bioinformatics/btn166View ArticleGoogle Scholar
- Gromiha MM, Suresh MX: Discrimination of mesophilic and thermophilic proteins using machine learning algorithms. Proteins-Structure Function and Bioinformatics 2008, 70: 1274–1279. 10.1002/prot.21616View ArticleGoogle Scholar
- Das S, Paul S, Bag SK, Dutta C: Analysis of Nanoarchaeum equitans genome and proteome composition: indications for hyperthermophilic and parasitic adaptation. Bmc Genomics 2006, 7: 186. 10.1186/1471-2164-7-186View ArticlePubMedPubMed CentralGoogle Scholar
- Haney PJ, Badger JH, Buldak GL, Reich CI, Woese CR, Olsen GJ: Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. P Natl Acad Sci USA 1999, 96: 3578–3583. 10.1073/pnas.96.7.3578View ArticleGoogle Scholar
- Sadeghi M, Naderi-Manesh H, Zarrabi M, Ranjbar B: Effective factors in thermostability of thermophilic proteins. Biophysical Chemistry 2006, 119: 256–270. 10.1016/j.bpc.2005.09.018View ArticlePubMedGoogle Scholar
- Cambillau C, Claverie JM: Structural and genomic correlates of hyperthermostability. J Biol Chem 2000, 275: 32383–32386. 10.1074/jbc.C000497200View ArticlePubMedGoogle Scholar
- Xiao L, Honig B: Electrostatic contributions to the stability of hyperthermophilic proteins. Journal of Molecular Biology 1999, 289: 1435–1444. 10.1006/jmbi.1999.2810View ArticlePubMedGoogle Scholar
- George RA, Heringa J: An analysis of protein domain linkers: their classification and role in protein folding. Protein Eng 2002, 15: 871–879. 10.1093/protein/15.11.871View ArticlePubMedGoogle Scholar
- Vogt G, Woell S, Argos P: Protein thermal stability, hydrogen bonds, and ion pairs. J Mol Biol 1997, 269: 631–643. 10.1006/jmbi.1997.1042View ArticlePubMedGoogle Scholar
- Thompson MJ, Eisenberg D: Transproteomic evidence of a loop-deletion mechanism for enhancing protein thermostability. J Mol Biol 1999, 290: 595–604. 10.1006/jmbi.1999.2889View ArticlePubMedGoogle Scholar
- Szilagyi A, Zavodszky P: Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure 2000, 8: 493–504. 10.1016/S0969-2126(00)00133-7View ArticlePubMedGoogle Scholar
- Maugini E, Tronelli D, Bossa F, Pascarella S: Structural adaptation of the subunit interface of oligomeric thermophilic and hyperthermophilic enzymes. Computational biology and chemistry 2009, 33: 137–148. 10.1016/j.compbiolchem.2008.08.003View ArticlePubMedGoogle Scholar
- Berezovsky IN, Shakhnovich EI: Physics and evolution of thermophilic adaptation. Proc Natl Acad Sci USA 2005, 102: 12742–12747. 10.1073/pnas.0503890102View ArticlePubMedPubMed CentralGoogle Scholar
- Heaton AL, Ye SJ, Armentrout PB: Experimental and theoretical studies of sodium cation complexes of the deamidation and dehydration products of asparagine, glutamine, aspartic acid, and glutamic acid. The journal of physical chemistry 2008, 112: 3328–3338.View ArticlePubMedGoogle Scholar
- Xie M, Shahrokh Z, Kadkhodayan M, Henzel WJ, Powell MF, Borchardt RT, Schowen RL: Asparagine deamidation in recombinant human lymphotoxin: hindrance by three-dimensional structures. Journal of pharmaceutical sciences 2003, 92: 869–880. 10.1002/jps.10342View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402. 10.1093/nar/25.17.3389View ArticlePubMedPubMed CentralGoogle Scholar
- Trivedi S, Gehlot HS, Rao SR: Protein thermostability in Archaea and Eubacteria. Genetics and Molecular Research 2006, 5: 816–827.PubMedGoogle Scholar
- Glyakina AV, Garbuzynskiy SO, Lobanov MY, Galzitskaya OV: Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms. Bioinformatics 2007, 23: 2231–2238. 10.1093/bioinformatics/btm345View ArticlePubMedGoogle Scholar
- Haney PJ, Stees M, Konisky J: Analysis of thermal stabilizing interactions in mesophilic and thermophilic adenylate kinases from the genus Methanococcus. J Biol Chem 1999, 274: 28453–28458. 10.1074/jbc.274.40.28453View ArticlePubMedGoogle Scholar
- Jones DT: Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology 1999, 292: 195–202. 10.1006/jmbi.1999.3091View ArticlePubMedGoogle Scholar
- Cheng J, Randall AZ, Sweredoski MJ, Baldi P: SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 2005, 33: W72–76. 10.1093/nar/gki396View ArticlePubMedPubMed CentralGoogle Scholar
- Breiman L: Random forests. Machine Learning 2001, 45: 5–32. 10.1023/A:1010933404324View ArticleGoogle Scholar
- Jain P, Garibaldi JM, Hirst JD: Supervised machine learning algorithms for protein structure classification. Comput Biol Chem 2009, 33: 216–223. 10.1016/j.compbiolchem.2009.04.004View ArticlePubMedGoogle Scholar
- Han P, Zhang X, Feng ZP: Predicting disordered regions in proteins using the profiles of amino acid indices. Bmc Bioinformatics 2009, 10(Suppl 1):S42. 10.1186/1471-2105-10-S1-S42View ArticlePubMedPubMed CentralGoogle Scholar
- Breiman L, Friedman J, Olshen R, Stone C: Classification and Regression Trees. Norwell: Kluwer Academic Publishers; 1984.Google Scholar
- Zhang GY, Fang BS: Discrimination of thermophilic and mesophilic proteins via pattern recognition methods. Process Biochemistry 2006, 41: 552–556. 10.1016/j.procbio.2005.09.003View ArticleGoogle Scholar
- Fisher RA: On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society 1922, 85: 87–94. 10.2307/2340521View ArticleGoogle Scholar
- Dubchak I, Holbrook SR, Kim SH: Prediction of Protein Folding Class from Amino-Acid-Composition. Proteins 1993, 16: 79–91. 10.1002/prot.340160109View ArticlePubMedGoogle Scholar
- Kim H, Moon EJ, Moon S, Jung HJ, Yang YL, Park YH, Heo M, Cheon M, Chang I, Han DS: New method of evaluating relative thermal stabilities of proteins based on their amino acid sequences; Targetstar. International Journal of Modern Physics C 2007, 18: 1513–1526. 10.1142/S0129183107011534View ArticleGoogle Scholar
- Goihberg E, Dym O, Tel-Or S, Levin I, Peretz M, Burstein Y: A single proline substitution is critical for the thermostabilization of Clostridium beijerinckii alcohol dehydrogenase. Proteins 2007, 66: 196–204. 10.1002/prot.21170View ArticlePubMedGoogle Scholar
- Frank Eisenhaber PA: Improved strategy in analytic surface calculation for molecular systems: Handling of singularities and computational efficiency. Journal of Computational Chemistry 1993, 14: 1272–1280. 10.1002/jcc.540141103View ArticleGoogle Scholar
- Gasteiger E, HC GA, Duvaud S, Wilkins MR, Appel RD, Bairoch A: Protein Identification and Analysis Tools on the ExPASy Server. In Book Protein Identification and Analysis Tools on the ExPASy Server. City: Humana Press; 2005:571–607. (Editor ed.^eds) (Editor ed.^eds)Google Scholar
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics 2000, 16: 404–405. 10.1093/bioinformatics/16.4.404View ArticlePubMedGoogle Scholar
- Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB: Protein disorder prediction: implications for structural proteomics. Structure 2003, 11: 1453–1459. 10.1016/j.str.2003.10.002View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.