- Research article
- Open Access
Protein contact order prediction from primary sequences
© Shi et al; licensee BioMed Central Ltd. 2008
Received: 20 August 2007
Accepted: 30 May 2008
Published: 30 May 2008
Contact order is a topological descriptor that has been shown to be correlated with several interesting protein properties such as protein folding rates and protein transition state placements. Contact order has also been used to select for viable protein folds from ab initio protein structure prediction programs. For proteins of known three-dimensional structure, their contact order can be calculated directly. However, for proteins with unknown three-dimensional structure, there is no effective prediction method currently available.
In this paper, we propose several simple yet very effective methods to predict contact order from the amino acid sequence only. One set of methods is based on a weighted linear combination of predicted secondary structure content and amino acid composition. Depending on the number of components used in these equations it is possible to achieve a correlation coefficient of 0.857–0.870 between the observed and predicted contact order. A second method, based on sequence similarity to known three-dimensional structures, is able to achieve a correlation coefficient of 0.977. We have also developed a much more robust implementation for calculating contact order directly from PDB coordinates that works for > 99% PDB files. All of these contact order predictors and calculators have been implemented as a web server (see Availability and requirements section for URL).
Protein contact order can be effectively predicted from the primary sequence, at the absence of three-dimensional structure. Three factors, percentage of residues in alpha helices, percentage of residues in beta strands, and sequence length, appear to be strongly correlated with the absolute contact order.
Considerable computational and experimental efforts over the past three decades have been devoted to learning about or predicting how proteins fold. Experimentally, insights into protein folding mechanisms can be gained by measuring bulk properties such as protein folding rates [1, 2], free energies of folding  or hydrogen exchange rates  and correlating them with molecular properties such as secondary structure , molecular topology  and solvent accessibility . One of the more remarkable observations to emerge over the past decade is that protein folding rates vary over many orders of magnitude, from microseconds  to hours . These experimental observations, in combination with theoretical studies, have led to a general agreement that protein folding mechanisms and folding landscapes are largely determined by the topology of the native protein and are relatively insensitive either to the details of the inter-atomic interactions [6, 8–11] or to the length of the protein .
To better quantify the topology and the stability of protein native states, the concept of contact order (CO) was proposed in 1998 . Contact order is essentially a measure of non-adjacent amino acid proximity within a folded protein. More specifically, two distinct amino acid residues in a protein are said to form a contact when there is a pair of heavy atoms (C, O, S or N), one from each residue, whose physical (euclidean) distance is within 6 Å . The absolute contact order, denoted as Abs_CO, of a protein is defined as the average number of residues separating the contacts inside the protein (where two sequentially adjacent residues are separated by one residue). The relative contact order, or simply the contact order, is denoted as CO. Essentially, CO measures the average sequence separation between contacting residues in the native state of a protein normalized by the protein length, and intuitively, when the portion of interacting atoms which are far away in the protein sequence grows, CO increases.
Both positive and negative correlations have been found between CO and several bulk protein properties such as protein folding rate and transition state placements [6, 11–16]. For example, previous experimental results have shown that the logarithm of a protein's folding rate is linearly correlated with CO of the protein in its native state . A similar but inverse correlation between CO and the protein folding transition state placements has also been observed . Early studies have suggested that Abs_CO exhibited a weaker correlation with two-state protein folding kinetics than CO does [6, 9]. More recently, Ivankov et al.  showed that Abs_CO is a more appropriate parameter to predict the folding rate of proteins as it actually spans a wider range of folding state kinetics (i.e., two-state, multi-state, and short peptides) . Consequently, some of the more promising applications of CO prediction or calculation lie in the prediction of protein folding rates, folding transition state placements, and other folding properties.
In addition to its application in predicting protein folding kinetic properties, contact order has also been shown to have some utility in ab initio protein structure prediction . In particular, it has been observed that during the candidate structure generation stage in ab initio structure prediction programs, decoys with higher topological complexity are more likely to be under-sampled, especially among larger proteins. Normalizing the CO distribution of candidate structures has been shown to alleviate such a bias, and, as a result, better protein structure predictions were generally achieved . In fact, contact order filtering is now an integral part of the Rosetta protein structure prediction package .
In this study, we adopted the CO definition in which two distinct amino acid residues in a protein form a contact when there is a pair of heavy atoms (C, O, S or N), one from each residue, whose physical (euclidean) distance is within 6 Å [6, 11]. We note that in the literature, there are several different definitions of CO. For instances, Bonneau et al. suggested that two residues form a contacting pair if and only if they are sequentially at least 3 residues away from each other and their β-carbons are within 8 Å . Yuan studied different distance thresholds (6 Å, 8 Å, 10 Å, 12 Å, 14 Å) in Plaxco et al.'s definition and concluded that they did not significantly affect the prediction accuracy . It has also been suggested that sequentially adjacent residues should not be considered to be a contact in Plaxco et al.'s definition. Note that although these variants use different parameters in defining a contact, the underlying ideas of using CO to quantify the topology of a protein's native state tertiary structure are similar. In the literature, there are also several well-studied concepts related to CO such as residue contact order [19–22], contact number [18, 23, 24], and residue contact number [25–28]. These quantities are largely used to characterize protein native structure, but unlike contact order, they are not directly correlated to some global protein properties such as protein folding rate and folding transition state placements. While some researchers [14, 15] have tried to predict protein folding rate from the amino acid sequence directly, typically they only tested their methods on very small data-sets and the results were subject to the overfitting problem .
For proteins with solved three-dimensional structures, their COs can be calculated exactly using the equations given below (in Methods), according to the definition given by Plaxco et al. [6, 11]. In fact, a web server (albeit with limited functionality) has been developed that calculates contact order when given an appropriately formatted PDB coordinate file . However, to the best of our knowledge, there is no CO prediction method available when the three-dimensional structure of the target protein is unknown. Given that only a tiny fraction of protein 3D structures are known and given the utility of contact order in the understanding and prediction of protein folding rates and protein folds, we decided to tackle the problem of predicting CO for proteins with unknown three-dimensional structures (i.e., predicting CO using only the amino acid sequence as input). In addressing this problem we wanted to develop a method that could accurately predict or robustly calculate contact order regardless of whether the 3D structure was known or not. Therefore, three scenarios are possible: 1) the input sequence exactly matches a known 3D structure; 2) the input sequence is homologous (> 20% sequence identity, computed as the number of identical residues divided by the query sequence length) to a known 3D structure and 3) the input sequence does not match any known structure. As described below, we have succeeded in developing a combination of methods that is capable of predicting CO with a correlation between the observed and predicted values ranging from 0.857 (for scenario 3) to 0.977 (for scenario 2) to 1.000 (for scenario 1). Details regarding the implementation, testing and performance of these methods are given below.
Contact order calculation
Because the relative CO is defined as Abs_CO normalized over protein length, exactly the same prediction accuracy can be achieved for Abs_CO as for CO. In this study, we focus on calculating and predicting Abs_CO, from which the corresponding CO can be trivially calculated. We implemented a contact order calculator that determines the Abs_CO value from the PDB coordinates of an input protein using the methods described above. The program was tested and validated against a large number of files for which the Abs_CO values had been previously published.
Prediction by homology
Many protein properties, including tertiary structure, secondary structure and solvent accessibility can be predicted via homology . In other words, the properties of a query sequence can be predicted by directly transferring the properties or features of a homologous protein to the query protein. Since CO is a property that is a function of structure, we hypothesized that the calculated CO of known 3D structures could be used to predict the CO of homologous proteins. In implementing this approach we calculated the Abs_CO (using the method described in the last "Contact order calculation" section) for 16, 499 non-redundant proteins obtained from the PDB. These proteins were selected using the PDB culling/filtering service called PISCES . Structures were initially selected using a 95% identity sequence-redundancy cutoff and a requirement for better than 3 Å resolution (for X-ray structures). Structures were further processed by removing disordered structures (secondary structure content < 10%) as well as all membrane proteins (membrane beta barrel and transmembrane helix proteins). The resulting CO database consisted of 16, 499 sequences in FASTA format with the Abs_CO value listed in the sequence name header. A local copy of BLAST  was installed which used this FASTA-formatted CO database as the search database. For a hit to be considered to be significant the query sequence must exhibit more than 20% sequence identity (computed as the number of identical residues divided by the query sequence length) to a protein in the CO database and the query sequence must be ± 40% of the length of the matching homologue. If these two criteria are met, then the contact order is transferred to the query protein. If any of these criteria is not met, then the contact order is predicted using the method described in the next "Prediction by regression" section. Tests through 5-fold cross validation on the CO database were performed using a variety of sequence identity cutoffs and sequence-length thresholds to assess their influence on both the accuracy and the coverage (coverage refers to the percentage of query sequences that could be predicted by this homology-based method). Overall, the 20% sequence identity cutoff and the 40% length threshold provided the best accuracy-to-coverage tradeoff.
Prediction by regression
In order to deal with the situation where no homologue can be found to predict the CO value (the last "Prediction by homology" section) we developed and tested a regression-based approach that permits accurate prediction of CO for any water-soluble protein. Let p(α) denote the percentage of residues in alpha-helices and p(β) denote the percentage of residues in beta strands in the protein. We observed that Abs_CO correlates well with a linear combination of p(α), p(β), and the protein length L. Given this observation we decided to use linear regression to optimize the correlation between Abs_CO and the protein primary and secondary structures, as follows:Abs_CO = χ1 · p(α) + χ2 · p(β) + χ3 · L + c, (3)
where χ i , i = 1, 2, 3, are the coefficients of the three factors p(α), p(β), and L, and c is a constant value in the linear regression. Note that for proteins with unknown three-dimensional structure, their secondary structures are also unknown. Therefore, as part of the linear regression process as specified in Formula (3), we predict their secondary structure content using Proteus . Proteus is a secondary structure predictor that uses sequence alignment to achieve highly accurate predictions (Q3 accuracy score of 81.3% or greater), where homologs are identified using an E-value of < 0.01 and the secondary structures derived from VADAR  and the PPT-Database .
Using a large dataset of 933 high resolution three-dimensional protein structures (see Results section), the parameters in Formula (3) localize at χ1 = -6.8968, χ2 = 7.6216, χ3 = 0.0612, and c = 8.0397.
Subsequently, given any query protein, we may use Proteus again to predict p(α) and p(β) values, and then report its Abs_CO asAbs_CO = -6.8968p(α) + 7.6216p(β) + 0.0612L + 8.0397.
In the Results section, we will demonstrate the effectiveness of this stunningly simple prediction method.
In addition to this 3-factor CO predictor (Formula (4), and denoted as F3-LR), which has been implemented on our web server, we also developed other linear equations that considered more factors that might be strongly correlated to Abs_CO. For example, we added four other factors to F3-LR to create a 7-factor linear regression formula. These four factors are 1) the number of beta hairpins (two adjacent beta-strand segments form a hairpin if they are separated by 2 to 5 residues), 2) the number of distant beta strands (two adjacent beta-strand segments are considered "distant" if they are separated by at least 5 residues), 3) the number of Cysteine residues (C), and 4) the number of hydrophobic amino acid residues (V, I, L, M, F, W, C). Among these four factors, the latter two are obtained from the primary sequence, while the former two are extracted from the secondary structures predicted using Proteus. This method is denoted as F7-LR. The third method known as F27-LR considers 27 factors. These 27 factors include the first 5 factors in the F7-LR method, (the other two factors in the F7-LR method, the number of Cysteine residues and the number of hydrophobic amino acid residues, are replaced by) 19 amino acid frequencies of the 20 ones in the target protein, and 3 hydrophobicity frequencies defined as follows. For each amino acid type, its frequency in the target protein is defined as the number of occurrences divided by L, the length of the protein. Since the sum of all 20 such frequencies is 1, only 19 of them are included in the regression (to avoid redundancy). Next, for each residue in the target sequence, the hydrophobicity information of both the preceding and the succeeding residues are recorded. As a result, every residue, except the first and the last, is associated with one of the four labels: "HH", "HP", "PH", and "PP", where 'H' denotes hydrophobic and 'P' denotes hydrophilic. The frequency of "HH" is defined as the number of residues labeled with "HH" divided by L - 2. The other three frequencies are similarly defined, and their sum is exactly 1. For the same reason, only 3 of them are included in the regression.
We also tested two other regression methods: Support Vector Regression (SVR)  and Neural Network (NN) . Combining these two regression methods, we have F3-SVR, F7-SVR, F27-SVR, and F3-NN, F7-NN, F27-NN. Performance of these nine different regression methods was assessed using a number of criteria to identify the best performing approach (see Results).
Public web server
We have implemented the above contact order calculator, the homology-based contact order predictor, and the linear regression based contact order predictors as a public web server . The input to the server can be either a three-dimensional structure (either uploading the PDB file or key in the PDB id), or the primary sequence of the query protein. When the input is a sequence, our server will first use BLAST to identify sequences that are either identical or homologous to those in our CO database. There are three possible scenarios: 1) If the input is a 3D structure, or the input sequence exactly matches a known structure in our database, our server will calculate its Abs_CO directly using Formula (1); 2) If the input is a sequence and the BLAST search finds a homolog that is not an exact match but satisfies the criteria described in the "Prediction by homology" section, the pre-computed Abs_CO of the homologue is used as the predicted Abs_CO of the query sequence; 3) If the input is a sequence and has no BLAST match that falls into the second scenario, our server will call Proteus to predict the secondary structure content for the query protein, and then report its Abs_CO using Formula (4). Average calculation times are around 35 seconds for the CO calculator and about 27 seconds for the CO predictor.
Results and Discussion
SCOP classification of the 933 training monomeric proteins
All alpha proteins
All beta proteins
Alpha and beta proteins
Contact order calculation
We implemented an absolute contact order calculator based on Formula (1) in the web server . We tested our server using the monomeric protein dataset (described in the above "Prediction by homology" section) and compared it with a previously published contact order calculation server . The two servers returned nearly identical contact order values with a correlation coefficient of 0.999. However, the other server failed (tested on May 2, 2007) to recognize 61% of the input PDB files while our server successfully processed all of the PDB files. As to the runtime, we observed no noticeable difference between the two servers.
Results on homology-based prediction
Results on regression-based prediction
Performances of all 9 regression-based Abs_CO prediction methods
Average Percent Correct
Method 1 (F3)
Method 1 (F7)
Method 1 (F27)
Method 2 (F3)
Method 2 (F7)
Method 2 (F27)
Method 3 (F3)
Method 3 (F7)
Method 3 (F27)
Rows 6–11 in Table 2 summarize the correlation coefficients and the average percent correct values when the percentage of residues in alpha helices and beta strands (p(α) and p(β), Method 1) are substituted by the numbers of residues in alpha helices and beta strands (q(α) and q(β), Method 2), as well as by the numbers of alpha helices and beta strands (n(α) and n(β), Method 3), respectively, for all 9 regression based prediction methods. The three explicit three-factor linear regression formulae are also included in the caption. Two interesting observations are: 1) The coefficient for the term "number of residues in beta strands", q(β), is 0 in the second regression; 2) Using the percentage of residues in alpha helices and beta strands gave the best correlation coefficient, while using the other two sets of parameters on secondary structure content performed comparably well though slightly worse.
In this paper, we proposed a simple yet very effective method to predict protein contact order from primary sequences. We discovered three factors (i.e., percentage of residues in alpha helices, percentage of residues in beta strands, and sequence length) that appear to be strongly correlated with the absolute contact order. Tests using a large dataset of high resolution monomeric proteins showed that our method achieved a correlation coefficient between the predicted and the actual absolute contact orders of 0.857–0.870. Several other factors were also identified and shown to correlate with the absolute contact order, including amino acid composition and adjacent residue hydrophobicity. In addition, we have also shown that it is possible to use sequence homology to accurately predict the contact order for proteins for which no 3D structure exists. This latter approach, which is extremely fast (less than a second) and accurate (correlation coefficient 0.977), avoids the need to have to generate and refine an accurate 3D homology model or to use extensive computer resources to calculate the contact order. Therefore, using a combination of homology-based prediction and regression-based prediction, we have shown that it is possible to rapidly and accurately predict the contact order of any water-soluble protein for which the sequence is known. All of these methods for CO prediction and calculation are freely available through the web server .
Availability and requirements
Web server: http://www.copredictor.ca.
The authors are grateful to the research support from Alberta Prion Research Institute, PrioNet Canada, NSERC, CFI, and iCORE.
- Kim PS, Baldwin RL: Intermediates in the folding reactions of small proteins. Annual Review of Biochemistry 1990, 59: 631–660. 10.1146/annurev.bi.59.070190.003215View ArticlePubMedGoogle Scholar
- Kubelka J, Hofrichter J, Eaton WA: The protein folding "speed limit". Curr Opin Struct Biol 2004, 14: 76–88. 10.1016/j.sbi.2004.01.013View ArticlePubMedGoogle Scholar
- Tanaka S, Scheraga HA: Model of protein folding: inclusion of short-, medium-, and long-range interactions. Proc Natl Acad Sci U S A 1975, 72: 3802–3806. 10.1073/pnas.72.10.3802PubMed CentralView ArticlePubMedGoogle Scholar
- Fezoui Y, Braswell EH, Xian W, Osterhout JJ: Dissection of the de novo designed peptide alpha-t-alpha: stability and properties of the intact molecule and its constituent helices. Biochemistry 1999, 38: 2796–2804. 10.1021/bi9823838View ArticlePubMedGoogle Scholar
- Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22: 2577–2637. 10.1002/bip.360221211View ArticlePubMedGoogle Scholar
- Plaxco KW, Simons KT, Baker D: Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 1998, 277: 985–994. 10.1006/jmbi.1998.1645View ArticlePubMedGoogle Scholar
- Lee B, Richards FM: The interpretation of protein structures: Estimation of static accessibility. Journal of Molecular Biology 1971, 55: 379–380. 10.1016/0022-2836(71)90324-XView ArticlePubMedGoogle Scholar
- Alm E, Baker D: Matching theory and experiment in protein folding. Current Opinion in Structural Biology 1999, 9: 189–196. 10.1016/S0959-440X(99)80027-XView ArticlePubMedGoogle Scholar
- Grantcharova V, Alm EJ, Baker D, Horwich AL: Mechanisms of protein folding. Current Opinion in Structural Biology 2001, 11: 70–82. 10.1016/S0959-440X(00)00176-7View ArticlePubMedGoogle Scholar
- Bonneau R, Ruczinski I, Tsai J, Baker D: Contact order and ab initio protein structure prediction. Protein Science 2002, 11: 1937–1944. 10.1110/ps.3790102PubMed CentralView ArticlePubMedGoogle Scholar
- Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco KW, Baker D, Finkelstein AV: Contact order revisited: Influence of protein size on the folding rate. Protein Sci 2003, 12: 2057–2062. 10.1110/ps.0302503PubMed CentralView ArticlePubMedGoogle Scholar
- Baker D: A surprising simplicity to protein folding. Nature 2000, 405: 39–42. 10.1038/35011000View ArticlePubMedGoogle Scholar
- Koga N, Takada S: Roles of native topology and chain-length scaling in protein folding: a simulation study with a Go-like model. Journal of Molecular Biology 2001, 313: 171–180. 10.1006/jmbi.2001.5037View ArticlePubMedGoogle Scholar
- Ivankov DN, Finkelstein AV: Prediction of protein folding rates from the amino acid sequence-predicted secondary structure. Proceedings of the National Academy of Sciences of the USA 2004, 101: 8942–8944. 10.1073/pnas.0402659101PubMed CentralView ArticlePubMedGoogle Scholar
- Gromiha MM, Thangakani AM, Selvaraj S: FOLD-RATE: prediction of protein folding rates from amino acid sequence. Nucleic Acids Research 2006, 34: W70-W74. 10.1093/nar/gkl043PubMed CentralView ArticlePubMedGoogle Scholar
- Gromiha MM, Selvaraj S: Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction. Journal of Molecular Biology 2001, 310: 27–32. 10.1006/jmbi.2001.4775View ArticlePubMedGoogle Scholar
- Chivian D, Kim DE, Malmstrom L, Schonbrun J, Rohl CA, Baker D: Prediction of CASP6 structures using automated Robetta protocols. Website 2005. [http://robetta.bakerlab.org/pub/dylan/]Google Scholar
- Yuan Z: Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinformatics 2005, 6: 248. 10.1186/1471-2105-6-248PubMed CentralView ArticlePubMedGoogle Scholar
- Kinjo AR, Nishikawa K: Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structure from amino acid sequence using critical random networks. Biophysics 2005, 1: 67–74. 10.2142/biophysics.1.67View ArticleGoogle Scholar
- Kihara D: On the effect of long range interactions on secondary structure formation in proteins. Protein Science 2005, 14: 1955–1963. 10.1110/ps.051479505PubMed CentralView ArticlePubMedGoogle Scholar
- Kinjo AR, Nishikawa K: CRNPRED: highly accurate prediction of one-dimensional protein structures by large-scale critical random networks. BMC Bioinformatics 2006, 7: 401. 10.1186/1471-2105-7-401PubMed CentralView ArticlePubMedGoogle Scholar
- Song J, Burrage K: Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinformatics 2006, 7: 425. 10.1186/1471-2105-7-425PubMed CentralView ArticlePubMedGoogle Scholar
- Kinjo AR, Nishikawa K: Recoverable one-dimensional encoding of three-dimensional protein structures. Bioinformatics 2005, 21: 2167–2170. 10.1093/bioinformatics/bti330View ArticlePubMedGoogle Scholar
- Kinjo AR, Horimoto K, Nishikawa K: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 2005, 58: 158–165. 10.1002/prot.20300View ArticlePubMedGoogle Scholar
- Fariselli P, Casadio R: RCNPRED: prediction of the residue co-ordination numbers in proteins. Bioinformatics 2001, 17: 202–204. 10.1093/bioinformatics/17.2.202View ArticlePubMedGoogle Scholar
- Pollastri G, Baldi P, Fariselli P, Casadio R: Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics 2001, 17: S234-S242.View ArticlePubMedGoogle Scholar
- Pollastri G, Baldi P, Fariselli P, Casadio R: Prediction of coordination number and relative solvent accessibility in proteins. Proteins 2002, 47: 142–153. 10.1002/prot.10069View ArticlePubMedGoogle Scholar
- Ishida T, Nakamura S, Shimizu K: Potential for assessing quality of protein structure based on contact number prediction. Proteins 2006, 64: 940–947. 10.1002/prot.21047View ArticlePubMedGoogle Scholar
- Mitchell T: Machine Learning. McGraw Hill; 1997.Google Scholar
- Calculate the Contact Order of Proteins[http://depts.washington.edu/bakerpg/contact_order/]
- Montgomerie S, Sundararaj S, Gallin W, Wishart DS: Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics 2006, 7: 301. 10.1186/1471-2105-7-301PubMed CentralView ArticlePubMedGoogle Scholar
- Wang G, Dunbrack RL Jr: PISCES: a protein sequence culling server. Bioinformatics 2003, 19: 1589–1591. 10.1093/bioinformatics/btg224View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. Journal of Molecular Biology 1990, 215: 403–410.View ArticlePubMedGoogle Scholar
- Willard L, Ranjan A, Zhang H, Monzavi H, Boyko RF, Sykes BD, Wishart DS: VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Research 2003, 31: 3316–3319. 10.1093/nar/gkg565PubMed CentralView ArticlePubMedGoogle Scholar
- Wishart DS, Arndt D, Berjanskii M, Guo AC, Shi Y, Shrivastava S, Zhou J, Zhu Y, Lin GH: PPT-DB: The Protein Property Prediction and Testing Database. Nucleic Acids Research 2008, 36: D222-D229. 10.1093/nar/gkm800PubMed CentralView ArticlePubMedGoogle Scholar
- Smola AJ, Schölkopf B: A tutorial on support vector regression. Statistics and Computing 2003, 14: 199–222. 10.1023/B:STCO.0000035301.49549.88View ArticleGoogle Scholar
- Anderson JA: An Introduction to Neural Networks. MIT Press; 1995.Google Scholar
- Protein Contact Order Prediction/Calculation Web Sever[http://www.copredictor.ca]
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG: SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 2004, 32: D226-D229. 10.1093/nar/gkh039PubMed CentralView ArticlePubMedGoogle Scholar
- Shindyalov IN, Bourne PE: Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering 1998, 11: 739–747. 10.1093/protein/11.9.739View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.