Structure-based Markov random field model for representing evolutionary constraints on functional sites
- Chan-Seok Jeong^{1} and
- Dongsup Kim^{1}Email author
https://doi.org/10.1186/s12859-016-0948-2
© Jeong and Kim. 2016
Received: 31 October 2015
Accepted: 15 February 2016
Published: 24 February 2016
Abstract
Background
Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure.
Results
In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time.
Conclusions
The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Keywords
Background
Coevolution analysis is widely used to model the interdependency between protein residues in a multiple sequence alignment (MSA). Since it is generally believed that highly correlated mutation patterns represent evolutionary constraints resulting from structural or functional aspects [1], coevolutionary information has been widely used to describe residue-residue contacts [2], sequence comparisons [3], deleterious substitutions [4], drug-resistant positions [5], various types of functional sites [6, 7], allosteric signaling pathways [8], protein-protein interactions [9], and for protein design [10].
Despite the usefulness of coevolution information, its accurate estimation remains challenging because of various noise factors such as those derived from phylogenetic signals [11], indels [12] and indirect signals [13]. Recently, new coevolution analysis methods have been developed that are based on a type of probabilistic graphical model called the Markov random field (MRF), which have shown remarkable improvements for estimation [9, 14–19]. Unlike the earlier approaches based on local estimates [12, 20–22], the MRF methods utilize a global sequence context of multiple alignment, and thus can effectively overcome interference from indirect signal noise.
All of the MRF methods are broadly similar to each other with respect to graphical modeling and coevolution estimations. They represent an MSA as a graphical model—in which each node encodes a distribution of amino acids at a specific residue position, and each edge encodes a joint distribution of amino acids between two connected residues—and coevolution scores are calculated from the edge weights. However, because parameterization using a likelihood function is computationally challenging, recent studies have suggested different methods for learning a MRF model. GMRC [14] uses a greedy structure search that develops a graphical architecture by iteratively updating the edge set. mpDCA [9] and mfDCA [15] approximate the likelihood function by using a message-passing algorithm and a mean-field equation, respectively. PSICOV [17] uses a sparse inverse covariance estimation technique with a graphical LASSO penalty instead of directly computing the MRF model. Most recently, some methods have been proposed to replace the likelihood function with an alternative objective function, which is more tractable [16, 19]. In particular, GREMLIN has shown the most advanced performance, which relies on a pseudo-likelihood objective and parameter regularization [18]. Nevertheless, the use of MRF methods has not been comprehensively assessed with respect to the functional aspect of the coevolutionary constraint; instead, most of these assessments have thus far focused on the ability for protein structure prediction. Moreover, the accuracy of MRF methods considerably depends on the number of sequences comprising the MSA [23, 24].
In this paper, we present a structure-based MRF (SMRF) model whose graphical architecture is determined by using the protein structure information, and then derive a novel positional coevolution estimate using the node weight. We further apply the SMRF model to three data sets with different types of functional annotations, and demonstrate the association between coevolution information and functional sites. In addition, we examine the computational robustness and efficiency of the proposed SMRF-based coevolution analysis.
Methods
Structure-based Markov random field
Overview of the Markov random field model
To estimate the evolutionary constraints on functional sites, we use the MRF approach as a class of probabilistic graphical models represented as an undirected graph. Similar to a Bayesian network, the nodes of an MRF represent the variables, and the edges represent direct dependency between the variables of the neighboring nodes. However, an MRF is defined on the basis of undirected graphs and may be cyclic. This distinctive feature enables the MRF to model certain dependencies such as the symmetric influence of neighboring variables, whereas a Bayesian network forces a directionality for the interactions. Recent studies [9, 14–19] have shown that MRF methods are suitable for modeling coevolutionary relationship between residues of a protein in an MSA.
Modeling of an multiple sequence alignment
Determining the Markov random field architecture from intramolecular contact
Parameterization procedure
To minimize the objective function, \(R(v, w) - \text {pll}(v, w | \mathcal {D})\), we use the limited memory Broyden-Fletcher-Goldfarb-Shannon (L-BFGS) algorithm [26] of libLBFGS implementation [27]. Compared to the GREMLIN model, an SMRF model can be built more efficiently, despite the similarity of their parameterization procedures, because the structure-based graph topology of the SMRF effectively reduces the search space for edge weights.
Measurement of evolutionary constraints
To normalize the different distributions, the coevolution and conservation scores are transformed to Z-scores over all of the residues of the protein, respectively.
Data sets
We built three data sets, each of which describes different types of functional site information. The first data set was collected from the Catalytic Site Atlas (CSA) database [30], which annotates catalytic sites from the literature according to homology. For the CSA data set, the proteins with five or more catalytic sites annotated in the literature were collected. The second data set was collected from the AlloSteric Database (ASD) [31], which annotates allosteric sites from the literature. For the ASD data set, the proteins with five or more annotated allosteric sites were collected. The third data set was collected from the InterPro database [32], which provides comprehensive information on various types of functional sites. For the InterPro data set, the proteins with five or more annotated functional sites were collected. Next, we chose proteins whose Protein Data Bank structure has been determined by X-ray diffraction with a resolution of ≤2.5 Å. To remove sequence redundancy in the data set, the amino acid sequences of proteins were clustered to the maximum sequence identity of <50 % by running the CD-HIT [33]. In addition, for reliably parameterizing the SMRF models, we chose only proteins whose MSA consists of more than 300 sequences. The MSAs were constructed by running the HHblits [34] with the option “-e 0.001” for the NR20 sequence database (last update August 2011) downloaded from the HHblits webpage. The NR20 is the NCBI non-redundant database clustered to the maximum sequence identity of 20 %. Finally, the CSA data set consisted of 99 proteins with 628 catalytic sites, the ASD data set consisted of 54 proteins with 501 allosteric sites, and the InterPro data set consisted of 688 proteins with 15,607 functional sites.
Assessment
The central objective of our study was to use the structure information for MRF-based coevolution modeling, and to derive a novel positional coevolution estimate that could more accurately represent functional constraints. To assess the effectiveness of the protein structure-based MRF architecture, we compared the SMRF approach with the state-of-the-art MRF methods GREMLIN [18] and PSICOV [17] using the recommended default options. Additionally, we built a random predictor by randomly permutating the coevolution scores of the SMRF 100 times and calculating the average. Since GREMLIN and PSICOV were originally developed to predict residue-residue contacts similar to other previously published MRF methods [9, 14–16, 19], they can only provide scores determined for a residue pair. However, the pairwise coevolution score is not commensurable with the functional annotation determined for an individual residue. Therefore, we computed the GREMLIN-style scores derived from the edge weights of the SMRF model, and compared them with the original GREMLIN and PSICOV scores. With this approach, a positive example is defined as a residue-residue contact composed of at least one functional site, a negative example is defined as a residue-residue contact with no functional site, and residue pairs not in contact are ignored.
where TP, FP, TN, and FN represent the number of true positive, false positive, true negative, and false negative predictions at a certain cutoff. We compared the overall performance by vertically averaging the ROC curves of the target protein. Moreover, to evaluate the performance for each target protein, we used the area under the ROC curve (AUC), where an AUC value of 1 indicates perfect prediction, 0.5 indicates a random prediction, and <0.5 indicates worse than random. ROC curves and AUC scores were estimated using the ROCR package [35].
Results
Evaluation of the protein structure-based architecture
For the SMRF model, we first determined the MRF architecture by describing individual residues and their intramolecular contacts as nodes and edges, respectively, and then calculated the coevolution scores by parameterizing the MRF model. We assumed that a coevolution model explicitly encoding the protein structure could provide a better representation of functional constraints rather than only encompassing structural constraints. To validate this assumption, we examined the association between the coevolution scores and functional sites, and compared the results to those of the conventional MRF method without a structure-based architecture. We used GREMLIN [18], which incorporates the MRF architecture of a complete graph topology connecting all available residues. Except for the MRF architecture, GREMLIN and SMRF calculated the coevolution scores in the same way. For GREMLIN, the coevolution scores for intramolecular contacts were considered. Consequently, the SMRF and GREMLIN scores differed only with respect to the network architecture of the MRF models. In addition, we used PSICOV [17], which utilizes sparse inverse covariance estimation. Similar to GREMLIN, only the coevolution scores of PSICOV for intramolecular contacts were considered.
Next, for the ASD data set, the functional sites were determined as allosteric sites. As shown in Fig. 2 b, the SMRF resulted in a higher RPF rate than GREMLIN and PSICOV in the normalized rank of 0.1–1.5, with a rate of 6.3–8.3 %. On the other hand, GREMLIN and PSICOV showed RPF rates of 4.4–5.0 % and 5.2–6.0 %, respectively, which are lower than the RPF rate obtained from the random prediction (5.9–6.2 %). This implies that the coevolution scores of GREMLIN and PSICOV are not associated with allosteric sites, whereas those of SMRF are more likely to correspond to the allosteric sites as well as catalytic sites.
Evaluation of the positional coevolution measure
In contrast to the coevolution score determined for a residue pair, functionality is generally determined for an individual residue. The conventional methods used to convert the coevolution score for a residue pair to the positional coevolution score are averaging coevolution scores across neighboring links, denoted as EW, or calculating the fraction of strongly coevolving residue pairs, denoted as FC [7, 36–38]. Here, we propose a novel measure of calculating the positional coevolution score by utilizing the node weight of the MRF model, denoted as NW, and investigate the association with functional sites.
Next, for the ASD data set, we investigated the association between positional coevolution scores and allosteric sites, as shown in Fig. 3 b. Similar to the above result, EW-SMRF and FC-GREMLIN showed higher fractions of allosteric sites than EW-GREMLIN and FC-SMRF, respectively. Furthermore, for the most part, NW-SMRF showed the highest fraction of allosteric sites below the normalized rank of <0.2. In the normalized rank of 0.01–0.05, NW-SMRF showed a fraction of allosteric sites of 5.2–6.1 %, which is 1.5–3.0-times higher than that of the random prediction.
Effectiveness of positional coevolution information in combination with conservation information
Average AUC values of the logistic regression model for the InterPro data set
Feature | AUC _{0.1} | AUC _{0.2} | AUC _{0.5} | AUC |
---|---|---|---|---|
NW-SMRF | 0.023 | 0.069 | 0.275 | 0.733 |
KLD | 0.024 | 0.071 | 0.277 | 0.733 |
KLD + NW-SMRF | 0.027 | 0.076 | 0.293 | 0.758 |
JSD | 0.025 | 0.072 | 0.279 | 0.736 |
JSD + NW-SMRF | 0.027 | 0.077 | 0.294 | 0.758 |
Average AUC values of the logistic regression model for the CSA data set
Feature | AUC _{0.1} | AUC _{0.2} | AUC _{0.5} | AUC |
---|---|---|---|---|
NW-SMRF | 0.030 | 0.080 | 0.288 | 0.736 |
KLD | 0.047 | 0.115 | 0.362 | 0.839 |
KLD + NW-SMRF | 0.049 | 0.120 | 0.372 | 0.851 |
JSD | 0.045 | 0.110 | 0.353 | 0.829 |
JSD + NW-SMRF | 0.048 | 0.116 | 0.365 | 0.844 |
Average AUC values of the logistic regression model for the ASD data set
Feature | AUC _{0.1} | AUC _{0.2} | AUC _{0.5} | AUC |
---|---|---|---|---|
NW-SMRF | 0.012 | 0.040 | 0.189 | 0.607 |
KLD | 0.013 | 0.041 | 0.196 | 0.607 |
KLD + NW-SMRF | 0.013 | 0.042 | 0.202 | 0.618 |
JSD | 0.012 | 0.039 | 0.187 | 0.594 |
JSD + NW-SMRF | 0.012 | 0.041 | 0.194 | 0.609 |
Weight values for coevolution (NW-SMRF) and conservation (KLD) terms, and intercept values of logistic regression models for the InterPro, CSA, and ASD data sets
Data set | NW-SMRF | KLD | Intercept |
---|---|---|---|
InterPro | 0.241^{***} | 0.446^{***} | −3.033^{***} |
CSA | 0.226^{***} | 0.875^{***} | −4.638^{***} |
ASD | 0.125^{*} | 0.195^{**} | −3.775^{***} |
Robustness against the size of the multiple sequence alignment
Computational complexity
Discussion and conclusion
The effectiveness of SMRF for modeling evolutionary constraints derives from the fact that the graph topology is determined according to the proximity of protein residues. Explicitly encoding intramolecular contacts forces the MRF edges to share similar structural constraints, so that the edges become parameterized along other sorts of biochemical constraints, including those related to functional significance. Moreover, this approach could avoid the potential bias of a covariation measure to the core region [39], and improve the signal-to-noise ratio of coevolution information. Consequently, in conjunction with MRF methodology that considers the global context of a random variable distribution, SMRF can encode the evolutionary information associated with the functional aspect. Based on comparisons with the conventional MRF method, we have demonstrated that use of an MRF architecture derived from the three-dimensional protein structure can enhance the ability to derive information about the inter-dependencies among functional residues.
Although the edge weight of the MRF model has been commonly used for coevolution analysis, the node weight has not been utilized sufficiently. In the present work, we developed a novel positional coevolution estimate by using the node weight of the SMRF model. This positional coevolution score has a form comparable with a traditional conservation estimate; thus, the integrated analysis of coevolution and conservation information can be easily achieved. Moreover, various machine-learning methods could append the positional coevolution score as an additional component of their feature vector.
The use of a structure-based architecture in this context is particularly advantageous when there are insufficient available sequences for carrying out the conventional MRF method. Previous studies have suggested that an MSA consisting of more than 5L sequences [18] or 1000 sequences [24] is required for reliable coevolution analysis. However, in this paper, we demonstrated that the SMRF method could perform robustly for an MSA with fewer aligned sequences, which could extend the applicability of coevolution analysis. Although high-throughput sequencing progress is continuously expanding sequence databases, information on certain kinds of proteins such as newly evolved or rarely populated proteins has not been expanded from this technology. Therefore, the extended applicability of the method proposed herein could be useful for large-scale coevolution studies.
Availability of supporting data
The data set supporting the results presented in this article is available in the Zenodo repository (http://dx.doi.org/10.5281/zenodo.32989). The repository holds 1) the structures, sequences, multiple alignments, and functionality annotations for proteins of the CSA, ASD, and InterPro data sets; and 2) the pairwise and positional coevolution scores.
Software availability
Project name: SMRF Project home page: https://github.com/jeongchans/smrfArchived version: http://dx.doi.org/10.5281/zenodo.45543Programming language: Python License: MIT
Declarations
Acknowledgements
We thank all of the members of the Bioinformatics and Computational Biology Laboratory (BCBL) for helpful discussion. We are grateful to David Baker’s group for providing GREMLIN software. This work was supported by the Stem Cell Research Program (NRF-2012M3A9B4027957) from the Ministry of Science, ICT and Future Planning. This study was also supported by a grant of the Korean Health Technology R&D Project, Ministry of Health & Welfare, Republic of Korea (HI12C0014).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Lee BC, Park K, Kim D. Analysis of the residue-residue coevolution network and the functionally important residues in proteins. Proteins. 2008; 72(3):863–72.View ArticlePubMedGoogle Scholar
- Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994; 18(4):309–17.View ArticlePubMedGoogle Scholar
- Jeong CS, Kim D. Linear predictive coding representation of correlated mutation for protein sequence alignment. BMC Bioinformatics. 2010; 11(Suppl 2):2.View ArticleGoogle Scholar
- Kowarsch A, Fuchs A, Frishman D, Pagel P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput Biol. 2010; 6(9):1000923.View ArticleGoogle Scholar
- Khudyakov Y. Coevolution and HBV drug resistance. Antivir Ther (Lond). 2010; 15(3 Pt B):505–15.View ArticleGoogle Scholar
- Kuipers RKP, Joosten H-J, Verwiel E, Paans S, Akerboom J, van der Oost J, Leferink NGH, van Berkel WJH, Vriend G, Schaap PJ. Correlated mutation analyses on super-family alignments reveal functionally important residues. Proteins. 2009; 76(3):608–16.View ArticlePubMedGoogle Scholar
- Chakrabarti S, Panchenko AR. Coevolution in defining the functional specificity. Proteins. 2009; 75(1):231–40.PubMed CentralView ArticlePubMedGoogle Scholar
- Süel GM, Lockless SW, Wall MA, Ranganathan R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol. 2003; 10(1):59–69.View ArticlePubMedGoogle Scholar
- Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci U S A. 2009; 106(1):67–72.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, Russ WP, Benkovic SJ, Ranganathan R. Surface sites for engineering allosteric control in proteins. Science. 2008; 322(5900):438–42.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee BC, Kim D. A new method for revealing correlated mutations under the structural and functional constraints in proteins. Bioinformatics. 2009; 25(19):2506–13.View ArticlePubMedGoogle Scholar
- Jeong CS, Kim D. Reliable and robust detection of coevolving protein residues. Protein Eng Des Sel. 2012; 25(11):705–13.View ArticlePubMedGoogle Scholar
- Feizi S, Marbach D, Médard M, Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol. 2013; 31(8):726–33.PubMed CentralView ArticlePubMedGoogle Scholar
- Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of residue coupling in protein families. IEEE/ACM Trans Comput Biol Bioinform. 2008; 5(2):183–97.View ArticlePubMedGoogle Scholar
- Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A. 2011; 108(49):1293–301.View ArticleGoogle Scholar
- Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins. 2011; 79(4):1061–78.View ArticlePubMedGoogle Scholar
- Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012; 28(2):184–90.View ArticlePubMedGoogle Scholar
- Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A. 2013; 110(39):15674–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlin Soft Matter Phys. 2013; 87(1):012707.View ArticlePubMedGoogle Scholar
- Olmea O, Rost B, Valencia A. Effective use of sequence correlation and conservation in fold recognition. J Mol Biol. 1999; 293(5):1221–39.View ArticlePubMedGoogle Scholar
- Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol Biol Evol. 2000; 17(1):164–78.View ArticlePubMedGoogle Scholar
- Dekker JP, Fodor A, Aldrich RW, Yellen G. A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments. Bioinformatics. 2004; 20(10):1565–72.View ArticlePubMedGoogle Scholar
- Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nat Rev Genet. 2012; 30(11):1072–80.Google Scholar
- Tetchner S, Kosciolek T, Jones DT. Opportunities and limitations in applying coevolution-derived contacts to protein structure prediction. Bio-Algorithms Med-Syst. 2014; 10(4):243–54.Google Scholar
- Monastyrskyy B, D’andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact prediction in CASP10. Proteins. 2014; 82 Suppl 2:138–53.View ArticlePubMedGoogle Scholar
- Nocedal J. Updating quasi-Newton matrices with limited storage. Math Comp. 1980; 35(151):773–82.View ArticleGoogle Scholar
- Okazaki N. libLBFGS: a library of limited-memory Broyden-Fletcher-Goldfarb-Shannon (L-BFGS), Version 1.10. 2010. http://www.chokkan.org/software/liblbfgs/.
- Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008; 24(3):333–40.View ArticlePubMedGoogle Scholar
- Capra JA, Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics. 2007; 23(15):1875–82.View ArticlePubMedGoogle Scholar
- Furnham N, Holliday GL, de Beer TAP, Jacobsen JOB, Pearson WR, Thornton JM. The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Res. 2014; 42(Database issue):485–9.View ArticleGoogle Scholar
- Huang Z, Mou L, Shen Q, Lu S, Li C, Liu X, Wang G, Li S, Geng L, Liu Y, Wu J, Chen G, Zhang J. ASD v2.0: updated content and novel features focusing on allosteric regulation. Nucleic Acids Res. 2014; 42(Database issue):510–6.View ArticleGoogle Scholar
- Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador- Vegas A, Scheremetjew M, Rato C, Yong S-Y, Bateman A, Punta M, Attwood TK, Sigrist CJA, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 2015; 43(Database issue):213–21.View ArticleGoogle Scholar
- Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28(23):3150–2.PubMed CentralView ArticlePubMedGoogle Scholar
- Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2012; 9(2):173–5.View ArticleGoogle Scholar
- Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005; 21(20):3940–1.View ArticlePubMedGoogle Scholar
- Jeon J, Nam HJ, Choi Y, Yang JS, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily-coupled residues. Mol Biol Evol. 2011; 28(9):2675–85.View ArticlePubMedGoogle Scholar
- Liu Y, Bahar I. Sequence evolution correlates with structural dynamics. Mol Biol Evol. 2012; 29(9):2253–63.PubMed CentralView ArticlePubMedGoogle Scholar
- Teppa E, Wilkins AD, Nielsen M, Buslje CM. Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction. BMC Bioinformatics. 2012; 13:235.PubMed CentralView ArticlePubMedGoogle Scholar
- Talavera D, Lovell SC, Whelan S. Covariation is a poor measure of molecular coevolution. Mol Biol Evol. 2015; 32(9):2456–2468.PubMed CentralView ArticlePubMedGoogle Scholar