Sequence specificity between interacting and non-interacting homologs identifies interface residues – a homodimer and monomer use case
© Hou et al. 2015
Received: 30 March 2015
Accepted: 30 September 2015
Published: 8 October 2015
Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites.
We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9 % recall and up to 25.1 % precision.
To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites.
Protein-protein interactions (PPIs) play a central role in virtually all cellular processes. Proteins interact with other proteins to accomplish specific biological functions, such as DNA replication or RNA transcription, gene translation, gene regulation and protein transport, as well as signal transduction. Identification of interaction sites between two binding proteins is essential to understand complex formation and investigate their function (e.g., [13, 20, 53]). In particular, information about specific amino acid residues that play essential roles in protein interactions usually has a wide range of applications such as design of the targets of drugs and antimicrobials (e.g., ).
Despite continual improvement, certainly over the last decade, experimental techniques for large scale determination of PPIs are not yet able to provide comprehensive coverage over all PPIs in the detail needed to allow better understanding of the evolutionary and physical forces that govern them (e.g., [13, 23, 24]).
During the past decades, several types of computational methods have been developed for protein interaction prediction. Docking and modeling approaches that rely mainly on surface complementarity and electrostatics to predict structural complexes. These approaches fit together two known structures through their interacting surfaces, or predict protein–protein interaction sites from known monomer structures [13, 23, 41, 47]. However, these methods require structure information of proteins, which remains relatively scarce and expensive. Therefore with the increasing amount of sequence data from sequencing initiatives, a method that only uses sequence information without known structures to predict protein-protein interaction sites is becoming increasingly attractive.
Several such computational methods aim to predict the possibility of interaction between two proteins [15, 35, 52]. Perhaps the most well-known technique for predicting PPIs from sequence data is the ‘mirror tree’ method (e.g., ). This method infers interactions from the correlation of evolutionary patterns, as seen in phylogenetic trees representing each of the interaction partners. However, this correlation may instead arise from functional relatedness as well as a number of other general evolutionary mechanisms [8, 29].
Predicting intra-protein and inter-protein residue-residue contacts from sequence covariation has recently revived [19, 22, 32–34, 51, 55]. This is directly due to the availability of large amounts of sequence data and the recent development of so-called direct-coupling methods (e.g., . The idea has been studied in the eighties (e.g., ) and nineties (e.g., [16, 27]). The main limitation of these methods is that, typically, five times more sequences than the alignment length (the ‘5 L’ rule) are required [21, 22, 32, 33, 51]. For most proteins, this is not available. In addition, the application to inter-protein residue contacts is hampered by the need to construct large correlated alignments. Here, for each sequence an ortholog in the other alignment must be included, so that positional variations of the alignment of one interaction partner may be correlated with those of the other protein.
For identifying protein-protein interaction (PPI) sites, often conservation measures on sequence features are used . For example, ISIS by Ofran and Rost combine PSI-blast profiles and predicted solvent accessibility and secondary structure to predict interface sites [35, 36]. SPPIDER  uses in addition several structure-derived features in an elaborate Machine learning approach. In addition, sequence and network features [12, 48], as well as conservation in combination with specificity  are also used to predict interaction sites. Several findings indicate that the interface rim tends to be more conserved than the interface core (e.g., [5, 18, 44]), while localized conservation of single residues can indicate interaction hot spots [9, 35, 50]. At the level of PPI networks, mixed results are being reported. Some conserved PPI network motifs appear related to conserved sequence motifs [12, 48]. Overall conservation patterns, however, are found to be weak and mostly not significant (e.g., [28, 42]).
Although progress has been made in predicting binding sites from sequence information, the problem remains far from solved and several limitations persist. First, extracting evolutionary information from sequence data critically depends on sequence alignments containing large numbers of sequences. Second, most methods rely on a combination of structural and sequence features (e.g., [52, 54]). While combined methods can achieve high prediction performance, the performance of sequence-only methods remains modest [35, 37, 42].
Specificity of interaction, i.e. differences between groups of homologs that display different interactions has previously been reported. Pirovano et al.  identified interface residues by comparing homologs with different binding partners. Manning et al.  predicted positions which define sequence subfamily specificity, where some of these positions were binding sites. Based on a dataset of yeast interaction data and fungal ortholog groups, it has been suggested that, in addition, specificity between non-interacting and heteromeric interacting protein pairs might be used to detect the interaction sites . Interestingly, here only up to one hundred sequences were needed to detect the specificity signal between binding and non-binding groups, far fewer than the ‘5 L’ needed for covariation-based methods. However, the performance of their predictions is only just above random, indicating a need for a cleaner dataset for obtaining proof of principle.
In this paper, we investigate whether specificity between interacting and non-interacting subgroups can be used to predict interaction sites. To address this question, we chose homodimers as a use case to construct interacting subgroups and monomers to constitute non-interacting subgroups. In this way, we can confirm that all sequences in the interacting subgroup physically interact, and that we have a sub-group of monomers known not to (self) interact. Furthermore, the specificity signal is from compositional differences of one chain rather than multiple chains as would be the case when comparing heteromeric interacting groups with non-interacting groups. All homodimers and monomers were obtained from PISA which is a resource for exploring marcromolecular interfaces . We compiled a new database derived entirely from crystallized proteins in the PDB , and compared homodimers with homologous non-interacting monomers in a multiple sequence alignment. Starting with 9152 homodimers and 13,355 monomers, we constructed 1,592 pair groups for which we predicted putative interface residues. We found that the compositional differences between interacting and non-interacting subgroups pinpoint interface positions. We also found that various filters on the input sequences yielded a stronger specificity signal and a better prediction performance. Finally, we relate our method with a sequence-only method, SPPIDER .
The length of the sequences is at least 50 amino acids.
None of two sequences in either groups is identical.
In addition to the PISA annotation, all homodimeric and monomeric proteins are also defined as homodimers and monomers respectively in PDB.
A list of the selected homodimers and monomers can be found in the Additional file 1: Table S8 and 9).
For Test-set 1, we constructed 10 datasets based on the sequence identity (% ID) filtering ranging from 40 % to 100 % ID (e.g., 40 % means no two sequences in the dataset have more than 40 % identity): <40 %, <50 %, <60 %, <70 %, <80 %, <90 %, <95 %, <98 %, <99 %, <100 % (i.e., non-identical sequences). The filtering was done by using CD-hit . Only the longest sequence was retained of a set of sequences above the sequence identity threshold. For Test-set 2, we only use the <100 % dataset.
Interface sites (interacting residues): ASA > 0 and BSA > 0
Surface residues (Solvent-accessible residues): ASA > 0 and BSA = 0
Buried residues (Inaccessible residues): ASA = 0 and BSA = 0
Interacting and non-interacting homologs
To investigate conservation differences of interaction positions between interacting and non-interacting homologs, knowing the homologous relationship between a set of interacting homologs and a set of non-interacting homologs is essential for grouping the sequences. Using the sequence sets defined above, we created paired homodimer-monomer alignments. The paired alignments derived from the <100 (non-identical) sequence set, we refer to as the 'complete' set. Figure 1 summarizes the scheme to get homologous groups of interacting and non-interacting subgroups. We did this for each of the 10 sequence datasets of Test-set 1.
First, BLASTP  is used to detect homologous relationships in an All-against-All comparison in our custom database of all 9152 (homodimer) and 13,355 (monomer) sequences combined. For each homodimer query sequence, we search for the nearest non-interacting (monomer) homolog first. The set of interacting (homodimer) sequences that are closer (lower BLAST e-values so up in the list) than the nearest monomer sequence constitute the interacting subgroup. A minimum of five homologs is required to form a subgroup. Subsequently, the first monomer hit is also used as query, and, symmetrically, all monomer hits closer than the first homodimer hit constitute the non-interacting subgroup (again with a minimum of five sequences). These two subgroups together then compose an interacting and non-interacting pair group for conservation and specificity analysis. The requirement that each sub-group has at least five sequences ensure the necessary evolutionary information is obtained for analysis. MUSCLE , a fast alignment method, was used with default parameters to build the approximately 20,000 alignments for all the sequences in the pair group. Default parameters are used when running BLASTP and MUSCLE.
For Test-set 2, all 1416 homodimers and 2453 monomers are used to obtain homologous groups of interacting and non-interacting subgroups following the method described above. The difference is we use all the sequences (Test-set 1+ Test-set 2) as a blast database to get enough sequence information.
interacting (homodimer) protein length cut-off ranging from 50 to 600 amino acids as minimum sequence length.
High scoring Segment Pair (HSP) length from BLAST between the homodimer and its first homologous monomer hit ranging from 25 to 200 amino acids as minimum length cut-offs. The HSP length between a homodimer and its first monomer hit was used as reported by BLAST when finding the interacting and non-interacting homologs.
Scoring for conservation
where p i,x means the fraction of amino acid x at the i-th position of the sequence; the sum is over all 20 amino acids. A low sequence entropy S i represents higher evolutionary conservation. We calculated average entropies for each pair group for comparison among varied positions.
Detecting Specificity signal
Sequence Harmony [6, 39], an entropy based method, is applied to detect the compositional differences between subgroups. The program is accessible as a web-server from: http://www.ibi.vu.nl/programs/seqharmwww/.
where p H i,x indicates the observed frequency in homodimer group H for amino acid type x at position i in the sequence, and analogously for monomer group M, and the sum is over all 20 amino acids . Therefore, an SH score of 0 indicates an amino acid position that has no co-occurring residues in the two groups, indicating complete specificity between the two sequence groups, whereas an SH score of 1 indicates a complete compositional overlap between the two groups at this amino acid position. SH scores were calculated between the interacting (H) and non-interacting (M) pair group at each position in the alignment. The lower-scoring sites are then predicted to constitute the interface region.
Detecting Specificity signal
Recall (True Positive Rate, Sensitivity or Coverage) = TP/(TP + FN)
False Positive Rate (FPR) = FP/(TN + FP)
Precision (Positive Predictive Value) = TP/(TP + FP)
Comparison with other method
SPPIDER  is a machine learning-based methods that can predict interaction sites using sequence information only, although, support for sequence-only prediction in SPPIDER is experimental. We tested the performance of the SPPIDER web server using the same datasets as we used for our method. Precision and Recall were calculated for each method to compare performance.
We applied our analysis on the 10 pair-group sets created based on sequence identity (i.e., the complete set, <99 %, <98 %, <95 %, <90 %, <80 %, <70 %, <60 %, <50 % and <40 %) for Test-set 1. The number of pair-group alignments in each of the sets can be found in Additional file 1: Table S1. With lower sequence identity cut-off values, the numbers of homodimer and monomer sequences within each of the groups and the total number of groups are reduced. The complete set (Test-set 1) consists of 1593 pair-group alignments, each containing on average 25 homodimer and 14 monomer sequences per group.
Conservation differs for buried sites
As described in methods, interface sites stand for the residues for which accessible surface area becomes buried during association; surface residues are solvent-accessible, but not interface; and buried residues are inaccessible. When filtering on maximum sequence identity from <40 % to <100 % (<100 % yielding the complete dataset), average entropies went down because more similar sequences were included in the datasets. It was not surprising that the buried (inaccessible) sites were more conserved than the other two (interface and other surface sites). During evolution, proteins usually conserve their hydrophobic core to keep structural stability. However, the conservation differences between interface and the rest of the protein surface were small over the whole range of % ID cut-offs.
In summary, it is easy to separate the buried sites from surface residues because the conservation pattern differs. However, it is virtually impossible to distinguish the interaction sites from other surface sites using sequence conservation information only, because differences in conservation are generally negligible.
Specificity differs between interface, surface and buried positions
Next, we calculated the specificity between interacting and non-interacting groups for the three aforementioned types of positions using Sequence Harmony. In our hypothesis, the lower SH scores should be located at the interface positions. We took the complete dataset to show the specificity. Sequence Harmony detected compositional differences between interacting and non-interacting sub-groups for the three different types of positions. The overall average SH value for interface is 0.358, for other surface 0.375 and buried position 0.380 (p-value ≤ 0.05 between interface and other surface using two-sided Student's t-test). Indeed, the interface positions show the lowest SH scores. In other words, there is signal present in the specificity.
The second selection parameter was the interacting (homodimer) sequence length. Additional file 1: Figure S1 shows the changes of average SH score with increasing sequence length (of the interacting protein). The figure shows that with increasing sequence length, the differences between the SH scores of the three groups remained rather stable, unlike the diverging trend observed for HSP-length selection (Fig. 3.).
Interface prediction depends on HSP length and % ID filtering
In light of the observed trends in differences of SH scores, we tested if the specificity signal could be used to predict interaction sites. Predictions were validated against interaction site annotations obtained from the PISA database (see Methods for details). For all pair groups, the Area Under the Curve (AUC) of Receiver Operator Characteristic (ROC) plots was calculated as a measure of performance.
Interface prediction is a challenging problem
Comparison of performance between SPPIDER and our SH
Complete set1 : <100 % ID (Test-set 1)
Complete set2 : <100 % ID (Test-set 2)
Selected Test-set 1: 200HSP +400length
Selected Test-set 2 : 200HSP +400length
Predicting the interface that stabilizes the ligand binding regions in a phosphatase and a kinase family
To illustrate the impact of accurate interface prediction, we here show details of two protein families with the highest AUC in ROC performance in our dataset, indicating the specificity signal strongly correlates with the interface region.
The second example is Amino-imidazole riboside kinase (1TZ6), which has a homodimeric structure with one active site per monomer. The active site is covered by a lid which is supposed to be a morphological marker for evolution within the ribokinase superfamily [3, 56]. The homodimeric structure is formed through lid-to-lid interactions [3, 56]. The query monomer sequence found is 2ABS, adenosine kinase, another member of the ribokinase superfamily which can be active as a monomer. We identify 53 positions and detect 13 interacting sites out of all 24 binding sites below the SH value cut-off 0.2: Recall 54.2 %, FPR 14.7 %, Precision 24.5 %. The AUC of ROC reaches 0.746 (Additional file 1: Figure S4.)
Interaction site prediction using sequence information alone remains a challenging problem. It is particularly important in the context of increasing protein sequence data and given the relative paucity of structure information which always needs expensive and time-consuming experiments. We demonstrate that sequence specificity information from interacting proteins and their non-interacting homologs is able to detect interaction sites. To the best of our knowledge, this is the first time that predicting interaction sites using subfamily specificity by including non-interacting information is performed at a scale beyond a few protein families.
Our results show that prediction is well beyond random: The SH signal is able to obtain ROC values greater than 0.6 AUC. The AUC increases to 0.7 if the dataset is filtered on sequence length and HSP-length between interacting and non-interacting groups. HSPs are formed by BLAST local alignments based upon residue similarity, thus residues should be similar. However, this goes against looking for differences between two subgroups, which is the hallmark of SH scoring. Since shorter HSPs should have more similar alignment positions to get a high-enough BLAST score, longer HSPs may contain less similar positions; i.e., that differ more between the subgroups, here implying interface residues that confer interaction specificity. Additionally, alignments comprising shorter HSPs contain larger regions of higher divergence, which will likely lead to more false positive predictions.
For our analysis, on average only 25 homodimer and 14 monomer sequences were used in a group. Furthermore, we do not lose prediction performance with low numbers of sequences, above the minimum of 5 required in our analysis. This is a vast improvement to covariance-based methods that require an estimated five times more sequences than the length of the alignment to detect interacting sites. Thus, this opens up the possibility of deriving interaction signals from genomes with little sequence data available, or from sparsely sampled protein families.
We observe that conservation of interaction sites is indistinguishable from other surface sites (Fig. 2), which corroborates observations by others (e.g., [7, 28]). This also helps explain why predicting interaction sites only using sequence conservation information still remains a very difficult problem. We also observe low correlation (R = 0.22) between AUC of ROC plots and sequence length of query homodimers, which suggests that the size of protein and its interface are unimportant factors in interaction. That is similar to what Dhole et al.  reported recently.
The definition of 'non-interacting' in our manuscript is not that the monomer can not interact with all the other proteins but the monomer loses the interaction with another monomer (itself). It is reported that, the same protein involved in different interaction might have different binding sites . If the monomer in our non-interacting group also binds to other proteins, the interaction sites might be different. Then, the sequence specificity between the interacting sub-group (homodimer) and monomer sub-group can still be used to pinpoint the homodimer interaction sites.
Our dataset is obtained from PISA and PDB and might include any bias which the PDB has. To test this, we map the homodimer query sequences from each group in Test-set 1 (both Complete and Selected) onto CATH superfamilies and calculate the overall performance of our method (Average area Under Curve, AUC) for each superfamily. Our Test-set 1 covers all four main classes, 62.5 % (25/40) architectures, 11.5 % (158/1375) topologies, 8 % (219/2738) superfamilies in CATH while only comprising 1593 out of 69058 (2.3 %) total proteins in CATH. Our method does not only predict superfamiles which are enriched in PDB but also those for which few (homodimer) structures are solved. For the selected dataset, the sequences map to 7 superfamilies. Interestingly, we also selected one superfamily which our method can not predict well. The results of this can be found in Additional file 1: Figure S7 and S8.
Although our results demonstrate an advance in predicting interaction sites from sequence information alone, it is clear that, like other approaches, our approach also holds some limitations. First, detection of functionally conserved homologs is still a difficult problem. Thus, care should be taken in selecting homologous proteins that are likely to conserve interaction. Initially, our procedure was based on ortholog clustering databases, such as COG and OrthoMCL, but unfortunately, interaction appears to be an ill conserved property within these orthologous groups . This introduces considerable noise into the specificity signal . Second, for our prediction we need homologous sets of interacting and non-interacting groups. Not for all interacting proteins will one be able to identify non-interacting homologs. Proteins interacting with different partners might be used instead [14, 39], but that remains to be investigated further.
We have shown it is possible to predict interaction sites out of all residues by combining sequence and group specificity information. When used as a prediction method in its current form, on homodimer versus monomer data the Sequence Harmony specificity signal yields similar precision as other signals but may obtain higher coverage.
Qingzhen Hou is supported by Chinese Scholarship Council (No.2011627127). Part of this work was supported by the Netherlands Bioinformatics Centre, BioRange research programme. Bas E. Dutilh was supported by CAPES/BRASIL. The authors thank Sanne Abeln for her suggestion on using PISA database to get homodimer datasets and Paul De Geest for mapping all homodimer query sequences to CATH superfamilies.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Altschuh D, Lesk AM, Bloomer AC, Klug A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol. 1987;193(4):693–707.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.View ArticlePubMedGoogle Scholar
- Baez M, Cabrera R, Pereira HM, Blanco A, Villalobos P, Babul J. A Ribokinase Family Conserved Monovalent Cation Binding Site Enhances the MgATP-induced Inhibition in E. coli Phosphofructokinase-2. Biophysical journal. 2013;105(1):185–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Berman HM, Westbrook JD, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Research. 2000;28(1):235–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Bouvier B, Grunberg R, Nilges M, Cazals F. Shelling the Voronoi interface of protein-protein complexes reveals patterns of residue conservation, dynamics, and composition. Proteins. 2009;76(3):677–92.View ArticlePubMedGoogle Scholar
- Brandt BW, Feenstra KA, Heringa J. Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res. 2010, 38 (Web Server issue):W35–40Google Scholar
- Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES. Are protein--protein interfaces more conserved in sequence than the rest of the protein surface? Protein Science. 2004;13(1):190–202.View ArticlePubMedPubMed CentralGoogle Scholar
- De Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet. 2013;14(4):249–61.View ArticlePubMedGoogle Scholar
- De Vries SJ, van Dijk AD, Bonvin AM. WHISCY: what information does surface conservation yield? Application to data-driven docking. Proteins. 2006;63(3):479–89.View ArticlePubMedGoogle Scholar
- Dhole K, Singh G, Pai PP, Mondal S. Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. J Theor Biol. 2014;348:47–54.View ArticlePubMedGoogle Scholar
- Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113.View ArticlePubMedPubMed CentralGoogle Scholar
- Evlampiev K, Isambert H. Conservation and topology of protein interaction networks under duplication-divergence evolution. Proc Natl Acad Sci USA. 2008;105(29):9863–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Ezkurdia I, Bartoli L, Fariselli P, Casasio R, Valencia A, Tress ML. Progress and challenges in predicting protein-protein interaction sites. Brief Bioinformatics. 2009;10(3):233–46.View ArticlePubMedGoogle Scholar
- Feenstra KA, Bastianelli G, Heringa J. Predicting Protein Interactions from Functional Specificity. Jülich (Germany): John von Neumann Institute for Computing; 2008. p. 89–92.Google Scholar
- Gallet X, Charloteaux B, Thomas A, Brasseur R. A fast method to predict protein interaction sites from sequences. J Mol Biol. 2000;302:917–926.View ArticlePubMedGoogle Scholar
- Gobel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994;18(4):309–17.View ArticlePubMedGoogle Scholar
- Gohla A, Birkenfeld J, Bokoch GM. Chronophin, a novel HAD-type serine protein phosphatase, regulates cofilin-dependent actin dynamics. Nat Cell Biol. 2005;7(1):21–9.View ArticlePubMedGoogle Scholar
- Guharoy M, Chakrabarti P. Conservation and relative importance of residues across protein-protein interfaces. Proc Natl Acad Sci USA. 2005;102(43):15447–52.View ArticlePubMedPubMed CentralGoogle Scholar
- Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28(2):184–90.View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996;93(1):13–20.View ArticlePubMedPubMed CentralGoogle Scholar
- Juan D, Pazos F, Valencia A. Co-evolution and co-adaptation in protein networks. FEBS Lett. 2008;582(8):1225–30.View ArticlePubMedGoogle Scholar
- Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci USA. 2013;110(39):15674–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Kastritis PL, Bonvin AM. Are scoring functions in protein-protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J Proteome Res. 2010;9(5):2216–25.View ArticlePubMedGoogle Scholar
- Katz C, Levy-Beladev L, Rotem-Bamberger S, Rito T, Rudiger SG, Friedler A. Studying protein-protein interactions using peptide arrays. Chem Soc Rev. 2011;40(5):2131–45.View ArticlePubMedGoogle Scholar
- Kestler C, Knobloch G, Tessmer I, Jeanclos E, Schindelin H, Gohla A. Chronophin dimerization is required for proper positioning of its substrate specificity loop. J Biol Chem. 2014;289(5):3094–103.View ArticlePubMedGoogle Scholar
- Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J Mol Biol. 2007;372(3):774–97.View ArticlePubMedGoogle Scholar
- Lapedes AS, Giraud B, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects. In: Seillier-Moiseiwitsch Fco, editor. Statistics in molecular biology and genetics Volume 33. Hayward, CA: Institute of Mathematical Statistics; 1999. p. 236–56.View ArticleGoogle Scholar
- Lewis AC, Jones NS, Porter MA, Deane CM. What evidence is there for the homology of protein-protein interactions? PLoS Comput Biol. 2012;8(9):e1002645.View ArticlePubMedPubMed CentralGoogle Scholar
- Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996;257(2):342–58.View ArticlePubMedGoogle Scholar
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.View ArticlePubMedGoogle Scholar
- Manning JR, Jefferson ER, Barton GJ. The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction. BMC Bioinformatics. 2008;9:51.View ArticlePubMedPubMed CentralGoogle Scholar
- Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nat Biotechnol. 2012;30(11):1072–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics. 2014;30(17):i482–488.View ArticlePubMedPubMed CentralGoogle Scholar
- Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA. 2011;108(49):E1293–1301.View ArticlePubMedPubMed CentralGoogle Scholar
- Ofran Y, Rost B. ISIS: interaction sites identified from sequence. Bioinformatics. 2007;23(2):e13–6.View ArticlePubMedGoogle Scholar
- Ofran Y, Rost B. Predicted protein-protein interaction sites from local sequence information. FEBS Lett. 2003;544(1–3):236–9.View ArticlePubMedGoogle Scholar
- Ofran Y, Rost B. Protein-protein interaction hotspots carved into sequences. PLoS Comput Biol. 2007;3(7), e119.View ArticlePubMedPubMed CentralGoogle Scholar
- Pazos F, Ranea JA, Juan D, Sternberg MJ. Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome. J Mol Biol. 2005;352(4):1002–15.View ArticlePubMedGoogle Scholar
- Pirovano W, Feenstra KA, Heringa J. Sequence comparison by sequence harmony identifies subtype-specific functional sites. Nucleic Acids Res. 2006;34:6540–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Pommier Y, Marchand C. Interfacial inhibitors: targeting macromolecular complexes. Nat Rev Drug Discov. 2012;11(1):25–36.Google Scholar
- Pons C, Grosdidier S, Solernou A, Pérez‐Cano L, Fernández‐Recio J. Present and future challenges and limitations in protein-protein docking. Proteins. 2010;78(1):95–108.View ArticlePubMedGoogle Scholar
- Porollo A, Meller J. Computational Methods for Prediction of Protein-Protein Interaction Sites. In: Cai W, Hong H, editors. Protein-Protein Interactions - Computational and Experimental Tools. Vol. 472. Croatia: InTechOpen; 2012. p. 3–26.Google Scholar
- Porollo A, Meller J. Prediction-based fingerprints of protein-protein interactions. Proteins. 2007;66(3):630–45.View ArticlePubMedGoogle Scholar
- Rahat O, Yitzhaky A, Schreiber G. Cluster conservation as a novel tool for studying protein-protein interactions evolution. Proteins. 2008;71(2):621–30.View ArticlePubMedGoogle Scholar
- Res I, Mihalek I, Lichtarge O. An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics. 2005;21(10):2496–501.View ArticlePubMedGoogle Scholar
- Sander C, Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68.View ArticlePubMedGoogle Scholar
- Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D. Progress in modeling of protein structures and interactions. Science. 2005;310(5748):638–42.View ArticlePubMedGoogle Scholar
- Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, et al. Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA. 2005;102(6):1974–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Shenkin PS, Erman B, Mastrandrea LD. Information-theoretical entropy as a measure of sequence variability. Proteins. 1991;11:297–313.View ArticlePubMedGoogle Scholar
- Shulman-Peleg A, Shatsky M, Nussinov R, Wolfson HJ. Spatial chemical conservation of hot spot interactions in protein-protein complexes. BMC Biol. 2007;5:43.View ArticlePubMedPubMed CentralGoogle Scholar
- Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Curr Opin Struct Biol. 2013;23(3):473–9.View ArticlePubMedGoogle Scholar
- Tuncbag N, Kar G, Keskin O, Gursoy A, Nussinov R. A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Brief Bioinformatics. 2009;10(3):217–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Valencia A, Pazos F. Computational methods for the prediction of protein interactions. Curr Opin Struct Biol. 2002;12(3):368–373.View ArticlePubMedGoogle Scholar
- Wass MN, David A, Sternberg MJ. Challenges for the prediction of macromolecular interactions. Curr Opin Struct Biol. 2011;21(3):382–90.View ArticlePubMedGoogle Scholar
- Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein–protein interaction by message passing. Proc Nal Acad Sci USA. 2009;106(1):67–72.View ArticleGoogle Scholar
- Zhang Y, Dougherty M, Downs DM, Ealick SE. Crystal structure of an aminoimidazole riboside kinase from Salmonella enterica: implications for the evolution of the ribokinase superfamily. Structure. 2004;12(10):1809–21.View ArticlePubMedGoogle Scholar