Volume 8 Supplement 4
Using structural motif descriptors for sequence-based binding site prediction
© Henschel et al; licensee BioMed Central Ltd. 2007
Published: 22 May 2007
Many protein sequences are still poorly annotated. Functional characterization of a protein is often improved by the identification of its interaction partners. Here, we aim to predict protein-protein interactions (PPI) and protein-ligand interactions (PLI) on sequence level using 3D information. To this end, we use machine learning to compile sequential segments that constitute structural features of an interaction site into one profile Hidden Markov Model descriptor. The resulting collection of descriptors can be used to screen sequence databases in order to predict functional sites.
We generate descriptors for 740 classified types of protein-protein binding sites and for more than 3,000 protein-ligand binding sites. Cross validation reveals that two thirds of the PPI descriptors are sufficiently conserved and significant enough to be used for binding site recognition. We further validate 230 PPIs that were extracted from the literature, where we additionally identify the interface residues. Finally we test ligand-binding descriptors for the case of ATP. From sequences with Swiss-Prot annotation "ATP-binding", we achieve a recall of 25% with a precision of 89%, whereas Prosite's P-loop motif recognizes an equal amount of hits at the expense of a much higher number of false positives (precision: 57%). Our method yields 771 hits with a precision of 96% that were not previously picked up by any Prosite-pattern.
The automatically generated descriptors are a useful complement to known Prosite/InterPro motifs. They serve to predict protein-protein as well as protein-ligand interactions along with their binding site residues for proteins where merely sequence information is available.
Exhaustive knowledge about protein interactions is a prerequisite to understanding the molecular machinery of the cell. While comprehensive protein sequence databases are available, the number of known PPIs is still small. In addition, experimentally proven PPIs often do not reveal the binding sites involved. The implications of the discovery of binding sites are manifold: the discovery of patterns in amino acid arrangements is of general importance in the study of protein-protein interactions. Furthermore, docking algorithms benefit greatly from the correct prediction of binding sites. Finally, interaction prediction is the key to mapping global interaction networks and signalling pathways, and may help elucidate the functions of individual proteins. Complementary to experimental techniques are computational approaches that analyze and predict protein-protein interactions. Sequence-based methods include gene context conservation , synthetic lethality , phylogenetic profiling [3, 4] or co-evolution of gene expression . Various databases of binding sites and interfaces between proteins and their domains exist [6–8]. An extensive list of prediction methods can be found in .
As pointed out by Bailey and Gribskov , the signal-to-noise ratio in homology searches can be improved by using sets of motifs that characterize a family. In this study, we aim to create descriptors for all relevant sequence parts of structurally known protein-protein and protein-ligand binding sites. These binding sites are often well-conserved , but their segmented nature on sequence level has to be taken into account for sequence similarity searches (Figure 1B). In accordance with previous work [19, 20], we define an interface between two proteins to consist of two faces. Similar faces can be clustered geometrically into face types, and similar interfaces can be clustered into interface types .
Many approaches for interaction prediction and function annotation require structural knowledge about the protein of interest [21–23]. By waiving this requirement, interaction prediction is applicable to a much wider range of sequences but becomes a substantially harder problem. It has been addressed previously (see e.g. [24, 25]). Most notably, Li and Li  discover stable and significant interface motifs and represent them with regular expressions, while Espadaler et al.  prove the usefulness of HMMs for this endeavor. Both approaches use single structural templates as seeds for generalization with further sequences, coming from either similarity search or random generation. However, several structures for a particular kind of domain-domain interaction are available, each providing new insights into the sequential variability of the actual interacting residues. Novel to our method is the incorporation of as many structures as possible for each binding site descriptor. The benefits are intriguing as protein-protein interactions from complex structures are considered to be the most reliable source of interaction data.
We compiled a library that comprises profile HMM descriptors for 740 protein-protein and 3000 protein-ligand binding sites in the Protein Data Bank PDB . Each descriptor describes one face. These descriptors, totalling more than 3,740, characterize an interaction/ligand binding site on sequence level. Hence, given a query sequence of interest, it is possible to compare it to each interface descriptor, thus identifying binding sites to possible interaction partners including ligands. Gene Ontology (GO)  annotations are linked to each descriptor from the original PDB entries that were used for its construction. The complete list of profile HMM descriptors is directly usable with the HMMer package  and is freely available for academics upon request.
PPI descriptor construction
Based on the family level of the Structural Classification of Proteins, SCOP , we can extract and classify all domain-domain interactions found in the PDB. This classification is available in the SCOPPI database . As pointed out by Kim et al., even homologous domain pairs can associate in geometrically different ways by employing different sets of residues to form interfaces [19, 20]. Consequently, the corresponding interface profiles would differ substantially and combining the information about interacting residues to a profile would be meaningless. However, often a number of domain-domain interactions expose striking similarities and it is desirable to collect all instances of one interface type for the calculation of the respective interface profile. We therefore compose descriptors for all interface types in SCOPPI by combining all instances of that interface type. When data for interface types is sparse, we utilize sequence data provided by HSSP . Often several sequentially remote segments contribute to a binding site. To accommodate for this phenomenon, we adopt the multiple-motif approach from PRINTS , MAST  and Meta-Meme  to represent binding sites as a collection of small HMMs for one local binding motif. Thus we describe only the important sequence parts that form a structural feature. To represent the full sequence space of a whole family with a weight matrix or a profile-HMM, a large number of sequences is required, in particular for families of strong sequence variability. However, considerably fewer sequences are needed for short, conserved motifs.
PLI descriptor construction
As large protein complexes are often difficult to crystallize, knowledge about protein-protein interactions can be drawn from the more abundant protein-peptide interactions. The descriptor construction method can be extended to these cases in particular and to PLI in general. We construct HMM profiles for faces that bind to small molecules and peptides. To this end we scanned PDB for most frequently occuring co-crystallized ligands. We identify the residues surrounding the ligand incl. possible cofactors. A profile is solely built from one structural template and aligned sequences utilizing HSSP .
Assessing the performance of PLI descriptors
Benchmarking HMMer E-values
Expectation values provided by the HMMer software are a means for assessing the significance of HMM hits. As demonstrated by Li et al. , the statistical evaluation to randomness can be used to establish a Z-score to distinguish significant from random hits. Here, we use comparisons to shuffled databases to gain further information about the significance: by calculating the ratio of best E-values of hits from shuffled and not shuffled sequence databases. For database shuffling, we generated a random permutation for every single sequence in the database. In the sequel, we demonstrate the use of shuffled databases for one particular ligand – adenosine triphosphate (ATP).
Case study: ATP-binding sites
Matches for ATP- and ADP-binding descriptors with various E-value thresholds.
Figures 3A–D illustrate the use of E-value ratios of unshuffled vs shuffled sequence databases. In case of the ATP-binding descriptors searched against Swiss-Prot, the comparison of the two E-value distributions (shuffled/not shuffled) allows the identification of a significance threshold (Figure 2B).
Note that small motifs like e.g. the poly-proline (PxxP) motif occur frequently in sequence where only 5% are functional. Nevertheless, hits of small motifs are helpful to identify candidate sets that should undergo manual postprocessing.
Assessing the performance of PPI descriptors
Cross validation with structure data sets
The initial data set comprised 740 interface descriptors of protein-protein interactions, each having at least three non-redundant instances (not more than 98% sequence identity). In order to check the ability of face descriptors to recognize faces from structures in the PDB, we performed 2-, 4- and 8-fold cross validations on interface types with at least 8 non-redundant instances. This yields a set of 45 interface types, i.e. 90 face types. This set was run against domains from PDB that are classified by SCOP. For 61 face types (67%), a reasonable recall of 70%–100% was achieved (additional file 1). Face types with low recall often have small interfaces with short, dispersed segments producing insignificant hits or have low face conservation. Another source of errors is misalignments of sequences of an interface type. In five cases, the recall could be improved by adding homologous sequences from HSSP . The recall for interaction prediction by requiring both face types to be present is upper bound by the minimal recall of both faces. Hence, the average recall for interface detection dropped to 39%. This problem is most eminent for predicting interactions of promiscuous faces. Using the Structural Classification of Protein-Protein Interactions (SCOPPI ), it is then still possible to provide candidate interaction partners. In particular, for dedicated faces, i.e. those with just one opposite binding face, recognition of one face type suffices.
Literature protein-protein interaction benchmark
To investigate how well our method is suited to detect protein-protein interactions, we benchmark it against a set of high quality literature-curated interactions. To this end, we use a subset of NetPro, an expert curated and annotated database containing ~15,000 protein-protein interactions . These were extracted from PubMed abstracts by a semi-automated method and then cross-checked by human experts.
Number of interactions found
Using the descriptors to annotate uncharacterized sequences
We obtained a corpus of 32,000 uncharacterized proteins from the NCBI's non-redundant protein sequence database. Face descriptors were run against these sequences and a shuffled version. The result is shown in Figure 3D. A number of hits can be identified in the upper left part that have significant difference between log E-values of best hit against original sequences and against best hit of shuffled sequences. One example is the uncharacterized Fe-S protein from Yersinia bercovieri (NCBI ZP_00820831).
The face descriptor constructed from E. coli Complex II 2Fe-2S ferredoxin domain (which binds to the N-terminal domain of succinate dehydrogenase/fumarate reductase flavoprotein) matches this protein with an E-value of 4.4e-7. In contrast, the same descriptor achieves only 0.057 as best E-value when run against the shuffled version of the uncharacterized sequence database.
gi|77956751|ref|ZP_00820831.1|: score 14.5, E = 4.4e-07
SCR ++CGS+ i vD
This suggests that uncharacterized Fe-S protein could be part of the Complex II of Yersinia bercovieri interacting with succinate dehydrogenase flavoprotein subunit. The latter was found to be present in the available Yersinia pestis genome via sequence similarity search.
In order to estimate the quality of descriptors, E-value ratios for all descriptors in shuffled and original databases were analyzed in dependence of descriptor length (Figure 3). Not surprisingly, most descriptors perform well against SwissProt, in particular long descriptors. Significant hits are rarer in uncharacterized sequences, which had less or no influence in the generation of descriptors (Figure 3C,D). Interestingly, the best results were achieved by short and medium size descriptors. This suggests that long descriptors are less likely to discover binding behaviour in unknown sequences.
The match of a descriptor pair allows identification of the putative residues responsible for the interaction. This information can be used to guide site-directed mutagenesis experiments that aim at disrupting the interaction.
The proposed method for binding site prediction can be applied by itself or in combination with other methods. A common technique for protein interaction prediction is by identifying interacting homologues. This concept of so-called interologues was first noted by Walhout et al.  and was applied in PPI prediction in e.g. [36, 37]. The usage of descriptors can serve here as a refinement: the assumption that the residues responsible for the interaction are present can be easily confirmed by descriptor matches of the according families. Since a match provides residue correspondences to the original structures used to create the descriptor, the match alignment serves as an initial setup for homology modeling of the interface region.
For some interface types only few or no structures are available. This implies that protein-protein interaction descriptors are inaccurately or not at all represented by the descriptor library. On the other side, a pair of homologous sequences from HSSP does not necessarily preserve the interaction of the structural template and thus does not belong to a certain interface type. These sequences "contaminate" the alignments of interface descriptors. It is therefore essential to assure that recruited sequence pairs not only maintain an interaction but also agree in interface type, i.e. have similar sets of interface residues.
A related problem is that of interaction specificity. An interface descriptor with N and M hits for each face, respectively, induces N × M candidate interaction pairs. Aytuna et al.  argue analogously for structural face descriptor pairs and Deane et al.  verify interactions between pairs of yeast proteins by known paralogous interactions. Although the latter report only 1% false positives, results should be – as with any computational method – ideally supported by further evidence from experimental results.
Our technique to construct interface descriptors is inadequate for short, strongly dispersed or highly variable interfaces (e.g. loops in immunoglobulins). However, it is possible to create a descriptor with our method that spans over the surrounding secondary structures, which are often well conserved.
We provide a library of Hidden Markov Model based descriptors that capture important structural features such as protein interfaces, ligand binding sites and active sites of enzymes. The implications for predicting binding sites and binding partners of proteins are many-fold. It provides insights into the biological processes the matched protein might be involved in. Furthermore the method can pinpoint interacting residues. It thus bears the potential for functional annotation and for assisting in discovering new drug targets.
Cross validation with the available structural data for a number of interface types reveals that two thirds of the face descriptors have a recall between 70% and 100%. Interaction prediction by recognizing both faces is intrinsically harder than just one-sided binding site detection. The cross validation results reflect this fact, as the recall for predicted interactions drops to 39%.
To demonstrate the biological significance of our descriptors, we compare the descriptors to NetPro, a PPI database with literature evidence. This way we could validate the predicted interactions and moreover provide insights about the critical interacting residues.
We created a benchmark for ATP-binding site detection. From a database of Swiss-Prot annotated sequences, our descriptors successfully recognized 30% while producing much less false positives than Prosite's regular expression for the P-loop motif.
Finally, an example for a significant hit for a binding site in an uncharacterized protein from Yersinia bercovieri is presented, which suggests a possible function as Complex II (succinate dehydrogenase) subunit for this protein.
Protein-protein interaction (PPI) descriptors
Despite the fact that homodimers account for a large set of interface types, we omit them because it is often unclear whether a homodimer interface is genuine or an artifact from crystallization.
Protein-ligand interaction (PLI) descriptors
For each structure in the PDB containing a protein and a ligand:
Select a ligand
Iteratively expand the selection to include surrounding cofactors
Identify the residues surrounding the ligand selection within 4.5 Angstrom
If this yields at least three well conserved residues:
Include direct well conserved sequence neighbours
Include residues that are between selected sequence neighbors
Add sequences identified as structural homologous by HSSP
Generate HMMs for each segment
Combine HMMs into one descriptor connected by inserting states that reflect the linker regions between segments
Add descriptors (one for each ligand) to library
The library can now be used to predict ligand binding sites.
Conservation of residues is calculated by using the von-Neumann-Entropy in combination with the substitution matrix BLOSUM62 (details are given in ).
We evaluate the descriptors' accuracy in terms of standard precision and recall, where precision is defined as TP/(TP+FP) and recall is defined as TP/(TP+FN). TP, FP, and FN denote the numbers of true positives, false positives and false negatives, respectively.
List of abbreviations
- ATP – Adenosine triphosphate:
PLI – Protein-Ligand Interaction, PPI – Protein-Protein Interaction, HMM – Hidden Markov Model, SCOP – Structural Classification Of Proteins, SCOPPI – Structural Classification Of Protein-Protein Interactions, PDB – Protein Databank, HSSP – Homology-derived secondary structure of proteins, MSA – Multiple Sequence Alignment, TP/FP/TN/FN – True/False Positives/Negatives, resp.
We thank the Center for High Performance Computing (ZIH) of TU Dresden for provision of supercomputing infrastructure. Further, we acknowledge Annalisa Marsico, Gihan Dawelbait, John Duperon, Gary Maynard, Ana Rodrigues and Iddo Friedberg for valuable comments regarding the manuscript.
This article has been published as part of BMC Bioinformatics Volume 8, Supplement 4, 2007: The Second Automated Function Prediction Meeting. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/8?issue=S4.
- Galperin MY, Koonin EV: Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol 2000, 18(6):609–613. 10.1038/76443View ArticlePubMedGoogle Scholar
- Tong AHY, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Menard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, Sdicu AM, Shapiro J, Sheikh B, Suter B, Wong SL, Zhang LV, Zhu H, Burd CG, Munro S, Sander C, Rine J, Greenblatt J, Peter M, Bretscher A, Bell G, Roth FP, Brown GW, Andrews B, Bussey H, Boone C: Global mapping of the yeast genetic interaction network. Science 2004, 303(5659):808–813. 10.1126/science.1091317View ArticlePubMedGoogle Scholar
- Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96(8):4285–4288. 10.1073/pnas.96.8.4285PubMed CentralView ArticlePubMedGoogle Scholar
- Sun J, Xu J, Liu Z, Liu Q, Zhao A, Shi T, Li Y: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 2005, 21(16):3409–3415. 10.1093/bioinformatics/bti532View ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE, Wall DP, Eisen MB: Coevolution of gene expression among interacting proteins. Proc Natl Acad Sci USA 2004, 101(24):9033–9038. 10.1073/pnas.0402591101PubMed CentralView ArticlePubMedGoogle Scholar
- Keskin O, Tsai CJ, Wolfson H, Nussinov R: A new, structurally nonredundant, diverse data set of protein-protein interfaces and its implications. Protein Sci 2004, 13(4):1043–1055. 10.1110/ps.03484604PubMed CentralView ArticlePubMedGoogle Scholar
- Davis FP, Sali A: PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 2005, 21: 1901–1907. 10.1093/bioinformatics/bti277View ArticlePubMedGoogle Scholar
- Stein A, Russell RB, Aloy P: 3DID: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 2005, 33(Database issue):D413-D417. 10.1093/nar/gki037PubMed CentralView ArticlePubMedGoogle Scholar
- Aloy P, Russell RB: Structural systems biology: modelling protein interactions. Nature Reviews Molecular Cell Biology 2006, 7(3):188–197. 10.1038/nrm1859View ArticlePubMedGoogle Scholar
- Lichtarge O, Sowa ME: Evolutionary predictions of binding surfaces and interactions. Curr Opin Struct Biol 2002, 12: 21–27. 10.1016/S0959-440X(02)00284-1View ArticlePubMedGoogle Scholar
- Bairoch A: PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res 1992, 20(Suppl):2013–2018.PubMed CentralView ArticlePubMedGoogle Scholar
- Espadaler J, Romero-Isart O, Jackson RM, Oliva B: Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 2005, 21(16):3360–3368. 10.1093/bioinformatics/bti522View ArticlePubMedGoogle Scholar
- Li H, Li J: Discovery of stable and significant binding motif pairs from PDB complexes and protein interaction datasets. Bioinformatics 2005, 21(3):314–324. 10.1093/bioinformatics/bti019View ArticlePubMedGoogle Scholar
- Bateman A, Haft DH: HMM-based databases in InterPro. Brief Bioinform 2002, 3(3):236–45. 10.1093/bib/3.3.236View ArticlePubMedGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics 1998, 14(9):755–63. 10.1093/bioinformatics/14.9.755View ArticlePubMedGoogle Scholar
- Zdobnov EM, Apweiler R: InterProScan-an integration platform for the signature-recognition methods in InterPro. Bioinformatics 2001, 17(9):847–8. 10.1093/bioinformatics/17.9.847View ArticlePubMedGoogle Scholar
- Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics 1998, 14: 48–54. 10.1093/bioinformatics/14.1.48View ArticlePubMedGoogle Scholar
- Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES: Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 2004, 13: 190–202. 10.1110/ps.03323604PubMed CentralView ArticlePubMedGoogle Scholar
- Kim WK, Ison JC: Survey of the geometric association of domain-domain interfaces. Proteins 2005, 61(4):1075–88. 10.1002/prot.20693View ArticlePubMedGoogle Scholar
- Kim WK, Henschel A, Winter C, Schroeder M: The Many Faces of Protein-Protein Interactions: A Compendium of Interface Geometry. PLoS Computational Biology 2006, 2(9):e124. 10.1371/journal.pcbi.0020124PubMed CentralView ArticlePubMedGoogle Scholar
- Koike A, Takagi T: Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel 2004, 17(2):165–173. 10.1093/protein/gzh020View ArticlePubMedGoogle Scholar
- Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2004, 21: 1487–1494. 10.1093/bioinformatics/bti242View ArticlePubMedGoogle Scholar
- Torrance JW, Bartlett GJ, Porter CT, Thornton JM: Using a Library of Structural Templates to Recognise Catalytic Sites and Explore their Evolution in Homologous Families. J Mol Biol 2005, 347(3):565–581. 10.1016/j.jmb.2005.01.044View ArticlePubMedGoogle Scholar
- Ofran Y, Rost B: Predicted protein-protein interaction sites from local sequence information. FEBS Lett 2003, 544(1–3):236–239. 10.1016/S0014-5793(03)00456-3View ArticlePubMedGoogle Scholar
- Obenauer JC, Yaffe MB: Computational prediction of protein-protein interactions. Methods Mol Biol 2004, 261: 445–468.PubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25: 25–9. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL: The Pfam protein families database. Nucleic Acids Res 2000, 28: 263–266. 10.1093/nar/28.1.263PubMed CentralView ArticlePubMedGoogle Scholar
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: A Structural Classification of Proteins Database for the Investigation of Sequences and Structures. Journal of Molecular Biology 1995, 247(4):536. 10.1006/jmbi.1995.0159PubMedGoogle Scholar
- Winter C, Henschel A, Kim WK, Schroeder M: SCOPPI: A Structural Classification of Protein-Protein Interfaces. Nucleic Acids Res 2006, (34 Database):310–314. 10.1093/nar/gkj099
- Sander C, Schneider R: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 1991, 9: 56–68. 10.1002/prot.340090107View ArticlePubMedGoogle Scholar
- Scordis P, Flower DR, Attwood TK: FingerPRINTScan: intelligent searching of the PRINTS motif database. Bioinformatics 1999, 15(10):799–806. 10.1093/bioinformatics/15.10.799View ArticlePubMedGoogle Scholar
- Grundy WN, Bailey TL, Elkan CP, Baker ME: Meta-MEME: motif-based hidden Markov models of protein families. Comput Appl Biosci 1997, 13(4):397–406.PubMedGoogle Scholar
- Walhout AJ, Sordella R, Lu X, Hartley JL, Temple GF, Brasch MA, Thierry-Mieg N, Vidal M: Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 2000, 287(5450):116–122. 10.1126/science.287.5450.116View ArticlePubMedGoogle Scholar
- Aloy P, Russell RB: Interrogating protein interaction networks through structural biology. Proc Natl Acad Sci USA 2002, 99(9):5896–5901. 10.1073/pnas.092147999PubMed CentralView ArticlePubMedGoogle Scholar
- Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 2002, 1(5):349–356. 10.1074/mcp.M100037-MCP200View ArticlePubMedGoogle Scholar
- Aytuna A, Gursoy A, Keskin O: Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics 2005.Google Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32(5):1792–1797. 10.1093/nar/gkh340PubMed CentralView ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14(6):1188–1190. 10.1101/gr.849004PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.