SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces
© Teyra et al; licensee BioMed Central Ltd. 2006
Received: 11 November 2005
Accepted: 02 March 2006
Published: 02 March 2006
Currently there is a strong need for methods that help to obtain an accurate description of protein interfaces in order to be able to understand the principles that govern molecular recognition and protein function. Many of the recent efforts to computationally identify and characterize protein networks extract protein interaction information at atomic resolution from the PDB. However, they pay none or little attention to small protein ligands and solvent. They are key components and mediators of protein interactions and fundamental for a complete description of protein interfaces. Interactome profiling requires the development of computational tools to extract and analyze protein-protein, protein-ligand and detailed solvent interaction information from the PDB in an automatic and comparative fashion. Adding this information to the existing one on protein-protein interactions will allow us to better understand protein interaction networks and protein function.
SCOWLP (S tructural C haracterization O f W ater, L igands and P roteins) is a user-friendly and publicly accessible web-based relational database for detailed characterization and visualization of the PDB protein interfaces. The SCOWLP database includes proteins, peptidic-ligands and interface water molecules as descriptors of protein interfaces. It contains currently 74,907 protein interfaces and 2,093,976 residue-residue interactions formed by 60,664 structural units (protein domains and peptidic-ligands) and their interacting solvent.
The SCOWLP web-server allows detailed structural analysis and comparisons of protein interfaces at atomic level by text query of PDB codes and/or by navigating a SCOP-based tree. It includes a visualization tool to interactively display the interfaces and label interacting residues and interface solvent by atomic physicochemical properties. SCOWLP is automatically updated with every SCOP release.
SCOWLP enriches substantially the description of protein interfaces by adding detailed interface information of peptidic-ligands and solvent to the existing protein-protein interaction databases. SCOWLP may be of interest to many structural bioinformaticians. It provides a platform for automatic global mapping of protein interfaces at atomic level, representing a useful tool for classification of protein interfaces, protein binding comparative studies, reconstruction of protein complexes and understanding protein networks. The web-server with the database and its additional summary tables used for our analysis are available at http://www.scowlp.org.
One of the most interesting and important challenges in the so-called "Post-genomic Era" is the understanding of protein networks. Protein-protein interactions have been extensively investigated using a variety of methods , and many databases have been built becoming very helpful tools for the analysis of protein networks [2–4].
Protein interfaces have long been studied at protein chain and domain interface levels [5–12]. Furthermore, numerous analyses have used datasets of protein chain interfaces to investigate residue type propensities, sequence and structure conservation at protein interfaces [8, 11, 13–16]. Databases containing structural domain-domain interactions have also been recently created: 3did , PiBase , iPfam , PSIbase , InterPare , PRISM . However, in these methods still many protein residues are not taken into account as "interfacial" or "interacting" because of peptidic-ligands and also solvent being frequently ignored from the protein interaction analysis.
Peptidic-ligands and solvent mediate protein interactions and are fundamental components for a complete description of protein interfaces. Proteins can interact with peptides to perform their biological function. Besides, peptides have been used to mimic protein binding interfaces, and their complexes with proteins have been used to study protein binding affinity/specificity properties in a simplified way [23–25]. For these reasons, many protein-peptide complexes have been experimentally studied by X-ray crystallography and/or NMR studies, providing additional information on protein interfaces . Moreover, protein interactions take place in an aqueous solution. Solvent molecules can bridge binding partners via hydrogen bonds contributing significantly to molecular recognition and function [23, 26–31].
Most current methods do not provide an accurate description of protein interfaces, which is required to be able to establish the bases for understanding the principles that govern molecular recognition and protein function.
Here we present SCOWLP (S tructural C haracterization O f W ater, L igands and P roteins), a platform for complete and detailed characterization and visualization of protein interfaces. Our database includes all protein-interacting components of the PDB including peptides and solvent, which until now have been excluded from systematic protein interface analysis and databases. In our database all interface interactions are described at atom, residue and domain level by using interacting rules based on atomic physicochemical criteria. This complete characterization makes SCOWLP useful for comparative structural analysis of molecular interfaces. The web application allows the user to access all the atomic interaction information by querying the PDB or the SCOP hierarchy. All interface information characterized by different interaction descriptors can be interactively visualized by using a Jmol 3D applet .
Construction and content
SCOWLP is a web-based relational database formed by eleven tables describing PDB interface interactions at atom, residue and domain level. The database contains 74,907 protein interfaces and 2,093,976 residue-residue interactions formed by 60,664 structural units and interacting solvent. For the creation of the SCOWLP, we extract 3D data of protein domains, peptidic-ligands and interface solvent from the PDB , and we define protein domains from the SCOP 1.69 . We compute protein interactions at atom, residue and domain level by using bounding shape-based algorithms . We also have developed a web application to handle and navigate through the interfacial data in an automatic and user-friendly fashion. We designed the SCOWLP methodology based on the following steps:
SCOL-Ligand(S tructural C haracterization O f Peptidic- L igands)
Interacting structural unit pairs
We label all structural units of the PDB with the SCOL-peptide and the SCOP-domain definitions in order to compute their interactions. We consider a contact distance cut-off of 9Å between two residues in order to allow up to two bridging water molecules in the shortest axes defining the interface. We use bounding shape-based algorithms to compute a 9Å convex hull (the smallest convex set containing all atoms at 9Å) for each structural unit of each PDB entry. Convex hull algorithms have been proved to reduce the computational time required for an interface calculation by both, reducing the search space to decrease the number of residues checked for the calculation and allowing distributed computations . Structural units with intersecting shapes and having at least one residue-residue interaction are considered interacting pairs (Fig. 1).
SCOW-Water(S tructural C haracterization O f W ater)
Interaction rules for interface computation
Only amino acid residues and water molecules placed in the intersection of structural unit shapes are potential interactors. We apply atom type and distance criteria to compute interactions between structural unit pairs at physicochemical level. For hydrogen bonds we apply a ≤ 3.2 Å donor-acceptor distance. For salt bridges, we apply a ≤ 4 Å distance criteria. Van der Waals energies are defined by hydrophobic atoms at van der Waals radii distance. At atomic level, we characterize the interactions by: i) nature: hydrophilic, hydrophobic; ii) contact type: main chain, side chain, mixed; iii) number of bridging water molecules. At residue level, we characterize the interactions by: i) nature: hydrophilic, hydrophobic, dual; ii) contact type: main chain, side chain, mixed; iii) number of bridging water molecules; iv) total number of atoms contacting. At structural unit level, we characterize the interactions by: i) contact volume; ii) surface area from convex hull surface; ii) number of interacting atoms/residues per unit; iv) type of interaction: intra-/inter-molecular. All interfacial interaction information is stored in the SCOWLP database (Fig. 1).
We have created the following additional tables for the filtering and comparative analysis of the information contained in the database:
This table summarizes all interfaces of the SCOWLP database. It contains 74,907 interfaces constituted by SCOP domains labelled with the attributes: PDB Id code, atomic resolution, contact type (intra-/inter-molecular) and SCOP Id code. All interfaces are also labelled by number of interactions (total, all water-mediated and only water-mediated) and number of interacting residues per binding partner. Each interaction is classified by type (side-/main-chain or both) and by number of bridging water molecules.
Wet interfaces selection
This table stores interfaces of complexes at resolution ≤ 2.5 Å from the Interface description table for interfacial solvent analysis. This table does not include homodimer interfaces because of their patchy, poorly packed and highly hydrated nature . With the resultant dataset, we create three tables:
This table can be used to rank superfamilies based on their content in water mediating interface interactions. For each interface, it contains the average of total interactions, all water-mediated interactions and the ratio from the percentage of water-mediated interactions at superfamily level.
This table can be used to rank the interfaces by number of wet spots. In this table each family is represented by the complex with the highest number of wet spots, labelled with the total number of interacting residues and wet spots.
This table can be used to monitor solvent variations in interfaces and compare them at family level. It contains interfaces sorted out by domain, and then by their respective ligands (protein or peptide). Because a protein-ligand interface can be found in different PDBs, we select the interfaces that appear more than once and contain wet spots. When the same PDB file contains a repeated interface of two binding partners, we select as a representative the one with more wet spots.
We used MySQL and the Java programming language to generate and analyze the SCOWLP database. Interface calculations are performed on a 2.6 GHz Pentium IV in approximately 36 hours. SCOWLP is automatically updated with every SCOP release.
Utility and Discussion
SCOWLP database contains detailed information of protein interfaces including peptidic-ligands and solvent in the PDBs, and classifies protein interfaces by using specific physicochemical atomic criteria. The database can be accessed through a user-friendly web application.
The use of atom type and distance rules allows us to characterize and classify interactions at physicochemical level. Other existing methods adopt exclusively a general distance criterion. PSIMAP , for example, considers as an interacting pair any atom distance at ≤ 5 Å. For this reason, the total number of residue-residue and structural unit interactions we obtain by applying our interaction rules is reduced in comparison to PSIMAP (Fig. 2). This reduction translates into more accurate interface definitions.
Some proteins have been subject of many structural studies complexed with peptides (e.g. Proteases, b.47.1). Besides, the superfamilies that have the higher occurrence of peptides are not necessarily those with higher domain-domain representation (e.g. Cyclophilin, b.62.1). By taking into account information about protein-peptide complexes SCOWLP contributes interfacial information of 8 SCOP superfamilies uniquely represented by protein-peptide complexes (a23.4, a.50.1, d.76.1, a.8.5, d195.1, g.33.1, a. 144.1, a. 12.1). In addition, it contributes with more than 50% of the interacting information in other superfamilies. Our results show the importance of including protein-peptide interfacial information in order to enrich considerably the description of protein interfaces.
All superfamiles of the Content table contain solvent mediating interactions. Furthermore, in some of these superfamilies water-mediated interactions represent up to 75% of the total interfacial interactions (e.g. d.250.1). Relating to the "only water-mediated" interactions, we observe from the Morphology table that 43 is the maximum number of wet spots found. Figures 4B and 4C illustrate how solvent, in particular wet spots, may play an important role in the morphological description of protein interfaces (shape and size). Considering the solvent, a discontinuous surface formed by several small isolated patches changes to a bigger and rounded patch. These observations show that we can enrich the description of protein interfaces by considering interfacial solvent.
Although solvent molecules mediating protein interactions can be conserved in a protein family, variations may occur due to different facts: i) atomic resolution and/or quality of the structural data, ii) conformational changes upon ligand binding, iii) protein flexibility, iv) new interacting regions (e.g. loop insertions and deletions), v) residue mimicry. Wet spots variations may be used as indicators in these cases. The Comparative table allows us to compare the interfaces of 127 families in 751 complexes based on wet spots variations.
Solvent molecules play an important role in the replacement of residues in protein interfaces. Sometimes the atomic resolution, the existence of different rotamers or even small differences in contact distances defining the interaction may influence the number of wet spots. Nevertheless, small variations of wet spots in complexes of the same family that do not present changes in total number of interactions can be used to locate residue mimicry cases (e.g. Lys+H2O≈Arg). Making use of this information may be very useful in analysis of protein interfacial evolution and in protein engineering/rational design when designing affinity and specificity of a protein for its ligands.
By using SCOWLP, the user can achieve specific queries, SCOP family analysis, interface comparisons and a detailed 3D display of the atomic interaction data contained in PDBs.
Detailed analysis of the interfacial information contained in the PDB is very useful to obtain more accurate descriptions of protein interfaces. We have created SCOWLP to have a platform for the characterization and 3D visualization of protein interfaces. SCOWLP enlarges the available information on protein-protein interactions by introducing 3,413 new protein-peptide interfaces and 435,086 additional water-mediated interactions. All interactions contained in SCOWLP are characterized and classified at physicochemical level instead of using general distance criteria. This allows a more appropriate definition and enhanced comparison of the interfaces contained in our database.
As the origin of specificity and affinity in molecular recognition can be partially explained in terms of solvent's contribution to the interaction, our database constitutes a very useful tool to facilitate rational ligand design. In particular wet spots can be used as indicators of interfacial solvent variations, being helpful in comparison of protein family interfaces, and perhaps guiding docking experiments.
SCOWLP may be of interest to many structural bioinformaticians, representing a useful tool for classification of protein interfaces, protein binding comparative studies, reconstruction of protein complexes and understanding protein networks.
Availability and requirements
SCOWLP is available at http://www.scowlp.org. The database and all summary tables used in this paper can be freely downloaded for independent studies.
We thank Gerd Anders and Jens Lättig for useful comments on the manuscript. We thank Christof Winter for helping with the Java programming. M.T.P. group is funded by Klaus Tschira Stiftung GmbH.
- Phizicky EM, Fields S: Protein-protein interactions: methods for detection and analysis. Microbiol Rev 1995, 59(1):94–123.PubMed CentralPubMedGoogle Scholar
- Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 2003, 31(1):248–250. 10.1093/nar/gkg056PubMed CentralView ArticlePubMedGoogle Scholar
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Lett 2002, 513(1):135–140. 10.1016/S0014-5793(01)03293-8View ArticlePubMedGoogle Scholar
- Xenarios I, Salwinski L, Duan XJ, Higney P, Kim SM, Eisenberg D: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002, 30(1):303–305. 10.1093/nar/30.1.303PubMed CentralView ArticlePubMedGoogle Scholar
- Argos P: An investigation of protein subunit and domain interfaces. Protein Eng 1988, 2(2):101–113.View ArticlePubMedGoogle Scholar
- Janin J, Miller S, Chothia C: Surface, subunit interfaces and interior of oligomeric proteins. J Mol Biol 1988, 204(1):155–164. 10.1016/0022-2836(88)90606-7View ArticlePubMedGoogle Scholar
- Tsai CJ, Lin SL, Wolfson HJ, Nussinov R: Protein-protein interfaces: architectures and interactions in protein-protein interfaces and in protein cores. Their similarities and differences. Crit Rev Biochem Mol Biol 1996, 31(2):127–152.View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA 1996, 93(1):13–20. 10.1073/pnas.93.1.13PubMed CentralView ArticlePubMedGoogle Scholar
- Lo Conte L, Chothia C, Janin J: The atomic structure of protein-protein recognition sites. J Mol Biol 1999, 285(5):2177–2198. 10.1006/jmbi.1998.2439View ArticlePubMedGoogle Scholar
- Park J, Lappe M, Teichmann SA: Mapping protein family interactions: intramolecular and intermolecular protein family interaction repertoires in the PDB and yeast. J Mol Biol 2001, 307(3):929–938. 10.1006/jmbi.2001.4526View ArticlePubMedGoogle Scholar
- Aloy P, Ceulemans H, Stark A, Russell RB: The relationship between sequence and interaction divergence in proteins. J Mol Biol 2003, 332(5):989–998. 10.1016/j.jmb.2003.07.006View ArticlePubMedGoogle Scholar
- Keskin O, Tsai CJ, Wolfson H, Nussinov R: A new, structurally nonredundant, diverse data set of protein-protein interfaces and its implications. Protein Sci 2004, 13(4):1043–1055. 10.1110/ps.03484604PubMed CentralView ArticlePubMedGoogle Scholar
- Valdar WS, Thornton JM: Conservation helps to identify biologically relevant crystal contacts. J Mol Biol 2001, 313(2):399–416. 10.1006/jmbi.2001.5034View ArticlePubMedGoogle Scholar
- Ofran Y, Rost B: Analysing six types of protein-protein interfaces. J Mol Biol 2003, 325(2):377–387. 10.1016/S0022-2836(02)01223-8View ArticlePubMedGoogle Scholar
- Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES: Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 2004, 13(1):190–202. 10.1110/ps.03323604PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Marin A, Thornton JM: Protein domain interfaces: characterization and comparison with oligomeric protein interfaces. Protein Eng 2000, 13(2):77–82. 10.1093/protein/13.2.77View ArticlePubMedGoogle Scholar
- Stein A, Russell RB, Aloy P: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 2005, (33 Database):D413–417.Google Scholar
- Davis FP, Sali A: PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 2005, 21(9):1901–1907. 10.1093/bioinformatics/bti277View ArticlePubMedGoogle Scholar
- Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics 2005, 21(3):410–412. 10.1093/bioinformatics/bti011View ArticlePubMedGoogle Scholar
- Gong S, Yoon G, Jang I, Bolser D, Dafas P, Schroeder M, Choi H, Cho Y, Han K, Lee S, Choi H, Lappe M, Holm L, Kim S, Oh D, Bhak J, et al.: PSIbase: a database of Protein Structural Interactome map (PSIMAP). Bioinformatics 2005, 21(10):2541–2543. 10.1093/bioinformatics/bti366View ArticlePubMedGoogle Scholar
- Gong S, Park C, Choi H, Ko J, Jang I, Lee J, Bolser DM, Oh D, Kim DS, Bhak J: A protein domain interaction interface database: InterPare. BMC Bioinformatics 2005, 6: 207. 10.1186/1471-2105-6-207PubMed CentralView ArticlePubMedGoogle Scholar
- Ogmen U, Keskin O, Aytuna AS, Nussinov R, Gursoy A: PRISM: protein interactions by structural matching. Nucleic Acids Res 2005, (33 Web Server):W331–336. 10.1093/nar/gki585Google Scholar
- Zeng J: Mini-review: computational structure-based design of inhibitors that target protein surfaces. Comb Chem High Throughput Screen 2000, 3(5):355–362.View ArticlePubMedGoogle Scholar
- Pawson T: Specificity in signal transduction: from phosphotyrosine-SH2 domain interactions to complex cellular systems. Cell 2004, 116(2):191–203. 10.1016/S0092-8674(03)01077-8View ArticlePubMedGoogle Scholar
- Castagnoli L, Costantini A, Dall'Armi C, Gonfloni S, Montecchi-Palazzi L, Panni S, Paoluzi S, Santonico E, Cesareni G: Selectivity and promiscuity in the interaction network mediated by protein recognition modules. FEBS Lett 2004, 567(1):74–79. 10.1016/j.febslet.2004.03.116View ArticlePubMedGoogle Scholar
- Levy Y, Onuchic JN: Water and proteins: a love-hate relationship. Proc Natl Acad Sci USA 2004, 101(10):3325–3326. 10.1073/pnas.0400157101PubMed CentralView ArticlePubMedGoogle Scholar
- Palencia A, Cobos ES, Mateo PL, Martinez JC, Luque I: Thermodynamic dissection of the binding energetics of proline-rich peptides to the Abl-SH3 domain: implications for rational ligand design. J Mol Biol 2004, 336(2):527–537. 10.1016/j.jmb.2003.12.030View ArticlePubMedGoogle Scholar
- Janin J: Wet and dry interfaces: the role of solvent in protein-protein and protein-DNA recognition. Structure Fold Des 1999, 7(12):R277–279. 10.1016/S0969-2126(00)88333-1View ArticlePubMedGoogle Scholar
- Levitt M, Park BH: Water: now you see it, now you don't. Structure 1993, 1(4):223–226. 10.1016/0969-2126(93)90011-5View ArticlePubMedGoogle Scholar
- Papoian GA, Ulander J, Wolynes PG: Role of water mediated interactions in protein-protein recognition landscapes. J Am Chem Soc 2003, 125(30):9170–9178. 10.1021/ja034729uView ArticlePubMedGoogle Scholar
- Petukhov M, Rychkov G, Firsov L, Serrano L: H-bonding in protein hydration revisited. Protein Sci 2004, 13(8):2120–2129. 10.1110/ps.04748404PubMed CentralView ArticlePubMedGoogle Scholar
- Dafas P, Bolser D, Gomoluch J, Park J, Schroeder M: Using convex hulls to extract interaction interfaces from known structures. Bioinformatics 2004, 20(10):1486–1490. 10.1093/bioinformatics/bth106View ArticlePubMedGoogle Scholar
- Rodier F, Bahadur RP, Chakrabarti P, Janin J: Hydration of protein-protein interfaces. Proteins 2005, 60(1):36–45. 10.1002/prot.20478View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.