SCOWLP update: 3D classification of protein-protein, -peptide, -saccharide and -nucleic acid interactions, and structure-based binding inferences across folds
© Teyra et al; licensee BioMed Central Ltd. 2011
Received: 9 March 2011
Accepted: 13 October 2011
Published: 13 October 2011
Protein interactions are essential for coordinating cellular functions. Proteomic studies have already elucidated a huge amount of protein-protein interactions that require detailed functional analysis. Understanding the structural basis of each individual interaction through their structural determination is necessary, yet an unfeasible task. Therefore, computational tools able to predict protein binding regions and recognition modes are required to rationalize putative molecular functions for proteins. With this aim, we previously created SCOWLP, a structural classification of protein binding regions at protein family level, based on the information obtained from high-resolution 3D protein-protein and protein-peptide complexes.
We present here a new version of SCOWLP that has been enhanced by the inclusion of protein-nucleic acid and protein-saccharide interactions. SCOWLP takes interfacial solvent into account for a detailed characterization of protein interactions. In addition, the binding regions obtained per protein family have been enriched by the inclusion of predicted binding regions, which have been inferred from structurally related proteins across all existing folds. These inferences might become very useful to suggest novel recognition regions and compare structurally similar interfaces from different families.
The updated SCOWLP has new functionalities that allow both, detection and comparison of protein regions recognizing different types of ligands, which include other proteins, peptides, nucleic acids and saccharides, within a solvated environment. Currently, SCOWLP allows the analysis of predicted protein binding regions based on structure-based inferences across fold space. These predictions may have a unique potential in assisting protein docking, in providing insights into protein interaction networks, and in guiding rational engineering of protein ligands. The newly designed SCOWLP web application has an improved user-friendly interface that facilitates its usage, and is available at http://www.scowlp.org.
Proteins are ubiquitous and interact with other molecules to perform their function, being conditioned to timing and location . High-throughput technologies for the identification of protein interactions are generating a plethora of new data that should be independently studied to decipher the specific molecular role of the proteins and their cellular functions . Structural determination methods at atomic resolution are indispensable for the functional characterization of protein interactions. However, techniques for isolating protein complexes and their structural determination are still encountering many challenges, and hence, experimental structural studies are not always possible. Alternatively, the rapid accumulation of protein complex structures in the PDB repository [3, 4] provides an unprecedented opportunity for comparative analysis of protein interactions that can be used to predict binding regions and modes , to model protein complexes , and to improve our understanding of the principles governing protein recognition . To facilitate these comparative studies, it is necessary to generate tools that can allow us to analyze the available experimental structures of protein complexes . In fact, several databases have been developed to structurally identify and classify protein-protein and protein-peptide interactions at family level, such as 3DID , SCOPPI  and SCOWLP . Their classifications are based on collecting all interacting information per protein family, then calculating the binding residues similarity, and finally clustering the different binding regions and their partners. Another database, IBIS , contains binding regions and interacting partners inferred from the inspection of complexes formed by close homologous proteins instead of using structural classification schemes. Unlike the others, SCOWLP has been developed towards an atomic inspection of the interactions by applying physicochemical principles and by considering water molecules in protein interfaces, since solvent has been shown to be abundant and important in the mediation of protein interactions [13, 14], and to improve protein contact predictions  and docking .
SCOWLP is a database and a web application containing a structural classification of protein binding regions at SCOP family level, including protein-protein and peptide interactions . In the new updated version we present here, we additionally include two biologically relevant protein ligands that are quite abundant in the PDB: saccharides (SAC) and nucleic acids (NA) [17–20]. We also considered solvent in the definition of protein interactions, since it has been shown to be critical mediating both, protein-NA  and protein-SAC  interactions, highlighting the importance of the new SCOWLP to perform in-detail inspection of these kind of interactions. Another novelty in SCOWLP is the inclusion of predicted binding regions for each protein family. These predictions are inferred from significantly conserved binding regions belonging to structurally similar protein families independently of their fold [5, 23]. It has been observed that proteins with different folds and functions can recognize molecules through binding regions containing similar local structural features or interacting motifs [24–26]. Therefore, the predicted binding inferences might become very useful to suggest alternative recognition regions for a protein family and to compare structurally similar binding regions from different families.
In summary, the updated SCOWLP classification with its newly designed web application represent a unique framework for the identification and comparative analysis of protein binding regions at atomic level. In the following sections, we explain the methodology used to build the database, and describe the architecture and possible usages of the web application.
Construction and Content
The new version of SCOWLP contains protein interactions with different ligand types, including proteins, peptides, nucleic acids (NA) and saccharides (SAC), taking into account interfacial solvent mediating protein interactions. Interacting residues and molecules are described at physicochemical level according to atom type and distance criteria. The following types of interactions are considered: hydrogen bonds, with distance donor/acceptor atom ≤ 3.6 Å; salt bridges, with charged atom distance ≤ 4 Å; Van der Waals, with atoms at distance ≤ 4.5 Å. Water-mediated residue interactions through a water molecule are also considered in the interface definition. The specific definition of the ligand types, and the protein interfaces is as follows:
Protein-protein interactions: The 4,194 protein families from SCOP V1.75  are used to define protein domain boundaries within PDB files.
Protein-peptide interactions: All PDB chains that are labeled "ATOM", not defined in SCOP, and shorter than 90 residues are considered peptides .
Protein-nucleic acids interactions: PDB residues labeled as standard nucleotides are selected. We differentiate RNA form DNA by the presence of the O2' group in the ribose ring. Nucleic acid chains are merged in a single unit (double strand) if there is at least one inter-base atomic interaction among chains.
Protein-saccharide interactions: The SAC molecules are extracted from PDB files labeled with the terms "saccharide", "carbohydrate" and/or "sugar", and containing HETATM atoms. We obtained 307 unique molecules (three-letter code) that include neither standard or modified nucleotides, nor SAC modifications bigger than the SAC moieties. The oligosaccharide units can be represented in the PDB either within a common HETATM type or as a collection of them. In the later case, SAC units are identified and merged together in a single oligosaccharide molecule using the PDB connectivity. SAC connectivity to protein domains is also checked to differentiate covalent (intra) and non-covalent (inter) protein interactions.
SCOWLP currently contains 97,252 protein-protein, 3,563 protein-peptide, 2,568 protein-NA (1,660 DNA, 908 RNA) and 10,590 protein-SAC complexes. Crystal packing contacts are filtered out using a support vector machine-based program, NOXclass  (cutoff 70%), which takes into account the distinctive properties of these protein contacts .
The classification of family binding regions has been performed by clustering interacting domains based on binding region similarities. As described previously , this value has been obtained based on the interacting residues overlap once they are mapped onto the structure-based sequence alignments of the family members. Likewise, the inferred binding regions are obtained among members of different families aligned using non-sequential structural alignments as previously described . SCOWLP contains a total of 7,121 protein binding regions identified at zero similarity cutoff; from which 2,315 have more than one interface. In addition, it contains 8,985 predicted binding regions, 786 of them in protein families that so far lack binding information in the PDB.
Utility & Discussion
SCOWLP web application follows the SCOP hierarchical levels to classify protein structures: RT-root, CF - class family, SF - super family, where families are finally listed. In addition, it extends the SCOP classification with three protein interaction levels: FA - family, BR - binding region and IF - interface. FA level contains a list of binding regions, defined as distinctive surface regions of a protein family used to recognize other molecules. BR level contains a list of interfaces distinguishing the different partners or ligands that a specific region can recognize. IF level contains a list of domains interacting with the same ligand, and that are linked to their original PDB file (e.g. 2oei:AB). Each binding region and interface is represented by identifiers (BR_24483 or IF_24486), since their association to an automatic description is not possible.
SCOWLP web application facilitates the hierarchical navigation through the different levels. It also contains a keyword search box for SCOP descriptions, PDB Ids and similar SCOP domain sequences identified using the BLAST algorithm. Some specific examples of the query capabilities are shown in the SCOWLP main page.
The information at each interacting level is displayed in the web page in consecutive steps. Each level shares a common web page composed by three interconnected frames to facilitate the analysis of the information (Figure 1):
• Alignment frame: The structure-based sequence alignments of the corresponding domains are shown in each interacting level. In addition, the FA level also includes predicted binding regions, whereas BR level includes predicted interfaces, information inferred from other structurally-related protein families. The interacting residues are highlighted for better analysis of binding patterns. At IF level, these residues can be colored by their physicochemical properties (hydrophobic or hydrophilic), and by the water participation in the interactions (dry, wet or dual). The patterns and the physicochemical properties facilitate the distinction between conserved and variable interactions. The structure of each member can be visualized in the 3D Viewer frame upon click selection of the Jmol icon.
• 3D Viewer frame: The Jmol plug-in  is available for 3D visualization of the members shown in the Alignment frame as follows: The FA level displays a surface representation of the binding regions onto a representative structure for general visualization of the spatial locations used for recognition. The BR level allows the 3D visualization of the different interfaces containing different ligands and/or binding modes. The IF level, allows the user to visualize atomic details of all domains interacting with a common interface and to label them according to the physicochemical and solvent criteria selected in the alignment frame. Subtle structural differences within domains interacting with the same interface can be detected and analyzed.
• Control frame: This frame contains Jmol-interactive commands and includes links to the PDB and FA levels. In addition, the residue-residue interaction list is displayed with their physicochemical and water-mediation properties for each interacting domains.
The Frame Interconnectivity feature implemented in the IF level of the new SCOWLP allows the possibility to automatically highlight (i.e. centered zoom and color) a specific interacting residue in the 3D structure of the viewer upon clicking either the Alignment or the Control frame (Figure 1).
The main page contains examples of the SCOWLP main functionalities: i) exploration of the different surface regions that a protein family uses to recognize other molecules, ii) identification of the different ligands that a given region can recognize, and iii) comparative analysis of the interacting properties of a group of domains in complex with the same ligand. These analyses include the conservation and variability of not only interfacial residues but of their interactions, taking into consideration water-mediated interactions.
Another key feature of the new SCOWLP is the possibility to obtain alternative binding regions and interfaces, as schematically shown in Figure 2b. These predicted binding regions are inferred from other structurally similar protein families. For instance, if we select in the Search options for "Binding region type: only predicted" we will filter those SCOP families that do not have any binding information available in the PDB yet (i.e. no structures in complex with other molecules), and that present predicted binding regions (786 families). Navigating through the SCOP hierarchy, the predicted binding regions for any of these families can be explored. An example is the "DEATH effector domain, DED" (search by keyword), where three different binding regions have been inferred. Detailed analysis of the binding regions (3D Viewer) and the structurally similar proteins and their ligands (Control frame) might be useful to explore putative binding regions and ligands for DED domain.
The number of protein-protein interactions obtained from large scale technologies is increasing , though protein-protein interaction networks contain a considerable amount of noise due to intrinsic errors . Structural information has already been implemented into these networks in order to distinguish direct vs. indirect interactions between proteins, and competing vs. complementary interactions, whether two proteins interact to a third one through the same or a different binding region [8, 33]. The structural classification and the predicted protein binding regions contained in SCOWLP might contribute towards a more accurate construction of protein-protein interaction networks.
In summary, the examples explained above point out the unique potential of SCOWLP for identification, analysis and prediction of protein interactions. Our ultimate goal is to facilitate the analysis of protein interactions that may contribute to a better understanding of the rules governing protein recognition and molecular function.
Here we present an updated and enhanced version of the SCOWLP database and its user-friendly web application. The new SCOWLP comprises its previous structural classification of all protein binding regions of the PDB at protein family level, including protein-peptide and water-mediated interactions, which has been enhanced by the inclusion of protein-nucleic acid and protein-saccharide interactions. In addition, the original functionality of SCOWLP towards the prediction of protein binding regions has been augmented by the inclusion of binding regions inferred from structurally similar proteins across fold space. The new SCOWLP database and its newly designed web application, which includes new helpful features such as frame interconnectivity, represent useful tools for the detailed analysis of the protein interactome. They provide the user a valuable assistance in suggesting protein recognition regions and comparing structurally similar interfaces from different protein families, which denotes their unique potential for gaining a better understanding of protein interaction networks and for guiding protein docking and rational ligand design.
Availability & Requirements
Acknowledgements & Funding
The authors would like to thank the members of the Structural Bioinformatics Group for fruitful discussions, and Ralf Gey for technical assistance. SS is funded by European Structural Funds (EFRE 301270 UT 135) and JT by the Klaus Tschira Stiftung. This work has been funded by the German Research Council SFB-TRR 67 (TPA7).
- Yamada T, Bork P: Evolution of biomolecular networks: lessons from metabolic and protein interactions. Nature Reviews Molecular Cell Biology. 2009, 10 (11): 791-803. 10.1038/nrm2787.View ArticlePubMedGoogle Scholar
- Gentleman R, Huber W: Making the most of high-throughput protein-interaction data. Genome Biology. 2007, 8 (10): 112-10.1186/gb-2007-8-10-112.PubMed CentralView ArticlePubMedGoogle Scholar
- PDB. [http://www.rcsb.org/pdb]
- Aloy P, Russell RB: Ten thousand interactions for the molecular biologist. Nature Biotechnology. 2004, 22 (10): 1317-1321. 10.1038/nbt1018.View ArticlePubMedGoogle Scholar
- Teyra J, Hawkins J, Zhu H, Pisabarro MT: Studies on the inference of protein binding regions across fold space based on structural similarities. Proteins. 2011, 79 (2): 499-508. 10.1002/prot.22897.View ArticlePubMedGoogle Scholar
- Kiel C, Beltrao P, Serrano L: Analyzing protein interaction networks using structural information. Annual Review of Biochemistry. 2008, 77: 415-441. 10.1146/annurev.biochem.77.062706.133317.View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA. 1996, 93 (1): 13-20. 10.1073/pnas.93.1.13.PubMed CentralView ArticlePubMedGoogle Scholar
- Aloy P, Russell RB: Structural systems biology: modelling protein interactions. Nature Reviews Molecular Cell Biology. 2006, 7 (3): 188-197. 10.1038/nrm1859.View ArticlePubMedGoogle Scholar
- Stein A, Ceol A, Aloy P: 3did: identification and classification of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2011, D718-723. Epub 2010 Oct 2021, 39 DatabaseGoogle Scholar
- Winter C, Henschel A, Kim WK, Schroeder M: SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res. 2006, D310-314. 34 DatabaseGoogle Scholar
- Teyra J, Paszkowski-Rogacz M, Anders G, Pisabarro MT: SCOWLP classification: structural comparison and analysis of protein binding regions. BMC Bioinformatics. 2008, 9: 9-10.1186/1471-2105-9-9.PubMed CentralView ArticlePubMedGoogle Scholar
- Shoemaker BA, Zhang D, Thangudu RR, Tyagi M, Fong JH, Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR: Inferred Biomolecular Interaction Server--a web server to analyze and predict protein interacting partners and binding sites. Nucleic Acids Res. 2009, D518-524. Epub 2009 Oct 2020, 38 DatabaseGoogle Scholar
- Teyra J, Pisabarro MT: Characterization of interfacial solvent in protein complexes and contribution of wet spots to the interface description. Proteins. 2007, 67 (4): 1087-1095. 10.1002/prot.21394.View ArticlePubMedGoogle Scholar
- Samsonov S, Teyra J, Pisabarro MT: A molecular dynamics approach to study the importance of solvent in protein interactions. Proteins. 2008, 73 (2): 515-525. 10.1002/prot.22076.View ArticlePubMedGoogle Scholar
- Samsonov SA, Teyra J, Anders G, Pisabarro MT: Analysis of the impact of solvent on contacts prediction in proteins. BMC Structural Biology. 2009, 9: 22-10.1186/1472-6807-9-22.PubMed CentralView ArticlePubMedGoogle Scholar
- van Dijk AD, Bonvin AM: Solvated docking: introducing water into the modelling of biomolecular complexes. Bioinformatics. 2006, 22 (19): 2340-2347. 10.1093/bioinformatics/btl395.View ArticlePubMedGoogle Scholar
- Timmer MS, Stocker BL, Seeberger PH: Probing glycomics. Current Opinion in Chemical Biology. 2007, 11 (1): 59-65. 10.1016/j.cbpa.2006.11.040.View ArticlePubMedGoogle Scholar
- Kim TH, Ren B: Genome-wide analysis of protein-DNA interactions. Annual Review of Genomics & Human Genetics. 2006, 7: 81-102. 10.1146/annurev.genom.7.080505.115634.View ArticleGoogle Scholar
- Lee S, Blundell TL: BIPA: a database for protein-nucleic acid interaction in 3D structures. Bioinformatics. 2009, 25 (12): 1559-1560. 10.1093/bioinformatics/btp243.View ArticlePubMedGoogle Scholar
- Ranzinger R, Herget S, Wetter T, von der Lieth CW: GlycomeDB - integration of open-access carbohydrate structure databases. BMC Bioinformatics. 2008, 9: 384-10.1186/1471-2105-9-384.PubMed CentralView ArticlePubMedGoogle Scholar
- Jayaram B, Jain T: The role of water in protein-DNA recognition. Annual Review of Biophysics & Biomolecular Structure. 2004, 33: 343-361. 10.1146/annurev.biophys.33.110502.140414.View ArticleGoogle Scholar
- Tschampel SM, Woods RJ: Quantifying the role of water in protein-carbohydrate interactions. J Phys Chem A. 2003, 107 (43): 9175-9181. 10.1021/jp035027u.PubMed CentralView ArticlePubMedGoogle Scholar
- Slabicki M, Theis M, Krastev DB, Samsonov S, Mundwiller E, Junqueira M, Paszkowski-Rogacz M, Teyra J, Heninger AK, Poser I: A genome-scale DNA repair RNAi screen identifies SPG48 as a novel gene associated with hereditary spastic paraplegia. PLoS Biology. 2010, 8 (6): e1000408-10.1371/journal.pbio.1000408.PubMed CentralView ArticlePubMedGoogle Scholar
- Keskin O, Nussinov R: Favorable scaffolds: proteins with different sequence, structure and function may associate in similar ways. Protein Eng Des Sel. 2005, 18 (1): 11-24. 10.1093/protein/gzh095.View ArticlePubMedGoogle Scholar
- Gao M, Skolnick J: Structural space of protein-protein interfaces is degenerate, close to complete, and highly connected. Proc Natl Acad Sci USA. 2010, 107 (52): 22517-22522. 10.1073/pnas.1012820107. Epub 22010 Dec 22513PubMed CentralView ArticlePubMedGoogle Scholar
- Ogmen U, Keskin O, Aytuna AS, Nussinov R, Gursoy A: PRISM: protein interactions by structural matching. Nucleic Acids Res. 2005, W331-336. 33 Web ServerGoogle Scholar
- Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C: SCOP: a structural classification of proteins database. Nucleic Acids Res. 2000, 28 (1): 257-259. 10.1093/nar/28.1.257.PubMed CentralView ArticlePubMedGoogle Scholar
- Teyra J, Doms A, Schroeder M, Pisabarro MT: SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces. BMC Bioinformatics. 2006, 7 (1): 104-10.1186/1471-2105-7-104.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhu H, Domingues FS, Sommer I, Lengauer T: NOXclass: prediction of protein-protein interaction types. BMC Bioinformatics. 2006, 7: 27-10.1186/1471-2105-7-27.PubMed CentralView ArticlePubMedGoogle Scholar
- Carugo O, Argos P: Protein-protein crystal-packing contacts. Protein Science. 1997, 6 (10): 2261-2263.PubMed CentralView ArticlePubMedGoogle Scholar
- Jmol. [http://jmol.sourceforge.net]
- Charbonnier S, Gallego O, Gavin AC: The social network of a cell: recent advances in interactome mapping. Biotechnol Annu Rev. 2008, 14: 1-28.View ArticlePubMedGoogle Scholar
- Kim PM, Lu LJ, Xia Y, Gerstein MB: Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 2006, 314 (5807): 1938-1941. 10.1126/science.1136174.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.