SCOWLP update: 3D classification of protein-protein, -peptide, -saccharide and -nucleic acid interactions, and structure-based binding inferences across folds

Background Protein interactions are essential for coordinating cellular functions. Proteomic studies have already elucidated a huge amount of protein-protein interactions that require detailed functional analysis. Understanding the structural basis of each individual interaction through their structural determination is necessary, yet an unfeasible task. Therefore, computational tools able to predict protein binding regions and recognition modes are required to rationalize putative molecular functions for proteins. With this aim, we previously created SCOWLP, a structural classification of protein binding regions at protein family level, based on the information obtained from high-resolution 3D protein-protein and protein-peptide complexes. Description We present here a new version of SCOWLP that has been enhanced by the inclusion of protein-nucleic acid and protein-saccharide interactions. SCOWLP takes interfacial solvent into account for a detailed characterization of protein interactions. In addition, the binding regions obtained per protein family have been enriched by the inclusion of predicted binding regions, which have been inferred from structurally related proteins across all existing folds. These inferences might become very useful to suggest novel recognition regions and compare structurally similar interfaces from different families. Conclusions The updated SCOWLP has new functionalities that allow both, detection and comparison of protein regions recognizing different types of ligands, which include other proteins, peptides, nucleic acids and saccharides, within a solvated environment. Currently, SCOWLP allows the analysis of predicted protein binding regions based on structure-based inferences across fold space. These predictions may have a unique potential in assisting protein docking, in providing insights into protein interaction networks, and in guiding rational engineering of protein ligands. The newly designed SCOWLP web application has an improved user-friendly interface that facilitates its usage, and is available at http://www.scowlp.org.


Background
Proteins are ubiquitous and interact with other molecules to perform their function, being conditioned to timing and location [1]. High-throughput technologies for the identification of protein interactions are generating a plethora of new data that should be independently studied to decipher the specific molecular role of the proteins and their cellular functions [2]. Structural determination methods at atomic resolution are indispensable for the functional characterization of protein interactions. However, techniques for isolating protein complexes and their structural determination are still encountering many challenges, and hence, experimental structural studies are not always possible. Alternatively, the rapid accumulation of protein complex structures in the PDB repository [3,4] provides an unprecedented opportunity for comparative analysis of protein interactions that can be used to predict binding regions and modes [5], to model protein complexes [6], and to improve our understanding of the principles governing * Correspondence: jt@biotec.tu-dresden.de; mayte@biotec.tu-dresden.de 1 Structural Bioinformatics BIOTEC TU Dresden, Tatzberg 47-51 01037 Dresden, Germany Full list of author information is available at the end of the article protein recognition [7]. To facilitate these comparative studies, it is necessary to generate tools that can allow us to analyze the available experimental structures of protein complexes [8]. In fact, several databases have been developed to structurally identify and classify protein-protein and protein-peptide interactions at family level, such as 3DID [9], SCOPPI [10] and SCOWLP [11]. Their classifications are based on collecting all interacting information per protein family, then calculating the binding residues similarity, and finally clustering the different binding regions and their partners. Another database, IBIS [12], contains binding regions and interacting partners inferred from the inspection of complexes formed by close homologous proteins instead of using structural classification schemes. Unlike the others, SCOWLP has been developed towards an atomic inspection of the interactions by applying physicochemical principles and by considering water molecules in protein interfaces, since solvent has been shown to be abundant and important in the mediation of protein interactions [13,14], and to improve protein contact predictions [15] and docking [16].
SCOWLP is a database and a web application containing a structural classification of protein binding regions at SCOP family level, including protein-protein and peptide interactions [11]. In the new updated version we present here, we additionally include two biologically relevant protein ligands that are quite abundant in the PDB: saccharides (SAC) and nucleic acids (NA) [17][18][19][20]. We also considered solvent in the definition of protein interactions, since it has been shown to be critical mediating both, protein-NA [21] and protein-SAC [22] interactions, highlighting the importance of the new SCOWLP to perform in-detail inspection of these kind of interactions. Another novelty in SCOWLP is the inclusion of predicted binding regions for each protein family. These predictions are inferred from significantly conserved binding regions belonging to structurally similar protein families independently of their fold [5,23]. It has been observed that proteins with different folds and functions can recognize molecules through binding regions containing similar local structural features or interacting motifs [24][25][26]. Therefore, the predicted binding inferences might become very useful to suggest alternative recognition regions for a protein family and to compare structurally similar binding regions from different families.
In summary, the updated SCOWLP classification with its newly designed web application represent a unique framework for the identification and comparative analysis of protein binding regions at atomic level. In the following sections, we explain the methodology used to build the database, and describe the architecture and possible usages of the web application.

Construction and Content
The new version of SCOWLP contains protein interactions with different ligand types, including proteins, peptides, nucleic acids (NA) and saccharides (SAC), taking into account interfacial solvent mediating protein interactions. Interacting residues and molecules are described at physicochemical level according to atom type and distance criteria. The following types of interactions are considered: hydrogen bonds, with distance donor/acceptor atom ≤ 3.6 Å; salt bridges, with charged atom distance ≤ 4 Å; Van der Waals, with atoms at distance ≤ 4.5 Å. Water-mediated residue interactions through a water molecule are also considered in the interface definition. The specific definition of the ligand types, and the protein interfaces is as follows: • Protein-protein interactions: The 4,194 protein families from SCOP V1.75 [27] are used to define protein domain boundaries within PDB files.
• Protein-peptide interactions: All PDB chains that are labeled "ATOM", not defined in SCOP, and shorter than 90 residues are considered peptides [28].
• Protein-nucleic acids interactions: PDB residues labeled as standard nucleotides are selected. We differentiate RNA form DNA by the presence of the O2' group in the ribose ring. Nucleic acid chains are merged in a single unit (double strand) if there is at least one inter-base atomic interaction among chains.
• Protein-saccharide interactions: The SAC molecules are extracted from PDB files labeled with the terms "saccharide", "carbohydrate" and/or "sugar", and containing HETATM atoms. We obtained 307 unique molecules (three-letter code) that include neither standard or modified nucleotides, nor SAC modifications bigger than the SAC moieties. The oligosaccharide units can be represented in the PDB either within a common HETATM type or as a collection of them. In the later case, SAC units are identified and merged together in a single oligosaccharide molecule using the PDB connectivity. SAC connectivity to protein domains is also checked to differentiate covalent (intra) and non-covalent (inter) protein interactions.
The classification of family binding regions has been performed by clustering interacting domains based on binding region similarities. As described previously [11], this value has been obtained based on the interacting residues overlap once they are mapped onto the structure-based sequence alignments of the family members. Likewise, the inferred binding regions are obtained among members of different families aligned using nonsequential structural alignments as previously described [5]. SCOWLP contains a total of 7,121 protein binding regions identified at zero similarity cutoff; from which 2,315 have more than one interface. In addition, it contains 8,985 predicted binding regions, 786 of them in protein families that so far lack binding information in the PDB.

Utility & Discussion
Web architecture SCOWLP web application follows the SCOP hierarchical levels to classify protein structures: RT-root, CF -class family, SF -super family, where families are finally listed. In addition, it extends the SCOP classification with three protein interaction levels: FA -family, BRbinding region and IF -interface. FA level contains a list of binding regions, defined as distinctive surface regions of a protein family used to recognize other molecules. BR level contains a list of interfaces distinguishing the different partners or ligands that a specific region can recognize. IF level contains a list of domains interacting with the same ligand, and that are linked to their original PDB file (e.g. 2oei:AB). Each binding region and interface is represented by identifiers (BR_24483 or IF_24486), since their association to an automatic description is not possible.
Query SCOWLP web application facilitates the hierarchical navigation through the different levels. It also contains a keyword search box for SCOP descriptions, PDB Ids and similar SCOP domain sequences identified using the BLAST algorithm. Some specific examples of the query capabilities are shown in the SCOWLP main page.

Search options
The interacting levels of SCOWLP (FA, BR, IF) contain the Search options located at the top of the web application ( Figure 1). This feature reduces the query and navigation to a specific list of interacting domains based on: i) Ligand type: proteins, peptides, DNA, RNA and saccharides; ii) Complex type: same (homo) or different (hetero) domain families; iii) Interaction type: same (intra) or to different (inter) domain chains; iv) Clustering cutoff: four cutoff values that define the final binding region clusters per family, as described in ref. [5] (default cutoff is zero). Note that the possibility of filtering out homo-dimers and intra-domain interactions may become very useful since their number is rather high and irrelevant for many analyses.

Data analysis
The information at each interacting level is displayed in the web page in consecutive steps. Each level shares a common web page composed by three interconnected frames to facilitate the analysis of the information (Figure 1): • Alignment frame: The structure-based sequence alignments of the corresponding domains are shown in each interacting level. In addition, the FA level also includes predicted binding regions, whereas BR level includes predicted interfaces, information inferred from other structurally-related protein families. The interacting residues are highlighted for better analysis of binding patterns. At IF level, these residues can be colored by their physicochemical properties (hydrophobic or hydrophilic), and by the water participation in the interactions (dry, wet or dual). The patterns and the physicochemical properties facilitate the distinction between conserved and variable interactions. The structure of each member can be visualized in the 3D Viewer frame upon click selection of the Jmol icon.
• 3D Viewer frame: The Jmol plug-in [31] is available for 3D visualization of the members shown in the Alignment frame as follows: The FA level displays a surface representation of the binding regions onto a representative structure for general visualization of the spatial locations used for recognition. The BR level allows the 3D visualization of the different interfaces containing different ligands and/or binding modes. The IF level, allows the user to visualize atomic details of all domains interacting with a common interface and to label them according to the physicochemical and solvent criteria selected in the alignment frame. Subtle structural differences within domains interacting with the same interface can be detected and analyzed.
• Control frame: This frame contains Jmol-interactive commands and includes links to the PDB and FA levels. In addition, the residue-residue interaction list is displayed with their physicochemical and water-mediation properties for each interacting domains.
The Frame Interconnectivity feature implemented in the IF level of the new SCOWLP allows the possibility to automatically highlight (i.e. centered zoom and color) a specific interacting residue in the 3D structure of the viewer upon clicking either the Alignment or the Control frame (Figure 1).

Applications
The main page contains examples of the SCOWLP main functionalities: i) exploration of the different surface regions that a protein family uses to recognize other molecules, ii) identification of the different ligands that a given region can recognize, and iii) comparative analysis of the interacting properties of a group of domains in complex with the same ligand. These analyses include the conservation and variability of not only interfacial residues but of their interactions, taking into consideration water-mediated interactions.
It is important to highlight that one of the main potentials of the new SCOWLP relies on the rapid identification of protein families able to recognize one or several ligand types through the same region. For instance, the selection of a specific combination of ligands in the search options ("protein and DNA") will show only those SCOP families that can recognize proteins and DNA thorough the same surface region. An example is the "heat-shock transcription factor" (search by keyword), that has a binding region at FA level (BR_1892) recognizing these two ligands. By clicking at the BR Id, two different interfaces are shown for this family binding region, one recognizing a protein and the other DNA, and therefore, responsible of different functions (Figure 2a). After clicking at the IF Id, the web application also allows the analysis of the interacting features governing the two interfaces.
Another key feature of the new SCOWLP is the possibility to obtain alternative binding regions and interfaces, as schematically shown in Figure 2b. These predicted binding regions are inferred from other structurally similar protein families. For instance, if we select in the Search options for "Binding region type: only predicted" we will filter those SCOP families that do not have any binding information available in the PDB yet (i.e. no structures in complex with other molecules), and that present predicted binding regions (786 families). Navigating through the SCOP hierarchy, the predicted binding regions for any of these families can be explored. An example is the "DEATH effector domain, DED" (search by keyword), where three different binding regions have been inferred. Detailed analysis of the binding regions (3D Viewer) and the structurally similar proteins and their ligands (Control frame) might be useful to explore putative binding regions and ligands for DED domain.
The number of protein-protein interactions obtained from large scale technologies is increasing [32], though protein-protein interaction networks contain a considerable amount of noise due to intrinsic errors [2]. Structural information has already been implemented into these networks in order to distinguish direct vs. indirect interactions between proteins, and competing vs. complementary interactions, whether two proteins interact to a third one through the same or a different binding region [8,33]. The structural classification and the predicted protein binding regions contained in SCOWLP might contribute towards a more accurate construction of protein-protein interaction networks.
In summary, the examples explained above point out the unique potential of SCOWLP for identification, analysis and prediction of protein interactions. Our ultimate goal is to facilitate the analysis of protein interactions that may contribute to a better understanding of the rules governing protein recognition and molecular function.

Conclusions
Here we present an updated and enhanced version of the SCOWLP database and its user-friendly web application. The new SCOWLP comprises its previous structural classification of all protein binding regions of the PDB at protein family level, including protein-peptide and water-mediated interactions, which has been enhanced by the inclusion of protein-nucleic acid and protein-saccharide interactions. In addition, the original functionality of SCOWLP towards the prediction of protein binding regions has been augmented by the inclusion of binding regions inferred from structurally similar proteins across fold space. The new SCOWLP database and its newly designed web application, which includes new helpful features such as frame interconnectivity, represent useful tools for the detailed analysis of the protein interactome. They provide the user a valuable assistance in suggesting protein recognition regions and comparing structurally similar interfaces from different protein families, which denotes their unique potential for gaining a better understanding of protein interaction networks and for guiding protein docking and rational ligand design.

Availability & Requirements
SCOWLP database and web application are freely available at http://www.scowlp.org. MySQL files containing the entire database can be downloaded for independent studies. SCOWLP classification is updated with each new SCOP release. Programming language: Oracle Java 6.0, Javascript. Requirements-serverside: Oracle Java 6.0 (or higher), Apache Tomcat 6.0 (or higher). Requirements-clientside: Oracle Java 6.0 (or higher), Mozilla Firefox 3.6 (or higher), Google Chrome 13 (or higher). Software source code will be provided upon request for non commercial usage. assistance. SS is funded by European Structural Funds (EFRE 301270 UT 135) and JT by the Klaus Tschira Stiftung. This work has been funded by the German Research Council SFB-TRR 67 (TPA7).