A protein domain interaction interface database: InterPare
© Gong et al; licensee BioMed Central Ltd. 2005
Received: 03 May 2005
Accepted: 25 August 2005
Published: 25 August 2005
Most proteins function by interacting with other molecules. Their interaction interfaces are highly conserved throughout evolution to avoid undesirable interactions that lead to fatal disorders in cells. Rational drug discovery includes computational methods to identify the interaction sites of lead compounds to the target molecules. Identifying and classifying protein interaction interfaces on a large scale can help researchers discover drug targets more efficiently.
We introduce a large-scale protein domain interaction interface database called InterPare http://interpare.net. It contains both inter-chain (between chains) interfaces and intra-chain (within chain) interfaces. InterPare uses three methods to detect interfaces: 1) the geometric distance method for checking the distance between atoms that belong to different domains, 2) Accessible Surface Area (ASA), a method for detecting the buried region of a protein that is detached from a solvent when forming multimers or complexes, and 3) the Voronoi diagram, a computational geometry method that uses a mathematical definition of interface regions. InterPare includes visualization tools to display protein interior, surface, and interaction interfaces. It also provides statistics such as the amino acid propensities of queried protein according to its interior, surface, and interface region. The atom coordinates that belong to interface, surface, and interior regions can be downloaded from the website.
InterPare is an open and public database server for protein interaction interface information. It contains the large-scale interface data for proteins whose 3D-structures are known. As of November 2004, there were 10,583 (Geometric distance), 10,431 (ASA), and 11,010 (Voronoi diagram) entries in the Protein Data Bank (PDB) containing interfaces, according to the above three methods. In the case of the geometric distance method, there are 31,620 inter-chain domain-domain interaction interfaces and 12,758 intra-chain domain-domain interfaces.
Proteins are the most important class of molecules in a cell. Most proteins function by interacting with other molecules, especially other proteins. The interactions among proteins are highly regulated and tightly conserved throughout evolution, [1, 2] mainly because unnecessary or unsatisfactory interaction (misinteraction) triggered by random mutations can lead to molecular dysfunction. Therefore, interaction interface regions are under pressure from natural selection and are more conserved  compared to other exposed non-interface regions of proteins. Protein "structural interactomics" to map all the protein domain interactions is becoming increasingly important as more complete genome sequences are made available [4–7]. Now scientists can map the whole human interactome bioinformatically , using ever-increasing experimental data coming from methods such as yeast two-hybrid analysis. Consequently, a higher resolution molecular interaction analysis is also becoming more important.
Since the 1970s, there has been much effort to determine the principles of protein-protein recognition. Pioneers in the field of protein-protein interaction, such as Chothia and Janin , have studied the physical and chemical properties of protein interaction sites that contribute to the recognition processes. Colman et al. [10, 11] focused on electrostatic and shape complementarity of interaction interfaces using EC (Electrostatic Complementarity) and shape correlation index, respectively. Argos  studied interfaces between protein subunits or protein domains. He not only investigated the physicochemical properties of protein interfaces, but also tried to understand the geometric features of protein interfaces using a spline function [13, 14]. Jones and Thornton  introduced a surface patch method to find out the parameters that contribute to the process of protein-protein interaction. Chakrabarti and Janin [16, 17] investigated the structure of interface region by dissecting it into core and rim based on different solvent accessibility. They also addressed the chemical properties of each region.
Recently, there has been a new trend in the study of protein interfaces. Several groups have introduced computational geometric and topology methods for the study of protein interfaces. Most importantly, the Voronoi diagram [18, 19, 23] has been used to study interfaces of protein complexes. As early as 1974, Richards [20, 21] first introduced the Voronoi diagram as an application for protein structure study, although not specifically as an interface analysis tool.
Despite all the efforts to unveil the underlying principles of protein-protein interaction for over 30 years, there has not been much progress at the fundamental level since the research by Chothia and Janin . The interface data derived from different approaches are not well maintained or widely shared amongst scientists. Fortunately, with the help of faster X-ray crystallography and NMR in structural biology, there has been an increase in the number of known three-dimensional protein structures. This 3D structure information is a good source of data for the study of protein interfaces.
Here, we introduce a large-scale protein interaction interface database called InterPare (http://interpare.net or http://psimap.org). InterPare presents interfaces between protein domains identified by three methods. First, the interface is detected by calculating the geometric distance between subunits of multidomain proteins or protein complexes in the PDB [22, 27]. In the second approach, buried protein regions are identified by calculating the accessible surface area (ASA) when they form a complex or an aggregate with other subunits or domains. These buried regions can be accessible to water when they are in a free subunit or one domain state. Finally, interfaces are defined by a geometric and topological approach using the Voronoi diagram [18, 19, 23]. InterPare presents protein interfaces defined by the Voronoi diagram. The interface structure of queried proteins, in the context of the whole protein configuration, can be viewed with three different molecular viewers on the results page. They are the Chime , Jmol , and InterFacer . InterPare also provides the atomic coordinate files for protein surface, interior, and interface for further analysis.
Construction and content
Interface identification methods
Calculating the geometric distance between atoms in different domains (PSIMAP method).
Detecting the differences of Accessible Surface Area (ASA) from all the residues in two states: the detached individual subunit state and the multimeric state.
Calculating Voronoi diagrams.
The geometric distance method checks the distance between atoms in two interacting domains.
The Accessible Surface Area (ASA) method detects protein regions that are buried and hence excluded from a solvent when forming a multimer or a complex.
If two or more subunits form a protein complex or aggregate, they have to lose a portion of area that was accessible by a solvent (typically water). With the ASA method, we define interface residues as residues that have lost more than 1 Å2 solvent accessible surface area (ASA) upon aggregation or complexation [15, 38, 39]. It can be formulated as follows.
The Voronoi diagram, also known as Dirichlet Tessellation, has been widely used in the fields of science and engineering. The Voronoi diagram was first introduced as an application for the study of protein structures by Richards [20, 21]. There is a report on defining molecular interfaces by Power Diagram; Voronoi Diagram on a weighted point set . We used the same protocol suggested by Varshney et al. , but applied our own polygon filtering method and calculated interfaces only between domains instead of calculating them on protein complexes.
To have polygons only close to the interaction region, marginal polygons need to be filtered out because those are irrelevant to the interacting interfaces. We removed all the marginal polygons by using our two-stage polygon filtering method. At first stage, we removed polygons which do not contain edges defined by interface atoms. Interface atoms are those in the interface residue defined by ASA method (see above). The default van der Waals radii of atoms were taken from Chothia . Polygons are further filtered out if they have one or more vertices which are beyond 5 Angstrom distance from the interface atoms. For each face in P (Figure 3), if two atoms defining a face belong to different domains from each other, we call such a face an interface-face. Let us define interface-cells as cells in the power-diagram P that have at least one interface-face. Let us define interface-atoms to be those atoms whose cells are interface-cells. In the InterPare database, all the interface-atoms between two domain pairs are stored in a PDB-style file format.
The protein interfaceome can be defined as the whole set of protein interaction interfaces found in cells. There can be many methods to define such an interface data set. We use the concept of the hierarchical classification of protein domains from SCOP. We extend the SCOP classification to molecular interfaces. The advantage of this approach is that each interface can be classified in the context of domain evolution. SCOP Superfamily is the level of classification where protein structures are clearly known to be related within the classification group. The protein Family level in SCOP is a more functionally relevant class, where each member of the Family is related and functionally similar. Below Family, there are individual domains. We applied three algorithms to find interfaces associated with SCOP. Any protein domain classification system, such as FSSP  and CATH , can also be used. The main contribution to structural bioinformatics is that interfaces can be searched and compared (hence InterPare) by computer.
We expect that hierarchically similar clusters in the interfaceome will have highly conserved interfaces to maintain their interaction partners. This can provide a new level of functional prediction capability for the designing of novel molecules that can interface with proteins and hence control protein activities.
InterPare is an open and public database server for protein interaction interface information. It contains large-scale interface data for proteins whose 3D-structures are known. We identified 31,620 inter-chain interfaces and 12,758 intra-chain interfaces. At this moment, there are 10,583, 10,431, and 11,010 PDB entries whose domain interaction interfaces have been identified according to geometric distance, ASA, and Voronoi diagram methods, respectively. These interfaces are based on protein domains which are from the SCOP database. By using SCOP, InterPare is tightly associated with the domain classification hierarchy, making the search and lookup convenient.
Availability and requirements
List of abbreviations used
Protein Structural Interactome map
Protein Data Bank
Structural Classification Of Protein structure
Fold classification based on Structure-Structure alignment of Proteins
We thank colleagues at Biomatics Lab in NGIC and KAIST. We also thank all the scientists in the field of protein-protein interaction. This project was supported by Biogreen21 program of RDA, R01-2004-000-10172-0 grant of KOSEF, and M1040701000105N070100100 grant of MOST. JB is supported by a grant from KRIBB Research Initiative Program. We especially thank Maryana Bhak for editing this manuscript.
- Bolser DM, Park J: Biological network evolution hypothesis applied to protein structural interactome. Genomics and Informatics 2003, 1: 7–19.Google Scholar
- Park J, Bolser D: Conservation of Protein Interaction Network in Evolution. Genome Informatics 2001, 12: 135–140.PubMedGoogle Scholar
- Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES: Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? Protein Science 2004, 13: 190–202.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim WK, Bolser DM, Park JH: Large scale co-evolution analysis of Protein Structural Interlogues using the global Protein Structural Interactome Map (PSIMAP). Bioinformatics 2004, 20: 1138–1150.View ArticlePubMedGoogle Scholar
- Bolser DM, Dafas P, Harrington R, Park J, Schroeder M: Visualisation and graph-theoretic analysis of a large-scale protein structural interactome. BMC Bioinformatics 2003, 4: 45.PubMed CentralView ArticlePubMedGoogle Scholar
- Park D, Lee S, Bolser D, Schroeder M, Lappe M, Oh D, Bhak J: Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map). Bioinformatics 2005, 21: 3234–3240.View ArticlePubMedGoogle Scholar
- Moon HS, Bhak J, Lee KH, Lee D: Architecture of basic building blocks in protein and domain structural interaction networks. Bioinformatics 2005, 21: 1479–1486.View ArticlePubMedGoogle Scholar
- Kim HG, Park J, Han KS: Predicting Protein Interactions in Human by Homologous Interactions in Yeast. Lecture Notes in Computer Science 2003, 2637: 159–165.View ArticleGoogle Scholar
- Chothia C, Janin J: Principles of protein-protein recognition. Nature 1975, 256: 705–708.View ArticlePubMedGoogle Scholar
- McCoy AJ, Epa VC, Colman PM: Electrostatic Complementary at Protein/Protein Interfaces. J Mol Biol 1997, 268: 570–584.View ArticlePubMedGoogle Scholar
- Lawrence MC, Colman PM: Shape complementarity at protein/protein interfaces. J Mol Biol 1993, 234: 946–950.View ArticlePubMedGoogle Scholar
- Argos P: An investigation of protein subunit and domain interfaces. Protein Eng 1998, 2: 101–113.View ArticleGoogle Scholar
- Harder RL, Desmarais RN: Interpolation Using Surface Splines. J Aircraft 1972, 9: 189–191.View ArticleGoogle Scholar
- Meinguet J: Multivariate Interpolation at Arbitrary Points Made Simple. J Appl Math Phys 1979, 30: 292–304.View ArticleGoogle Scholar
- Jones S, Thornton JM: Analysis of Protein-protein interaction sites using surface patches. J Mol Biol 1997, 272: 121–132.View ArticlePubMedGoogle Scholar
- Chakrabarti P, Janin J: Dissecting Protein-Protein Recognition Sites. Proteins 2002, 47: 334–343.View ArticlePubMedGoogle Scholar
- Bahadur RP, Chakrabarti P, Rodier F, Janin J: Dissecting Subunit Interfaces in Homodimeric Proteins. Proteins 2003, 53: 708–719.View ArticlePubMedGoogle Scholar
- Ban YEA, Edelsbrunner H, Rudolph J: Interface surfaces for protein-protein complexes. Proceedings of the Research in Computational Molecular Biology, San Diego 2004, 27–31.Google Scholar
- Poupon A: Voronoi and Voronoi-related tessellations in studies of protein structure and interaction. Curr Opin Struct Biol 2004, 14: 233–241.View ArticlePubMedGoogle Scholar
- Richards FM: The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol 1974, 82: 1–14.View ArticlePubMedGoogle Scholar
- Richards FM: Area, volumes, packing and protein structures. Ann Rev Biophys Bioeng 1977, 6: 151–176.View ArticleGoogle Scholar
- Protein Data Bank[http://www.rcsb.org/pdb]
- Kim DS, Cho YS, Kim DG, Kim SS, Bhak J, Lee SH: Euclidean Voronoi Diagrams of 3D Spheres and Applications to Protein Structure Analysis. Japan Journal of Industrial and Applied Mathematics 2005, 22: 251–265.View ArticleGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acid Res 2000, 28: 235–242.PubMed CentralView ArticlePubMedGoogle Scholar
- Structural Classification Of Proteins[http://scop.mrc-lmb.cam.ac.uk/scop]
- Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536–540.PubMedGoogle Scholar
- Brenner SE, Koehl P, Levitt M: The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Res 2000, 28: 254–256.PubMed CentralView ArticlePubMedGoogle Scholar
- Park J, Lappe M, Teichmann S: Mapping Protein Family Interactions: Intramolecular and Intermolecular Protein Family Interaction Repertoires in the PDB and Yeast. J Mol Biol 2001, 307: 929–938.View ArticlePubMedGoogle Scholar
- Han KS, Park BK, Kim HG, Hong JS, Park J: HPID: The Human Protein Interaction Database. Bioinformatics 2004, 20: 2466–2470.View ArticlePubMedGoogle Scholar
- Lappe M, Park J, Niggemann O, Holm L: Generating protein interaction maps from incomplete data: application to Fold assignment. Bioinformatics 2001, (Suppl 1):149–156.
- Gong SS, Yoon GS, Jang IS, Bolser DM, Dafas P, Schroeder M, Choi HS, Cho YB, Han KS, Lee SH, Choi HH, Lappe M, Holm L, Kim SS, Oh DH, Bhak JH: PSIbase: a database of Protein Structural Interactome map (PSIMAP). Bioinformatics 2005, 21: 2541–2543.View ArticlePubMedGoogle Scholar
- Dafas P, Bolser DM, Gomoluch J, Park J, Schroeder M: Using convex hulls to extract interaction interfaces from known structures. Bioinformatics 2004, 20: 1486–1490.View ArticlePubMedGoogle Scholar
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci 1996, 93: 13–20.PubMed CentralView ArticlePubMedGoogle Scholar
- Jones S, Marin A, Thornton JM: Protein domain interfaces: characterization and comparison with oligomeric protein interfaces. Protein Engineering 2000, 13: 77–82.View ArticlePubMedGoogle Scholar
- Hubbard SJ, Thornton JM: NACCESS. Department of Biochemistry and Molecular Biology, University College, London; 1993.Google Scholar
- Lee B, Richards FM: The Interpretation of Protein Structures: Estimation of Static Accessibility. J Mol Biol 1971, 55: 379–400.View ArticlePubMedGoogle Scholar
- Chothia C: The nature of the accessible and buried surfaces in proteins. J Mol Biol 1976, 105: 1–14.View ArticlePubMedGoogle Scholar
- Miller S, Janin J, Lesk AM, Chothia C: Interior and surface of monomeric proteins. J Mol Biol 1987, 196: 641–656.View ArticlePubMedGoogle Scholar
- Varshney A, Brooks F, Richardson D: Defining, Computing, and Visualizing Molecular Interfaces. IEEE Visualization 1995, 95: 36–43.Google Scholar
- Halperin D, Overmars MH: Spheres, Molecules, and Hidden Surface Removal. The Proceedings of the 10th Annual ACM Symposium of Computational Geometry 1994, 113–122.Google Scholar
- Dwyer RA: Higher-Dimensional Voronoi Diagrams in Linear Expected Time. The Proceedings of the 5th Annual ACM Symposium on Computational Geometry 1989, 326–333.Google Scholar
- Teichmann SA, Chothia C, Church GM, Park JH: Fast assignment of protein structures to sequences using the intermediate sequence library PDB-ISL. Bioinformatics 2000, 16: 117–124.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389–3402.PubMed CentralView ArticlePubMedGoogle Scholar
- Holm L, Sander C: Mapping the protein universe. Science 1996, 273: 595–602.View ArticlePubMedGoogle Scholar
- Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH – A Hierarchic Classification of Protein Domain Structures. Structure 1997, 5: 1093–1108.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.