- Research article
- Open Access
Oligomeric protein structure networks: insights into protein-protein interactions
BMC Bioinformatics volume 6, Article number: 296 (2005)
Protein-protein association is essential for a variety of cellular processes and hence a large number of investigations are being carried out to understand the principles of protein-protein interactions. In this study, oligomeric protein structures are viewed from a network perspective to obtain new insights into protein association. Structure graphs of proteins have been constructed from a non-redundant set of protein oligomer crystal structures by considering amino acid residues as nodes and the edges are based on the strength of the non-covalent interactions between the residues. The analysis of such networks has been carried out in terms of amino acid clusters and hubs (highly connected residues) with special emphasis to protein interfaces.
A variety of interactions such as hydrogen bond, salt bridges, aromatic and hydrophobic interactions, which occur at the interfaces are identified in a consolidated manner as amino acid clusters at the interface, from this study. Moreover, the characterization of the highly connected hub-forming residues at the interfaces and their comparison with the hubs from the non-interface regions and the non-hubs in the interface regions show that there is a predominance of charged interactions at the interfaces. Further, strong and weak interfaces are identified on the basis of the interaction strength between amino acid residues and the sizes of the interface clusters, which also show that many protein interfaces are stronger than their monomeric protein cores. The interface strengths evaluated based on the interface clusters and hubs also correlate well with experimentally determined dissociation constants for known complexes. Finally, the interface hubs identified using the present method correlate very well with experimentally determined hotspots in the interfaces of protein complexes obtained from the Alanine Scanning Energetics database (ASEdb). A few predictions of interface hot spots have also been made based on the results obtained from this analysis, which await experimental verification.
The construction and analysis of oligomeric protein structure networks and their comparison with monomeric protein structure networks provide insights into protein association. Further, the interface hubs identified using the present method can be effective targets for interface de-stabilizing mutations. We believe this analysis will significantly enhance our knowledge of the principles behind protein association and also aid in protein design.
It is well known that a vast majority of cellular functions are mediated through protein-protein and protein-DNA interactions. Protein association is implicated in cellular signal transduction, antigen-antibody binding, in the regulation of gene expression and in the functioning of a huge variety of other constitutive multimers, where the multimeric state is the biologically active state. Hence, extensive research has been carried out to identify and to understand the underlying principles of protein association and interactions. Some insights to such interactions at atomic level have emerged from the analysis of large number of high-resolution crystal structures. Such investigations involve the characterization of the geometrical, chemical, and the energetic features of the interfaces as explained in the various reviews [1–6]. Specific studies include obtaining residue preferences at the interfaces , calculations of geometric parameters and shape complementarities between the interacting protein chains [8–11], calculations of the loss in accessible surface upon multimerization [12–15], elucidation of the role of hydrogen bonds, salt-bridges and hydrophobic and polar interactions at protein interfaces [16–21] and the analysis of conservation of residues at protein interfaces [22–26]. Various investigators have identified and analyzed energetic hot spots in protein interfaces using varied approaches [26–29]. Haliloglu et al., have compared protein folding and protein binding using vibrational motions of interface hot spots and conserved residues and conclude that both processes involve similar packing of amino acid residues . They also provide a method for identifying hot spots at binding interfaces. Further, Ofran and Rost have classified and analyzed the differences between six interface types including obligatory and transient homo and hetero oligomers . De et al., have also distinguished obligatory and non-obligatory interfaces using differences in the amino acid contacts and interactions patterns between the two interface types . Bahadur et al., have distinguished the biological oligomers from non-specific oligomers caused due to crystal packing . There have also been speculations about whether folding and binding are completely de-coupled with each other or whether they occur simultaneously, one coupled with the other . Wolynes and co-workers through simulations present that even if the monomers involved in binding may be stable separately, binding might preferably occur through unfolded intermediates, thus implying that folding and binding may be coupled in vivo and driven by the native state topology of the functional protein . Further, a community-wide evaluation of the significance and success of different methods used in the prediction of protein-protein interactions and protein docking has been carried out (CAPRI) and has been hugely successful . However, though there have been significant advances in methods of protein docking, those that are generally used in the identification of binding sites in monomer surfaces and the prediction of protein-protein interactions sites are far from satisfactory. Hence, newer approaches are required to get more insights into the factors contributing to protein-protein interactions.
We have earlier carried out an analysis on a limited set of twenty homodimers to understand the principles of protein-protein interactions from a graph perspective . This analysis was directed towards identifying clusters of amino acid residues with strong interactions at the protein interfaces, the nature of the residues involved in these interface clusters and the accessibility and conservation of these interface cluster forming residues. We had also proposed a simple and straightforward method to identify interacting surfaces on protein monomers, which was highly successful in that dataset. The present study focuses on the network of amino acid interactions across protein interfaces and has been carried out on a larger dataset of protein homo as well as hetero multimers. Recently, Del Sol and co-workers have investigated protein-protein complexes from the small-world network perspective using parameters like clustering coefficients and betweenness, where the central residues identified at the interfaces, have been found to correlate with the experimentally determined hotspots . Further, the same group also proposes the rewiring of the small-world networks at protein interfaces to form clusters of central residues at the interfaces . The current analysis also considers the protein structure in its multimeric form as a network of non-covalently interacting amino acids. However, we use a different definition of nodes and edges than the ones used by Del Sol and co-workers [37, 38], and have also incorporated an interaction strength term in the network construction and in the analysis of different parameters to understand the network topology of protein multimers. Since we know that protein-protein interactions are mainly mediated through non-covalent interactions, the connections (edges) between amino acids (nodes) are defined on the basis of the strength of the non-covalent interactions, as evaluated from the normalized number of contacts between them. The results are analyzed in terms of the network properties such as the hubs (nodes with greater number of edges) and clusters of amino acid residues in the protein complex at a given interaction strength, with particular focus at the protein-protein interface. Such an approach gives a global perspective of the interactions across the interface, which is difficult to obtain from pair-wise interaction or loss of accessible surface area analysis. For example, our earlier analysis on the clusters of interacting residues at the protein interface has given insights regarding the sequence signatures responsible for the different types of quaternary association in legume lectins and has also helped in the identification of hot spots in the α-α dimeric interface of Escherichia coli RNA polymerase. The network representation presented here has also been used earlier to identify structural domains and domain interface residues in multi-domain protein using a graph spectral method . However, in this analysis, we focus on the identification and analysis of amino acid clusters and hubs at protein-protein interfaces based on a generic network approach.
Interesting observations made from the present analysis on protein multimers include the fact that the strength of interfaces evaluated using the interface clusters and hubs identified by present method correlate well with the kinetic and thermodynamic parameters of complex formation evaluated experimentally. Further, the interface hubs identified here also correlate well with the experimentally identified hot spots on the basis of binding free energy. This result indicates that hotspots can be associated with interface hubs, the identification of which can be useful in rationally designing interface de-stabilizing mutants. Further, a comparison of the interface hubs to the hubs within the protein monomer and with the non-hubs at the interface show significant differences in the interface hub properties, such as the contribution of the charged interactions being considerably higher at the interfaces. The analysis of the interface clusters has also shown that the protein interfaces are as strong as or stronger than the protein cores in more than half the protein complexes considered in the dataset. Thus, the present algorithm has given a new perspective into analyzing protein structures in general and protein complexes in specific, which has shed light onto some of the factors involved in protein association.
Results and discussion
The concept of networks in biology has been explored in the areas of protein interaction networks, metabolic networks etc . The idea of considering protein structures as a network of amino acid connections is relatively new and has provided insights into protein structure, stability and folding. For instance, Vendruscolo et al., and Dokholyan et al., [43–45] have used a similar approach to understand protein folding, where as Atilgan et al.,  and Green and Higman  have represented protein structures as amino acid networks to analyze residue fluctuations and stability of the protein structures. Del Sol and O'meara have analyzed protein complexes as small-world networks where the central residues in the interfaces correlate with experimental hot spots [37, 38]. We have previously used a similar network representation to understand the factors affecting protein stability where the amino acid residues are the nodes in the protein structure network and the strength of the non-covalent interactions between them are evaluated for the edge-determining criterion . In the present work, this approach has been extended to protein quaternary structures rather than just protein tertiary structures so as to understand the factors responsible for protein association. We have extracted the interface cluster (a set of connected residues) and hub (a highly connected residue) information from the network representation of protein multimers as explained in the methods section. This has given insights into the role of specific amino acid residues in stabilizing inter-subunit interfaces. The hubs in many real-world networks are known to provide robustness to the networks against random attack. However, targeted attacks on these hubs are known to destabilize them. In the multimeric protein structure networks, the interface hubs can be considered as the centers providing stability to these networks due to their extensive interactions and their presence at the oligomeric interface. Hence, the mutation of a hub can lead to the destabilization of the interface. Therefore, the hubs can be identified as hot spots at protein interfaces that can be targeted for interface de-stabilizing mutations.
A non-redundant set of 455 protein oligomers is used in this study. The oligomeric protein structures as a whole are represented as graphs, with each amino acid as a node and the strength of non-covalent interactions (I, evaluated as given in the methods section) between them determining the edges. Those amino acid pairs with interaction strength greater than a user-defined cutoff (Imin) are connected by edges. Such graphs generated at various Imin values, have been analyzed in this section to understand the details of protein-protein interfaces at the network level. Specifically, (1) the analysis of the interface clusters (defined as distinct clusters of amino acid residues with contributions from more than one chain of the protein oligomer) and interface hubs (defined as amino acid residues interacting with five or more residues with at least one residue belonging to a different chain than itself) have been presented. (2) The strength of interface interaction, as measured from the clusters and hubs identified at different Imin values has been compared with the experimentally determined dissociation constants for known complexes. Finally, (3) the relevance of interface hubs to the stability of the oligomer is pointed out comparing some of the identified interface hubs with experimental results.
Analysis of interface clusters
Correlation of interface clusters with loss of accessible surface area and composition of interface clusters
Interface clusters have been identified and analyzed for the loss of accessible surface area, the interface cluster composition and strength of the interface clusters based on Imin and number of residues participating in interface cluster composition. The results of these investigations have been summarized in the two figures in the additional material (Additional file 1, Figures A1 and A2). The comparison of the residues that formed interface clusters at Imin = 6% with those that have lost accessible surface area on oligomerization (δASA) showed a very good correlation (correlation coefficient = 0.83, Figure A1) indicating that the clusters identified at Imin = 6% are a good representation of the oligomeric interfaces. Hence, all generic cluster analyses are carried out at this Imin. This correlation decreases with increase or decrease of Imin since higher Imins give specific strong clusters that fail to represent the complete interface and at lower Imin, the monomeric protein core also becomes a part of the interface cluster. The interface cluster composition at Imin = 6% also correlated very well with the residue composition obtained from δASA calculations (Figure A2) with preference for residues like Arginine, Histidine, Tryptophan, Tyrosine and Phenyl Alanine, though other residues are not left out. Such preferences have also been observed in several earlier interface analyses [7, 15, 33, 36]. The present investigation in addition has provided information regarding the size and strength of oligomeric protein interfaces, through the parameters such as the number of interface clusters, the number of residues constituting the interface clusters and the size of the largest interface cluster. This is discussed in detail in a later section where experimental dissociation constants are compared with the amino acid cluster and hub results from our analysis.
Largest cluster analysis
The size of the largest cluster is one of the parameters that are generally used to analyze the behavior and properties of complex networks . Here we have identified the largest cluster and its size (in terms of number of residues) in the protein complexes considered in the present dataset at varying Imin values. A plot of normalized size of the largest cluster (normalized with respect to the total number of residues in the complex) Vs Imin is shown in Figure 1. Interestingly, all the protein multimers show a very similar profile of the largest cluster plot (Figure 1) with a transition around Imin = 4%. (Incidentally, such a profile was also observed in the case of monomers, with transition around the same Imin value ). The largest cluster in the complex at a given Imin however, may or may not include the interface region. Undoubtedly, at Imin = 0% the largest cluster includes the interface in all the cases, since the whole protein exists as one big cluster at this Imin. As the Imin is increased, the cluster size decreases and the largest cluster may be within the monomer or at the interface. This feature depends on the specific nature of the multimer complex and can be used to evaluate the strength of the interface with respect to the core of monomers. Since the clusters obtained at Imin = 6% are significantly strong, this Imin can be used to identify the strength of the interface using the following criterion. If the largest cluster at this Imin is found at the interface, then it is a strong interface, where the interface is stronger than its monomeric core. Our analysis has shown that 291 protein multimers in a dataset of 455 proteins have such strong interfaces. The PDB list of these 291 proteins is given in the Additional file 1, Table A1a. The rest of the dataset that do not form such strong interfaces is given in the Additional file 1, Table A1b.
Identification of interface patches
Apart from the interface strengths, the present study involving interface cluster analysis also enables us to identify the number of interacting regions or patches that constitute the interface. For example, if a protein dimer shows two interface clusters at a higher Imin (6%), which do not merge at a lower Imin (4%), then it clearly indicates the presence of more than one patch in the interface. This is shown in Figures 2a and 2b, which show the interface clusters obtained in the protein dimer of Urate oxidase at Imin = 6% and Imin = 4% respectively. The dimer forms two distinct interface clusters that are strong and independent at both Imin = 6% and at Imin = 4% without merging at the lower Imin. This indicates the presence of two separate patches at the interface. Thus, the present method of interface cluster analysis can provide information regarding the size, strength and the constitution of interfaces involved in protein oligomers.
Analysis of interface hubs
Hub composition at interfaces
The interface hubs are defined as those residues, which interact with five or more residues, out of which at least one residue belongs to the other monomer. Unlike interface cluster analyses (carried out mainly at Imin = 6%), analyses of interface hubs are carried out from Imin = 0% to Imin = 4%, because, beyond Imin = 4%, we do not get significant number of interface hubs for statistically significant results. The residue composition of the interface hubs, identified at Imin = 0% and Imin = 4% are presented in Figure 3. The hub composition of the non-interface regions (i.e., the other regions of the protein multimer devoid of the interface) at these Imins is also presented in the figure so as to compare the residue composition of the hubs in the interfaces and the non-interface regions. The values presented in the figure are percentage compositions with respect to the residue composition in the complete dataset.
It is evident from Figure 3 that Arginine, Tryptophan, Tyrosine, Phenyl Alanine, Histidine and Methionine are highly preferred as hubs at the protein interfaces at both higher and lower Imins, making them the strong interface hubs. The interface hub preferences of hydrophobic Leucine, Isoleucine and Valine are seen at lower Imins, making them the weak interface hubs. This overall profile is similar to the non-interface hub preference profile (Figure 3, ). However, there are some differences between the interface and non-interface hub preferences as can be seen from Figure 3. These include the fact that the interface hub preferences for the hydrophobic and aromatic residues are much lower when compared to that in the non-interface regions at the same Imin. Further, the interface hub preferences of the charged residues are comparable to their non-interface hub preferences, though the non-interface regions are much larger than the interface regions. The differences between the preferences in the interface and the non-interface hubs become pronounced at higher Imins, with the predominance of Arginine and other charged amino acids in the interface hubs where as the aromatic residues predominate the non-interface hubs at higher Imins. The percentage of charged hubs is much higher in the interface regions than the non-interface regions and the percentage of aromatic and hydrophobic hubs is higher in the non-interface regions than the interface regions. Further, Arginine seems to make more contribution at the interface at both high and low Imins, ahead of the aromatic and hydrophobic amino acids (except a slight preference for Tyrosine and Tryptophan over Arginine at Imin = 0%), unlike the non-interface hubs, where either the hydrophobic or aromatic residues or both are preferred ahead of Arginine at any Imin. This shows that the protein interfaces have major contributions from the charged amino acid residues. The preference of Arginine and charged interactions at the protein interfaces has also been shown by a few previous analyses [7, 15, 17, 33, 36]. The present analysis also confirms this aspect with the charged interactions dominating the interfaces to a large extent.
Another important observation that can be made by comparing Figure A2 (see Additional file 1) and Figure 3 is that although the residues like Leucine, Isoleucine, Valine and Lysine are found significantly in the interface clusters even at higher Imins, they are not preferred as interface hubs at these Imins. Hence, there is marked difference in the residue preferences in interface clusters and the hub preferences at the interfaces.
Preferences of hub-forming residues to interact with other residue types
We have already seen the differences between the hub preferences in the interfaces and the non-interface regions from Figure 3. It would also be interesting to identify the preferences in the amino acid interactions that can lead to formation of the strong hubs at protein interfaces. Although the interacting preferences at the protein interfaces have been studied earlier , we present the same from the hub perspective here. The 20 × 20 matrix giving the preference of each of the 20 amino acid hubs to interact with themselves and with the other 19 residues at the interface regions (normalized percentages) at Imin = 4% are presented in Table 1. The percentage of hubs of a particular residue type is also given. The obvious interaction preferences are between the positively charged Arginine and the negatively charged Aspartate and Glutamate, the interactions of the aromatic residues with other aromatic and hydrophobic residues and the preferences of the hydrophobic residues for other hydrophobic and aromatic residues. However, there seem to be some interesting preferences, apart from the normally seen salt-bridge, hydrogen bonds and aromatic stacking interactions. The significant ones include the preference of Arginine hubs to interact with itself in spite of its positive charge. Similarly, the preference of Histidine hubs to interact with Asparagine, Proline hubs with Phenyl alanine and Tyrosine, Leucine hubs with Arginine and Tyrosine hubs with Arginine and other charged residues are also noticed. There seems to be a preference for charged and polar interactions and those involving planar charge de-localized systems. A few examples are presented in detail in the next section.
It is to be noted that a similar 20 × 20 matrix for the non-interface hubs at Imin = 4%, shows a different profile (see Additional file 1, Table A2), where the Arg-Arg, His-Asn, Leu-Arg, Tyr-charged and Tyr-polar interaction preferences are much lower than what is observed for the interface hubs shown in Table 1. In the non-interface hubs, the Tyr-Aromatic and Tyr-Hydrophobic interactions are more preferred than Tyr-charged or Tyr-polar interactions. Similarly, Arg-Aromatic interactions are also more preferred than Arg-Arg interactions and Leu-Leu and Leu-Phe are more preferred than Leu-Arg in case of the non-interface hubs.
Interactions of interface hubs
We have seen from Figure 2 and Table 1 that Arginine, Histidine and Tyrosine form some of the important hubs in the protein interfaces with some interesting interacting partners. We will discuss the interactions of some of these interface hubs in this section.
(a) Arginine hubs
Arginine has been shown to play a major role at protein interfaces [15, 17, 33, 36]. In the present analysis, we find that there is a preference for Arginine in the interface clusters and in the interface hubs in comparison to the other amino acid residues. We also find that the interface Arginine hubs interact significantly with other Arginine and aromatic residues from the same chain and from other chains, apart from the normal salt-bridge interactions that they are most commonly involved in. Figure 4a shows some of the details of the interactions made by the interface Arginine hubs, which form a large interface cluster in 5-aminolevulinic acid dehydratase tetramer. Here, there are three Arginine hubs (Arg 17 C, Arg 14 C and Arg 186 B) and one of the Arginine hubs (Arg 17 C) interacts with four other Arginine residues (Arg 14 C, Arg 20 C, Arg 186 B and Arg 198 B), coming from two different chains, simultaneously. Moreover, the Arginine residues are also found to form stacking interactions with the π system of the aromatic Tyrosine residue and hydrogen bonds with Threonine, Serine and Glutamine side-chains. Further, there are negatively charged Asparate, Glutamate and Glutamine residues generously spread over this Arginine cluster, which neutralize the positive charges coming from the four Arginine residues. Arg-Arg stacking can also be seen along with hydrogen bonds involving the backbone oxygen of Arginine with backbone or side chain nitrogens of other Arginines. Investigations carried out on many other interface Arginine hubs showed that the Arg-Arg interactions can occur through a variety of interactions including planar stacking of the guanidine groups, hydrogen bonding between the guanidine-guanidine groups or guanidine group with main chain atoms, CHO hydrogen bonding of backbone oxygen with the Cβ, Cγ and Cδ of the Arginine side chain. One of the notable factors is that the Arginine hubs are invariably neutralized by the presence of negatively charged Glutamate and Aspartate side chains in and around the hub (need not necessarily form direct salt bridges), which have an overall neutralizing effect on the local environment. Thus, the versatile Arginine side-chain has been found to make extensive interactions stabilizing the oligomeric protein interfaces.
(b) Tyrosine hubs
One of the significant contributions to the interface hubs comes from the Tyrosine hubs, which makes extensive interactions with the charged and polar residues like Arginine, Aspartate, Asparagine, Glutamate and Glutamine apart from the expected interactions with the other aromatic residues and itself as can be seen from Table 1. The interactions of Tyrosine with charged and polar residues are generally due to hydrogen bonding or cation-π interactions. Figure 4b shows an example of an interface Tyrosine hub (Tyr 275) making different kinds of interactions including a short hydrogen bond involving the hydroxyl group (with Arg 282, donor-acceptor distance = 2.52 Å) at Imin = 4%. (Tyrosine is also known to contribute to the stability of protein tertiary structure by means of short hydrogen bonds ). This Tyrosine residue also interacts with a Serine (279), Valine (141), Glutamine (278) and Asparagine (114) with Asparagine and Valine being from the other chain. Thus, we find that the Tyrosine residue is also versatile in its interactions due to its planar de-localized side chain and the hydroxyl group.
Statistics of hub versus non-hub interactions at the interface
The pair-wise residue interactions across the interface can be categorized into three with respect to the hub status of the interacting residues: (a) Hub-Hub (b) Hub-Nonhub and (c) Nonhub-Nonhub interactions. The percentage of the charged and hydrophobic interactions in these categories at Imin = 0% and Imin = 4% are given in Table 2. It can be seen from the table that the charged interactions dominate the hub-hub, hub-nonhub and nonhub-nonhub interactions at Imin = 4% with a very high percentage in the hub-hub interactions. However at Imin = 0%, the charged interactions still dominate the nonhub-nonhub interactions, where as the hydrophobic interactions dominate the hub-hub and hub-nonhub interactions at the same Imin with a very high percentage in the hub-hub interactions. Therefore, when the Imin is varied, the profile changes dramatically for the interactions involving the hubs (hub-hub and hub-nonhub), whereas there is no change in the overall profile in the nonhub-nonhub interactions. It is evident from Table 2 that the charged and hydrophobic interactions undergo a clear role reversal as far as hub-hub interactions at Imin = 0% and 4% are concerned. This is also consistent with the residue preferences in the interface hubs shown in Figure 2, where the preferences change from Aromatic/hydrophobic to charged/Aromatic when Imin is increased from 0% to 4%. Further, as we move from nonhub-nonhub to hub-hub interactions at Imin = 4%, the charged+polar interactions including salt bridges as well as the aromatic-aromatic interactions increase, where as the hydrophobic interactions decrease. However the same at Imin = 0% shows an increase in hydrophobic and aromatic interactions and a decrease in charged+polar interactions and salt bridges. In all cases, the hub-nonhub interactions fall in the intermediate category between the hub-hub and nonhub-nonhub interactions. These statistics clearly show a distinct profile for the interactions involving the hub residues when compared to those of the non-hub residues at the oligomeric protein interfaces.
Correlation with experiments
Correlation of interface clusters and hubs with dissociation constants
We have considered eight protein-protein complexes with known dissociation constants [50–52] and analyzed their interface cluster and hub parameters so as to correlate our results with experimentally available results on interface strength. These complexes have mainly been taken from reference  where a similar analysis of comparison of generic interface parameters with dissociation constants was carried out. Table 3 summarizes the results of the present interface strength analysis. The number of interface hubs (cutoff relaxed to nodes with ≥ 4 edges so as to obtain statistically significant number for analysis), the size of the largest interface cluster, the number of interface clusters (Nic) and the total number of residues in these interface clusters (Nires) at different Imins along with the experimentally determined dissociation constants (Kd) are given in the table for the chosen complexes. The complexes with μM Kd are weaker complexes and the ones with nM Kd are the stronger ones. In general, we find that the number of interface hubs, number of interface clusters, number of interface cluster residues and largest interface cluster size are all higher for the nM Kd complexes than the μM Kd complexes at all Imins. This indicates that the interface clusters and hubs identified and the Imin values used in the present method are genuine and robust and are good indicators of the strengths of oligomeric protein interfaces.
Correlation of interface hubs with ΔΔG
Experimental results are available on the stability of interface mutants for some protein complexes . These have been comprehensively presented in the Alanine Scanning Energetics Data Base, the ASEDB [53, 60]. Here, we have compared our results with those from the ASEDB. We have selected those complexes from ASEDB where the dimeric structures are available, since the availability of the dimeric structure is a prerequisite for our present analysis. There are 15 such complexes in ASEDB as listed in Table 4. We then obtain the interface hubs in these complexes at different Imins and compare with the ΔΔG (differences in the free energies of the mutant and the wild-type) of the specific mutants given in ASEDB. We have relaxed the hub detection criterion to nodes with greater than or equal to 4 edges similar to the previous section so as to obtain statistically significant results for comparison with the experimental results. Interface hubs are identified at Imin values of 0%, 2%, and 4% and have been characterized by the highest Imin value at which they appear as interface hubs (since, if a residue is a hub at a particular Imin value, then by default, it would remain a hub at all values lower than that Imin value). Since there are very few interface hubs at Imin > 4%, we have not considered these hubs separately in this analysis.
Figure 5 summarizes the overall results of this analysis pertaining to the 15 complexes. The mutation results are categorized as those with ΔΔG in the ranges of <1, 1–2, 2–3, 3–4 and ≥ 4 kcal/mol. The frequency distribution of the mutated residues, according to their hub character is presented for different ΔΔG values in Figure 5. We find that a majority of the mutations with ΔΔG < 1 kcal/mol are not hubs at any Imin whereas, most of the mutations with ΔΔG ≥ 4 kcal/mol are interface hubs at Imin = 4% (though there are some hubs at Imin = 2% and Imin = 0%). The fraction of hubs at Imin = 4% with ΔΔG < 1 kcal/mol is insignificant and there is no mutation with ΔΔG ≥ 4 kcal/mol, which is not a hub. Mutations in the 1–2, 2–3 and 3–4 kcal/mol ranges, do show a combination of hub characters, which are however consistent over the range. In general, we find that the interface hubs obtained at higher Imin values (Imin = 4%) have higher ΔΔG values than the hubs obtained at lower Imin values (Imin = 0%) and the residues that do not form hubs at all. Hence, the interface hubs identified using the present method correlate well with the experimentally obtained ΔΔG values of the interface hot spots.
Out of the already mutated residues, ten are found to be hubs even at Imin = 6%, out of which, five have ΔΔG ≥ 4 kcal/mol and the other five have ΔΔG varying between 1 and 4 kcal/mol. None of these have ΔΔG < 1 kcal/mol. One of the residues in the Trypsin-BPTI complex (Lysine 15) is known to have a ΔΔG ≈ 10 kcal/mol  and this residue remains a hub even at Imin = 8%. This happens to be the only residue with such a high ΔΔG value and also the only one to remain a hub even at Imin = 8%.
Surprisingly, a large number of the interface hubs identified by the present method in these complexes, have not been mutated (not shown in figure). These include quite a few strong hubs identified at Imin = 4% (84 in number). These have been listed in Table 4 and are potential hot spots in these protein complexes, which can be mutated to destabilize the protein interface. It would be interesting to verify these predictions experimentally, which would then establish this as a rational method for the design of mutants that disrupt the protein-protein interfaces.
The oligomeric protein structures have been represented as networks, with amino acid residues as nodes and the edges have been constructed on the basis of non-covalent interaction strength (ranging from a cutoff of 0% to 6%) between amino acids. The analysis is focused on characterizing the interface clusters and hubs.
The interfaces have been characterized as strong, if the largest cluster in the protein appears at the interface at high (6%) interaction strength. Interestingly more than 50% of the complexes in the dataset exhibit such strong interfaces. The interface clusters identified and their amino acid composition correlate with those identified from previous studies as well as from δASA calculations.
The composition and the connections of the highly connected interface hubs have been evaluated at varying interaction strengths and compared with those of the non-interface hubs. The interfaces show an increase in Arginine hubs and a decrease in hydrophobic hubs when compared to the non-interface hubs. The hydrophobic residues, though present in the interface clusters, do not contribute to the interface hubs. Further, the interface hubs make the usual interactions such as salt bridges, stacking interactions and hydrogen-bonds as well as unusual interactions such as Arginine-Arginine interactions. The hub and non-hub interactions at the interfaces also show specific profiles with the hub interactions being dominated with hydrophobic interactions at lower interaction cutoffs and charged interactions at higher interaction cutoffs, whereas the non-hub interactions are dominated with charged interactions at all cutoffs. More importantly, the cluster and hub identification procedure picks up all types of interactions in a consolidated way, giving a global view of the interactions at the interface.
The interface clusters and hubs identified correlate well with the experimentally determined dissociation constants for known complexes indicating that we have a robust method of identifying the strength of oligomeric protein interfaces. Finally, the hubs at high interaction strength have been identified as hotspots by comparing the ΔΔG values from alanine scanning mutagenesis experiments. Several strong hubs that have not been mutated have been predicted to be hotspots and await confirmation from future experiments.
Materials and methods
The dataset consists of a non-redundant set of protein multimer (455 in number) structures with resolution better than 2 Å, obtained from the protein data bank . The dataset list is provided in Table A1 in Additional file 1. The sequence identity of the selected proteins is less than 25%. In the cases where the full multimer coordinates were not provided, they were generated from the rotation matrices and translation vectors. The dataset includes dimers and multimers of all types such as homo, hetero, functional as well as crystallographic multimers. 44 of the 455 oligomers (<10%) are crystal dimers as obtained from the BIOLOGICAL_UNIT record of the pdb file and the protein quaternary structure server . These proteins are indicated in Table A1 (see Additional file 1). The size of the monomers varies from 50 to 1000 and that of the multimers varies from 100 to 2500.
Accessible surface area
The loss of accessible surface area upon dimerization/multimerization was calculated from the residue-wise accessible surface area of the multimeric proteins and that of their respective monomers, which were obtained from NACCESS . The multimer values were normalized to those of the dimers. The residues that lose greater than 1% of their accessible surface area upon dimerization were identified as those contributing to the interface from δASA calculations.
Protein structures have been considered as a network of interactions amongst amino acid residues. Each residue in a protein complex is considered as a node in the graph and the connections between these nodes are the edges. A group of interconnected nodes is defined as a cluster and a cluster with at least one residue belonging to a different protein chain in the multimer is denoted as an interface cluster. Contact number is defined as the number of edges made by a node and those nodes with a contact number greater than 4 (unless otherwise specified), have been identified as hubs. A hub with at least one residue belonging to a different protein chain in the multimer is denoted as an interface hub.
Evaluation of non-covalent interaction
The non-covalent interactions between side chain atoms of amino acid residues (with the exception of Glycine, where the Cα atom is taken) are considered. The interactions between the sequence neighbors however, have been ignored. The interaction between two residues i and j has been quantified as defined by Kannan and Vishveshwara :
Iij = (nij/N) × 100
where nij is the number of atom pairs belonging to the side-chains of i and j coming within a distance of 4.5 Å and N is the normalization value for the amino acid type, which has been evaluated previously from a non-redundant set of proteins and also correlates with the size of the residue . The lesser of the two normalization values corresponding to the residues i and j is used for the evaluation of the interaction Iij for cluster identification. The normalization value of the residue i is used to evaluate the interaction Iij, for hub detection. In the identification of the clusters, both the normalization values of residues i and j are required during Iij evaluation due to symmetric considerations during graph construction. We have tried different combinations of the normalization values in this case, like sqrt(Ni × Nj), (Ni + Nj)/2 and min(Ni, Nj). Since they give qualitatively very similar results, we use the lesser of the two values (min(Ni, Nj)) for cluster identification. However, for hub detection, such constraints are not there and hence we have used the normalization value of the residue i (Ni) whose hub character is being evaluated.
Contact criterion on the basis of interaction strength
We choose an interaction cutoff, referred to as Imin and any two non-sequential ij pair, which has an Iij value that is greater than a chosen Imin value, is connected by an edge in the graph. Such a graph is referred to as a protein structure graph for a given interaction strength Imin. The protein structure graphs are generated for all the multimers considered in the dataset using an Imin range varying from 0 to 10%. Physically, a higher Imin indicates strong interactions between the connected residues and a lower Imin includes the weakly interacting residues as well. For instance, at Imin = 0% even a single atom-atom contact between the side-chains of two residues is sufficient to connect them by an edge in the protein structure graph and more contacts are required for connections at higher Imins. The interface clusters and hubs were identified and analyzed in these protein structure graphs at varying Imins. Finally, an Imin of 6% was chosen for interface cluster analyses due to better correlation with results from δASA and an Imin of 0% to 4% was chosen for interface hub analyses so as to obtain statistically significant number for analyses.
Cluster and hub analysis
The protein structure graphs have been represented as an adjacency matrix, which is an N × N matrix, where N is the number of residues in the protein structure. Each ijth element in the matrix is either 0 or 1 depending on whether the two nodes (residues) are connected (interacting) or not, on the basis of the chosen Imin. The diagonal elements are considered as 0 since connections with self are avoided. The amino acid residues forming disjoint clusters (with minimum three residues in each) are identified from the adjacency matrix by using a standard graph algorithm (depth first search (DFS) algorithm ). This gives the clusters of all the interacting residues in the protein structure, from which the interface clusters are selected.
Similarly, the residues with contact number greater than 4 are detected as hubs, from which the interface hubs are identified. The hub definition is relaxed to a contact number equal to or greater than 4, while investigating the interface hubs of single multimeric complexes in detail, as given in Tables 3 and 4, in order to obtain statistically significant number for analysis. The interfacial hub preferences of amino acid residues and the preferences of the residues with which these hubs interact are obtained and compared with similar properties of the non-interface hubs and non-hubs at interfaces, identified from the same data set.
Size of the largest cluster
When analyzing complex networks, one of the most common parameters used is the size of the largest cluster . Here, we have used this parameter to analyze the structure networks of protein oligomers. At various Imins, the clusters in the protein oligomers are obtained using DFS and the size of the largest cluster in terms of the number of residues constituting it is obtained at different Imins. This has been found to be a function of protein size and hence the size of the largest cluster is normalized with respect to the protein size and is plotted as a function of Imin. The largest cluster size decreases as the Imin increases and the largest cluster obtained at a higher Imin may or may not be present at the oligomeric interface. An analysis is made on all the proteins in the data set, to find out if the largest cluster is at the interface or not at Imin = 6%. This provides an idea regarding the strength of the oligomeric interface with respect to its monomeric protein core.
Janin J, Wodak SJ: Structural basis for macromolecular recognition. In Protein modules and protein-protein interactions. Advances in protein chemistry. Harcourt publishers Ltd; 2002.
Russel RB, Alber F, Aloy P, Davis FP, Korkin D, Pichaud M, Topf M, Sali A: A structural perspective on protein-protein interactions. Curr Opin Struct Biol 2004, 14: 313–324. 10.1016/j.sbi.2004.04.006
Valencia A, Pazos F: Computational methods for prediction of protein interactions. Curr Opin Struct Biol 2002, 12: 368–372. 10.1016/S0959-440X(02)00333-0
Jones S, Thornton JM: Analysis and classification of protein-protein interactions from a structural perspective. In Protein-Protein Recognition. Edited by: Kleanthous C. Oxford University Press, Oxford; 2000.
Janin J: Kinetics and thermodynamics of protein-protein interactions. In Protein-Protein Recognition. Edited by: Kleanthous C. Oxford University Press, Oxford; 2000.
Smith GR, Sternberg MJ: Prediction of protein-protein interactions by docking methods. Curr Opin Struct Biol 2002, 12: 28–35. 10.1016/S0959-440X(02)00285-3
Glaser F, Steinberg DM, Vakser IA, Ben-Tal N: Residue frequencies and pairing preferences at protein-protein interfaces. Proteins: Struct Funct Genet 2001, 43: 89–102. Publisher Full Text 10.1002/1097-0134(20010501)43:2<89::AID-PROT1021>3.0.CO;2-H
Lawrence MC, Colman PM: Shape complementarity at protein/protein interfaces. J Mol Biol 1993, 234: 946–950. 10.1006/jmbi.1993.1648
Gabb HA, Jackson RM, Sternberg MJ: Modeling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol 1997, 272: 106–120. 10.1006/jmbi.1997.1203
Jones S, Thornton JM: Analysis of protein-protein interaction sites using surface patches. J Mol Biol 1997, 272: 121–132. 10.1006/jmbi.1997.1234
Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci USA 1996, 93: 13–20. 10.1073/pnas.93.1.13
Miller S, Lesk AM, Janin J, Chothia C: The accessible surface area and stability of oligomeric proteins. Nature 1987, 328: 834–836. 10.1038/328834a0
Lo Conte L, Chothia C, Janin J: The atomic structure of protein-protein recognition sites. J Mol Biol 1999, 285: 2177–2198. 10.1006/jmbi.1998.2439
Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. Proteins: Struct Funct Genet 2002, 47(3):334–43. 10.1002/prot.10085
Bahadur RP, Chakrabarti P, Rodier F, Janin J: Dissecting subunit interfaces in homodimeric proteins. Proteins: Struct Funct Genet 2003, 53(3):708–19. 10.1002/prot.10461
Fernandez HA, Scheraga : Insufficiently dehydrated hydrogen bonds as determinants of protein interactions. Proc Natl Acad Sci USA 2003, 100(1):113–118. 10.1073/pnas.0136888100
Xu D, Tsai CJ, Nussinov R: Hydrogen bonds and salt bridges across protein-protein interfaces. Protein Engg 1997, 10: 999–1012. 10.1093/protein/10.9.999
Young L, Jernigan RL, Covell DG: A role of surface hydrophobicity in protein-protein recognition. Protein Sci 1994, 3: 717–729.
Tsai CJ, Lin SL, Wolfson HJ, Nussinov R: Study of protein-protein interfaces: a statistical analysis of the hydrophobic effect. Protein Sci 1997, 6: 53–64.
Li Y, Huang Y, Swaminathan CP, Smith-Gill SJ, Mariuzza RA: Magnitude of the hydrophobic effect at central versus peripheral sites in protein-protein interfaces. Structure 2005, 13(2):297–307. 10.1016/j.str.2004.12.012
Shanahan HP, Thornton JM: Amino acid architecture and the distribution of polar atoms on the surfaces of proteins. Biopolymers 2005, 78(6):318–28. 10.1002/bip.20295
Valdar WSJ, Thornton JM: Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins: Struct Funct Genet 2001, 42: 108–124. Publisher Full Text 10.1002/1097-0134(20010101)42:1<108::AID-PROT110>3.0.CO;2-O
Ma B, Elkayam T, Wolfson H, Nussinov R: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci USA 2003, 100: 5772–5777. 10.1073/pnas.1030237100
Landgraf R, Xenarios I, Eisenberg D: Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol 2001, 307: 1487–1502. 10.1006/jmbi.2001.4540
Lichtarge O, Bourne HR, Cohen FE: An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996, 257: 342–358. 10.1006/jmbi.1996.0167
Keskin O, Ma B, Nussinov R: Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol 2005, 345(5):1281–94. 10.1016/j.jmb.2004.10.077
Bogan AA, Thorn KS: Anatomy of hot spots in protein interfaces. J Mol Biol 1998, 280(1):1–9. 10.1006/jmbi.1998.1843
Kortemme T, Baker D: A simple physical model for binding energy hot spots in protein-protein complexes. Proc Natl Acad Sci USA 2002, 99: 14116–14121. 10.1073/pnas.202485799
Ying G, Wang R, Lai L: Structure based method for analyzing protein-protein interfaces. J Mol Model 2004, 10(1):44–54. 10.1007/s00894-003-0168-3
Haliloglu T, Keskin O, Nussinov R: How similar are protein folding and protein binding nuclei? Examination of vibrational motions of energy hot spots and conserved residues. Biophys J 2005, 88(3):1552–9. 10.1529/biophysj.104.051342
Ofran Y, Rost B: Analyzing six types of protein-protein interfaces. J Mol Biol 2003, 325: 377–387. 10.1016/S0022-2836(02)01223-8
De S, Krishnadev O, Srinivasan N, Rekha N: Interaction preferences across protein-protein interfaces of obligatory and non-obligatory components are different. BMC Struct Biol 2005, in press.
Bahadur RP, Chakrabarti P, Rodier F, Janin J: A dissection of specific and non-specific protein-protein interfaces. J Mol Biol 2004, 336: 943–955. 10.1016/j.jmb.2003.12.073
Levy Y, Wolynes PG, Onuchic JN: Protein topology determines binding mechanism. Proc Natl Acad Sci USA 2004, 101(2):511–516. 10.1073/pnas.2534828100
Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, Vajda S, Vakser I, Wodak SJ: CAPRI: a Critical Assessment of PRedicted Interactions. Proteins: Struct Funct Genet 2003, 52: 2–9. 10.1002/prot.10381
Brinda KV, Kannan N, Vishveshwara S: Analysis of homodimeric protein interfaces by graph-spectral methods. Protein Engg 2002, 4: 265–77. 10.1093/protein/15.4.265
Del Sol A, O'meara P: Small-world network approach to identify key residues in protein-protein interaction. Proteins: Struct Funct Bioinf 2005, 58(3):672–82. 10.1002/prot.20348
Del Sol A, Fujihashi H, O'meara P: Topology of small-world networks of protein-protein complex structures. Bioinformatics 2005, 21(8):1311–5. 10.1093/bioinformatics/bti167
Brinda KV, Mitra N, Surolia A, Vishveshwara S: Determinants of quaternary association in legume lectins. Protein Sci 2004, 13: 1735–1749. 10.1110/ps.04651004
Kannan N, Preethi C, Ghosh P, Vishveshwara S, Chatterji D: Stabilizing interactions in the dimer interface of α-subunit in Escherichia coli RNA polymerase: A graph spectral and point mutation study. Protein Sci 2001, 10: 46–54. 10.1110/ps.26201
Ramesh K, Sistla Brinda KV, Saraswathi Vishveshwara: Identification of Domains and Domain Interface Residues in Multidomain Proteins from Graph Spectral Method. Proteins: Struct Funct Bioinfo 2005, 59(3):616–626. 10.1002/prot.20444
Barabasi AL: Linked: The new science of networks. Persues Publishing, Cambridge, Massachusetts; 2002.
Vendruscolo M, Paci E, Dobson CM, Karplus M: Three key residues form a critical contact network in a protein folding transition state. Nature 2001, 409: 641–645. 10.1038/35054591
Vendruscolo M, Dokholyan NV, Paci E, Karplus M: Small-world view of the amino acids that play a key role in protein folding. Phys Rev E 2002, 65: 061910. 10.1103/PhysRevE.65.061910
Dokholyan NV, Li L, Ding F, Shakhnovich EI: Topological determinants of protein folding. Proc Natl Acad Sci USA 2002, 99(13):8637–8641. 10.1073/pnas.122076099
Atilgan AR, Akan P, Baysal C: Small-world communication of residues and significance for protein dynamics. Biophys J 2004, 86: 85–91.
Greene LH, Higman VA: Uncovering network systems within protein structures. J Mol Biol 2003, 334: 781–791. 10.1016/j.jmb.2003.08.061
Brinda KV, Vishveshwara S: A network representation of protein structures: implications to protein stability. Biophys J 2005, in press.
Sathyapriya R, Vishveshwara S: Short hydrogen bonds in proteins. FEBS J 2005, 272: 1819–1832. 10.1111/j.1742-4658.2005.04604.x
Nooren IMA, Thornton JM: Structural characterization and functional significance of transient protein-protein interactions. J Mol Biol 2003, 325: 991–1018. 10.1016/S0022-2836(02)01281-0
Schnittman SM, Lane HC, Roth J, Burrows A, Folks TM, Kehrl JH, Koenig S, Berman P, Fauci AS: Characterization of GP120 binding to CD4 and an assay that measures ability of sera to inhibit this binding. J Immunol 1988, 141(12):4181–6.
Lascols O, Cherqui G, Capeau J, Caron M, Picard J: Alteration by concanavalin A of the slow dissociable component in the human growth hormone-receptor interaction. Horm Metab Res 1986, 18(8):512–6.
Thorn KS, Bogan AA: ASEdb: A database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 2001, 17(3):284–285. 10.1093/bioinformatics/17.3.284
Castro MJ, Anderson S: Alanine point mutations in the reactive regions of bovine pancreatic trypsin inhibitor: effects on the kinetics and thermodynamics of binding to beta-trypsin and alpha-chymotrypsin. Biochemistry 1996, 35(35):11435–46. 10.1021/bi960515w
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235–242. 10.1093/nar/28.1.235
Henrick K, Thornton JM, PQS: A protein quaternary structure file server. Trends in Biochem Sci 1998, 23(9):358–61. 10.1016/S0968-0004(98)01253-5
Hubbard SJ: NACCESS: program for calculating accessibilities. Department of Biochemistry and Molecular Biology. University college of London; 1992.
Kannan N, Vishveshwara S: Identification of side-chain clusters in protein structures by a graph spectral method. J Mol Biol 1999, 292(2):441–64. 10.1006/jmbi.1999.3058
West DB: Introduction to Graph theory. Prentice-Hall of India Private Limited; 2000.
Alanine Scanning Energetics DataBase (AsEDB)[http://thornlab.cgr.harvard.edu/hotspot/index.php]
We acknowledge the Computational Genomics Initiative at the Indian Institute of Science, funded by the Department of Biotechnology (DBT), India, for support. KVB would like to thank the Council of Scientific and Industrial Research (CSIR), India for the award of a fellowship. We also acknowledge Rakesh Kumar Pandey for providing the DFS program and generating the oligomer dataset.
KVB carried out the construction and analysis of oligomeric protein structure networks. SV devised the concepts and the formalism used in this study. Both authors contributed to the interpretation of results and the preparation of the manuscript.
Electronic supplementary material
Additional File 1: Two tables (Table A1 and Table A2) are provided as additional material (see Additional file 1), giving the list of pdbs in the dataset and the 20 × 20 matrix for the residue preferences of the non-interface hubs to interact with the 20 different amino acid types at Imin = 4%, respectively. Two figures (Figure A1 and Figure A2) are also provided as additional material in Additional file 1, giving the correlation of δASA with interface clusters and the amino acid composition in the interface clusters, respectively. All four additional materials (two tables and two figures) are provided as a single word document (Additional file 1). (PDF 58 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.