Water is an integral part of protein complexes. It shapes protein binding sites by filling cavities and it bridges local contacts by hydrogen bonds. However, water molecules are usually not included in protein interface models in the past, and few distribution profiles of water molecules in protein binding interfaces are known.
In this work, we use a tripartite protein-water-protein interface model and a nested-ring atom re-organization method to detect hydration trends and patterns from an interface data set which involves immobilized interfacial water molecules. This data set consists of 206 obligate interfaces, 160 non-obligate interfaces, and 522 crystal packing contacts. The two types of biological interfaces are found to be drier than the crystal packing interfaces in our data, agreeable to a hydration pattern reported earlier although the previous definition of immobilized water is pure distance-based. The biological interfaces in our data set are also found to be subject to stronger water exclusion in their formation. To study the overall hydration trend in protein binding interfaces, atoms at the same burial level in each tripartite protein-water-protein interface are organized into a ring. The rings of an interface are then ordered with the core atoms placed at the middle of the structure to form a nested-ring topology. We find that water molecules on the rings of an interface are generally configured in a dry-core-wet-rim pattern with a progressive level-wise solvation towards to the rim of the interface. This solvation trend becomes even sharper when counterexamples are separated.
Immobilized water molecules are regularly organized in protein binding interfaces and they should be carefully considered in the studies of protein hydration mechanisms.
Water is an important component of biomolecules that is crucial to their formation and association , particularly in proteins folding  and binding . Many studies have been carried out, by energetic model/experiment or statistical analysis, to uncover the precise roles of water in protein-protein binding. It is widely understood that water molecules can shape the binding sites by filling cavities and can bridge local contacts by hydrogen bonds [4, 5]. Although its importance has long been recognized, water is usually excluded in protein binding interface modeling. An interface is often defined according to the change of the solvent accessibility of the residues before and after the binding [6, 7], or by the distance between the two chains in the complex [8, 9]. As these definitions do not involve water molecules, those residues that are in contact with the other chain indirectly through water molecules--e.g., wet spot residues [10, 11]--are missing in these interface models. The size of an interface is therefore underestimated. Actually, wet spots can account as much as 14.5% of the interface residues . As the missing residues are more likely to be in the interface than at the surface in terms of their mobility and energy contribution [10, 11], it is unreasonable to overlook interfacial water molecules even when the study is only focused on interfacial residues. Water molecules have also been ignored in most protein-protein interaction studies, especially those in computational approaches. For example, water is rarely considered in protein docking , interface analysis [6, 13, 14], interface classification [15–18], etc.
Few results are reported about the spatial arrangement of water molecules and their solvation trend in protein binding interfaces. An earlier work  pioneered the study of hydration patterns in protein interfaces, however, their patterns are isolated only within individual interfaces, which were not derived as a general trend. Their definition of interfacial water is prone of including many exposed water molecules. As some of their interfacial water molecules are actually not in interfaces at all, bias may be introduced to the analysis when the study steps to the fine solvation trend in protein interfaces.
Recently, we introduced a tripartite model of protein binding interfaces . Under this model, an interface is defined as an object of three compartments: the two binding sites of the two interacting chains and the interfacial water molecules. The interfacial water molecules are determined by a recursive computational method. As this newly proposed protein binding interface model is different from traditional definitions of protein binding interface, we named it a protein-water-protein interface, or a tripartite interface. A protein-water-protein interface can be represented by a tripartite graph, in which the nodes represent the residues or atoms, depending on the level of the study, and the edges are the contacts among them.
In this work, we conduct a topological analysis of water molecules in protein-water-protein interfaces. The distribution profiles of water molecules in three types of interfaces: obligate interfaces, non-obligate interfaces, and crystal packing contacts are investigated. In the analysis, a feature of atoms and residues, called burial level, is sophisticatedly explored. Burial level is defined with respect to an atomic contact network of a protein complex, describing the extent an atom or residue is buried in the protein complex. The atoms of an interface are then organized as a nested-ring topology where atoms at the same burial level in the interface are grouped into level-wise rings. We examine both overall and level-wise views of water arrangements in the interface and on the rings. We find that the interior of protein binding interfaces is not homogeneously the same everywhere in terms of a variety of properties such as wetness, water detectablity, polarity and mobility. Moreover, water molecules in protein binding interfaces are distributed in a dry-core-wet-rim style, suggesting that the solvation of protein interfaces occurs progressively ring-by-ring from core to rim in protein binding interfaces. It is also found that the function of an interaction seems to be another constraint of the associated water arrangement. All of these results indicate that water is an active player in protein binding interfaces and should be considered in the studies of protein binding interfaces.
Detectability of water molecules at different burial levels of protein interfaces
The amount of water molecules (in a protein complex) that can be detected by X-ray crystallography is closely correlated with the resolution at which the crystal structure is solved . A previous work also found that the quality of interfacial water information is subject to the resolution of the crystal structure . We investigated correlations between the wetness and resolution of crystal structures of protein interfaces. The average correlation coefficients between the wetness of an interface and the resolution (the resolution value) of the crystal structures of the obligate, non-obligate and crystal packing interfaces in our data are negative, being -0.4015, -0.5460 and -0.5632 respectively. This indicates that water-related properties of protein interface depend on the detectability of the water molecules. This observation is consistent with previous results reported by Rodier et al..
We are especially interested in the quality of water information at the core of protein binding interfaces by comparing the quality of water information at different burial levels. We find that the amount of deeply buried water molecules is less correlated with the crystal structure resolutions. That is, as the burial level goes deeper, the correlation becomes weaker; see Figure 1. Thus water molecules in a protein or protein complex cannot be classified simply as exposed or buried. Rather, their properties change gradually when they step into the center of the interface away from the bulk solvent. On the whole, the amount of water molecules is under reported as roughly reported by [21, 22]. More importantly, the observation here implies that water molecules at the core of an interface are closer to the completeness (the real amount of water molecules) than those at the other parts. This has promoted our confidence on the quality of our results on the buried water molecules in the core part.
Wetness of different types of interfaces
Table 1 shows wetness-related statistics of the obligate interfaces, non-obligate interfaces, and crystal packing contacts in our data set. The significance of the differences in wetness, average polarity and relative water burial level are tested by the one-sided Mann-Whitney U test  between the obligate and non-obligate interfaces and between the biological and crystal packing interfaces. The p-values are shown in Table 2. In general, the difference between the biological interfaces and crystal packing interfaces is more pronounced than that between the obligate and non-obligate interfaces, both of which are biological interfaces.
The obligate interfaces are of the largest size, and are capable of holding more water molecules. More specifically, there are about 29 water molecules per interface in the obligate interactions, far more than that in the non-obligate interactions (13 per interface). The crystal packing interfaces are significantly smaller than the non-obligate interfaces; however, they possess almost the same number of water molecules (10 per interface) as the non-obligate interfaces. It has been reported that the number of water molecules held by an interface is correlated with the size of the interface . This correlation is also observed in our data. The correlation coefficients between the number of water molecules and the number of atoms in an interface are 0.8232, 0.6177 and 0.6540 for the obligate, non-obligate and crystal packing interfaces, respectively. Moreover, the wetness of an interface is also bounded by its size. In Figure 2, the relationship between the wetness and interface size is shown. It can be noted that, when interface size is small (less than 500 atoms), wetness is strictly bounded by interface size for both the obligate and non-obligate interfaces. On the other hand, in the crystal packing interfaces, although it seems that the average wetness is somehow related to interface size, but the wetness values are extremely high. The average wetness of the crystal packing interfaces with less than 200 atoms is as high as 0.050, a very high value for such small interfaces. Note that, this correlation between interface size and wetness is due to the upper bound of the wetness of an interface of a certain size. The interface can be very dry for interface of any size. A possible reason why the wetness is bounded by interface size is that, to immobilize a water molecule into an interface, multiple interacting atoms in the interface are required. Then, interfaces of a larger size can offer more water-holding atom clusters, resulting in wetter interfaces.
Figure 3 shows the wetness distributions of the three types of interfaces. Combining with column 2 of Table 2, it can be observed that the obligate interfaces tend to be wetter than the non-obligate interfaces; and these biological interfaces are drier than the crystal packing interfaces. Generally, obligate interactions possess large binding affinity. The binding is so strong that the interaction partners have to be denatured to be separated from each other. The high wetness of the obligate interfaces (compared to the non-obligate interfaces in our data) and the even higher wetness of the crystal packing interfaces (compared to the obligate interfaces) suggest that there is no simple correlation between amount of water and the binding strength.
Level-wise distribution of water in protein interfaces
Given a tripartite interface, we partition its atoms according to their burial levels. Atoms at the same burial level are organized as a ring. The ring of "core atoms" consists of those atoms with the highest burial level in the interface. The rings are then ordered with the ring of core atoms in the middle. Thus, a tripartite interface can be viewed as a nested-ring structure. The ring of core atoms is denoted by O0, the ring closest to the core is denoted by O1, similarly for O2, etc. We examine how water molecules are distributed in these rings of an interface by looking at level-wise wetness. As the highest burial level varies a lot from one interface to another, we choose the core of interfaces as the starting point to see the change trend of level-wise wetness towards to the rim of the interfaces.
From Table 3, we can see that a progressive dry-core-wet-rim water distribution pattern exists in protein interfaces, with the core O0 more desolvated than the other rings that are closer to the rim. Similarly to the proportion of water molecules (i.e., wetness), the proportion of polar atoms (i.e., polarity) also increases when the burial level goes from core to rim, even in the crystal packing interfaces. Thus, although the overall wetness and polarity of the three types of interfaces are different, the change trend of their level-wise wetness and polarity is the same from core to rim, following a cone pattern.
For more visual clarity of the change trend of level-wise wetness, three curves corresponding to the three types of interfaces are plotted as shown in Figure 4. A clear smooth increase in wetness from core to rim is observed in the obligate, non-obligate, as well as crystal packing interfaces.
The crystal packing interfaces have the largest inter-level wetness differences. However, this does not indicate that crystal packing interfaces are most capable of excluding interfacial water from core to rim. Rather, this is due to the small size of crystal packing interfaces and the extremely high wetness of their outer rims. To quantitatively understand the extent to which water molecules are "excluded" from the core of an interface, we introduce the relative water burial level (rWBL, see Methods) as the average burial level of water molecules in the interface divided by the average burial level of all the interfacial atoms. If the rWBL of an interface is high, its water molecules are deeply buried in the interface; if it is low, the water molecules are distributed in the rim of the interface. The distribution of rWBL is shown in Figure 5. The obligate interfaces have lower average rWBLs than the non-obligate interfaces (also see row 8 of Table 1), although their difference is not very significant, with a p-value of 0.0541, as shown in Table 2. However, the crystal packing interfaces have significantly higher rWBL (p-value: 2.6622 × 10-5) than the obligate or non-obligate interfaces, indicating a heavier water exclusion in the formation of biological interfaces.
One may expect that interfaces with a higher rWBL are more twisted, as twisted interfaces are capable of accommodating more water molecules in their core, with higher wetness and higher rWBL. We investigated the relationship between interface wetness and planarity, but no significant correlation was found. In fact, the correlation coefficients between wetness and planarity are 0.10 and 0.12 for obligate and non-obligate interfaces, respectively. For rWBL, although its correlation coefficient with planarity is even lower than that of wetness, some interesting observation is found. In Figure 6, a scatter plot of rWBL versus planarity in biological interfaces is shown. It can be observed that, when water molecules are strongly excluded (low rWBL, < 0.9), the corresponding interfaces are usually very flat. This suggests that being planar is usually a necessary condition for an interface to exclude its water. However it is not sufficient, as many flat interfaces with a high rWBL were also observed.
Recall that the (negative) correlation between wetness and crystal structure resolution is stronger when the burial level becomes shallower. Thus the wetness of the outer rims of interfaces is more likely to be underestimated than that of the cores. This means that the increase in wetness from core to rim is affirmatively reliable in spite of the different water information quality at different burial levels.
To better understand the influence of water information quality unevenness, we divided the interfaces into three groups according to their level-wise wetness trend: strictly dry-core-wet-rim interfaces, strictly wet-core-dry-rim interfaces, and other interfaces. Strictly dry-core-wet-rim interfaces are referred to as those interfaces whose level-wise wetness increases monotonically from core to rim, while strictly wet-core-dry-rim interfaces are those interfaces whose level-wise wetness decreases monotonically from core to rim. We found, as expected, strictly dry-core-wet-rim interfaces are much more abundant than strictly wet-core-dry-rim interfaces. Over the obligate, non-obligate, and crystal packing interfaces in the data set, there are 87, 83, and 342 strictly dry-core-wet-rim interfaces but only 17, 26, and 124 strictly wet-core-dry-rim interfaces respectively. The strictly wet-core-dry-rim interfaces suffer more from the bad resolution and hence from the bad water information quality. The average resolution for strictly dry-core-wet-rim obligate, non-obligate and crystal packing interfaces are 1.98 Å, 2.18 Å and 2.11 Å, respectively, while the average resolution for strictly wet-core-dry-rim obligate, non-obligate and crystal packing interfaces are 2.35 Å, 2.29 Å and 2.16 Å, respectively (p-values of one-sided difference test: 0.0015, 0.1037 and 0.0403, respectively). This indicates that some water molecules in the rim of the interfaces are not reported and hence the actual wetness of these rims are underestimated, resulting in an overestimate of the number of strictly wet-core-dry-rim interfaces. Nevertheless, there are some high resolution strictly wet-core-dry-rim interfaces. In our data set, there are 4 obligate and 5 non-obligate interfaces that are strictly wet-core-dry-rim interfaces with a resolution better than 2.0 Å. As they are not abundant, we refer them as counterexamples to the dry-core-wet-rim hydration pattern.
A counterexample, the yeast triosephosphate isomerase (TIM) dimer interface, is shown in Figure 7(a). In this protein binding interface, the rim is not rich of water molecules, while the core is occupied by a cluster of water molecules. The rWBL of this interface is extremely high (1.304), and the core is the wettest place in this interface. The binding between the two subunits of TIM into a dimer is important as the enzyme is only active in its dimer form . In fact, human TIM deficiency is a rare disease that causes chronic hemolytic anemia and neuromuscular disorders in children . Although it is not a strictly wet-core-dry-rim interface, the human TIM dimer interface is similar to yeast TIM dimer interface, with a very high rWBL (1.282). The most frequent mutation that leads to TIM deficiency, E104D, is in the interface. It is believed that the mutation disrupts the the network formed by interfacial water molecules, then weakens the binding between the two subunits, and thus reduces the activity of the enzyme .
Three examples of dry-core-wet-rim interfacial water topological arrangements are presented in Figures 7(b), (c) and 7(d). In the DTDP-glucose 4,6-dehydratase dimer interface shown in Figure 7(b), a large desolvated interface core is observed with rings of gradually increasing water molecules distributed towards to the rim of the interface. In another obligate interface in the aspartate aminotransferase shown in Figure 7(c), more water molecules are observed than in the first example, and several of them penetrate into the core of the interface; yet the amount is not as abundant as that observed in the rim. A twisted non-obligate interface between eEF1A and eEF1Balpha is shown in Figure 7(d). It also shows a dry-core-wet-rim water topology, with a higher wetness than the first two examples. In these three cases, their level-wise wetness goes up progressively from core to rim, being strictly dry-core-wet-rim interfaces.
Function and interfacial water arrangement
Interfacial water enrichment and organization are different in different functional groups of interfaces. We have manually examined the non-obligate interactions in our data set. Here we describe three types of them, enzyme-inhibitor interactions antibody-antigen interactions, and interactions containing shared hub proteins.
There are 42 enzyme-inhibitor interfaces in our data set, accounting for about 25% of the total non-obligate interfaces. All of them are hydrolase-inhibitor interfaces, except one cyclin A-cyclin-dependent kinase 2 interaction [PDB:1JSU] and one Cell division protein kinase 2 [PDB:2CO5]. These enzyme-inhibitor interfaces are of medium wetness (mean: 0.042) and relative low rWBL (mean: 1.042) on average. However, the water topological arrangements within this type of interfaces are extremely heterogeneous. The interfaces between proteases (Enzyme Commission Number: 3.4.-.-) and their inhibitors are significantly drier and with lower rWBL than the other enzyme-inhibitor interfaces; see Table 4. The non-protease-inhibitor interfaces are very wet with the water deeply buried. Their wetness and rWBL are nearly the same as those of crystal packing interfaces.
Inhibitors usually bind to the active site of an enzyme to block the access to its substrate. Proteases are enzymes that are capable of hydrolyzing peptide bonds. As most inhibitors of proteases are proteins, one mechanism for an inhibitor to avoid being hydrolyzed by the binding protease is to achieve a tight binding between the inhibitor and the enzyme so that water, which is needed in the hydrolysis reaction, is blocked from reaching the active site [27, 28]. Thus it is functionally important that the water molecules are excluded from the active site in protease-inhibitor interfaces, resulting in their low wetness. Moreover, the active site is usually located at the center of an interface; thus preventing water from accessing it generally reduces the burial level of water molecules and hence reduces the rWBL, making protease-inhibitor interfaces perfect dry-core-wet-rim interfaces.
Figures 8(a) and 8(b) show two examples, a wet one and a dry one, of protease-inhibitor interfaces. Both structures have a resolution better than 2.0 Å. It can be noted that, no matter how wet an interface is, water molecules cannot access to its active site residues, which reside at the core of the interface . In both cases, a pocket is observed in the enzyme part, where the inhibitors can anchor deeply into the enzymes to obtain a tight binding. In the wetter interface in Figure 8(a), the pocket is the place where the active site residues are located, thus the pocket is dry with no interfacial water molecules observed inside. In the drier interface in Figure 8(b), the active site residues are not in the pocket; water molecules are observed in the pocket in this case. We should emphasize that, anchoring into this binding pocket shown in Figure 8(b) is very important for the inhibitor to bind tightly with the enzyme (beta-trypsin). The mutation of the anchor residue in the inhibitor (LYS15) into alanine changes the binding affinity dramatically by a ΔΔG of about 10 kcal/mol , a much bigger ΔΔG value than those of hot spot residues without surrounding water molecules. The contrast between the two figures clearly indicates that water molecules may be used to strongly reinforce the binding even in a very important site as long as the function of the binding is preserved.
There are 10 antibody-antigen interfaces in the data set. They are very wet with an average wetness 0.047. If only crystal structures of resolution better than 2.0 Å are considered, the average wetness becomes 0.064. Their average rWBL is only 1.037, lower than the average rWBL of all the non-obligate interfaces in the data set. The major difference between antibody-antigen interactions and other non-obligate interactions is that antibody and antigen are poorly related in evolution yet their binding is still of very high affinity and specificity.
This extraordinary requirement for both high binding affinity and specificity has resulted in a specific water distribution topology in antibody-antigen interfaces. Polar and charged residues are often used in antibody-antigen interfaces to enhance the binding specificity. These residues are capable of forming hydrogen bonds and salt bridges; and the electrostatic distribution on antigen and antibody binding sites can selectively determine to which they will bind . This leads to a high hydrophilicity at the interface. In order to achieve high binding affinity at the same time, the hydrogen bonds and salt bridges are usually networked through interfacial water molecules [31, 32], which in turn elevates the wetness of the interface.
Figure 9 shows an antibody-antigen interface between an anti-hen egg white lysozyme antibody D1.3 and a hen egg white lysozyme. This interface is the wettest antibody-antigen interface in the data set; yet we still observed a dry-core-wet-rim water distribution topology. There is a tier of water molecules near the edge of the interface and a cluster of water penetrating into a deeper level to shape the binding site by filling a pocket. There are two residues in this interface, TYR101 and ASP100, that contribute significantly more than other residues to the binding free energy . As shown in this figure, water molecules are crowded around these two residues, but these two residues' ability to contact directly with the antigen is not disturbed.
Interfaces involving hub proteins
Some proteins can interact with many different partners, and maintaining many different functions. These proteins are typically called "hub" proteins. We investigated the water distribution topology of hub proteins by using the "shared proteins" proposed by Keskin and Nussinov . Similar binding sites of these shared proteins are observed to bind with different partners. In protein-protein interaction networks, these proteins are of large connectivity. In terms of structure, these interfaces are of smaller size with larger gap between the two partners, and their shape is flatter.
In our non-obligate interface data set, 10 are also reported in  as this kind of interface (Type 3 as in ). The average wetness of them is 0.036, insignificantly lower than the overall wetness of non-obligate interfaces, which is, however, unexpected as interfaces containing shared proteins are believed to have more water molecules to bridge inter-protein contacts . Moreover, their rWBL is very low (mean: 0.992), significantly lower than other non-obligate interfaces (p-value: 0.021, one-sided Mann-Whitney U test). It seems that water exclusion is very important for them.
Figure 10 shows an example--the binding site of a transducin with cGMP phosophodiesterase (PDE). Transducin is an important G protein in vertebrate phototransduction cascade. The connectivity of this protein is 30 according to the MINT database [34, 35]. It is activated by the G-protein-coupled receptor rhodopsin after the the receptor is activated. After that it binds to and activates PDE to enable downstream reactions. There are only 7 water molecules in this interface and the dry-core-wet-rim pattern is again observed. Its rWBL is extremely low (0.87). One possible reason of why the core of this interface is so dry is the transient nature of the binding. The association and disassociation between transducin and PDE are triggered by upstream and downstream signals, and the binding site is veiled when it is not active . The hydrophobic and dry core may reduce the energy barrier of these processes as there is less solvation and desolvation of the binding site. However, detailed and systematic experimental or computational analysis is required to uncover the dynamics of these processes.
It is widely known that exposed protein surfaces directly accessible to bulk solvent are dramatically different from the interiors of protein interfaces . We also find that the interior of protein interface is not the same everywhere in terms of wetness, water-detectability or polarity. Among the reasons for this unevenness, the distance to the bulk solvent--i.e. burial level--is an important one. As discussed earlier, if the interface is organized into rings of residues from its core to the rim, the properties of the rings are different. This reminds us of the famous "O-ring" theory [38, 39]. The "O-ring" theory suggests that there is a cluster of residues residing at the core of an interface, contributing most to the binding free energy, while other interfacial residues surround them in a ring-like manner to protect them from the bulk solvent. Our results suggest that there are indeed nested rings of residues in a protein binding interface, progressively growing from the center to the rim of the interface, showing a level-wise pattern. Moreover, the core of an interface is sheltered from water molecules by several rings of atoms, the desolvation power of which increases when one gets deeper into the interface.
Actually, the nested rings of atoms in protein binding interfaces are also different in their mobility, which can be observed through a level-wise investigation of the B factors. In Figure 11, the average B factors at different burial levels are shown. It can be observed that deeply buried part possesses higher B factors--not only interfacial residues follow this trend, but interfacial water molecules also show such a layered pattern. This indicates that interfacial water molecules in the internal rings are indeed "trapped" by the outer rings of atoms.
The role of water molecules may also be different in different levels of the interface. One of the most important roles of water in protein binding interfaces is bridging the inter-protein contacts by hydrogen bonding with both sides. Specifically, interfacial water molecules prefer to make donor-water-donor or acceptor-water-acceptor hydrogen bond bridges, where the two groups are not complementary to each other originally . We investigated the hydrogen bonds formed by interfacial water molecules at different burial levels (using HBPLUS ). The percentage of non-complementary interface hydrogen bond bridges at different burial level is shown in Figure S1 (see Additional file 1). Although fluctuation is observed for transient interfaces, for obligate and crystal packing interfaces, it is observed that deeply buried water molecules are more likely to mediate non-complementary hydrogen bonds.
These observations suggest that protein interfaces do not simply follow a hot spot/O-ring dichotomy. Rather, a protein binding interface is subject to a progressive change in the physicochemical properties from core to rim.
According to the "O-ring" theory, the energy contribution of hot spots in the core is much stronger than the outer ring in the rim. We believe that the energy importance is growing progressively from rim to core, ring by ring. A direct correlation between the energy and burial level can be seen from the Generalized Born model  of solvation free energy, in which the atoms are characterized with an effective Born radius. Similar to burial level, the effective Born radius of an atom generally reflects how deep the atom is buried in the solute. However, it is set as a constant in practice. The electrostatic energies also seem to be related to burial level, as the dielectric constant of water is different from that of protein interior. The dielectric constant of water is around 80 , while the dielectric constant of protein interior is roughly in the range between 1 and 20 . In energy functions, this difference is considered in a very rough manner, previously. For example, in the FoldX energy function , the dielectric constant is linearly scaled from 8 to 80, according to the volumes of the nearby atoms within a distance of 6 Å. There is no further differentiation when atoms are more than 6 Å underneath the surface.
In our previous work , we proposed a hot spot prediction model based on the burial level of residues. We found that the average burial level of the atoms in a residue has a positive correlation with the ΔΔG caused by alanine mutation with a coefficient of 0.4588. Thus, we believe that incorporating burial level to energy functions explicitly or implicitly will increase the accuracy of binding free energy and hot spot prediction.
We also note that the water distribution topology is different between obligate and non-obligate interfaces, and also between biological and crystal packing interfaces. This encourages us to perform interface classification by taking interfacial water into consideration. For other applications, for example, protein docking, adding water into the model has been already proved to be useful . The general dry-core-wet-rim distribution topology may also be considered in this kind of application to understand a modeled binding interface, or a real binding interface.
We have studied level-wise water distribution profiles of protein interfaces using a tripartite graph model of protein binding interfaces, i.e., protein-water-protein interfaces. The water arrangement in biological interfaces can be distinguished from that in crystal packing interfaces in different ways such as higher wetness and lower relative water burial level. Differences between obligate and non-obligate interfaces are also observed, yet they are not as significant as those between biological and crystal packing interfaces. Water molecules are generally organized in a dry-core-wet-rim hydration pattern in an interface, suggesting that the core of an interface is protected incrementally by rings of progressively desolvated atoms. We have also conducted an analysis on the water arrangements in different functional groups of protein interfaces. It turns out that the water distributions are subject to the function of the interfaces.
Our set of obligate and non-obligate interactions are taken from a few previous works. The obligate interactions include those obligate interactions used by Mintseris and Weng  and Zhu et al., as well as those homodimeric proteins used by Ponstingl et al. and Bahadur et al.. Our non-obligate interactions include those protein complexes used by Bahadur et al., transient interactions used by Mintseris and Weng  and non-obligate interactions used by Zhu et al.. Crystal packing interactions are collected from the Protein Data Bank (PDB)  by taking those interfaces between two chains that are from different biological assemblies according to "REMARK 350". For a protein complex, if another version of the PDB entry with a better resolution (a smaller resolution value) is available, only the better one is used in this work. Redundancy is removed by using a sequence similarity threshold of 30%. That is, if the sequence similarities of any two chains, each from one side of the interaction, with a chain pair from another interaction are both larger than 30%, one of the interfaces is removed. To guarantee the quality of water information, interfaces whose PDB structure contains less than 20 reported water molecules or whose oxygen atoms of water are less than 1% of all the heavy atoms are eliminated. If any chain of an interface requires coordinate transformation, the corresponding interface is removed. Interfaces with less than 100 heavy atoms or have no interfacial water molecules are also eliminated. We removed interfaces with no water--there are only a few such cases--is the reason that it is hard to define the water burial level (WBL, defined later) of such interfaces.
This process results in a total of 206 obligate interactions, 160 non-obligate interactions and 522 crystal packing interactions in our data set. Complete lists of these interfaces are available in Tables S1-S3 (see Additional file 1). It should be noted that the "REMARK 350" in a PDB header is not always correct. However, we believe that such cases are not abundant in this relatively large data set [48, 49]. The conclusions we make are hence reliable.
Construction of atomic contact graphs and protein-water-protein interfaces
We distinguish immobilized water molecules and exposed water molecules in a protein complex by an iterative procedure. First, the solvent accessible surface area (SASA) of the atoms is calculated. Water molecules with SASA larger than 10 Å2 are removed. Then SASAs are calculated again based on the updated structure. This procedure is repeated until there is no water molecule with SASA larger than 10 Å2 in the structure. We refer to the removed water molecules as exposed water molecules and those remaining in the structure as immobilized or buried water molecules.
An atomic contact graph is built based on the structure resulting from the removal of exposed water molecules. The nodes of the graph are atoms and the edges are contacts between atoms. Two atoms are defined to be in contact if (i) they share a Voronoi facet and (ii) their distance is less than their radius plus 2.75 Å, which is the diameter of a water molecule. Two residues are defined to be in contact if there is at least one pair of atoms, one from each residue, that are in contact. The nodes in the atomic contact graph are labeled as "exposed" or "buried" based on their SASA with a threshold of 10 Å2. A pseudo node that represents the bulk solvent is added into the graph; this node is directly connected to all the exposed atoms.
The atomic/residue contact graph of a protein complex is denoted by G = < V, C >, where V is the set of atoms/residues and C⊆V × V is the set of contacts. Water molecules in G are denoted by the subset VW . The interfacial water VIW in the interface between VA and VB (VA , VB⊆V) is defined as:
Interfacial contacts are then defined as:
Our tripartite model of protein-water-protein interfaces is defined as the edge-induced subgraph GI of G:
We use VIA and VIB to denote the interfacial atoms/residues from chain A and B respectively. Our model of protein interfaces can capture those water molecules that immediately bridging the two parts, i.e. water molecules that forming protein-water-protein contacts. That's why we name interfaces under our model protein-water-protein interfaces. We do not consider higher order interfacial water bridges, such as protein-water-water-protein contacts. We believe they are less important and less abundant. More details about the Voronoi facets and the initial idea of the tripartite model of protein binding interfaces can be found in our earlier work .
Calculation of wetness
Suppose O is a protein-water-protein interface, we denote its atom-level tripartite graph as
O = < VIA (O) ∪VIB (O) ∪VIW (O), CI (O) >, where VIW is the set of oxygen atoms of interfacial water molecules.
The wetness of O is defined as:
where |X| is the cardinality of set X.
The burial level of an atom a, denoted BL(a), in a given protein complex is defined as the length of its shortest path to the nearest exposed atom in the associated atomic contact graph. It is equal to the length of its shortest path to the pseudo node minus one. The average burial level of all water oxygen atoms in O, denoted by WBL(O), is calculated by:
The size of an interface O is the number of interfacial atoms, including atoms of the amino acids from both sides and the oxygen atoms in the interfacial water molecules, namely |VIA (O) ∪VIB (O) ∪VIW (O)|.
The relative water burial level describes in general how deep the water molecules are buried with respect to the average interface burial level. It is defined as:
The level-wise wetness is the proportion of water oxygen atoms over all atoms at a given burial level i:
We also define the overall polarity as well as the level-wise polarity of an interface as the proportion of polar atoms, counting O, N and S atoms as polar atoms.
The planarity of an interface is defined as root mean square deviation of non-water interfacial atoms from the least-squares plane of them .
The correlation coefficient between two random variables X and Y is calculated as the Pearson correlation coefficient:
Here, is the mean of X and n is the sample size.
Chaplin M: Do we underestimate the importance of water in cell biology? Nat Rev Mol Cell Biol 2006, 7(11):861–866. 10.1038/nrm2021
Cheung MS, Garcia AE, Onuchic JN: Protein folding mediated by solvation: Water expulsion and formation of the hydrophobic core occur after the structural collapse. Proc Natl Acad Sci USA 2002, 99(2):685–690. 10.1073/pnas.022387699
Keskin O, Ma B, Nussinov R: Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol 2005, 345(5):1281–1294. 10.1016/j.jmb.2004.10.077
Teyra J, Pisabarro MT: Characterization of interfacial solvent in protein complexes and contribution of wet spots to the interface description. Proteins Struct Funct Bioinf 2007, 67(4):1087–1095. 10.1002/prot.21394
Rodriguez-Almazan C, Arreola R, Rodriguez-Larrea D, Aguirre-Lopez B, de Gomez-Puyou MT, Perez-Montfort R, Costas M, Gomez-Puyou A, Torres-Larios A: Structural Basis of Human Triosephosphate Isomerase Deficiency: mutation e104d is related to alterations of a conserved water network at the dimer interface. J Biol Chem 2008, 283(34):23254–23263. 10.1074/jbc.M802145200
Castro MJM, Anderson S: Alanine Point-Mutations in the Reactive Region of Bovine Pancreatic Trypsin Inhibitor: Effects on the Kinetics and Thermodynamics of Binding to beta-Trypsin and alpha-Chymotrypsin. Biochemistry 1996, 35(35):11435–11446. 10.1021/bi960515w
Sinha N, Mohan S, Lipschultz CA, Smith-Gill SJ: Differences in Electrostatic Properties at Antibody-antigen Binding Sites: Implications for Specificity and Cross-Reactivity. Biophys J 2002, 83(6):2946–2968. 10.1016/S0006-3495(02)75302-2
Guerois R, Nielsen JEE, Serrano L: Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002, 320(2):369–387. 10.1016/S0022-2836(02)00442-4
Additional file 1:One figure and three tables are contained in this file. The figure is about the hydrogen binding bridges. The three tables are the lists of all interfaces used in this paper, along with their properties. (PDF 146 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Li, Z., He, Y., Wong, L. et al. Progressive dry-core-wet-rim hydration trend in a nested-ring topology of protein binding interfaces.
BMC Bioinformatics13, 51 (2012). https://doi.org/10.1186/1471-2105-13-51