Correlation analysis of the side-chains conformational distribution in bound and unbound proteins

Background Protein interactions play a key role in life processes. Characterization of conformational properties of protein-protein interactions is important for understanding the mechanisms of protein association. The rapidly increasing amount of experimentally determined structures of proteins and protein-protein complexes provides foundation for research on protein interactions and complex formation. The knowledge of the conformations of the surface side chains is essential for modeling of protein complexes. The purpose of this study was to analyze and compare dihedral angle distribution functions of the side chains at the interface and non-interface areas in bound and unbound proteins. Results To calculate the dihedral angle distribution functions, the configuration space was divided into grid cells. Statistical analysis showed that the similarity between bound and unbound interface and non-interface surface depends on the amino acid type and the grid resolution. The correlation coefficients between the distribution functions increased with the grid spacing increase for all amino acid types. The Manhattan distance showing the degree of dissimilarity between the distribution functions decreased accordingly. Short residues with one or two dihedral angles had higher correlations and smaller Manhattan distances than the longer residues. Met and Arg had the slowest growth of the correlation coefficient with the grid spacing increase. The correlations between the interface and non-interface distribution functions had a similar dependence on the grid resolution in both bound and unbound states. The interface and non-interface differences between bound and unbound distribution functions, caused by biological protein-protein interactions or crystal contacts, disappeared at the 70° grid spacing for interfaces and 30° for non-interface surface, which agrees with an average span of the side-chain rotamers. Conclusions The two-fold difference in the critical grid spacing indicates larger conformational changes upon binding at the interface than at the rest of the surface. At the same time, transitions between rotamers induced by interactions across the interface or the crystal packing are rare, with most side chains having local readjustments that do not change the rotameric state. The analysis is important for better understanding of protein interactions and development of flexible docking approaches.


Background
Protein-protein interactions play a key role in life processes. Characterization of conformational changes in proteins upon binding is important for understanding the mechanisms of protein association and for our ability to model it. Dependence of side-chain dihedral angle distribution on the conformation of the backbone has been investigated in earlier studies [1][2][3][4][5]. The side-chain dihedral angles are not evenly distributed, but for the most part are tightly clustered. A number of unbound rotamer libraries have been described previously [1][2][3][4][5][6][7][8][9][10][11][12][13][14] (see [15] for a review). Dunbrack and Cohen [1] used Bayesian statistics to estimate populations and dihedral angles for all amino acids rotamers at all φ and ψ values. A backbonedependent rotamer library [15] was obtained by dividing φ and ψ dihedral space into 10°× 10°bins, χ angles into 120°b ins, and calculating frequencies and average values of rotamers for each amino acid. A backbone-independent rotamer library was generated in a similar way. In a recent study [16], a new version of the backbone-dependent rotamer library was developed. It consists of rotamer frequencies, mean dihedral angles, and variances as a function of the backbone dihedral angles. In one of the latest backbone-independent rotamer libraries, the "Penultimate rotamer library" [5] by Lovell, Richardson and colleagues, the dihedral angle space was clustered and rotamer positions were defined as the distribution mode.
Comparison of the side-chain distribution in the core and on the surface [3], conducted on 19 protein structures available in 1978, revealed a small variation of the χ 1 rotamers distribution. A later study [17] on a set of 50 nonhomologous proteins showed that for all side chains, except Asp, Asn and Glu, the distributions of χ 1 rotamers on the surface and in the core are not significantly different.
Comparison of the χ 1 and χ 2 distributions at the interface and non-interface surface was performed by Guharoy et al. [18]. Distributions were divided into bins as in the Dunbrack's backbone-independent rotamer library [1]. Empirical free energies of inter-rotamer transitions were calculated and compared for the interface and non-interface areas. The rotamers free energies were different at the interface and non-interface, whereas bound and unbound free energies were essentially the same.
Conformations of surface residues in protein structures determined by crystallography are affected by the crystal packing. The area of the protein surface involved in the crystal contacts is generally smaller than in biological interfaces [19], and the interface packing is looser [20]. Studies of the crystal packing effect on the surface side chains [21][22][23] showed that~20% of the exposed side chains change conformation, and the change increases with the increase of the side-chain solvent accessibility. Large polar or charged residues Arg, Lys, Glu, Gln, as well as Ser were found to be most flexible [21].
The purpose of this study was to analyze and compare dihedral angle distribution functions of the side chains at the interface and non-interface areas in bound and unbound proteins. Such analysis is important for better understanding of protein interactions and development of flexible docking approaches. The dihedral-angle distribution functions (DADF) were calculated on a cubic grid dividing the dihedral space into cells for each residue type, at interface and non-interface surface, in bound and unbound structures. The correlation coefficients between bound and unbound, interface and non-interface DADFs were calculated, along with the Manhattan distance, as a measure of dissimilarity between the DADFs. All the correlation coefficients depended on the amino acid type and the grid resolution. The correlation coefficients always increased with the increase of the grid spacing, whereas the Manhattan distances decreased accordingly. Short residues with one or two dihedral angles had higher correlations and smaller Manhattan distances at small grid spacing than the longer residues. The correlation between the interface and non-interface DADFs showed a similar dependence on the grid resolution in both bound and unbound states. The differences between bound and unbound DADFs induced by biological protein-protein interactions or crystal contacts disappeared at the 70°grid spacing for interfaces and 30°f or non-interface surface. The two-fold difference in the critical grid spacing indicates larger changes at the interface than on the rest of the surface. While the earlier studies [18,24,25] observed this trend for the side-chain rotamers, this study validates it by a more general approach based on the DADFs.

Methods
The analysis was performed on the non-redundant DOCKGROUND Benchmark 3 set of bound and corresponding unbound protein structures [26]. The set consists of 233 complexes, with the unbound structures of both interacting proteins for 99 complexes, and the unbound structure of one interacting protein for 134 complexes. The following criteria were used for generating the set: sequence identity between bound and unbound structures > 97%; sequence identity between complexes < 30%; and homomultimers, crystal packing, and obligate complexes excluded.
The core residues change conformation upon binding less than the surface ones [24]. Thus, our study focused on the surface residues only. Surface residues were defined as those with the relative solvent-accessible surface area ≥ 25% in bound and unbound state. The change of the residue solvent-accessible surface area (SASA) upon binding was used to differentiate the interface residues from the non-interface ones. SASA was calculated using Naccess [27]. The interface residues were defined as those losing > 1 Å 2 SASA upon binding. The statistics of the interface and non-interface residues in the bound and unbound structures are summarized in Table 1. The difference between the numbers of bound and unbound interface/non-interface residues reflects the difference between the number of bound and unbound protein structures in the DOCKGROUND set.
Side chain conformations were represented by dihedral angles, calculated by Dangle [28]. All dihedral angles varied from −180°to 180°, with exception of the last dihedral angle in Phe, Tyr, Asp and Glu [2], which varied from 0°to 180°due to the symmetry of the terminal aromatic and charged groups. To calculate the distribution functions, the configuration space was divided into cells by a cubic grid.
DADFs were calculated as the occupancy of the grid cells separately for each residue type for interface and non-interface, bound and unbound residues. Thus, there were four DADFs for each residue type: interface bound, interface unbound, non-interface bound, and noninterface unbound. Figure 1 shows a two-dimensional distribution function of Asp dihedral angles for the noninterface unbound residues.
To compare distributions X and Y, the corresponding n-dimensional space (n is the number of the dihedral angles in the side chain) was split into m cubes with a fixed side length. The occupancy in each cell was calculated ( Figure 1). The correlation coefficient r [29] between unbound (X) and bound (Y) DADFs was calculated as: where X i and Y i are the probabilities of bound and unbound side-chain conformations in a grid cell i, Y i are the average probabilities of bound and unbound side-chain conformations.
To determine the degree of similarity between two  Bound. probability distributions the Manhattan distance [30] was calculated as: The Manhattan distance equals 0 for two identical DADFs, and increases up to 1 with the decrease of the DADFs similarity (higher similarity between the DADFs corresponds to lower values of the Manhattan distance).

Results and discussion
The discrete probability distribution of the amino acid side-chain χ angles depended on the starting point of splitting and the size of the grid spacing. An example of a probability function with 20°grid spacing and different starting points of splitting for non-interface unbound Ser is shown in Figure 2. The distribution was divided into cells with a predefined step size, starting with a randomly chosen point, and the probability in each cell was calculated. To remove the effect of splitting, correlation coefficients were calculated 100 and 1000 times with the same splitting step but random starting point of splitting. Then, the average correlation coefficients were calculated. We found no significant difference between the correlation coefficients averaged 100 or 1000 times. Tests of statistical significance of the correlation [31] between bound and unbound distributions, and noninterface and interface distributions showed that all correlation values were significant, with p-values far below 0.001.
Analysis showed that the correlation coefficients depend on the grid spacing ( Figure 3). Generally, larger steps corresponded to higher correlation values (larger  cells yielded more smooth/similar distributions). Table 2 shows the grid spacing at which the correlation reaches a high level of 0.7. Most amino acids had high correlation between bound and unbound interface/non-interface distributions for grid spacing ≤ 20°, except Met and Arg at the interface and non-interface, and Glu and Gln at the interface. The correlation coefficient for Met and Arg increased with the grid spacing increase and reached the high level of 0.7 at the 70°grid spacing for interface, and 30°for non-interface. The two-fold difference in the critical grid spacing indicates higher flexibility of these amino acids at the interface [24]. Since the 120°distance between two adjacent side-chain rotamers is significantly larger than the critical grid spacing, the use of large clustering radii for bound and unbound rotamer libraries [24] would produce similar results.
Although the results showed high degree of similarity between the distributions, correlation values for Met and Arg were noticeably lower than for other amino acids. Analysis of the results for Met revealed that although the covariance of distributions for all amino acids with three dihedral angles were the same, the standard deviation for Met was higher (Table 3), leading to the lower correlation value for Met. In the case of Arg, although the standard deviations of Lys were twice larger than that of Arg, the covariance of Arg was ten times smaller than that of Lys, yielding the overall lower correlation for Arg. Table 2 The minimal grid spacing corresponding to correlation coefficient 0.7 between bound and unbound interface/non-interface dihedral angle distribution  Equation 2 was used to calculate the Manhattan distance between bound and unbound interface/non-interface distributions. As in the case of correlation, the metric value depended on the grid spacing, with larger steps corresponding to more coarse-grained distributions. Thus, tests were conducted with different steps: 10°, 30°, 50°, 70°, and 90°. The distance between the distributions decreased with the step increase ( Figure 4). In most cases, the Manhattan distances for the interface were greater than for the non-interface. The distances between interface unbound and bound distributions for all long amino acids with three and four dihedral angles were the largest ( Figure 4A). It agrees with our previous findings that long amino acids have higher flexibility in binding [24]. The Manhattan distance between the probability functions was < 30% for most amino acids, starting with 50°grid spacing, except for Met and Arg interface bound vs. unbound and non-interface vs. interface distributions. For these distributions, the distance was < 30% at grid spacing 70°, and < 35% for Met interface bound vs. unbound and Arg bound non-interface vs. interface. The high similarity between the DADFs at the 50°grid spacing is a result of the small number of rotamer-to-rotamer transitions induced by interactions across the interface or the crystal packing. Most side chains have local readjustments ( Figure 5) that do not change the rotameric state.

Conclusions
The dihedral-angle distribution functions were calculated for each amino acid type for interface and noninterface surface residues, in bound and unbound protein structures. To generate the distribution functions, the configuration space was divided into cells by a cubic  grid. Correlation coefficients between bound and unbound interface and non-interface distribution functions were calculated. The similarity between the distributions was also quantified by the Manhattan distance. The results showed that all the correlation coefficients depend on amino acid type and the grid resolution. For all amino acid types, the correlation coefficients increased with the increase of the grid spacing. The Manhattan distances between the distribution functions decreased accordingly. Short residues with one or two dihedral angles had higher correlations and smaller Manhattan distances than the longer residues. Met and Arg had the lowest correlation coefficients at any grid spacing. The correlations between the interface and non-interface distribution functions had a similar dependence on the grid resolution in both bound and unbound states. The interface and non-interface difference between bound and unbound distribution functions, induced by biological protein-protein interactions or crystal contacts, disappeared at the 70°grid spacing for interfaces and 30°for non-interface surface, in agreement with an average span of a side-chain rotamer. The two-fold difference in the critical grid spacing indicates larger conformational changes upon binding at the interface than at the rest of the surface. At the same time, transitions between rotamers induced by interactions across the interface or the crystal packing are rare, with most side chains having local readjustments that do not change the rotameric state. Conformational sampling based on the side chain dihedral angle distributions may optimize flexible docking protocols by reflecting conformational preferences of the bound proteins. The results suggest that the site-(interface vs. non-interface) and residue-specific grid spacing smaller than the critical values should be used in the sampling. The minimal grid spacing (Table 2) reflects intra-rotamer amino acid local readjustments upon binding. Thus, using such steps in conformational sampling may accelerate the flexible docking search by reflecting the size of these readjustments. The core residues are shown as surface. The interface residues are shown in bold colors. The bound structure of the immunoglobulin is 1a2y [32], the unbound structure is 1vfa [33]. The bound structure of the alpha-chymotrypsin is 1acb [34], and the unbound structure is 1gct [35].