Skip to main content

Table 1 Parameters delineating the clustering of conserved residues in interfaces

From: Conserved residue clusters at protein-protein interfaces and their use in binding site identification

Interface type

Averagea

Number of interfacesb

 

M s cons

M s int

ρ

Total

With M s cons greater than M s int

P values c

Homodimers

0.081 (0.02)

[0.079 (0.02)]

0.071 (0.02)

[0.071 (0.02)]

1.13 (0.08)

[1.11 (0.09)]

121

117

[108]

1.57E-04

[1.50E-03]

 

0.087 (0.02)

0.070 (0.02)

1.24 (0.20)

103

94

1.93E-08

Complexes

0.102 (0.03)

[0.101 (0.02)]

0.089 (0.02)

[0.089 (0.02)]

1.14 (0.14)

[1.13 (0.17)]

392

340

[308]

9.64E-14

[3.74E-11]

 

0.113 (0.04)

0.090 (0.02)

1.26 (0.30)

309

252

< 2.2E-16

Complexes (antibody-antigen excluded)

0.103 (0.03)

0.088 (0.02)

1.16 (0.14)

348

313

4.86E-14

 

0.115 (0.04)

0.089 (0.02)

1.28 (0.30)

271

229

< 2.2E-16

Antibody-antigen complexes

0.101 (0.02)

0.097 (0.01)

1.04 (0.15)

44

23

0.59

 

0.103 (0.03)

0.097 (0.01)

1.07 (0.29)

38

21

0.57

  1. Two sets of values are provided, corresponding to two different ways of identifying the subset of conserved residues (see Methods). In the first, conserved residues in each interface are those whose sequence entropy values (calculated using Eq. 1) are lower than the mean sequence entropy (< s>int) for that interface; in the second method, conserved residues have s < (< s>int - σ), σ being the standard deviation of 's' values over all residues in that particular interface. The first method was also repeated by using Eq. 1a (instead of Eq. 1) for the calculation of sequence entropy and the results are provided in square brackets.
  2. a Standard deviations are in parentheses.
  3. b Multiple sequence alignments were available in the HSSP database for all proteins in our datasets with the exception of one homodimer, and therefore the analysis could not be carried out for that interface. 121 homodimeric interfaces and 408 interfaces belonging to 204 protein-protein complexes were analyzed - since the subunit interfaces in homodimers are identical, the analysis was performed for only a single subunit in homodimers. For protein complexes, each of the two components was analyzed separately. The average numbers of aligned homologous sequences in the HSSP files were 768 and 1391 for homodimers and protein complexes, respectively, and the percentage sequence identities of the aligned proteins ranged between 30 and 100%. For 16 protein chains belonging to the dataset of complexes, all the interface residue positions in the multiple sequence alignments were fully conserved and therefore the average interface entropy was 0.0. This did not allow the identification of the subset of conserved residues within the whole set of interface residues, precluding the calculation of clustering of conserved residues relative to the whole interface. Therefore, the statistics are shown for the remaining 392 interfaces only. A smaller number of interfaces is reported in the second row of data (corresponding to Method 2), where because of the use of a more stringent condition of conservation, some interfaces, with 0 or 1 conserved residue, get excluded from consideration.
  4. c The non-parametric Mann-Whitney U-test was used to test for statistical significance of the hypothesis that Ms,cons is greater than Ms,int. P < 0.01 indicates that Ms,cons is significantly greater than Ms,int at the 1% level. All statistical calculations (including P-values) were implemented using R [64].