Mapping the distribution of packing topologies within protein interiors shows predominant preference for specific packing motifs
 Sankar Basu^{1},
 Dhananjay Bhattacharyya^{2} and
 Rahul Banerjee^{1}Email author
DOI: 10.1186/1471210512195
© Basu et al; licensee BioMed Central Ltd. 2011
Received: 20 October 2010
Accepted: 24 May 2011
Published: 24 May 2011
Abstract
Background
Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design.
Results
In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleationcondensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys.
Conclusions
Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry.
Background
Despite several decades of arduous effort, mapping of protein primary sequences to their three dimensional folds, referred to as the second genetic code, remains an unsolved scientific problem. What appears to be lacking is a comprehensive theory, integrating two factors which definitely condition the isomorphism between sequence and fold, namely (1) the pattern of hydrophobicities embedded in the polypeptide chain [1] and (2) the packing of amino acid side chains to give densely packed [2] protein interiors. Under the present circumstances, the more tractable approach is the 'inverse protein folding problem' [3, 4], that is to identify protein primary sequences [5] consistent with and supportive of a given fold, an idea which has found considerable application in the de novo design of targeted protein structures [6–9]. Yet even here, it was realized earlier on, that in de novo design, attainment of dense, wellpacked protein cores (a hallmark of native, correctly folded proteins) was neither an automatic part of the design process nor acquired simply by chance [10, 11]. Most often, it was observed (especially for longer sequences) that design led to molten globules or complete unraveling of the structure [12, 13]. An instructive example was the repeated failure to design parallel (α/β)_{8}  TIM barrel [14, 15], finally resolved successfully by Offredi et al.[16], where a term optimizing for side chain packing specificity was deliberately included in the computational process. In fact, one indicator for successful computational design [17–20] is the attainment of densely packed side chains in the interior of the targeted protein, experimentally characterized by the absence of ANS binding [16]. Thus, a comprehensive theory with regard to the packing of side chain atoms within proteins would not only provide insight into protein structures, facilitate their prediction [21] and would also be a valuable aid in the design of novel proteins.
Traditionally, there have been two models of protein packing: (1) the 'jigsaw puzzle' and (2) the 'nuts and bolts' model which lie on the opposite ends of the spectrum. The jigsaw puzzle model attributed to Crick [22], postulates the stereo specific interdigitation of amino acid side chains giving rise to densely packed protein interiors. On the other hand, the nuts and bolts model [23] does not require the association of side chains with specific geometry and asserts that the internal architecture of proteins arises simply due to the high compaction of side chain atoms within a constrained volume. Lately, another model referred to as the 'oil drop' model [24, 25] has been proposed in order to capture the dynamic fluctuations in protein cores. Possibly, all these models concentrate exclusively on some special features of interior packing. However, using a surface complementarity function a previous report from this laboratory [26] demonstrated that binary association between two hydrophobic side chains (LeuLeu, LeuPhe etc), with high surface fit and maximal overlap between their corresponding residue surfaces, did exhibit specific interresidue geometry. It was thus clear that at least for a subset of contacts (with high fit and overlap) predictions of the jigsaw puzzle model were indeed valid.
One drawback from all such studies is that the interresidue interactions which sustain a native fold are more accurately modeled as a network rather than a discrete assortment of binary interacting pairs. Several attempts have been made to view protein structures as contact networks [27–35] wherein the amino acids have been designated as nodes and their mutual non covalent interactions as edges. The character of these networks (in terms of degree distribution, clustering coefficients, characteristic pathlength etc.) exhibit variability depending on the cutoffs used to define interatomic contact. By and large, most protein contact networks preserve 'smallworld' character (local cohesiveness, global reach) [29, 34, 36–38] and display signatures of assortative mixing (preferential attachment of new nodes to preexisting high degree nodes) [34]. However, degree distribution can be exponential, sigmoidal or dependent on a single exponent  as a function of the criteria used to define atomic interactions [32]. It has also been noted that in certain aspects protein contact networks differ significantly from other real world networks, for example in the restricted number of edges a node can have. Apart from providing insights into protein structures, these networks have been used to identify residues implicated in folding nuclei [35] and transition states [28], identifying functional residues involved in the active site [30], hubs stabilizing the packing of secondary structural elements [32], rationalization of the difference in protein stabilities from thermophilic/mesophilic organisms [32] and estimation of folding rates [27, 31]. The utility of the network view of the protein structure is thus fairly well established.
In this study, the distribution of such networks from a database of protein structures has been analyzed in order to identify specific topological patterns in side chain association within protein cores. Such an analysis led to the recognition that certain packing topologies defined as packing motifs were found preferably in proteins. A limited region of the topological space was exhaustively mapped in terms of frequently occurring packing motifs, combinations of which could lead to networks of larger sizes. It was found that indeed larger networks could be assembled out of a basis set of smaller ones. One such frequently occurring motif namely the three residue clique received special attention with regard to its composition and geometry of associating residues.
Central to pursuing the research objectives outlined above was the extension of the jigsaw puzzle model into protein contact networks. Thus protein contact networks have been defined primarily in terms of surfaces rather than distance between point atoms (although such networks have also been studied in parallel for the sake of comparison). As mentioned previously, earlier studies [26] had established quantitative measures (in terms of surface complementarity and overlap) to identify those residue pairs whose interacting side chains exhibit specific geometry. These measures have now been used to define 'surface contact networks' based only on those interresidue interactions which severely constrain geometry and thus could play a predominant role in stabilizing a particular fold.
Results and Discussion
The primary objective of this study is to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. In this task we have deliberately chosen those contacts which strongly and specifically condition the interresidue geometry of association. Since the majority of atomic contacts inside a protein are contributed by side chain atoms, it is natural to represent such interior packing as a network, defined primarily in terms of contact between their corresponding van der Waals surfaces (ASCN). In addition, point atom contact networks (APCN) have also been studied simultaneously (albeit with a fairly strong interaction cut off: 3.8 Å), by way of comparison.
Contact between any two surfaces can be characterized in terms of overlap (Ov) that is the extent to which two surfaces are conjoined and by their goodnessoffit or surface complementarity (S_{m}) (see Methods, section: Surface Complementarity). In a previous work, it was demonstrated that when surface association between two amino acid side chains were greater than equal to 0.1 and 0.5 in Ov and S_{m} respectively (defined on a Connolly surface), angular distributions specifying interresidue geometry exhibited significant deviations from a random distribution [26]. For a corresponding van der Waals surface, the values of S_{m} were found to be marginally lower for the same binary interactions. In contrast to point atoms, the definition of 'contact' (see Methods, section: Surface Complementarity) between two surfaces is not necessarily commutative (i.e. A contact B does not imply B contact A). In networks based on surface contact, nodes representing residues A and B have been connected with an edge only when (1) the contact between A and B is commutative and (2) their reciprocal S_{m} and Ov both are greater than equal to 0.4 and 0.08 respectively. For strong association between two residue surfaces their contact is expected to be commutative, which also effectively simplifies the network to an undirected graph. For both point atom and surface contact networks, interatomic distance and surfaceoverlap bear a strong positive linear correlation. S_{m} on the other hand appears to be an additional feature for the latter. Interestingly, the choice of 3.8 Å as the interaction cut off for point atoms led to maximum resemblance between the two categories of networks.
Frequency distribution of contact networks according to size
Number of Networks  

Network Size  APCN  ASCN  
(0.0, 0.0)  (0.2, 0.05)  (0.3, 0.07)  (0.4, 0.08)  (0.5, 0.1)  
3  1168  58  162  336  707  1995 
4  614  51  99  165  349  1016 
5  433  29  57  99  187  641 
6  273  28  33  62  147  452 
7  198  11  32  49  90  314 
8  148  6  28  34  71  230 
9  125  12  16  29  58  195 
10  99  4  17  27  53  134 
1120  564  32  78  152  435  476 
2130  236  40  119  224  341  47 
3140  130  72  107  153  217  8 
4150  72  50  60  104  105  4 
51100  165  241  243  228  203   
101150  60  127  122  99  63   
151200  33  99  88  69  11   
201250  10  70  51  29  6   
251300    26  19  8    
301350    4  1     
The same calculations repeated for polypeptide chains distributed in bins with 75150, 151300, 301500 residues gave similar curves, though for bins of larger chain length, networks of larger size appeared, thereby extending the long tail of the distribution. As expected, frequency distributions of polypeptide chains containing networks of a particular size gave a similar decaying trend with increasing network size; that is networks of smaller size were found embedded in polypeptide chains regardless of the chain length, whereas instances of larger graphs were progressively rare. These distributions tend to indicate that (in the subset of contacts with specific interresidue geometry of association) small (310 nodes) to medium (1120 nodes) sized networks are found universally in all protein structures, whereas linkage and/or fusion of these smaller networks to form larger ones is protein specific and is thus context dependent. Very large networks (> 150 nodes) were found only in 17 proteins (additional file 2: Table S1) almost all of which had chain length exceeding 400 residues with close packing between extended secondary structural elements (helices and sheets). Overall, most of the very large networks were found in αβ proteins (additional file 3: Table S2).
In a protein contact network, there is an obvious upper bound to the highest possible degree a node can have (dependent on the contact criteria) due to the limited volume of the residues involved in packing. For the present set of criteria, the highest degree of a node was found to be restricted to 8 and 9 for ASCN and APCN respectively.
Packing Motif
One of the central concepts formulated in this study is that of a 'packing motif'. To start with, a packing motif is defined as a graph with a limited number of nodes (37), consisting of unique topological connections, which can be found either in isolation or can appear as a component or an induced subgraph, embedded within a larger graph. It follows that no two distinct motifs are superimposable onto each other. In other words two motifs are identical (or topologically isomorphic) if there exists a onetoone correspondence between their vertex sets which preserve adjacency [41]. The same motif can be found in different proteins and since a node (in the motif) does not conventionally represent any particular amino acid, it could stand for different sets of residues associated with diverse interresidue geometries in the actual three dimensional assemblies. Thus a packing motif is a reduced representation of three dimensional residue clusters, rather analogous to supersecondary structural motifs where, for example, different combination of residues in unrelated proteins can fold into (say) a helixturnhelix.
Frequency distributions of small (310 nodes) networks with their corresponding number of motifs
ASCN  APCN  

Network Size  Networks  Motifs  Networks  Motifs 
3  707  2  1168  2 
4  349  5  614  5 
5  187  12  433  13 
6  147  28  273  37 
7  90  47  198  60 
8  71  55  148  76 
9  58  46  125  93 
10  53  51  99  91 
All networks were systematically searched for size of the maximal clique (n_{c}) (see Methods, section: Cliquishness) which interestingly was found to be no more than 4 for embedded cliques (n_{c} being 3 for a large majority of cases) and not exceeding 3 for complete graphs (or isolated cliques). In fact, the number of networks with a maximal clique of 3 and 4 nodes respectively, were found to be 1548 and 77 in case of ASCN (1662 and 146 in the same order: APCN). Since an nclique should exactly have (^{n}C_{2}  n) diagonal edges, it follows that any possible closedring topology of n > 4 to be found in the database can have at the most (^{n}C_{2}  n 1) diagonal edges. Thus the possible network architectures spanning the space under study is expected to be restricted to a few basic topologies namely linear chains, closed triplets (with or without branching), closed quadruplets (including embedded 4cliques), higher order ring closures (n > 4) with a restricted number of diagonal edges and possibly a series of nonplanar graphs.
For n = 3 there were trivially only two possible motifs (1) the open linear chain (motif id: 2111212) and (2) the isolated closed triplet clique (motif id: 222222222). Both possibilities were found in protein contact networks, though with considerable difference in the number of their respective occurrences. The overwhelming majority of these threenode graphs are found to be open linear chains (660: ASCN; 1070: APCN) which offer greater flexibility unlike isolated closed triplet cliques (47: ASCN; 98: APCN) which can only occur, satisfying additional geometric constraints. It could also be possible that triplet cliques once formed display an inherent tendency to evolve into larger networks given the fact that a significantly larger number of these cliques were found to be embedded as induced subgraphs in larger graphs (8876: ASCN; 9102: APCN) relative to isolated instances. Out of a total of 719 polypeptide chains in the database, embedded triplet cliques were found at least once in 696, 689 for ASCN and APCN respectively whereas for isolated instances the corresponding numbers were 47 (ASCN) and 90 (APCN).
It is a relatively simple task (at least up to n = 5) to enumerate the possible number of motifs and then find their respective number of members (or the frequency of their occurrence) in the protein database. It is however a more complex exercise to propose a sound classification scheme, which leads to the regular ordering of actually observed motifs. To this end two additional concepts were defined namely family and path. Two motifs g(n) and g'(n+1) (with n and n+1 nodes respectively) are related by a path if the motif g'(n+1) can be formed from g(n) such that the node added to g(n) is linked to only one preexisting node by a single edge. In other words the transformation g(n) → g'(n+1) is a path provided the newly added node has degree of one and the degree of one and only one preexisting node in g(n) increases by one. Again, all motifs which can be linked by successive paths: g(n) → g'(n+1) → g''(n+2) ... etc. fall within the same family. However, in case the intermediate g'(n+1) was missing, g''(n+2) was still retained in the same family. Thus, essentially a path leads to linear branching(s) about nodes belonging to a basic core topology. It follows that a motif of larger size (greater than 7 nodes) can either belong to an already existing family provided it is appropriately linked by a path or belong to an entirely new family (for example, ring closures of n > 7) which was found to be remarkably less frequent.
More or less the same trend was preserved for n = 5 where new motifs with significant number of members were again placed in the families f1 and f2. Additional motifs with marginal membership were included in f3a and f3b, which were essentially branched four membered rings. Two more families (f4a and f4b) were created at this point, the former (f4a) originating from the closed pentagon (with no diagonals) whereas the latter (f4b) includes the pentagon with a single diagonal edge (additional file 6: Figure S3). Other families at this point include topologies demonstrated by two or more closed triplets; fused along their edges (f5) (additional file 7: Figure S4), connected at a node (f6a) or connected by an edge (f6b) (additional file 8: Figure S5). Once again, families other than f1 and f2 exhibited negligible memberships. Moving up levels n = 6, 7 led to the inclusion of only five more families: (a) linkage of two four membered rings through a node (f7: 1 member each in ASCN and APCN) (additional file 8: Figure S5), (b) embedded quadruplet cliques with additional linear branching (f8a: 1 in ASCN and 4 in APCN) (c) nonplaner graphs excluding quadruplet cliques (f8b: 1 in ASCN and 3 in APCN) (additional file 9: Figure S6), (d) closed six membered ring with or without diagonals edges (f4c: 3 each in ASCN and APCN) (additional file 6: Figure S3) and (e) graphs where two nonadjacent nodes are connected by more than two sequences of successively connected nodes (f8c: 4 each in ASCN and APCN) (additional file 9: Figure S6). The addition of nodes from n = 5 to n = 6, 7 primarily led to the addition of motifs in the preexisting families by, (1) increasing the length and branching of the linear chain (f1), (2) increased linear branching about the triplet cliques (f2), (3) progressive branching and inclusion of diagonal edges of the higher order closed rings (f3a, f3b, f4a, f4b, f4c, f5).
At this stage it became obvious that the initial definitions were leading to a proliferation of families with almost negligible membership. Thus to reduce the number of such families some exceptions were made. For families originating from five membered rings (f4a & f4b), motifs with a closed triplet fused about any two vertices of the preexisting pentagon (33323322332123323223213: f4a & 4332234323432243242232, 5332213532353225325223215, 4432244321344224424223212: f4b) (additional file 6: Figure S3) were included in the same respective families. Finally up to n = 7, 94 (ASCN) and 117 (APCN) motifs with 1480 (ASCN) and 2686 (APCN) members respectively were organized into 13 families.
The same procedure described above was performed separately for polypeptide chains in each individual protein class (all α, all β, αβ, α+β), in order to investigate any preference for specific motifs or families. By and large no outstanding preference was observed (after suitably normalizing for the number of polypeptide chains in each class) (see Methods, section: propensity), though a somewhat reduced propensity was found for family f2 (originating from closed triplet cliques) in the case of all α proteins (0.67) with a relative increase in propensity for αβ (1.20). The statistics was not robust for most families barring f1 and f2 due to their extremely low frequency of occurrence (additional file 10: Table S4).
The trend regarding the distributions of motifs (with preference for families f1 and f2) was not radically changed for different cutoffs on S_{m} and Ov, other than the reduction of smaller (isolated) motifs on gradually relaxing the cutoffs. Notably, the population of f2 (family of closed triplet cliques) was found to gradually approach that of f1 (family of open linear chains) upon systematic lowering of the cutoffs, allowing weaker links to close the cluster. On the other hand, the application of more stringent cutoffs (S_{m} > = 0.5, Ov > = 0.1) led to an increase in the population of smaller motifs, predominantly in the f1 family. Most notable was the increase in frequencies of motifs with 7 nodes (f1) probably due to the exclusion of a few weaker links leading to 'minimally connected' open linear chains (additional file 11: Table S5).
Since from n = 5 to n = 6, 7 a diminishing number of families (only 5) are added with negligible membership, it is highly likely that larger networks (n > 10) will generate motifs either populating already existing families or will be assembled by joining preexisting motifs following a defined set of rules. Since the same trend of preferential membership in the first two families were followed in networks of size n = 8, 9, 10 (data not shown) it was decided to begin the construction of higher order graphs out of a motif basis set obtained from networks of sizes up to n = 7, with the understanding that motifs of sizes greater than 7 nodes (located in appropriate families) would also be utilized depending on the context of a particular network. Variants of a motif with branching(s) from nodes different from those originally observed (especially for closed ring topologies) though preserving core topology would also be used in the resolution of larger graphs into subgraphs. For n = 10, the total number of motifs became comparable to the number of networks or members (Table 2). Thus, the resolution of larger graphs in terms of the proposed basis set was attempted for n greater than 10.
Possibly, introduction of cofactors and metals into proteins could distort contact networks to give rise to novel topological architectures. Preliminary analyses of networks associated with metal binding sites (on a reduced database consisting of 63 polypeptide chains) exhibit novel 'highly connected' motifs not found in ASCN/APCN which show a greater tendency to form higher order cliques (n_{c} = 5, 6) provided the metal(s) is also included in the contact network.
Triplet Clique
The classification of motifs into families reveals that the overwhelming majority of contact networks found in protein structures occur in the first two families (f1 + f2) originating from core topologies of either open linear chains or closed triplet cliques. Although the simple rule governing the classification of motifs leads to about thirteen families in all, a significant proportion of these families have such negligible membership that they can be currently disregarded. To investigate whether the most frequently occurring motifs exhibit any preference in their constituent amino acid residues and whether their side chains pack with specific geometry, closed triplet clique (regarded as the 'clustering unit') was chosen for further investigation. Analysis of the relative frequencies of isolated and embedded triplet cliques appeared to suggest that isolated cliques (or in other words, complete graphs of three nodes) have an inbuilt tendency for further branching(s) about the three constituent nodes resulting in their being embedded in larger graphs. Thus, to improve the statistics, both isolated triplet cliques and those embedded as induced subgraphs in larger graphs were pooled together. Further, since hydrophobic residues show greater propensity for burial and inclusion into contact networks, only the subspace of triplet cliques composed exclusively of hydrophobic residues (Ala, Val, Leu, Ile, Phe, Tyr, Trp) were considered. The resultant number of triplet cliques thus reduced to 4874, 1545 out of a total of 8923, 9200 for ASCN and APCN respectively. Interestingly, the number of such cliques was found to be significantly higher for ASCN relative to APCN, and since surface contact networks have been defined with a view to identify side chain associations with specific interresidue geometry, results from ASCN alone are being discussed, which in any case should give superior statistics.
Thus, the data indicates that even though most of the possible residue combinations are realized in local closed triplets within proteins, there is a wide divergence in their respective frequencies. Some residue combinations definitely appear to be preferred over others. Moreover, since only a subspace has been studied, the compositional propensities appear to be fairly pronounced, rather than outstanding. Without the use of surfaces and careful classification of triplet cliques (based on their compositions) these could well be overlooked. The overall pattern in composition remained fairly unmodified upon changing cutoffs in S_{m} and Ov. Even then, the formation of well packed three residue cliques in proteins appears to be constrained in terms of the total volume occupied by the triplet and probably their interresidue geometry. In all probability, only some residue combinations optimally satisfy these constraints. The question then is what are the geometrical constraints imposed on these threeresidue cliques?
 1.
For C1 (all three residues different), the three residues (R1, R2, R3) were first sorted on the basis of their side chain volume R1 > R2 > R3. Let the vector directed from the origin of R1 to R2 be v1 and that from R1 to R3 be v2. Then the global Z axis was defined as v1 × v2 and the global X axis as the unit vector directed from the global origin towards the origin of R1.
 2.
For a given composition in C2 (e.g., LeuPhePhe) one specific example was arbitrarily chosen whose unique residue was designated R1 and R2, R3 were assigned such that the identical procedure outlined above (in 1) resulted in an acute angle being subtended between the global Z and the internal Z of R1. All other triangles with the same composition were superposed onto this template. The calculations were repeated starting from different templates to confirm that the results were not artifacts of this geometrical procedure.
 3.
In case of C3 (e.g., LeuLeuLeu), a randomly chosen triplet was arbitrarily assigned R1, R2, R3 and the global frame was defined following procedure (1). All other triangles of the same composition were superposed onto this template. To select for the best possible superposition (additional file 14: Figure S7) in each case 6 combinatorial possibilities were checked. Similar to C2, the calculation was repeated for different starting templates.
The triangle constructed from the associated residues in a triplet clique approximates to being equilateral
Composition  Frequency  <r_{12}>  <r_{13}>  <r_{23}>  < Ω_{1}>  < Ω_{2}>  < Ω_{3}>  

ILE  LEU  VAL  322  5.9 (0.7)  5.9 (0.7)  5.6 (0.5)  56.5 (7.5)  61.6 (9.2)  61.9 (8.6) 
ILE  LEU  LEU  291  6.3 (0.6)  5.6 (0.6)  5.5 (0.6)  55.0 (7.9)  55.9 (5.9)  69.1 (7.1) 
PHE  ILE  LEU  276  5.8 (0.8)  5.4 (0.6)  5.9 (0.6)  63.2 (9.5)  55.5 (9.3)  61.3 (10.9) 
VAL  LEU  LEU  268  5.4 (0.5)  5.9 (0.4)  5.6 (0.6)  59.2(7.8)  65.2 (5.6)  55.5 (5.4) 
PHE  LEU  VAL  246  5.5 (0.6)  5.3 (0.6)  5.6 (0.6)  61.6 (8.2)  57.7 (9.4)  60.7 (9.4) 
PHE  LEU  LEU  237  5.1 (0.5)  5.8 (0.5)  5.6 (0.5)  62.1 (8.4)  65.7 (7.6)  52.1 (5.7) 
LEU  LEU  LEU  202  5.2 (0.5)  6.1 (0.3)  5.7 (0.4)  59.5 (4.2)  67.8 (4.9)  52.6 (4.7) 
LEU  ILE  ILE  187  6.4 (0.5)  5.7 (0.6)  6.3 (0.8)  63.4 (11.7)  52.8 (7.0)  63.8 (7.6) 
LEU  PHE  PHE  162  5.1 (0.5)  5.8 (0.5)  5.5 (0.5)  60.7 (9.3)  66.1 (7.0)  53.3 (5.9) 
TYR  ILE  LEU  151  5.7 (0.8)  5.3 (0.6)  6.0 (0.6)  65.5 (8.8)  53.9 (9.5)  60.5 (11.1) 
PHE  ILE  VAL  150  5.7 (0.7)  5.4 (0.7)  5.9 (0.7)  64.9 (10.6)  55.6 (10.7)  59.5 (10.3) 
VAL  ILE  ILE  134  5.5 (0.6)  6.3 (0.6)  6.2 (0.8)  63.1 (11.1)  64.9 (7.5)  52.0 (7.2) 
LEU  VAL  VAL  128  5.9 (0.4)  5.3 (0.5)  5.7 (0.5)  61.1 (8.0)  54.8 (4.7)  64.1 (6.1) 
ILE  VAL  VAL  119  6.3 (0.5)  5.5 (0.6)  5.7 (0.5)  57.8 (8.1)  54.4 (6.4)  67.8 (5.9) 
TYR  PHE  LEU  117  5.5 (0.6)  5.4 (0.6)  5.4 (0.5)  59.6 (9.0)  59.6 (10.1)  60.7 (9.5) 
ILE  PHE  PHE  105  6.3 (0.6)  5.5 (0.7)  5.5 (0.6)  54.6 (8.5)  54.9 (7.3)  70.5 (7.2) 
TYR  LEU  LEU  104  4.9 (0.4)  5.9 (0.5)  5.5 (0.6)  60.9 (8.5)  67.9 (7.1)  51.2 (6.3) 
TYR  LEU  VAL  98  5.4 (0.6)  5.4 (0.7)  5.5 (0.4)  60.7 (7.6)  59.4 (10.1)  59.9 (10.4) 
PHE  ILE  ILE  94  5.5 (0.6)  6.3 (0.7)  6.4 (0.9)  66.0 (11.3)  63.1 (8.5)  50.8 (6.7) 
PHE  VAL  VAL  88  4.9 (0.6)  5.8 (0.5)  5.7 (0.5)  63.5 (10.0)  65.7 (7.7)  50.8 (6.8) 
TYR  PHE  ILE  85  5.6 (0.5)  5.7 (0.9)  5.8 (0.8)  61.8 (12.1)  59.9 (13.3)  58.2 (9.9) 
TYR  PHE  VAL  77  5.5 (0.6)  5.5 (0.6)  5.5 (0.7)  59.9 (10.7)  59.3 (10.0)  60.7 (9.9) 
ILE  ILE  ILE  70  5.5 (0.6)  6.4 (0.5)  7.0 (0.6)  72.1 (6.2)  59.9 (4.5)  48.0 (5.8) 
TYR  ILE  VAL  69  5.8 (0.9)  5.4 (0.6)  5.9 (0.7)  63.4 (10.6)  55.0 (10.3)  61.5 (11.7) 
VAL  PHE  PHE  67  5.8 (0.5)  5.0 (0.5)  5.4 (0.5)  59.8 (9.0)  52.3 (6.9)  67.8 (8.0) 
TRP  PHE  LEU  56  5.5 (0.6)  5.6 (0.6)  5.4 (0.6)  58.7 (8.5)  61.6 (10.3)  59.7 (9.3) 
TYR  PHE  PHE  53  5.9 (0.5)  5.1 (0.4)  5.5 (0.5)  59.8 (9.2)  52.6 (5.7)  67.5 (8.1) 
LEU  VAL  ALA  50  5.7 (0.5)  4.7 (0.4)  4.7 (0.4)  53.4 (5.3)  53.1 (6.8)  73.5 (7.7) 
χ^{2} for angular variables for triplet clique compositions exhibiting specific geometry
Composition  Frequency  χ^{2}(θ1_{t})  χ^{2}(θ2_{t})  Χ^{2}(θ3_{t})  χ^{2}(φ1_{s})  χ^{2}(φ2_{s})  χ^{2}(φ3_{s})  

PHE  ILE  LEU  276  48.5  11.2  35.8  15.3  14.5  5.8 
PHE  LEU  VAL  246  60.0  20.6  5.2  14.0  8.5  6.8 
PHE  LEU  LEU  237  72.1  25.5  13.5  13.9  20.8  3.8 
TYR  ILE  LEU  151  39.0  2.9  7.3  20.5  16.0  10.0 
PHE  ILE  VAL  150  37.7  9.3  16.7  19.4  4.9  19.5 
TYR  PHE  LEU  117  44.6  6.2  2.8  18.5  5.7  10.6 
TYR  LEU  LEU  104  46.7  15.2  8.6  6.4  5.3  16.4 
TYR  LEU  VAL  98  35.5  8.7  9.5  7.4  3.1  6.6 
PHE  VAL  VAL  88  21.6  24.9  5.1  14.1  4.2  4.5 
TYR  PHE  VAL  77  20.3  3.0  4.0  9.0  10.2  2.0 
TYR  ILE  VAL  69  23.4  2.5  3.8  4.5  2.0  4.7 
TRP  LEU  LEU  47  29.5*  4.0*  5.1*  3.9  17.7  2.9 
TRP  LEU  VAL  41  23.9*  5.3*  3.0*  6.3  6.3  3.9 
Distribution in θ1_{t} for triplet clique compositions exhibiting high χ^{2}
% Occupancy in bins with θ (deg.) range  

Composition  χ ^{ 2 } (θ1 _{ t } )  030  3060  6090  90120  120150  150180  
Random (6bin):  6.7  18.3  25.0  25.0  18.3  6.7  
Random (3bin):  13.4  36.6  50.0        
PHE  ILE  LEU  48.5  4.0  26.1  69.9       
PHE  LEU  VAL  60.0  4.5  21.1  74.4       
PHE  LEU  LEU  72.1  2.1  21.1  76.8       
TYR  ILE  LEU  39.0  2.0  23.8  74.2       
PHE  ILE  VAL  37.7  4.0  21.3  74.7       
TYR  PHE  LEU  44.6  1.7  17.9  80.4       
TYR  LEU  LEU  46.7  0.0  17.3  82.7       
TYR  LEU  VAL  35.5  2.0  18.3  79.7       
PHE  VAL  VAL  21.6  0.0  28.4  71.6       
TYR  PHE  VAL  20.3  3.9  20.8  75.3       
TYR  ILE  VAL  23.4  1.4  20.3  78.3       
TRP  LEU  LEU  29.5  0.0  2.1  44.7  44.7  6.4  2.1 
TRP  LEU  VAL  23.9  2.4  7.4  43.9  43.9  2.4  0.0 
In order to investigate the rotation of the residue planes (XY plane of the residueinternal frames) about an axis parallel to their own internal Z, the component of the global Z axis of the triangle was projected onto the respective planes and the orientation of this vector (Zp) with respect to the internal X axis (defined as the swivel angle φ_{s} ranging from 0360°) was computed. Since the angle φ_{s} is restricted to a plane, each quadrant is expected to be equally populated for a random distribution. χ^{2} in φ_{s} did not appear to show any significant preferences for any residue. Therefore, for a given tilt, the residue plane can adopt multiple orientations about an axis perpendicular to it.
In order to determine whether the formation of triplet cliques was due to either long or short range contacts, the position of residues along the polypeptide chain was examined. The sequence separation of two residues (involved in triplet clique formation) was termed local when they were separated by less than (or equal to) 10 contiguous residues and nonlocal when greater than 10. Overall, out of a total of 26769 clique forming contacts, 6585 were local and 20184 nonlocal. The same calculations repeated for polypeptide chains distributed in bins with 75150, 151300, 301500 residues followed the same trends. Thus, although the majority of contacts were nonlocal, a nonnegligible fraction (~ 25%) are between residues closely located along the polypeptide chain.
As has been mentioned previously (section: Packing motifs), motifs from the f2 family were relatively favored in αβ proteins and disfavored in the all α class. The effect was accentuated for the set of all closed triplet cliques (isolated and embedded) wherein the propensities for the classes were  all α (0.53), all β (0.88), αβ (1.45) and α+β (0.87).
Utility of surface contact networks in fold recognition  a case study
Finally, an attempt was made to probe the efficacy of surface contact networks in correctly identifying sequences (situated amidst decoys) consistent with a given fold. Cyclophilinlike fold (pdb code: 2HAQ) was chosen as a test case and two decoy sets (a) random sequences and (b) spliced sequences from other folds, both of the same length as 2HAQ (166 residues) were generated and threaded onto the 2HAQ backbone template (see Methods, section: Decoy sets and threading). Each decoy set consisted of 250 sequences. In addition to 2HAQ, two more sequences (native to the given fold, from pdb files: 3BO7_B and 3KOP_F) having sequence identities of 37% and 11% (with 2HAQ) respectively were also included, the latter to ascertain whether the method worked for sequences with low identities, yet mapping to the same fold. Surface contact networks of 2HAQ and 17 other close homologues (sequence identities upon structural alignment with 2HAQ > = 40%, rmsd < 1.5 Å) were rigorously analyzed (see Methods, section: Topological fold detection measures) to identify a subset of links in 2HAQ where each link was conserved in (at least 80% of) the homologues and thus could be regarded as a characteristic signature, representative of the fold. This fold specific subgraph, {Lcyp} consisting of 31 links between 36 nodes (mostly buried, hydrophobic residues) spans the three dimensional structure of the entire protein in terms of spatially connecting almost all the secondary structural elements (additional file 15: Table S7). They were also found to have appreciably high surface complementarity (average S_{m}: 0.608 (0.07), min: 0.48, max: 0.71) and overlap (average Ov: 0.15 (0.03), min: 0.1, max: 0.21). In order to characterize the corresponding subgraph of {Lcyp} in a threaded structure, all nodes in {Lcyp} were mapped in that structure (see Methods: Topological fold detection measures) and the corresponding nodes extracted along with all their interconnecting links. This (induced) subgraph (in the threaded structure) was then compared with {Lcyp} (serving as a template) by means of two complementary topological measures (snet, dnet) defined (see Methods, section: Topological fold detection measures) to quantify the compatibility of a sequence to a given fold. snet essentially evaluates the fraction of links in {Lcyp} that are conserved in the corresponding subgraph, whereas dnet estimates the dissimilarity in the topological pattern between {Lcyp} and the corresponding subgraph in terms of an abstract distance measure. Thus, for the corresponding subgraph extracted from the 2HAQ native structure, snet will attain its highest possible value of 1.00 and dnet will be very close to its lowest possible value of 0.00.
Statistical analysis involved computing snet, dnet and estimating their mean (μ) and standard deviation (σ) for the 250 decoys of each set. For the sequences 2HAQ, 3BO7_B and 3KOP_F, the same sequence was threaded 250 times (onto the backbone template of 2HAQ), each time with a randomly selected set of side chain conformers (see Methods, section: Decoy sets and threading).
Topological measures to discriminate between native and decoy sets  a case study on the cyclophilinlike fold
Category  Similarity (snet)  Distance (dnet)  

Mean  Min  Max  Mean  Min  Max  
Random Sequences  0.156 (0.093)  0.032  0.452  0.880 (0.081)  0.650  1.000 
Sequences from other folds  0.143 (0.097)  0.000  0.387  0.880 (0.079)  0.676  1.000 
2HAQ  0.405 (0.089)  0.129  0.677  0.688 (0.069)  0.486  0.909 
3BO7_B  0.390 (0.081)  0.194  0.613  0.734 (0.060)  0.558  0.880 
3KOP_F  0.307 (0.063)  0.032  0.484  0.766 (0.049)  0.639  0.969 
Validation of the topological measures (snet, dnet) used to compare fold specific subgraphs
PDB ID  Sequence identity (%)  Number of residues aligned  Rmsd (Å)  Similarity (snet)  Distance (dnet) 

2POE_A  39  152  1.2  0.806  0.390 
3BO7_B  37  152  1.5  0.742  0.489 
2NUL_A  37  147  1.8  0.774  0.385 
2OSE_A  30  153  2.1  0.548  0.614 
1ZX8_A  15  108  2.5  0.516  0.679 
3KOP_F  11  126  2.8  0.484  0.712 
2P0O_A  10  96  2.9  0.452  0.725 
Conclusions
The work presented here is based on the confluence of two related though distinct ideas, (1) some network topologies are preferred within protein interiors, leading to the concept of packing motifs and (2) the 'jigsaw puzzle' model can be successfully extended into the domain of protein contact networks
The implementation of both these ideas depends partly on representing the internal architecture of proteins in terms of surfaces rather than point atoms. It has been noted previously that use of surfaces improved the performance of a side chain (torsion) prediction test [46, 47] and provided simple well defined criteria to identify those contacts which definitely constrain interresidue geometry of the associating amino acid side chains [26]. Presumably these set of interactions could be playing a more critical role in sustaining the native fold. Networks based on surface contacts (with appropriate cut offs on S_{m} and Ov) is in effect a straightforward extension of the jigsaw puzzle model. In the search for compositional or geometrical bias, surface contact networks appear to be indispensable. In particular, triplet cliques composed exclusively of hydrophobic residues had a frequency 3 fold higher in ASCN than APCN starting from a comparable (total) number of triplet cliques. Further more, compositional preferences along with strong geometrical constraints were far better explored by surfaces than point atoms (additional file 16: Table S8).
One feature which appears to be more or less conserved in surface contact networks (irrespective of the cutoff criteria in surface complementarity and overlap) is the almost ubiquitous presence of smaller networks (310 nodes) in all proteins which probably coalesce to produce larger networks specific to the particular fold. Thus, the distribution in network sizes and topologies appear to favor a nucleationcondensation phenomenon [48, 49] in protein packing wherein open linear chains, closed triplet cliques and other closed ring topologies could serve as basic packing units which could either get linked or recruit neighboring residues to grow into networks of larger size. This notion of packing units led to the definition of 'packing motifs', which could serve as a 'basis set' in the assembly of extended graphs.
Since graphs ranging from 3 to more than 200 nodes have been detected in proteins, the concept of a 'basis set of motifs' should represent sets of similar topologies (along with their variations in terms of linear branching) rather than a rigid set of isolated unique graphs. This was the rationale behind the organization of motifs into families (or set of similar graphs with gradual addition of nodes following a path such that the core topology remains unaltered) and it soon became clear that some families were overwhelmingly preferred in protein topological space. These families emanated from the 'minimally connected' open linear chains and three residue cliques (regarded as clustering units) and cast their dominant influence in frequency distribution of motifs. Other families occurred with such abysmally low frequencies that they could be considered oddities rather than the rule. Thus, in accord with the inductive approach of the current work, it was felt that larger graphs (n > 10) would either fall into preexisting families or could be assembled by known motifs or their variants. This possibility was explored for networks of 15 nodes and the observations tended to support the hypothesis.
One major drawback of the present study is the lack of a computational method to analyze larger graphs. Although, starting from the novel numerical scheme (Figure 3) a suitable algorithm was able to identify all possible induced subgraphs of a given larger graph, yet manual intervention was indispensable to resolve the larger graph into an optimal set of constituent motifs. Despite its rigor the tedious visual method restricted the analysis to a small fraction of the network space. However, even with this handicap, the general trend appears to be unmistakable. It thus appears that the topological space available to protein contact networks is severely constrained with clearly defined preferences. A unique constraint is the definite upper bound found in the size of the maximal clique (n_{c} < = 4)  a property rarely observed in real world networks [50]. Obviously, this is due to the atomic environment in protein interiors that restricts the permissible number of edges a node can have. Thus, although only a limited portion of the topological space has been actively explored, the conclusions are expected to be significantly in the right direction.
The next step was to enquire whether packing motifs exhibited any preferences in terms of their constituent residues and geometry. For this, triplet cliques were selected due to their ubiquitous presence primarily as induced subgraphs embedded in larger graphs. It soon became evident that in the subspace of hydrophobic residues, regular trends of propensities favoring specific residues or their combination do indeed exist and certain geometrical features exhibit very strong constraints (especially the approximately equilateral triangle connecting the three residueorigins and the tilt angles of aromatic residues). What is perhaps notable is that these compositional and geometric preferences stand a possibility of detection only when statistical analysis is performed subsequent to the precise classification of motifs and the appropriate partitioning of the topological space. In other words, looking for preferences in case of a pooled set of three residue graphs or subgraphs without adequate classification/characterization is most likely to end in failure.
The most direct application of this study should be in the area of protein fold recognition which is to select a polypeptide chain belonging to a particular fold [21, 51–53] from a set of decoys. The most challenging aspect of this problem is to identify those chains consistent with a fold (represented by a set of main chain coordinates), even though identity upon alignment with the sequence (native to the three dimensional structure) is significantly low (< 20%). Preliminary calculations show that topological measures (snet and dnet) defined on surface contact networks are indeed able to discriminate sequences native to a particular fold (cyclophilinlike) from decoy sets. Although the threading procedure was fairly straightforward and the decoy sets rather limited, yet all indications appeared to suggest the robustness of the fold prediction method. Most probably the scores could be improved by the adoption of more sophisticated threading procedures which actually solves for the most optimal side chain packing arrangement, rather than averaging over a large number of random side chain conformers. However, the interesting fact is that the native sequences specific to a particular fold could be distinguished from the decoy sets, based on the statistics of the topological measures alone, despite having randomized side chain conformers. The utility of these functions to sort networks based on topological resemblance (with respect to a template) is also notable. Large scale improvement and application of these methods are currently being investigated.
Another fruitful area of research could be to explore the possibility of introducing triplet cliques into designed proteins to stabilize their packing analogous to the engineering of disulfide bridges, in order to improve thermal stability. A library of triplets exhaustively documenting their conformational, geometrical and topological features might be useful in this regard.
Thus in conclusion, it appears that out of innumerable topological possibilities, only a finite number are actually realized in protein contact networks which either are themselves or could be assembled from a limited set of preferred motifs. One such recurrent motif, the triplet clique, exhibits clear preferences in its constituent residues and very strong constraints with regard to certain geometrical features.
Methods
The Database
Initially, 918 unique protein crystal structures were selected from the protein data bank (RCSBPDB) [54] with a maximum R factor of 20% and a resolution cutoff of 2.0 Å. Upon sequence alignment of any two proteins from the database, in no case was their sequence identity greater than 30%. For oligomeric proteins the largest polypeptide chain was retained and for atoms with multiple occupancies, those with the highest occupancy were used in the calculations. In case of equal occupancy the first conformer was selected. Proteins with incomplete side chain atoms and those with missing stretches of amino acid residues were individually surveyed in RasMol [55]. If the missing stretch(s) or residue(s) involving incomplete side chain atoms was found to be either in the extremities (N/C terminal) of the chain or on completely exposed loop regions with no participation in interior packing, the protein was included in the database, otherwise rejected. The length of the chains ranged from 75 to 500 amino acid residues. The final database consisted of 719 polypeptide chains of which 18.5% was all α, 19.9%  all β, 32.5%  αβ and 29.1%  α+β (additional File 17: Table S9). The protein class for each chain was decided by visual examination in Rasmol and a search in the SCOP database [56]. 53 multidomain proteins were appropriately truncated and their domains allotted to the relevant class. The program Reduce [57, 58] was used to geometrically fix hydrogen atoms on the proteins prior to the calculations.
Burial ratio
The exposure of residues to solvent (probe radius 1.4 Å) was estimated by the ratio (burial) of solvent accessible areas (SAA) of the amino acid, X in the polypeptide chain to that of an identical residue located in a GlyXGly peptide fragment with a fully extended conformation. Residues that were completely (0.00 < = burial ratio < = 0.05) or partially buried (0.05 < burial ratio < = 0.3)) were only considered in the analysis.
Algorithm to construct networks
As is well known every network can be represented as a graph, G = (V, E) which formally consists of a set of vertices (or nodes) V and a set of edges (or links) E between them. Trivially a graph can contain one or more standalone nodes (a node which is not connected to any other node in the graph) and a subgraph is called a component [41] of the graph provided each node is connected at least to one other node of the graph. In protein contact networks to be defined, no standalone node was considered. Thus, in this context, 'graph' and 'component' were treated synonymously. In the present study, a nodal point stands for the side chain of a particular residue, and two types of networks were defined based on surfaces and point atoms. For the case of point atoms, if any two atoms located on two different side chains were within 3.8 Å of each other, the two representative nodes were connected by a link. The number of atomic contacts between two side chains was considered to be the weight of the connecting edge. The network spanning the entire protein was constructed by exhaustively searching for contacts in the neighborhood of buried residues until no more nodes could be included in the network. Thus a protein could have more than one contact network embedded within it with no common nodes between them. The smallest networks considered had three nodes. With the exception of glycine any residue could be represented by a node.
Van der Waals surface generation
The van der Waals surfaces for the proteins (including all hydrogen atoms) were sampled at 10 dots/Å^{2}, the atomic radii being assigned from the all atom molecular mechanics force field [59]. The details of the surface generation have been discussed elsewhere [26]. In case of disulphide bridges care was taken to remove the extra points due to the interpenetration of the van der Waals spheres of the covalently linked sulphur atoms. Thus, the entire surface of the polypeptide chain was sampled as an array of discrete area elements defined by their location (x,y,z) and the direction cosines (dl,dm,dn) of their normals.
Surface Complementarity
where N_{AB} is the number of points on the target (A) that have their nearest neighboring points on B and N_{A} is the total number of surface points for A. The surface complementarity for this patch involving A, B will henceforth be referred to as S_{m}^{A→B}. Contact between any two residues (target and neighbor) can now be defined in terms of surfaces (based on S_{m} and Ov). Any two residues (target: A, neighbor: B) are said to 'interact' with each other when S_{m}^{A→B}, Ov^{A→B} are greater than equal to 0.4 and 0.08 respectively. It will be noted that the measures of S_{m} and Ov are noncommutative, that is S_{m}^{A→B} , Ov^{A→B} are not necessarily equal to S_{m}^{B→A} and Ov^{B→A}. We formally define interresidue surface 'contact' when their 'interactions' are mutually reciprocal, that is both S_{m}^{A→B}, S_{m}^{B→A} and Ov^{A→B}, Ov^{B→A} simultaneously satisfy the interaction criteria. For any contact, S_{m} and Ov were taken to be the mean of (S_{m}^{A→B} , S_{m}^{B→A}) and (Ov^{A→B}, Ov^{B→A}) respectively. Similar to point atom contact networks a node in this case is also representative of the residue side chain (surface). Two nodes are connected by an edge when their corresponding residue surfaces are in 'contact'. Weight of such an edge was defined as , analogous to calculating the magnitude of two mutually orthogonal vector components. Based on the definitions given above such networks, henceforth referred to as 'surface contact networks', will also be undirected.
Thus, two distinct types of networks have been defined and used in this study (1) All Residue Surface Contact Network (ASCN) and (2) All Residue Point Atom Contact Network (APCN). All contact networks were represented computationally in terms of onezero adjacency matrices, (N×N, for a network of N nodes) where the matrix element a_{ ij }= 1 denotes node i to be connected to node j and 0 otherwise. Since both types of networks were undirected, these adjacency matrices were essentially symmetric. Based on the adjacency matrices the following network parameters were estimated:
Degree: defined as the number of edges emanating from a node.
where w_{ ij }is the weight of the edge linking the i^{ th }and the j^{ th }node and the summation is over all nodes (N) of the network [40].
Unweighted and weighted clustering coefficients
Expressions for these coefficients are defined as follows:
Unweighted
where k_{ i }is the degree of the i^{ th }node and {e_{ jh }} is the total number of actually existing connections among the set of nodes (taken pairwise, {j,h}) from the direct neighborhood of node i and ^{ ki }C_{ 2 }is the number of maximum possible connections within the same set [39].
Weighted
where the symbols have the same significance as given above and under identical conditions [40].
Cliquishness
Clique is an induced subgraph where every node is connected to every other node. In case of an undirected graph containing a clique of n nodes, the embedded clique should contain ^{n}C_{2} edges. On the other hand, a complete graph will have any two nodes connected to each other. In this analysis the term 'isolated clique' refers to such complete graphs. Order (or size i.e., the number of constituent nodes, n_{c}) of the maximal clique was searched progressively in all networks starting from triplets. Initially, a systematic search for all possible combinations of 3 nodes (from a network) was performed to identify the closed triplet cliques and on occurrence, n_{c} was set to 3. Then from the immediate neighborhood of a 3clique, each node was sampled to test for adjacencies with all three nodes of the clique. A new node, on satisfaction of this criterion, was added to the previous clique and n_{c} was increased by one. The search was continued till convergence.
Deviation from random topology
To estimate deviation from a random topology, unweighted and weighted clustering coefficients were individually averaged over all nodes in a network and were compared with the same measure obtained for random graphs of identical size. Following standard methods, first, the link density (L_{ d }) of a graph was estimated, defined as the ratio of the total number of actually existing edges in the graph and the number of maximum possible edges if it were a complete graph. Random graphs of identical size were generated by systematically calling each pair of nodes along with a random number seed and the pair was assigned a weighted connection if the random number was found to be lesser than the corresponding L_{ d }value obtained from the original graph. Weight of an edge was also assigned randomly, scaled appropriately to the values obtained from the observed contact networks.
Geometry of threenode packing motifs
The methodology of Singh and Thornton [45] was adopted to identify preferred modes of packing in terms of the specific geometry of interacting amino acid side chains. An internal right handed frame of reference was defined for all the hydrophobic residues based on their side chain atoms. Conventionally, the Z axis was taken to be normal to the principal plane defined by either the ring atoms (phenyl for Phe, Tyr and indole for Trp) for aromatic residues or a defined set of three side chain atoms (forming the fork) for branched chain amino acids (Val, Leu, Ile) (Figure 10).
To characterize the geometry of graphs or subgraphs consisting of three nodes, a plane, P_{triangle} was defined passing through the origins of the three internal frames of reference (Figure 11). The resulting triangle defined by connecting the three origins was characterized by three internal angles Ω_{1}, Ω_{2} and Ω_{3} and the lengths of the three sides of the triangle r_{12}, r_{13}, and r_{23.} A preferred right handed frame was placed at the centroid of this triangle such that the X axis (X_{tr}) points towards the origin of a preferred residue chosen according to the composition of the triplet, (see Results, section: Triplet Clique), the Z axis (Z_{tr}) taken normal to P_{triangle} and Y_{tr} = Z_{tr} × X_{tr} . Three interplanar tilt angles namely θ1_{ t }, θ2_{ t }and θ3_{ t }were then defined as angles subtended between Z_{tr} and the Z axes of the three residueinternal frames. Three additional swivel angles φ1_{ s }, φ2_{ s }, φ3_{ s }were further defined as those subtended by Z_{p} (the component of Z_{tr}, projected on residue XY planes) and the X axes of the three residueinternal frames. The distributions of these angles in appropriate bins were analyzed for their deviation from a random distribution by means of χ^{2}. The distribution in the angle subtended by two randomly oriented vectors has probability density given by sin θ' dθ'/2, where θ' is the angle between the vectors [45] whereas for two coplanar random vectors each bin should be equally populated. Thus, for a random distribution, the probability of θ1_{t}, θ2_{t}, θ3_{t} falls as a function of sin θ' dθ'/2 (threebin models for Phe and Tyr and sixbin models for Val, Leu, Ile, Trp: 30° bins) and each bin should be equally populated for φ1_{s}, φ2_{s} and φ3_{s} (sixbin models for Phe, Tyr, Trp, Val, Leu, Ile: 60° bins).
Packing density
The method is considered an improvement over previous algorithms due to the fact that cavities are critically distinguished and eliminated from the actual spaces between two molecular entities and also the neighboring surfaces are cut about non planar boundaries.
Propensity
where N_{ fC }is the number of motifs 'f' found in chains belonging to class C, N_{ f }is the number of motifs 'f' found in all classes, N_{ C }is the number of chains belonging to class C in the database and N is the total number of chains in the database. C stands for one of the classes (all α, all β, αβ, α+β).
where N_{ SC }is the number of networks of size S found in chains belonging to class C, N_{ S }is the number of networks of size S found in all classes and N_{ C }, N, C have same significance as above.
Decoy sets and threading
To study the application of surface contact networks in fold recognition, cyclophilinlike fold (pdb code: 2HAQ) was selected as the test case and two decoy sets were assembled, of 250 sequences each. The first set consisted of random sequences of the same length (166 residues) as 2HAQ and the second was composed of naturally occurring 166 residue stretches truncated from the N terminal end, from other folds. In general, the sequence identity of these decoys w.r.t. 2HAQ was less than 10% and no two sequences in each decoy set had identities greater than 15% between them. To determine whether the fold recognition method could identify sequences compatible with the same fold (cyclophilinlike) even though exhibiting low sequence identity (less than 20%) with 2HAQ, the method was tested on the sequence extracted from 3KOP_F (11%). In addition, another chain, 3BO7_B (37%) was also included. To simplify matters, 3KOP_F and 3BO7_B were purposely chosen as their native chain lengths were identical to that of 2HAQ.
To start with, the actual three dimensional coordinates of all the residue conformers as listed in Dunbrack's rotamer library [65] were generated. The main chain coordinates were extracted from 2HAQ and was considered to be representative of the cyclophilinlike fold. For threading any sequence onto this template, the main chain N, CA, C coordinates of the appropriate residue (to be threaded) were selected from the library and superposed onto the corresponding native coordinates. For every threaded residue, the rotamer was selected randomly from the possibilities present in the library. Since CA is a tetrahedral center, this procedure automatically superposes the CB atom as well. The root mean square deviations of N, CA, C, CB atoms of the threaded structures (w.r.t. 2HAQ native coordinates) were found to be less than 0.1 Å which vouched for the correctness of the method. For each residue position, the superposed side chain coordinates of the rotamer were then appended to the original main chain coordinates of the template. Subsequent to threading, each structure was energyminimized by 500 steps of Steepest Descents (SD) followed by 20000 steps of Adopted Basis NewtonRaphson (ABNR) method with a gradient tolerance (tolgrd) of 0.001 and a distance dependent dielectric multiplied by 4.0 using the CHARMM22 forcefield [66, 67]. The constant harmonic force parameter was set to 250.0 for N, CA, C and O atoms and 10.0 for CB to conserve the main chain three dimensional representation of the fold. Every structure was checked to have reasonably acceptable geometry using PROCHECK [68]. For 3KOP_F, 3BO7_B and the native sequence 2HAQ, the threading procedure was performed 250 times, each time with a different set of randomized rotamer combination. For each threaded structure, surface contact networks were generated as described previously (see Methods, section: Surface complementarity).
Topological fold detection measures
17 additional structures belonging to the cyclophilinlike fold were chosen from the protein data bank [54] which had greater than 40% sequence identity upon structural alignment with 2HAQ (PDB ID_Chain (rmsd (Å), sequence identity (%)): 1XO7_A (0.5, 74), 3ICH_A (0.8, 65), 2PLU_A (1.3, 63), 2X25_B (1.2, 61), 2CFE_A (1.2, 60), 1QOI_A (0.8, 57), 1A58_A (1.2, 57), 1IHG_A (1.2, 57), 2R99_A (1.3, 57), 1DYW_A (1.4, 57), 2HQJ_A (1.4, 57), 2CMT_A (1.2, 56), 3K2C_B (1.4, 54), 2GW2_A (0.8, 53), 2HE9_A (0.8, 53), 2FU0_A (1.3, 47), 1ZKC_A (1.2, 42)). Surface contact networks (at S_{m} > = 0.4, Ov > = 0.08) were generated for all the 17 native structures along with 2HAQ. Unlike networks defined while describing packing motifs, these networks could contain unconnected components and even isolated binary links. Here the primary emphasis was to represent a fold as a unique subset of relevant links, highly conserved amongst members of that fold. Pairwise structural alignment (using Dali Server [69]) with 2HAQ (considered to be the template) provided the mapping between the nodes of 2HAQ and each of the 17 homologous proteins. In case of insertionsdeletions or nonalignment, the node was considered to be absent in the related protein. Every link in the contact network of 2HAQ was searched systematically in the 17 homologues and counted for the number of times the corresponding (mapped) nodes were found to be present and connected. Only those links from 2HAQ were retained which were present in at least 80% of the other (17) homologues. This subset of links was considered to be representative of the cyclophilinlike fold and designated as {Lcyp}.
To test for fold compatibility of any sequence threaded onto 2HAQ, two complementary topological measures (snet, dnet) were defined based on {Lcyp} and the corresponding subgraph in the threaded structure. It should be noted that for the threaded structure, the specification of a node was identical to that of 2HAQ depending on its residue number.
Similarity
where N_{t} is the number of equivalent links from {Lcyp} found in the threaded structure and N_{s} is the total number of links in {Lcyp}.
Distance
where A(i,j) and A'(i,j) are the matrix elements of adjacency matrices A and A' based on 2HAQ and the threaded structure respectively and nL is the number of elements in the set E ∪ E' where E and E' are the sets of links corresponding to graphs A and A'. It can be shown that dnet (A, A') is formally a metric in a vector space (proof not given).
Languages/Softwares used
Codes for network generation and calculation of network parameters were developed in PERL (v.5.8). Surface generation and surface complementarity/overlap calculations were performed on a DECAlpha server with programs written in Fortran 90. Matlab (v.7.5) was used to analyze geometry. Networks were visually analyzed using Cytoscape [43] (v.2.6.2) and related crystal structures were surveyed in RasMol [55] (v.2.4.7.2) and PyMol [44] (v.1.3). The threading program was written in Fortran 90 and energyminimization was carried out using CHARMM [66]. Structural alignments were performed using DALI server [69].
List of Abbreviations used
 ASCN:

All Residue Surface Contact Network
 APCN:

All Residue Point Atom Contact Network.
Declarations
Acknowledgements and Funding
We convey our sincerest gratitude to Mr. Abhirup Bandyopadhyay (NIT, Durgapur), Prof. P.K.Mohanty (TCMP division, SINP) for many fruitful suggestions. We acknowledge the computational facilities available at SINP. We are also thankful to Prof. Dipak Dasgupta (Biophysics division, SINP) for his constant support during this project. The work was funded by the 'Chemical and Biophysical Approaches for Understanding Natural Processes' project of Saha Institute of Nuclear Physics.
Authors’ Affiliations
References
 Beasley JR, Hecht MH: Protein design: the choice of de novo sequences. J Biol Chem 1997, 272: 2031–2034. 10.1074/jbc.272.4.2031View ArticlePubMedGoogle Scholar
 Richards FM: The interpretation of protein structures total volume, group volume distributions and packing density. J Mol Biol 1974, 82: 1–14. 10.1016/00222836(74)905701View ArticlePubMedGoogle Scholar
 Godzika A, Kolinski A, Skolnick J: Topology fingerprint approach to the inverse protein folding problem. J Mol Biol 1992, 227: 227–238. 10.1016/00222836(92)90693EView ArticleGoogle Scholar
 Yue K, Dill KA: Inverse protein folding problem: designing polymer sequences. Proc Natl Acad Sci USA 1992, 89: 4163–4167. 10.1073/pnas.89.9.4163PubMed CentralView ArticlePubMedGoogle Scholar
 Kuhlman B, Baker D: Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci USA 2000, 97: 10383–10388.PubMed CentralView ArticlePubMedGoogle Scholar
 Ponder JW, Richards FM: Tertiary templates for proteins: use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol 1987, 193: 775–791. 10.1016/00222836(87)903585View ArticlePubMedGoogle Scholar
 Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D: Design of a novel globular protein fold with atomiclevel accuracy. Science 2003, 302: 1364–1368.View ArticlePubMedGoogle Scholar
 Taylor WR, Bartlett GJ, Chelliah V, Klose D, Lin K, Sheldon T, Jonassen I: Prediction of protein structure from ideal forms. Proteins 2008, 70: 1610–1619. 10.1002/prot.21913View ArticlePubMedGoogle Scholar
 Koh SK, Ananthasuresh GK, Vishveshwara S: A deterministic optimization approach to protein sequence design using continuous models. Int J Rob Res 2005, 24: 109–130. 10.1177/0278364905050354View ArticleGoogle Scholar
 Desjarlais JR, Handel TM: New strategies in protein design. Curr Opin Biotechnol 1995, 6: 460–466. 10.1016/09581669(95)80076XView ArticlePubMedGoogle Scholar
 Dahiyat BI, Sarisky CA, Mayo SL: De novo protein design: towards fully automated sequence selection. J Mol Biol 1997, 273: 789–796. 10.1006/jmbi.1997.1341View ArticlePubMedGoogle Scholar
 Dahiyat BI, Mayo SL: Probing the role of packing specificity in protein design. Proc Natl Acad Sci USA 1997, 94: 10172–10177. 10.1073/pnas.94.19.10172PubMed CentralView ArticlePubMedGoogle Scholar
 Lazar GA, Desjarlais JR, Handel TM: De novo design of the hydrophobic core of ubiquitin. Protein Sci 1997, 6: 1167–1178. 10.1002/pro.5560060605PubMed CentralView ArticlePubMedGoogle Scholar
 Goraj K, Renard A, Martial J: Synthesis, purification and initial structural characterization of octarellin, a de novo polypeptide modeled on the alpha/beta barrel packing. Protein Eng 1990, 3: 259–266. 10.1093/protein/3.4.259View ArticlePubMedGoogle Scholar
 Tanaka T, Kuroda Y, Kimura H, Kidokoro S, Nakamura H: Cooperative deformation of a de novo designed protein. Protein Eng 1994, 7: 969–976. 10.1093/protein/7.8.969View ArticlePubMedGoogle Scholar
 Offredi F, Dubail F, Kischel P, Sarinski K, Stern AS, Van de Weerdt C, Hoch JC, Prosperi C, Francois JM, Mayo SL, Martial JA: De novo backbone and sequence design of an idealized α/βbarrel protein: evidence of stable tertiary structure. J Mol Biol 2003, 325: 163–174. 10.1016/S00222836(02)012068View ArticlePubMedGoogle Scholar
 Desjarlais JR, Handel TM: Sidechain and backbone flexibility in protein core design. J Mol Biol 1999, 290: 305–318. 10.1006/jmbi.1999.2866View ArticlePubMedGoogle Scholar
 Summa CM, Rosenblatt MM, Hong JK, Lear JD, DeGrado WF: Computational de novo design, and characterization of an A_{2}B_{2}diiron protein. J Mol Biol 2002, 321: 923–938. 10.1016/S00222836(02)005892View ArticlePubMedGoogle Scholar
 Butterfoss GL, Kuhlman B: Computerbased design of novel protein structures. Annu Rev Biophys Biomol Struct 2006, 35: 49–65. 10.1146/annurev.biophys.35.040405.102046View ArticlePubMedGoogle Scholar
 Hu X, Wang H, Ke H, Kuhlman B: Computerbased redesign of a βsandwich protein suggests that extensive negative design is not required for de novo βsheet design. Structure 2008, 16: 1799–1805. 10.1016/j.str.2008.09.013PubMed CentralView ArticlePubMedGoogle Scholar
 Misura KMS, Morozov AV, Baker D: Analysis of anisotropic sidechain packing in proteins and application to highresolution structure prediction. J Mol Biol 2004, 342: 651–664. 10.1016/j.jmb.2004.07.038View ArticlePubMedGoogle Scholar
 Crick FHC: The packing of αhelices: simple coiled coils. Acta Crystallog 1953, 6: 689–697. 10.1107/S0365110X53001964View ArticleGoogle Scholar
 Bromberg S, Dill KA: Side chain entropy and packing in proteins. Protein Sci 1994, 3: 997–1009. 10.1002/pro.5560030702PubMed CentralView ArticlePubMedGoogle Scholar
 Brylinski M, Konieczny L, Roterman I: Fuzzyoildrop hydrophobic force fielda model to represent latestage folding (in silico) of lysozyme. J Biomol Struct Dyn 2006, 23: 519–528.View ArticlePubMedGoogle Scholar
 Brylinski M, Prymula K, Jurkowski W, Kochanczyk M, Stawowczyk E, Konieczny L, Roterman I: Prediction of functional sites based on the fuzzy oil drop model. PLoS Comput Biol 2007, 3: e94. 10.1371/journal.pcbi.0030094PubMed CentralView ArticlePubMedGoogle Scholar
 Banerjee R, Sen M, Bhattacharya D, Saha P: The Jigsaw Puzzle Model: Search for Conformational Specificity in Protein Interiors. J Mol Biol 2003, 333: 211–226. 10.1016/j.jmb.2003.08.013View ArticlePubMedGoogle Scholar
 Plaxco KW, Simons KT, Baker D: Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 1998, 277: 985–994. 10.1006/jmbi.1998.1645View ArticlePubMedGoogle Scholar
 Vendruscolo M, Paci E, Dobson CM, Karplus M: Three key residues from a critical contact network in a protein folding transition state. Nature 2001, 409: 641–645. 10.1038/35054591View ArticlePubMedGoogle Scholar
 Greene LH, Higman VA: Uncovering Networks within protein structures. J Mol Biol 2003, 334: 781–791. 10.1016/j.jmb.2003.08.061View ArticlePubMedGoogle Scholar
 Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, Venger I, Pietrokovski S: Network analysis of protein structures identifies functional residues. J Mol Biol 2004, 344: 1135–1146. 10.1016/j.jmb.2004.10.055View ArticlePubMedGoogle Scholar
 Punta M, Rost B: Protein folding rates estimated from contact predictions. J Mol Biol 2005, 348: 507–512. 10.1016/j.jmb.2005.02.068View ArticlePubMedGoogle Scholar
 Brinda KV, Vishveshwara S: A network representation of the protein structures: implications for protein stability. Biophys J 2005, 89: 4159–4170. 10.1529/biophysj.105.064485PubMed CentralView ArticlePubMedGoogle Scholar
 Aftabuddin M, Kundu S: Hydrophobic, Hydrophilic, and Charged Amino Acid Networks within Protein. Biophys J 2007, 93: 225–231. 10.1529/biophysj.106.098004PubMed CentralView ArticlePubMedGoogle Scholar
 Bagler G, Sinha S: Assortative mixing in protein contact networks and protein folding kinetics. Bioinformatics 2007, 23: 1760–1767. 10.1093/bioinformatics/btm257View ArticlePubMedGoogle Scholar
 Li J, Wang J, Wang W: Identifying folding nucleus based on residue contact networks of proteins. Proteins 2008, 71: 1899–1907. 10.1002/prot.21891View ArticlePubMedGoogle Scholar
 Vendruscolo M, Dokholyan NV, Paci E, Karplus M: Smallworld view of the amino acids that play a key role in protein folding. Phys Rev E 2002, 65: 061910–1061910–4.View ArticleGoogle Scholar
 Atilgan AR, Akan P, Baysal C: SmallWorld Communication of Residues and Significance for Protein Dynamics. Biophys J 2004, 86: 85–91. 10.1016/S00063495(04)740862PubMed CentralView ArticlePubMedGoogle Scholar
 Bagler G, Sinha S: Network properties of protein structures. Physica A 2005, 346: 27–33. 10.1016/j.physa.2004.08.046View ArticleGoogle Scholar
 Watts DJ, Strogatz SH: Collective dynamics of 'smallworld' networks. Nature 1998, 393: 440–442. 10.1038/30918View ArticlePubMedGoogle Scholar
 Barrat A, Barthelemy M, PastorSatorras R, Vespignani A: The architecture of complex weighted networks. Proc Natl Acad Sci USA 2004, 101: 3747–3752. 10.1073/pnas.0400087101PubMed CentralView ArticlePubMedGoogle Scholar
 Harary F: Graphs. In Graph Theory. AddisonWesley Publishing company Inc, USA & Narosa Publishing House, New Delhi, India; 2001:10–13. 10^{th}Reprint 10th ReprintGoogle Scholar
 Cheriyan J, Maheshwari SN: Finding nonseparating induced cycles and independent spanning trees in 3connected graphs. J Algorithms 1988, 9: 507–537. 10.1016/01966774(88)900156View ArticleGoogle Scholar
 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13: 2498–2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
 DeLano WL: The PyMOL Molecular Graphics System[http://www.pymol.org/]
 Singh J, Thornton JM: The interaction between phenylalanine rings in proteins. FEBS Lett 1985, 191: 1–6. 10.1016/00145793(85)809820View ArticleGoogle Scholar
 Liang S, Grishin NV: Sidechain modeling with an optimized scoring function. Protein Sci 2002, 11: 322–331.PubMed CentralView ArticlePubMedGoogle Scholar
 Liang S, Grishin NV: Effective scoring function for protein sequence design. Proteins 2004, 54: 271–281.View ArticlePubMedGoogle Scholar
 Itzhaki LS, Otzen DE, Fersht AR: The structure of the transition state for folding of chymotrypsin inhibitor 2 analyzed by protein engineering methods: evidence for a nucleationcondensation mechanism for protein folding. J Mol Biol 1995, 254: 260–288. 10.1006/jmbi.1995.0616View ArticlePubMedGoogle Scholar
 Fersht AR: Optimization of rates of protein folding: the nucleationcondensation mechanism and its implications. Proc Natl Acad Sci USA 1995, 92: 10869–10873. 10.1073/pnas.92.24.10869PubMed CentralView ArticlePubMedGoogle Scholar
 Du N, Wu B, Xu L, Wang B, Pei X: A Parallel Algorithm for Enumerating All Maximal Cliques in Complex Network. Proceedings of the Sixth IEEE International Conference on Data Mining  Workshops (ICDMW'06) 2006, 324.Google Scholar
 Zhou H, Zhou Y: Distancescaled, finite idealgas reference state improves structurederived potentials of mean force for structure selection and stability prediction. Protein Sci 2002, 11: 2714–2726.PubMed CentralView ArticlePubMedGoogle Scholar
 Shen MY, Sali A: Statistical potential for assessment and prediction of protein structures. Protein Sci 2006, 15: 2507–2524. 10.1110/ps.062416606PubMed CentralView ArticlePubMedGoogle Scholar
 Rykunov D, Fiser A: New statistical potential for quality assessment of protein models and a survey of energy functions. BMC bioinformatics 2010, 11: 128. 10.1186/1471210511128PubMed CentralView ArticlePubMedGoogle Scholar
 Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The protein data bank: a computer based archival file for macromolecular structures. J Mol Biol 1977, 112: 535–542. 10.1016/S00222836(77)802003View ArticlePubMedGoogle Scholar
 Sayle RA, MilnerWhite EJ: RASMOL: biomolecular graphics for all. Trends Biochem Sci 1995, 20: 374–376. 10.1016/S09680004(00)890805View ArticlePubMedGoogle Scholar
 Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of protein database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536–540.PubMedGoogle Scholar
 Word JM, Lovell SC, Richardson JS, Richardson DC: Asparagine and glutamine: using hydrogen atom contacts in the choice of sidechain amide orientation. J Mol Biol 1999, 285: 1735–1747. 10.1006/jmbi.1998.2401View ArticlePubMedGoogle Scholar
 Reduce[http://kinemage.biochem.duke.edu/software/reduce.php]
 Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM Jr, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA: A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J Am Chem Soc 1995, 117: 5179–5197. 10.1021/ja00124a002View ArticleGoogle Scholar
 Lawrence MC, Colman PM: Shape complementarity at protein/protein interfaces. J Mol Biol 1993, 234: 946–950. 10.1006/jmbi.1993.1648View ArticlePubMedGoogle Scholar
 Gerstein M, Tsai J, Levitt M: The volume of atoms on the protein surface: calculated from simulation, using voronoi polyhedra. J Mol Biol 1995, 249: 955–966. 10.1006/jmbi.1995.0351View ArticlePubMedGoogle Scholar
 Rother K, Hildebrand PW, Goede A, Gruening B, Preissner R: Voronoia: analyzing packing in protein structures. Nucleic Acids Res 2009, 37: D393D395. 10.1093/nar/gkn769PubMed CentralView ArticlePubMedGoogle Scholar
 Voronoia[http://bioinformatics.charite.de/voronoia/index.php?site=download]
 Goede A, Preissner R, Frommel C: Voronoi cell: new method for allocation of space among atoms: elimination of avoidable errors in calculation of atomic volume and density. J Comput Chem 1997, 18: 1113–1123. 10.1002/(SICI)1096987X(19970715)18:9<1113::AIDJCC1>3.0.CO;2UView ArticleGoogle Scholar
 Dunbrack RL, Karplus M Jr: A backbone dependent rotamer library for proteins: application to sidechain prediction. J Mol Biol 1993, 230: 543–571. 10.1006/jmbi.1993.1170View ArticlePubMedGoogle Scholar
 Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M: CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations. J Comp Chem 1983, 4: 187–217. 10.1002/jcc.540040211View ArticleGoogle Scholar
 MacKerell AD Jr, Bashford D, Bellott M, Dunbrack RL Jr, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, JosephMcCarthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE III, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, WiorkiewiczKuczera J, Yin D, Karplus M: AllAtom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J Phys Chem B 1998, 102: 3586–3616. 10.1021/jp973084fView ArticlePubMedGoogle Scholar
 Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 1993, 26: 283–291. 10.1107/S0021889892009944View ArticleGoogle Scholar
 Holm L, Rosenström P: Dali server: conservation mapping in 3D. Nucl Acids Res 2010, 38: W545–549. 10.1093/nar/gkq366PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.