Adjusting protein graphs based on graph entropy
 ShengLung Peng^{1}Email author and
 YuWei Tsay^{1}
https://doi.org/10.1186/1471210515S15S6
© Peng and Tsay; licensee BioMed Central Ltd. 2014
Published: 3 December 2014
Abstract
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored proteingraph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.
Background
Graph theory is now widely used in information theory, combinatorial optimization, structural biology, chemical molecule, and many other fields. Graph similarity measuring is a practical approach in various fields. When graphs are used to represent of structured objects, the problem of measuring similarities between objects becomes one of computing similarities between graphs [1]. Protein remodeling is another field wherein multipledomains within structures are considerably complicated.
It is believed that proteins are important molecules for living organisms. In fact, they are essential parts of organisms and participate in almost every process within cells. A protein contains at least one linear chain of amino acid residues called a polypeptide. By various synthesis, e.g., biosynthesis and chemical synthesis, a polypeptide is folded into a unique 3dimensional structure. Usually, the structure of a protein determines its biological function performed in organisms. Knowledge of a protein structure can help us understand biological functions and evolution. Measuring protein similarities according to 3dimensional structures of proteins provides a valuable tool for evaluating proteins with low sequence similarities when evolutionary relations among proteins cannot be detected by sequence alignment techniques. To perform a structural comparison of molecules, accurate information of two superimposed protein structures must be obtained. However, optimizing these two quantities simultaneously is difficult. Unlike the sequence alignment problem, the structural alignment problem has not even been classified as solvable.
For decades, studies have attempted to define topological relations and notations on protein structures, a schematic description is essentially expected to describe its topology. Mathematical formulations of structural patterns can facilitate the composition in a polypeptide chain. A schematic description has the advantage of simplicity, making the implementation of graph theory as an alternative approach possible [2]. By selectively ignoring protein structural features, it has the potential to detect further homologous relationships based on various geometric methods and motivations.
The structure of a protein can be regarded as a conformation with various local elements (e.g., helixes, sheets) and forces (e.g., Van der Waal's forces, hydrogen bonds), folding into its specific characteristic and functional structure. With the help of graph transformation, folded polypeptide chains can be represented as a graph using several mapping rules. Proteins contain complex relationships in its polymer: residual reactions, covalent interactions, peptide bonding, and hydrophobic packing are essential parts in structural determination. The intention is to transform a protein structure into a graph. Formally, a graph transforming system consists of a set of graph rewrite rules: L → R, which L is called pattern graph and R is called replacement graph [3]. It is the key operation in graph transformation.
Protein Remodeling
Recent studies for constructing protein graphs.
Proteins have been represented in various ways using different levels of detail. The conformation of protein structure has been shown to be determined geometrically by various constraints [5]. Therefore, the most common method for protein modeling is to reserve its topological relationship in graphs. From the perspective of graph theory, a simplified representation of protein structure aims attention at connectivity patterns. It helps to go into details on interacted relation within a polypeptide folding. In brief, the geometricbased protein modeling is to refine its edges (relations) among vertices (objects), adapting the information from interobject distances for all pairs of objects.
Comparing with geometric relationship, chemical properties provides a more complicated description in the protein graph model. Amino acid contain various chemical properties, including electrostatic charge, hydrophobicity, bonding type, size and specific functional groups [6]. By giving values to edges and nodes in graph, each different labeled component that varies between the various types of chemical relation.
Entropy
Entropy defines a quantitative equilibrium property within a system and it implies the principle of disorder from the second law of thermodynamics [7]. It is particularly important in describing how energy is applied and transferred in an isolated system. The higher the disorder, the greater the entropy of the system [8]. Similarly, this concept is presented in life. As we known, life is composed of many cells, tissues, and organs from the vital element of protein. Since proteins are biochemical compounds, consisting of one or more polypeptide chains, the arrangement of protein polymers are assumed to be in a compact state, according to its backbone dihedral angles and side chain rotamers. This is called conformational entropy. There is considerable evidence to prove that the same observation can be applied to a protein graph model. In such a case, a graph model should also follow the second law of thermodynamics.
In graph theory, the entropy of a graph is usually defined by its degree sequence. For example, we consider the cycle with 4 vertices, i.e., C_{4}. The degree sequence is (2, 2, 2, 2). Thus, the p_{ i } for each vertex v_{ i } is $\frac{2}{8}=0.25$. By definition, I(C_{4}) = −4 × 0.25 × log_{2}(0.25) = 2.
Methods
In this section, we extend the concept of graph entropy to measuring protein graphs. To demonstrate the calculation of graph entropy exemplarily, peptide chains of MHC (Major Histocompatibility Complex) are selected as the materials for examining the utilities of graph entropy.
Graph entropy
Edge adjustment
By the definition of I′(G), its value is not increased monotonously if the density of G is increased. Thus, we have the following cases to determine how to adjust the graph. Assume that G is the current graph and I′(G) = x. Let I′(G − e) = y and I′(G + e) = z where G − e means that we remove the longest edge from G and G + e means that we add a shortest nonedge to G.

Case 1: y = 0 It means that after this edge is removed, G is no longer a connected graph.

Case 2: z > × > y It means that by adding a new edge, G will become more stable.

Case 3: x > z > y It means that G is stable enough.

Case 4: y > × > z It means that by removing an old edge, G will become more stable.
Graph spectra
Given two graphs G_{ A } = (V_{ A }, E_{ A }) and G_{ B } = (V_{ B }, E_{ B }), the graph matching problem is to find a oneone mapping f : V_{ A } → V_{ B } such that if (u, v) ∈ E_{ A }, then the possibility of (f(u), f (v)) ∈ E_{ B } is as higher as possible. Therefore, numerous attempts have been made on graph similarity to show its efficiency in recent years. In [11], it revealed that the problem of graph matching may be divided into different types depended on their levels. According to the graph scoring, it qualitatively measures a mutual dependence of two objects [12]. Generally, a value of similarity ranges between 0 and 1, from dissimilar to identical.
Obviously, V_{ X } = 4 < V_{ Y } = 5. Therefore, when spectra are different sizes, the smaller one may be padded with zero values to equalize the size of G_{ X } and G_{ Y } . By definition, the spectra λ_{ X } and λ_{ Y } can be obtained, i.e., λ_{ X } = [ 4, 4, 2, 0, 0 ] and λ_{ Y } = [ 4, 4, 1, 1, 0 ]. The similarity between G_{ X } and G_{ Y } can be simply measured by the Euclidean distance of λ_{ X } and λ_{ Y } . In this case, the similarity of G_{ X } and G_{ Y } is 1.414.
Results and discussion
In this experiment, we validated the remodeling function of the Pgraph by using extended graph entropy to verify the stability of a given Pgraph. For the Pgraph construction, please refer to [15]. Thus, we were interested in only the impact of connectivity on protein structural similarities. Various types of MHC were chosen as the material to verify the verification of proposed method: 1HDM, 1K5N, 2ENG, 1VCA, 1ZXQ, 1UXW, 1A2Y, 3ARD, 2Q3Z, and 2CRY. MHC, as an immune system in most vertebrates, encodes for a small complex cell surface protein. It is also known for HLA (Human Leukocyte Antigen), one of the most intensively studied genes in human [16]. Due to a great diversity of microbes in the environment, MHC genes vary widely its peptide through several mechanisms [17]; this is also the major reason why MHC proteins were selected as materials for this studies.
Pgraph comparison
Let G = (V, E) be the Pgraph after remodeling from the construction proposed by [15]. Vertices of V in G are created according to the DSSP. Under this metric, a protein secondary structure is represented by a single letter code, Hhelix (containing G, H, and I), Thydrogen turn (containing T, E, and B), or Ccoiled (containing only C). For controlling one variable in this experiment, let the edge set E of G be changed from a specific range.
A selected proteins with corresponding extended entropies.
PID  − 3e  − 2e  − 1e  AVG  + 1e  + 2e  + 3e 

1HDM  3.343  3.396  3.563  3.319  3.705  3.845  3.765 
Dens  0.357  0.393  0.464  0.524  0.535  0.607  0.643 
1K5N  4.305  5.545  5.564  4.537  4.614  4.732  3.787 
Dens  0.436  0.457  0.475  0.509  0.527  0.564  0.571 
2ENG  4.000  4.091  4.144  4.212  4.294  4.344  4.480 
Dens  0.422  0.444  0.467  0.489  0.511  0.533  0.578 
1VCA  3.106  3.171  3.254  3.221  3.249  3.467  3.493 
Dens  0.381  0.429  0.476  0.524  0.571  0.619  0.667 
1ZXQ  3.494  3.551  3.641  3.709  3.774  3.712  3.907 
Dens  0.429  0.464  0.500  0.535  0.571  0.607  0.643 
1UXW  5.562  5.563  5.646  5.764  5.855  5.950  6.079 
Dens  0.456  0.463  0.478  0.500  0.515  0.529  0.551 
1A2Y  NaC  NaC  2.414  2.507  2.581  2.512  2.510 
Dens      0.400  0.500  0.600  0.700  0.800 
3ARD  4.460  4.641  4.698  4.756  4.801  4.860  4.932 
Dens  0.424  0.470  0.485  0.500  0.515  0.530  0.554 
2Q3Z  6.611  6.730  6.775  6.834  6.885  6.996  7.302 
Dens  0.474  0.486  0.493  0.503  0.511  0.525  0.547 
2CRY  NaC  NaC  NaC  NaN  NaN  NaN  NaN 
Dens        0.667  1.000  1.000  1.000 
The relationship between E and I′(G) is as follows. First, when the density in G increases, the graph G appears to go from sparse to dense. However, its extended entropy does not increase completely with its density. It seems a little anomalous in this appearance. Second, the edge set in protein remodeling issue can be determined from its extended entropy. By definition, the Pgraph G should be a connected graph. Once the G becomes a disconnected graph, we cannot decide its entropy. For example, 1A2Y is not a connected graph when the density is lower than 0.400. Third, E appears to be considerably related to V in graph entropy. Consider the Pgraph 2CRY as another example. If a protein remodeling function adapts a specific value on the basis of its geometrical edge, then it might be an error to assume a fixed value as a criterion. This is an essential fact to stress. It is worth pointing out that the construction of a Pgraph is limited by V .
Pgraph verification
A comparison of protein structure remodelings.
PID  1K5N  2CRY  1VCA  2Q3Z  1ZXQ  1A21  2ENG  1UXW  1A2Y  3ARD  

1HDM  Old  7.93  23.36  15.68  24.01  13.74  6.54  12.57  7.92  5.75  8.27 
New  7.75  21.12  13.87  23.67  12.11  5.64  11.03  7.25  5.41  7.79  
Result  +  +  +  +  +  +  +  +  +  +  
1K5N  Old  ·  26.58  19.55  26.91  18.02  14.65  17.69  20.44  18.41  25.72 
New  ·  23.70  17.39  21.13  15.99  12.83  15.84  17.17  16.94  23.64  
Result  ·  +  +  +  +  +  +  +  +  +  
2CRY  Old  ·  ·  14.87  12.33  17.13  14.39  15.62  19.33  6.81  18.42 
New  ·  ·  12.91  34.10  14.92  12.45  17.54  19.35  5.17  19.63  
Result  ·  ·  +  −  +  +  −  =  +  −  
1VCA  Old  ·  ·  ·  17.71  5.39  4.83  7.75  11.42  5.45  12.80 
New  ·  ·  ·  29.68  4.47  3.21  6.82  10.07  4.83  11.67  
Result  ·  ·  ·    +  +  +  +  +  +  
2Q3Z  Old New  ·  ·  ·  ·  27.57 26.31  29.30 28.35  23.46 21.11  25.45 24.52  38.30 36.72  24.14 23.00 
Result  ·  ·  ·  ·  +  +  +  +  +  +  
1ZXQ  Old New  ·  ·  ·  ·  ·  3.98 3.41  7.96 7.49  10.52 9.67  6.21 6.53  9.14 8.87 
Result  ·  ·  ·  ·  ·  +  +  +  −  +  
1A21  Old  ·  ·  ·  ·  ·  ·  6.24  12.76  7.37  12.85 
New  ·  ·  ·  ·  ·  ·  5.41  11.38  6.91  11.06  
Result  ·  ·  ·  ·  ·  ·  +  +  +  +  
2ENG  Old  ·  ·  ·  ·  ·  ·  ·  4.65  11.42  14.19 
New  ·  ·  ·  ·  ·  ·  ·  4.17  10.39  13.82  
Result  ·  ·  ·  ·  ·  ·  ·  +  +  +  
1UXW  Old  ·  ·  ·  ·  ·  ·  ·  ·  16.24  5.71 
New  ·  ·  ·  ·  ·  ·  ·  ·  15.41  3.45  
Result  ·  ·  ·  ·  ·  ·  ·  ·  +  +  
1A2Y  Old  ·  ·  ·  ·  ·  ·  ·  ·  ·  12.24 
New  ·  ·  ·  ·  ·  ·  ·  ·  ·  11.41  
Result  ·  ·  ·  ·  ·  ·  ·  ·  ·  + 
CATH codes for the selected macromolecules.
PID  Domain  C  A  T  H  S  O  L  I  D 

1HDM  A2  2  60  40  10  152  1  1  1  1 
B2  2  60  40  10  137  1  2  1  1  
1K5N  A2  2  60  40  9  1  1  1  1  1 
2ENG  A2  2  40  40  10  1  1  1  1  1 
1VCA  A1  2  60  40  10  135  1  1  1  1 
A2  2  60  40  10  62  1  1  1  1  
1ZXQ  A1  2  40  40  10  123  1  1  1  1 
A2  2  40  40  10  121  2  1  1  1  
1UXW  A2  2  60  40  10  9  1  1  1  1 
B1  2  60  40  10  3  1  1  1  1  
1A2Y  A1  2  60  40  10  8  2  2  1  1 
B1  2  60  40  10  36  2  1  1  1  
3ARD  C1  2  60  40  10  18  2  3  2  1 
D1  2  60  40  10  21  4  12  1  1 
Program and environment
The procedure for computing the extended entropy for a Pgraph was implemented and has been tested with the MHC PDB dataset. The environment was running under 2 Ghz PC with 512 MB of main memory with Linux2.6.111.1369. The implementation was written using Bash3.00.16(1) and Octave3.0.0.
Conclusion
In this paper, we proposed a measurement to determine graph stability for protein structure remodeling based on graph entropy. Our modified entropy validation shows a positive result for protein structural comparison. This graphbased approach offers a practical concept to support protein structural alignment.
Declarations
Acknowledgements
This work is supported in part by the National Science Council, Taiwan, under the Grant No. NSC 1012221E259004.
Declarations
Publication charges for this article have been funded by the authors.
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 15, 2014: Proceedings of the 2013 International Conference on Intelligent Computing (ICIC 2013). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S15.
Authors’ Affiliations
References
 Bunke H: Graph Matching: Theoretical Foundations, Algorithms, and Applications. Proc Vision Interface 2000. 2000, 21:Google Scholar
 Gilbert D, Westhead DR, Nagano N, Thornton JM: Motifbased searching in TOPS protein topology databases. Bioinformatics. 1999, 15 (4): 317326. 10.1093/bioinformatics/15.4.317.View ArticlePubMedGoogle Scholar
 Ehrig H, Engels G, Kreowski H: Handbook of Graph Grammars and Computing by Graph Transformation: Applications, Languages and Tools. 1997, World Scientific Publishing CompanyGoogle Scholar
 Vishveshwara S, Brinda K, Kannan N: Protein Structure: Insights from Graph Theory. Journal of the Comp Chem. 2002, 1: 187211. 10.1142/S0219633602000117.View ArticleGoogle Scholar
 Lund O, Hansen J, Brunak S, Bohr J: Relationship between Protein Structure and Geometrical Constraints. Protein Science: a Publication of the Protein Society. 1996, 5 (11): 22172225. 10.1002/pro.5560051108.View ArticleGoogle Scholar
 Nelson DL, Cox MM: Lehninger Principles of Biochemistry. 2004, Freeman, 4Google Scholar
 Shannon C: Prediction and Entropy of Printed English. Bell Systems Technical Journal. 1951, 30: 5064. 10.1002/j.15387305.1951.tb01366.x.View ArticleGoogle Scholar
 Chang R: Physical Chemistry for the Biosciences. 2005, University ScienceGoogle Scholar
 Simonyi G: Graph Entropy: a Survey. Combinatorial Optimization. 1995, 20: 399441.Google Scholar
 Dehmer M, EmmertStreib F: Structural Information Content of Networks: Graph Entropy based on Local Vertex Functionals. Computational Biology and Chemistry. 2008, 32 (2): 131138. 10.1016/j.compbiolchem.2007.09.007.View ArticlePubMedGoogle Scholar
 Zager LA, Verghese GC: Graph similarity scoring and matching. Applied Mathematics Letters. 2008, 21: 8694. 10.1016/j.aml.2007.01.006.View ArticleGoogle Scholar
 Rand WM: Objective Criteria for the Evaluation of Clustering Methods. J Amer Statistical Assoc. 1971, 66 (336): 846850. 10.1080/01621459.1971.10482356.View ArticleGoogle Scholar
 Brouwer AE, Haemers WH: The Gewirtz graph: an exercise in the theory of graph spectra. Eur J Comb. 1993, 14: 397407. 10.1006/eujc.1993.1044.View ArticleGoogle Scholar
 Cvetkovic DM, Doob M, Sachs H, Cvetkovi&cacute M, Horstlistprice : Spectra of Graphs: Theory and Applications, 3rd Revised and Enlarged Edition. 1998, Vch Verlagsgesellschaft MbhGoogle Scholar
 Hsu CH, Peng SL, Tsay YW: An Improved Algorithm for Protein Structural Comparison based on Graph Theoretical Approach. Chiang Mai Journal of Science. 2011, 38 (2): 7181.Google Scholar
 Goodsell DS: The Machinery of Life. 2009, Springer, 2nd ed. editionView ArticleGoogle Scholar
 Pamer E, Cresswell P: Mechanisms of MHC class Irestricted antigen processing. Annual Review of Immunology. 1998, 16 (10): 323358.View ArticlePubMedGoogle Scholar
 Berman HM, Westbrook J, Feng Z: The Protein Data Bank. Nucl Acids Res. 2000, 28: 235242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
 Peng SL, Tsay YW: On the Usage of Graph Spectra in Protein Structural Similarity. Journal of Computers. 2012, 23: 95102.Google Scholar
 Huan J, Bandyopadhyay D, Wang W, Snoeyink J, Prins J, Tropsha A: Comparing Graph Representations of Protein Structure for Mining Familyspecific Residuebased Packing Motifs. J Computational Biology. 2005, 12 (6): 657671. 10.1089/cmb.2005.12.657.View ArticleGoogle Scholar
 Peng SL, Tsay YW: Measuring Protein Structural Similarity by Maximum Common Edge Subgraphs. Advanced Intelligent Computing Theories and Applications. 2010, 100107. LNCS, 6216Google Scholar
 Canutescu A, Shelenkov A, Dunbrack R: A Graphtheory Algorithm for Rapid Protein Sidechain Prediction. Protein Science. 2003, 12 (9): 20012014. 10.1110/ps.03154503.PubMed CentralView ArticlePubMedGoogle Scholar
 Samudrala R, Moult J: A Graphtheoretic Algorithm for Comparative Modeling of Protein Structure. J Mol Biology. 1998, 279: 279287.View ArticleGoogle Scholar
 Borgwardt K, Ong C, Sch¨onauer S, Vishwanathan SN, Smola A, Kriegel HP: Protein Function Prediction via Graph Kernels. Bioinformatics. 2005, 21 (suppl 1): i47i56. 10.1093/bioinformatics/bti1007.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.