Comparative analysis of thermophilic and mesophilic proteins using Protein Energy Networks
© Vijayabaskar and Vishveshwara. 2010
Published: 18 January 2010
Skip to main content
© Vijayabaskar and Vishveshwara. 2010
Published: 18 January 2010
Thermophilic proteins sustain themselves and function at higher temperatures. Despite their structural and functional similarities with their mesophilic homologues, they show enhanced stability. Various comparative studies at genomic, protein sequence and structure levels, and experimental works highlight the different factors and dominant interacting forces contributing to this increased stability.
In this comparative structure based study, we have used interaction energies between amino acids, to generate structure networks called as Protein Energy Networks (PENs). These PENs are used to compute network, sub-graph, and node specific parameters. These parameters are then compared between the thermophile-mesophile homologues.
The results show an increased number of clusters and low energy cliques in thermophiles as the main contributing factors for their enhanced stability. Further more, we see an increase in the number of hubs in thermophiles. We also observe no community of electrostatic cliques forming in PENs.
In this study we were able to take an energy based network approach, to identify the factors responsible for enhanced stability of thermophiles, by comparative analysis. We were able to point out that the sub-graph parameters are the prominent contributing factors. The thermophiles have a better-packed hydrophobic core. We have also discussed how thermophiles, although increasing stability through higher connectivity retains conformational flexibility, from a cliques and communities perspective.
Proteins are macromolecules that preserve their structural integrity to perform functions. Thermophilic proteins function at higher temperatures than normal life forms. Although they are structurally, functionally and most instances sequentially homologous to their mesophilic partners, they have optimal catalytic activity above 60°C . The comparative studies of thermophiles and mesophiles, to identify the factors contributing to the stability of the thermophiles, have been carried out at different levels from the genomic sequence level , structure level  to experimental elucidations . These studies indicate that thermophilic proteins have good hydrophobicity with propensity towards branched side chains, better packing with fewer loops and less cavities, more helical content, increased hydrogen bonding, and high occurrence of charged residues resulting in high electrostatic interactions. Although we have different views on the forces contributing to stability, we do not have a consolidated view of them .
Protein Structure Networks (PSNs) have been used extensively to understand the stability in protein structures [5, 6]. Protein structure is a resultant of complex intermolecular interactions. PSNs are convenient because this complexity is simplified as edges between nodes. Earlier, PSNs have considered contact based parameters to define edges [5–7].
In this study we have used energies to construct structure networks, known as Protein Energy Networks (PENs), for the first time. Since different types of interactions manifest eventually as interacting energies, we have tried to remove the ambiguities of defining each interaction (VdW, electrostatics, hydrogen bonding) separately, by considering energies calculated using classical force fields to define edges. We were also able to define Lennard-Jones dominant interaction and electrostatics dominated interaction regions in PENs, details of which will be presented elsewhere (work in progress).
In this study we have simulated twelve thermophile-mesophile pairs to obtain their equilibrium ensembles. These ensembles were then used to generate PENs of each protein. Parameters representing the whole protein network such as largest connected component, and parameters focusing on sub-graphs such as clusters, cliques and communities and node specific parameters like hubs are used to obtain structural insights on the stability of thermophilic proteins as compared to their mesophilic homologues.
In this comparative network study, we find that cluster population and clique population, along with community of cliques to be the major factors contributing to the stability of thermophiles. The thermophiles appear to have a highly packed hydrophobic core, by employing amino acid hotspots, thus increasing the enthalpy change between the folded and unfolded states, supporting previous studies . The thermophiles seem to have low energy communities, and segregated high-energy electrostatic cliques, implying that they prefer to maintain a degree of conformation plasticity whilst increasing stability, probably for performing their functions. We have also seen global network connectivity change in some thermophiles, supporting earlier suggestions on global evolution for thermal adaptation . Thermophiles seem to employ more than one of these methods to increase their stability.
Dataset taken for analysis
Triose Phosphate Isomerases
Signal Recognition Particle Receptor
where, V LJ (r ij ) and V c (r ij ) are the potential energies due to Lennard-Jones interactions and Coulombic interactions respectively, of residues i and j, averaged over the ensemble.
Similarly, we have constructed PENs in which we have considered only LJ interactions (V LJ (r ij )) and Coulombic interactions (V c (r ij )) respectively. The details of these graphs are not discussed.
We use the term "low energy" to denote low negative energies (for eg. -5 KJ/mol) and "high energy" for high negative energies (for eg. -25 KJ/mol).
Clusters are the connected components in a PEN e and can be identified using standard DFS algorithm .
Hubs are highly connected nodes in the network. In packing based Protein Structure Network (PSN) studies, a node is declared a hub if its degree is at least 4 . The same definition is being followed here because of similar packing constraints of the proteins analyzed.
Cliques are sub-graphs, having the maximum connectivity among them. For PEN e , we identify the k-cliques using CFinder . For example, for a clique of size k, there will be k × (k-1)/2 edges among the nodes. Two cliques are said to be adjacent if they share k-1 nodes. A community is a collection of adjacent k-cliques.
The size of a graph/network or a sub-graph (clusters, cliques and communities) is the total number of nodes in it.
The Largest Connected Component is a very important parameter in network analysis since it provides information on the connectivity of the network [5, 11]. The Largest Connected Component (LCC) is obtained as a function of 'e'. The LCC is well connected at low energy regions but breaks up at the transition region (Fig 1). The LCC transition profile comparison shows that thermophiles of Adenylate Kinases (Ad Kinases) (Fig 1), Subtilisins, Carboxypeptidases, PhosphoFructo (PF) Kinases and Endo-1,4-Beta Xylanases (E14B Xylanases) show a more connected LCC than mesophiles (Fig S1). From LCC profile we observe that global evolution from mesophile to thermophile is not a prominent contributing factor, nonetheless we observe certain thermophiles following this behavior (Fig 1 and Additional file 1, Fig S1).
Although, communities provides insight into how rigid sub-graphs collate to provide rigidity to the protein, isolated cliques can also provide similar rigidity to parts of the protein, empowering islands in protein structures to withstand extreme temperatures. From profiling the population change of cliques with changes in 'e', we were able to capture this effect in proteins. Analysis of the clique population profile in PEN e shows that almost all thermophiles with the exception of TIM, shows increased population of cliques at low energies (Additional file 1, Fig S5 and Fig 3). Unlike community transition profile, we see considerable number of cliques dominated by electrostatics (by constructing PENs with Eij = V C (r ij )). Conformational plasticity might be a possible factor influencing this behavior. The thermophilic proteins seem to employ low energy cliques and communities to maintain stability, but use only segregated high energy electrostatics dominated cliques to maintain rigidity. This strategy might help thermophiles to maintain stability but retain a degree of flexibility, enabling them to function.
In this study, we have used Protein Energy Networks for representing interactions in protein structures, to compare a dataset of thermophile-mesophile homologues. Network parameters like the largest connected component, sub-graph properties like clusters, cliques and communities and node specific parameters like hubs were compared across a range of interaction energies, to identify the factors contributing to the enhanced stability of thermophiles. In this work, we have consolidated all the interaction types by using their resultant interaction energies (calculated using classical force fields). This effort has eliminated ambiguities in defining and analyzing interactions separately. Also, we have considered the complexity of amino acid interactions by representing them as networks.
From the results obtained for PENs, we were able see that thermophilic proteins like Adenylate Kinases, Subtilisins, Carboxypeptidases, 3PG Kinases and Endo-1,4-Beta Xylanases show enhanced global connectivity, as seen from their increased LCC transition profiles. But thermophiles seem to prefer more than one factor for stabilization. For example, Adenylate Kinases thermophile seems to employ large communities and increased clique population than the mesophiles, whereas E14B Xylanase thermophile has a larger number cliques and hubs to enhance their stability. The LCC for thermophilic Neutral Proteases and Lipodimide Dehydrogenases are less connected than the corresponding mesophiles. Hence, the notion that evolution may prefer a global change from a thermophile to a mesophile might be more case specific. This was further strengthened by our observations on increased cluster population in thermophiles, showing that they use more autonomous stabilizing units to enhance stability. Also, increased electrostatic clusters in certain thermophiles, at higher energy ranges (< -20 KJ/mol), suggest that they might play a vital role in imparting stability to them, supporting earlier works in this area [16, 17].
Thermophilic proteins show a higher clique population than their mesophile homologues. But the clique population and largest community formation are at the low energy regime. PENs have electrostatic cliques but they do not form community. These observations suggest that the thermophilic proteins may employ a higher number of low energy communities to gain increased stability, but refrain from introducing much rigidity to the protein by keeping the high energy electrostatic cliques isolated. This property might allow thermophiles to be more stable but retain flexibility, probably to perform its function.
The studies on hub population suggest that the hubs (especially LJ dominated hubs) in thermophilic proteins are higher than that of mesophiles. Hence, the hydrophobic core connectivity (packing) in thermophilic proteins is better than mesophile partners, probably enabling them to stay folded under harsh conditions. This observation supports many studies on improved core packing in thermophilic proteins .
This comparative network based study suggests that global evolution of local enhancements has resulted in an increase in overall network connectivity and hence an increase in global stability of thermophiles. Increase in clusters and hubs in thermophiles bring about these local enhancements. Apart from these changes, thermophiles seem to employ low energy communities (highly connected sub-graphs) to maintain a level of rigidity to the network. Presence of electrostatic clusters, and cliques but absence of communities, shows localized electrostatic interactions, rather than global network influences. And thermophilic proteins seem to have evolved by exploiting more than one of the above-mentioned methods.
Department of Science and Technology (DST Mathematical Biology Grant, DST0773) and Department of BioTechnology (DBT) is acknowledged for funding of the computing facility, which was used to perform MD simulations. Dhruba Deb and Brinda KV are acknowledged for their discussions and contributions.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 1, 2010: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/11?issue=S1.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.