- Open Access
Studying the unfolding process of protein G and protein L under physical property space
© Zhao et al; licensee BioMed Central Ltd. 2009
- Published: 30 January 2009
The studies on protein folding/unfolding indicate that the native state topology is an important determinant of protein folding mechanism. The folding/unfolding behaviors of proteins which have similar topologies have been studied under Cartesian space and the results indicate that some proteins share the similar folding/unfolding characters.
We construct physical property space with twelve different physical properties. By studying the unfolding process of the protein G and protein L under the property space, we find that the two proteins have the similar unfolding pathways that can be divided into three types and the one which with the umbrella-shape represents the preferred pathway. Moreover, the unfolding simulation time of the two proteins is different and protein L unfolding faster than protein G. Additionally, the distributing area of unfolded state ensemble of protein L is larger than that of protein G.
Under the physical property space, the protein G and protein L have the similar folding/unfolding behaviors, which agree with the previous results obtained from the studies under Cartesian coordinate space. At the same time, some different unfolding properties can be detected easily, which can not be analyzed under Cartesian coordinate space.
- State Ensemble
- Cartesian Space
- Native Contact
- Physical Property Parameter
- Essential Subspace
Most proteins exist in unique three-dimensional conformations exquisitely suited to their function. Protein folding is one of the important and unsolved problems in life science. Some sophisticated theories have been proposed after several decades of extensive research through experimental and theoretical studies. The most popular theory is that native state topology is an important determinant of protein folding mechanism [1, 2]. Studies on some small single-domain proteins suggest that proteins that have similar native structures with low sequence identity have similar transition state ensemble [3, 4] and folding rates of two-state proteins have shown to correlate very well with contact order, a quality linked to topology [5, 6]. On the other hand, there are also some exceptions. Some studies indicated that proteins with the similar native structures maybe have different folding pathway [7–9].
The above studies are usually performed under conformational space or geometrical space that is constructed based on the Cartesian coordinates of atoms. The three-dimensional structure of protein is changed during the folding/unfolding process. Companied with transformation of three-dimensional Cartesian coordinates of atoms, some physical parameters, such as native contact number, accessible surface area, radius of gyration, are correspondingly changed. Some parameters, such as the fraction of the native contacts Q, number of unfolded links μ, and the fraction of residues that are ordered N f , have been chosen as reaction coordinates to depict the protein folding/unfolding process. That is to say, physical property parameters representing some properties of protein can describe the characters during the process of protein folding/unfolding as the atoms three-dimensional Cartesian coordinates do. In this study, we investigate the protein folding/unfolding behaviors under physical property space which is based on physical property parameters. Some novel characters of protein folding/unfolding can be revealed under this physical property space .
The trajectory number of different unfolding types
Type I had the same shape as umbrella (see Fig. 2(a) and Fig. 3(a)), and this type had 22 (55%) trajectories among forty unfolding trajectories for the two proteins. The probability of this type was much higher than that of the others.
Unfolded state ensemble
When the native contact number in a conformation is less than 20% of that in the native state, the conformation was defined to be in the unfolded state ensemble . Depended on the definition, fifty-five native contacts formed by fifty-six residues were identified for protein G, and forty-nine native contacts formed by sixty-two residues were identified for protein L.
The three principal components of unfolded states ensemble
Unfolding simulation time
The average unfolding simulation time of each unfolding type (ps)
With the protein spread, the native contact number N decreased, at same time, the total contact distance TCD became smaller and the second structure content SSC including the β sheet content decreased. For the inverse proportion with those parameters, the first principal component PC1 increased accordingly (Fig. 2, 3). The second principal component PC2 was dominated by the loading of radius of gyration of Cα atom Rg1 and number of hydrogen bonds between the protein and water HB4. During the unfolding process, the protein unwound and its volume became bigger, and the more hydrogen bonds between the protein and water were formed, then the second principal component PC2 was increased in physical property space until the unfolding simulation convergence. The variety of PC3 was the same as PC1 and PC2.
Under the essential property subspace, the unfolding trajectories of protein G and protein L had the similar three types, which was coordinate with the fact that native state topology determines the folding mechanism. Type I with umbrella-shape had the higher appearance probability and the shortest unfolded simulation time among the three types, which might induced that this type was the preferred pathway among the multiple pathway . The shape of unfolding trajectory of type I indicated that the two proteins were fast folding two-state proteins, which was consistent with the fact that protein G and protein L are fast folding two-state kinetics proteins [18, 19]. For the similar topology, protein G and protein L had the similar unfolded state ensemble as ellipsoid.
Protein G and protein L have low homology (16% sequence identity) although they share the similar native state topology. The most obvious difference for the two proteins is the α-helix orientation. In protein L the helix is almost parallel to the β-sheet, whereas in the protein G the helix runs diagonally across the sheet . Studies indicate that one of the two β-turns is largely formed and the other largely disrupted in the folding transition state. In protein L it is the N-terminal β-turn, and in the protein G the C-terminal β-turn, that is formed in the transition ensemble [7–9, 20]. However, the difference disappeared under the property space for the two proteins. The property space was constructed by physical property parameters of protein, for the two proteins, different transient states in Cartesian space might have the same physical properties. At same time, some other difference of unfolding behaviors of the two proteins was observed obviously under the property space.
First, the unfolding simulation time was difference. Protein L unfolded much faster than protein G, especially for the type I which represented the preferred pathway, which was accordant with the experiments study [19, 21]. It may be related to the amino acid sequence of the two proteins. Protein G and protein L had different residue number and different native contact number though they had the similar native topology structures. The ratio of local contact among all native contacts was different between protein G and protein L [7, 8] and the local and non-local contacts had different influence on unfolding rate.
Second, the distributing area of unfolded state ensemble was different. The unfolded states resided in a finite area under the property space, which agreed with the fact that the unfolded states are not infinite but finite [22, 23]. For protein L, the change ranges of three principal components were larger than those of protein G, and corresponding distributing area of unfolded state ensemble of protein L was larger than that of protein G. With similar native topology, protein L had the more unfolded states.
Among the multiple folding pathways, the unfolding difficulty of protein was different. Protein L had more unfolding difficulty by type II than by the other two types and protein G had middle unfolding difficulty by type II among the three unfolding types. For the same reason, protein L unfolded slower than protein G by type II.
The closer of the two points were in property subspace, the more similar properties the two conformations had. For type II, proteins unfolded from native state to unfolded state with an obvious stagnation at some states. There might be a intermediate state for the two proteins, which required further study to be confirmed.
In the physical property space, a point represents a conformation of protein that have thousands of atom coordinates in Cartesian space. The whole behaviors of protein folding/unfolding can be observed easily. With some effective analysis tool such as network, some details of protein folding mechanism may be detected, which is worthy of the following study.
In this study, the physical property space was constructed with twelve physical parameters and decreased to three-dimensional essential property subspace. Under the property space, the unfolding behaviors of protein G and protein L were studied. With the statistical analysis on the forty unfolding property trajectories, we found that the two proteins with similar native state topologies had the similar unfolding property trajectories and similar unfolded state ensemble under the property space, which agreed with the previous study under Cartesian space. At the same time, some unfolding properties, which could not be realized by studies under Cartesian space, could be easily detected, for example, the unfolding pathway type, the difference of unfolding simulation time and the difference of distributing area of the unfolded state ensemble. At last, we only studied the two proteins and can not say that all proteins with similar native topology have the same characteristic under property space as protein G and protein L, which demands more deep research.
Molecular dynamics (MD) simulations
The initial conformations of protein G and protein L were taken from protein data bank (PDB)  with PDB entry code 2GB1 and 2PTL, respectively, which have been solved by NMR spectroscopy [13–15]. Unfolding simulations were carried out using the GROMACS software package  with the GROMOS96 43a1 force field  and explicit water. The SPC water model was used for water molecules . After energy minimization, some water molecules were replaced with same number of chlorine or natrium ions to neutralize the system. Under 300 K and 1 bar, position-restrained MD simulations were performed for 500 ps and forty conformations were received every 10 ps from the last 400 ps simulation trajectory. With each conformation, position-restrained MD simulations were performed for 100 ps under 540 K and 1 bar at first, and then free MD simulations were carried out for 12ns under same condition. The time step of simulation is 2 fs and the total simulation time is up to 0.96μs.
For the convenience of analysis, 6000 conformations were chosen for the following analysis from 12 ns unfolding trajectories with same interval. The following twelve physical properties were calculated, they are α helix content, β sheet content, second structure content (including α helix, β sheet, β bridge, bend and turn), hydrophobic solvent-accessible surface area, radius of gyration of Cα atom, number of hydrogen bond within the protein, radius of gyration of the hydrophobic core, number of hydrogen bond within the hydrophobic core, number of hydrogen bond between the protein and water, number of hydrogen bond between the protein and waters within the hydrophobic core, native contact number within the protein, and total contact distance (TCD) .
We define a contact as being present if the Cα atom of two residues (i, j) are within 6.5 angstrom. We define native contact including all contact formed between residues not adjacent in sequence and present in both reference native simulations for more than two-thirds of the simulation time .
Principal component analysis and property space
The twelve parameters mentioned above of protein were calculated for each conformation during the simulations. The value of each parameter was normalized between 0 and 1, with 0 corresponding to the lowest value across the trajectory and 1 being the highest value across the trajectory. The covariance property matrix C was calculated and the element c ij was determined by
c ij = ⟨(x i - ⟨x i ⟩)(x j - ⟨x j ⟩)⟩
where ⟨⟩ donated the average over all structures sampled in the trajectory and x i = x i (t) was the ith physical parameter of the conformation at time t.
The property matrix C was 12 × 12 dimensional symmetric matrix. An average property matrix was obtained by averaging forty property matrixes. With the principal component analysis, the average property matrix was diagonalized to get the twelve new orthogonal eigenvectors and corresponding eigenvalues. The first three eigenvectors with largest eigenvalues were selected as three principal components , , to construct three-dimensional essential physical property subspace, and the unfolding trajectories were projected into the subspace.
We gratefully thank Prof. H.J.C. Berendsen for providing us with the GROMACS programs. This work was supported by a grant from Chinese National Key Fundamental Research Project (No. 90403120) and Shandong Fundamental Research Project (NO.Y2005D12) and project of shandong domestic visitorial researcher for excellent young teacher of universities.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S1
- Baker D: A surprising simplicity to protein folding. Nature. 2000, 405: 39-42. 10.1038/35011000.View ArticlePubMedGoogle Scholar
- Alm E, Baker D: Matching theory and experiment in protein folding. Curr Opin Struct Biol. 1999, 9: 189-196. 10.1016/S0959-440X(99)80027-X.View ArticlePubMedGoogle Scholar
- Riddle DS, Grantcharova VA, Santiago JD, Alm E, Ruczinski I, Baker D: Experiment and theory highlight role of native state topology in SH3 folding. Nature Struct Biol. 1999, 6: 1016-1024. 10.1038/14901.View ArticlePubMedGoogle Scholar
- Martinez JC, Serrano L: The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved. Nature Struct Biol. 1999, 6: 1010-1016. 10.1038/14896.View ArticlePubMedGoogle Scholar
- Plaxco KW, Simons KT, Baker D: Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol. 1998, 277: 985-994. 10.1006/jmbi.1998.1645.View ArticlePubMedGoogle Scholar
- Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco KW, Baker D, Finkelstein AV: Contact order revisited: influence of protein size on the folding rate. Protein Sci. 2003, 12 (9): 2057-62. 10.1110/ps.0302503.PubMed CentralView ArticlePubMedGoogle Scholar
- McCallister EL, Alm E, Baker D: Critical role of β-hairpin formation in protein G folding. Nat Struct Biol. 2000, 7: 669-673. 10.1038/77971.View ArticlePubMedGoogle Scholar
- Karanicolas J, Brooks CLIII: The origins of asymmetry in the folding transition states of protein L and protein G. Protein Sci. 2002, 11: 2351-2361. 10.1110/ps.0205402.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim DE, Fisher C, Baker D: A breakdown of symmetry in the folding transition state of protein L. J Mol Biol. 2000, 298: 971-984. 10.1006/jmbi.2000.3701.View ArticlePubMedGoogle Scholar
- Cecilla C, Nelson O: Topological and energetic factors: what determines the structural details of the transition state ensemble and "on-route" intermediates for protein folding? An investigation for small globular proteins. J Mol Biol. 2000, 298: 937-953. 10.1006/jmbi.2000.3693.View ArticleGoogle Scholar
- Alm E, Baker D: Prediction of protein-folding mechanism from free-energy landscape derived from native structures. PNAS. 1999, 96: 11305-11310. 10.1073/pnas.96.20.11305.PubMed CentralView ArticlePubMedGoogle Scholar
- Ji-Hua W, Li-Ling Z, Xiang-Hua D: Study of Multiple Unfolding Trajectories and Unfolded States of the Protein GB1 Under the Physical Property Space. J Biomol Struct Dyn. 2008, 25: 609-620.View ArticleGoogle Scholar
- Gronenborn AM, Filpula DR, Essig NZ, Achari A, Whitlow M, Wingfield PT, Clore GM: A novel, highly stable fold of the immunoglobulin binding domain of streptococcal protein G. Science. 1991, 253: 657-661. 10.1126/science.1871600.View ArticlePubMedGoogle Scholar
- Wikstrom M, Sjobring U, Kastern W, Bjorck L, Drakenberg T, Forsen S: Proton nuclear magnetic resonance sequential assignments and secondary structure of an immunoglobulin light chain-binding domain of protein L. Biochemistry. 1993, 32: 3381-3386. 10.1021/bi00064a023.View ArticlePubMedGoogle Scholar
- Wikstrom M, Drakenberg T, Forsen S, Sjobring U, Bjorck L: Three-dimensional solution structure of an immunoglobulin light chain-binding domain of protein L. Comparison with the IgG-binding domains of protein G. Biochemistry. 1994, 33: 14011-14017. 10.1021/bi00251a008.View ArticlePubMedGoogle Scholar
- Sheinerman FB, Brooks CL: Calculations on folding of segment B1 of streptococcal protein G. J Mol Biol. 1998, 278: 439-456. 10.1006/jmbi.1998.1688.View ArticlePubMedGoogle Scholar
- Lazaridis T, Karplus M: "New view" of protein folding reconciled with the old through multiple unfolding simulations. Science. 1997, 278: 1928-1931. 10.1126/science.278.5345.1928.View ArticlePubMedGoogle Scholar
- Alexander P, Orban J, Bryan P: Kinetic analysis of folding and unfolding the 56 amino acid IgG-binding domain of streptococcal protein G. Biochemistry. 1992, 31: 7243-7248. 10.1021/bi00147a006.View ArticlePubMedGoogle Scholar
- Scalley ML, Yi Q, Gu H, McCormack A, Yates JR, Baker D: Kinetics of folding of the IgG binding domain of peptostreptococcal protein L. Biochemistry. 1997, 36 (11): 3373-3382. 10.1021/bi9625758.View ArticlePubMedGoogle Scholar
- Hongyi Z, Yaoqi Z: Folding rate prediction using total contact distance. Biophys J. 2002, 82: 458-463. 10.1016/S0006-3495(02)75410-6.View ArticleGoogle Scholar
- Plaxco KW, Spitzfaden C, Campbell ID, Dobson CM: A comparison of the folding kinetics and thermodynamics of two homologous fibronectin type III modules. J Mol Biol. 1997, 270: 763-770. 10.1006/jmbi.1997.1148.View ArticlePubMedGoogle Scholar
- Daura X, Jaun B, Seebach D, van Gunsteren WF, Mark AE: Reversible peptide folding in solution by molecular dynamics simulation. J Mol Biol. 1998, 280: 925-932. 10.1006/jmbi.1998.1885.View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- Daura X, van Gunsteren WF, Mark AE: Folding-unfolding thermodynamics of a β-heptapeptide from equilibrium simulations. Proteins Struct Funct Genet. 1999, 34: 269-280. 10.1002/(SICI)1097-0134(19990215)34:3<269::AID-PROT1>3.0.CO;2-3.View ArticlePubMedGoogle Scholar
- Scott WRP, Huenenberger PH, Tironi IG, Mark AE, Billete SR, Fennen J, Torda AE, Huber T, Krueger P, van Gunsteren WF: The GROMOS biomolecular simulation program. J Phys Chem A. 1999, 103: 3596-3607. 10.1021/jp984217f.View ArticleGoogle Scholar
- van Gunsteren WF, Billeter SR, Eising AA, Hünenberger PH, Krüger P, Mark AE, Scott WRP, Tironi IG: The GROMOS96 Manual and User Guide. 1996, Vdf Hochschulverlag AG, ZürichGoogle Scholar
- Berendsen HJC, Grigera JR, Straatsma TP: The missing term in effective pair potentials. J Phys Chem. 1987, 91: 6269-6271. 10.1021/j100308a038.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.