Volume 13 Supplement 2
Amino acid function and docking site prediction through combining disease variants, structure alignments, sequence alignments, and molecular dynamics: a study of the HMG domain
© Prokop et al.; licensee BioMed Central Ltd. 2012
Published: 13 March 2012
The DNA binding domain of HMG proteins is known to be important in many diseases, with the Sox sub-family of HMG proteins of particular significance. Numerous natural variants in HMG proteins are associated with disease phenotypes. Integrating these natural variants, molecular dynamic simulations of DNA interaction and sequence and structure alignments give detailed molecular knowledge of potential amino acid function such as DNA or protein interaction.
A total of 33 amino acids in HMG proteins are known to have natural variants in diseases. Eight of these amino acids are normally conserved in human HMG proteins and 27 are conserved in the human Sox sub-family. Among the six non-Sox conserved amino acids, amino acids 16 and 45 are likely targets for interaction with other proteins. Docking studies between the androgen receptor and Sry/Sox9 reveals a stable amino acid specific interaction involving several Sox conserved residues.
The HMG box has structural conservation between the first two of the three helixes in the domain as well as some DNA contact points. Individual sub-groups of the HMG family have specificity in the location of the third helix, DNA specific contact points (such as amino acids 4 and 29), and conserved amino acids interacting with other proteins such as androgen receptor. Studies such as this help to distinguish individual members of a much larger family of proteins and can be applied to any protein family of interest.
Predicting function from protein sequence is a complex and challenging task. Multiple sequence alignments can give insights into functional conservation over evolutionary time but are limited to what can be observed at the level of primary structure. Combining these sequences with known protein tertiary structures provides a three dimensional explanation of potential evolutionary pressures, but correlating the conservation to specific functions is still a challenge. This study compares natural variants (NV) associated with disease phenotypes to molecular dynamic (MD) simulations of DNA binding, predicting the functionality of specific amino acids within a medically important protein domain.
The high mobility group (HMG) box is composed of three helices that make an "L" shape able to bind the minor groove of DNA (reviewed in ). Many of the members of this protein family bind to DNA with low sequence specificity, such as the HMGB1 protein important in inflammation response . Some members, such as the Sox sub-family, bind to DNA with a higher degree of sequence specificity . The Sox family consists of 20 known human proteins, with the most thoroughly studied being the mammalian testis-determining factor, Sry . Recent work has shown Sry to have additional functions outside testis determination. These functions may include brain development [5, 6], activation of the sympathetic nervous system , and blood pressure regulation . Identifying and understanding the roles of conserved amino acids in Sry and other Sox proteins may lead to insights into particular amino acid functions. These might be HMG specific, such as DNA binding and structure, or specific to individual protein members. Combined analysis of amino acids known to have natural variants in disease phenotypes via multiple sequence alignment, structure alignment and MD simulation reveals several amino acids in the Sox family that may contribute to Sox specific functions such as interactions with the androgen receptor (AR).
Natural variations of amino acids in HMG proteins associated with various diseases were collected from Uniprot  and can be seen in the Additional file 1 along with all sequence accession codes. These amino acids were highlighted on the sequence of Sry, which could be used to identify conserved regions on multiple sequence alignments.
Sequence and structure alignments
All sequence alignments were performed with ClustalW  using the BLOSUM62 matrix . Human HMG proteins were retrieved from Uniprot, and proteins containing multiple HMG domains were parsed into individual domain sequences. Human sequences were used to study conservation of the HMG family, while sequences from multiple species (from invertebrates to vertebrates) were used in studying conservation of an individual member of the family across evolutionary time. HMG protein structures were identified by blasting the sequence of the HMG box of Sry against the Protein Data Bank (PDB)  using blastp from NCBI with default settings . All structures were cleaned by removing all molecules (water, salts, DNA, additional protein sequence) that were not part of the HMG domain. For NMR structures containing multiple models, only regions of high agreement from the first reported ensemble member were used. The multiple structures of HMG proteins were superposed using MUSTANG  to the structure of SRY remaining bound to DNA. Sox proteins were also superposed to identify Sox-specific features.
Molecular dynamic (MD) simulations
All MD simulations were run using YASARA Structure  with Amber03 force field  for 1000 picoseconds (ps). The md_run macro  was used with a water density of 0.997g/mL. Simulations were analyzed using both the md_analyse and md_analyzeres macros . Structure 1j46 was used for MD of SRY. As no known structure exists for Sox9, models were created using I-TASSER , superposed onto DNA using structure 1j46, and the energy was minimized with YASARA. Although HMGB1 contains two HMG domains, only the second (which contains the NV) was used to run the MD simulation of HMGB1. Amino acid substitutions were performed by swapping amino acids in YASARA.
Sry-AR predicted interactions
A short peptide of the AR was docked into the model by placing the fragment in close proximity to the proposed contact amino acids of SRY and the energy of the system was minimized in vacuo. The starting model for docking was derived from 1j46 coordinates. The model was placed in simulation space of 57, 72, and 57 Å, water was added to the system at 0.997g/mL, and the system was energy minimized. Three different simulations were run on both SRY and Sox9 for 1500 ps each: docked AR (Docked), AR in which all the amino acids were swapped with alanine (Docked A's) to show sequence specificity, and the AR pulled away from interaction (free). Movement of the AR peptide in each system was recorded over the simulation every 25 ps. Sox9-AR interactions were investigated by replacing amino acids in the structure of Sry with those present in Sox9.
Results and discussion
Most NVs were conserved in Sox family members rather than in non-Sox HMG sequences. Because of the paucity of NVs in non-Sox HMG proteins, and with only 8 of the 33 NV amino acids conserved in the HMG family sequences, we decided to determine if any amino acids were conserved only in the Sox family. Nineteen additional amino acids with NVs were conserved 90% or greater in the Sox family with the previous 8 HMG NVs also conserved in SOX. Structure alignments of the Sox family members show a highly conserved first, second and third helix (Figure 1B) with several clumped regions of conserved NVs. A hydrophobic core is conserved between the N-terminus and the C-terminus of the Sox proteins. All of the NVs involved in Sry based disease associations were conserved in multiple sequence alignments of Sry.
Molecular dynamic simulations support functional conservation for DNA binding and structure of the 8 HMG conserved NVs. Most NVs identified were conserved in the Sox subfamily of HMG proteins. Of these amino acids conserved, amino acids 4 and 29 were identified to have contacts with base pairs of the minor groove contributing to DNA specificity. Several NV amino acids, such as 16 and 45, were not as highly conserved in HMG proteins and likely contribute to individual member specificity. Some Sox conserved amino acids that do not appear to contribute to proper packing or DNA interaction were identified as a potential docking site for interacting with AR. The use of sequences, structures, natural variants in disease phenotypes and molecular dynamics simulations of protein-DNA interaction offers new insights at understanding the HMG domain at an amino acid level. This approach serves as a hypothesis generator for molecular mutagenesis, and protein-protein/protein-DNA interactions.
List of abbreviations
high mobility group
Protein data bank
root mean square deviation
Acknowledgements and funding
Funding was through the Choose Ohio First Bioinformatics scholarship
This article has been published as part of BMC Bioinformatics Volume 13 Supplement 2, 2012: Proceedings from the Great Lakes Bioinformatics Conference 2011. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/13/S2
- Grosschedl R, Giese K, Page J: HMG domain proteins: architectural elements in the assembly of nucleoprotein structures. Trends Genet. 1994, 10: 94-100. 10.1016/0168-9525(94)90232-1.View ArticlePubMedGoogle Scholar
- Wang H, Bloom O, Zhang M, Vishnubhakat JM, Ombrellino M, Che J, Frazier A, Yang H, Ivanova S, Borovikova L, Manogue KR, Faist E, Abraham E, Andersson J, Andersson U, Molina PE, Abumrad NN, Sama A, Tracey KJ: HMG-1 as a late mediator of endotoxin lethality in mice. Science. 1999, 285: 248-251. 10.1126/science.285.5425.248.View ArticlePubMedGoogle Scholar
- Mertin S, McDowall SG, Harley VR: The DNA-binding specificity of SOX9 and other SOX proteins. Nucleic Acids Res. 1999, 27: 1359-1364. 10.1093/nar/27.5.1359.PubMed CentralView ArticlePubMedGoogle Scholar
- Gubbay J, Collignon J, Koopman P, Capel B, Economou A, Munsterberg A, Vivian N, Goodfellow P, Lovell-Badge R: A gene mapping to the sex-determining region of the mouse Y chromosome is a member of a novel family of embryonically expressed genes. Nature. 1990, 346: 245-250. 10.1038/346245a0.View ArticlePubMedGoogle Scholar
- Wu JB, Chen K, Li YM, Lau YFC, Shih JC: Regulation of monoamine oxidase A by the SRY gene on the Y chromosome. FASEB J. 2009, 23: 4029-4038. 10.1096/fj.09-139097.PubMed CentralView ArticlePubMedGoogle Scholar
- Milsted A, Serova L, Sabban EL, Dunphy G, Turner ME, Ely DL: Regulation of tyrosine hydroxylase gene transcription by Sry. Neurosci Lett. 2004, 369: 203-207. 10.1016/j.neulet.2004.07.052.View ArticlePubMedGoogle Scholar
- Ely D, Milsted A, Dunphy G, Boehme S, Dunmire J, Hart M, Toot J, Turner M: Delivery of sry1, but not sry2, to the kidney increases blood pressure and sns indices in normotensive wky rats. BMC Physiol. 2009, 9: 10-10.1186/1472-6793-9-10.PubMed CentralView ArticlePubMedGoogle Scholar
- Ely D, Underwood A, Dunphy G, Boehme S, Turner M, Milsted A: Review of the Y chromosome, Sry and hypertension. Steroids. 2010, 75: 747-753. 10.1016/j.steroids.2009.10.015.PubMed CentralView ArticlePubMedGoogle Scholar
- Uniprot. [http://www.uniprot.org/]
- Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003, 31: 3497-3500. 10.1093/nar/gkg500.PubMed CentralView ArticlePubMedGoogle Scholar
- Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. PNAS. 1992, 89: 10915-10919. 10.1073/pnas.89.22.10915.PubMed CentralView ArticlePubMedGoogle Scholar
- Protein Data Bank (PDB). [http://www.rcsb.org/pdb/home/home.do]
- NCBI blastp. [http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&BLAST_PROGRAMS=blastp&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome]
- Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM: MUSTANG: a multiple structural alignment algorithm. Proteins. 2006, 64: 559-574. 10.1002/prot.20921.View ArticlePubMedGoogle Scholar
- YASARA. [http://www.yasara.org/products.htm]
- Duan Y, Wu C, Chowdhury S, Lee MC, Xiong G, Zhang W, Yang R, Cieplak P, Luo R, Lee T, Caldwell J, Wang J, Kollman P: A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J Comput Chem. 2003, 24: 1999-2012. 10.1002/jcc.10349.View ArticlePubMedGoogle Scholar
- Elmar Krieger: YASARA macros. [http://www.yasara.org/macros.htm#HeadTarget]
- Roy A, Kucukural A, Zhang Y: I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010, 5: 725-738. 10.1038/nprot.2010.5.PubMed CentralView ArticlePubMedGoogle Scholar
- McDowall S, Argentaro A, Ranganathan S, Weller P, Mertin S, Mansour S, Tolmie J, Harley V: Functional and structural studies on wild type SOX9 and mutations causing Campomelic Dysplasia. J Biol Chem. 1999, 274: 24023-24030. 10.1074/jbc.274.34.24023.View ArticlePubMedGoogle Scholar
- Xiang YY, Wang DY, Tanaka M, Suzuki M, Kiyokawa E, Igarashi H, Naito Y, Shen Q, Sugimura H: Expression of high-mobility group-1 mRNA in human gastrointestinal adenocarcinoma and corresponding non-cancerous mucosa. Int J Cancer. 1997, 74: 1-6. 10.1002/(SICI)1097-0215(19970220)74:1<1::AID-IJC1>3.0.CO;2-6.View ArticlePubMedGoogle Scholar
- Ely DL, Salisbury R, Hadi D, Turner M, Johnson ML: Androgen receptor and the testes influence hypertension in a hybrid rat model. Hypertension. 1991, 17: 1104-1110.View ArticlePubMedGoogle Scholar
- Yuan X, Lu ML, Li T, Balk SP: SRY interacts with and negatively regulated androgen receptor transcriptional activity. J Biol Chem. 2001, 276: 46647-46654. 10.1074/jbc.M108404200.View ArticlePubMedGoogle Scholar
- Wang HY, McKnight NC, Zhang T, Lu ML, Balk SP, Yuan X: SOX9 is expressed in normal prostate basal cells and regulated androgen receptor expression in prostate cancer cells. Cancer Res. 2007, 67: 528-536. 10.1158/0008-5472.CAN-06-1672.View ArticlePubMedGoogle Scholar
- Boonyaratanakornkit V, Melvin V, Prendergast P, Altmann M, Ronfani L, Bianchi ME, Taraseviciene L, Nordeen SK, Allegretto EA, Edwards DP: High-mobility group chromatin proteins 1 and 2 functionally interact with steroid hormone receptors to enhance their DNA binding in vitro and transcriptional activity in mammalian cells. Mol Cell Biol. 1998, 18: 4471-87.PubMed CentralView ArticlePubMedGoogle Scholar
- Melvin VS, Harrell C, Adelman JS, Kraus WL, Churchill M, Edwards DP: The role of the C-terminal extension (CTE) of the estrogen receptor α and β DNA binding domain in DNA binding and interaction with HMGB. J Biol Chem. 2004, 279: 14763-71. 10.1074/jbc.M313335200.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.