Genome3D: A viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome
© Asbury et al; licensee BioMed Central Ltd. 2010
Received: 4 May 2010
Accepted: 2 September 2010
Published: 2 September 2010
New technologies are enabling the measurement of many types of genomic and epigenomic information at scales ranging from the atomic to nuclear. Much of this new data is increasingly structural in nature, and is often difficult to coordinate with other data sets. There is a legitimate need for integrating and visualizing these disparate data sets to reveal structural relationships not apparent when looking at these data in isolation.
We have applied object-oriented technology to develop a downloadable visualization tool, Genome3D, for integrating and displaying epigenomic data within a prescribed three-dimensional physical model of the human genome. In order to integrate and visualize large volume of data, novel statistical and mathematical approaches have been developed to reduce the size of the data. To our knowledge, this is the first such tool developed that can visualize human genome in three-dimension. We describe here the major features of Genome3D and discuss our multi-scale data framework using a representative basic physical model. We then demonstrate many of the issues and benefits of multi-resolution data integration.
Genome3D is a software visualization tool that explores a wide range of structural genomic and epigenetic data. Data from various sources of differing scales can be integrated within a hierarchical framework that is easily adapted to new developments concerning the structure of the physical genome. In addition, our tool has a simple annotation mechanism to incorporate non-structural information. Genome3D is unique is its ability to manipulate large amounts of multi-resolution data from diverse sources to uncover complex and new structural relationships within the genome.
A significant portion of genomic data that is currently being generated extends beyond traditional primary sequence information. Genome-wide epigenetic characteristics such as DNA and histone modifications, nucleosome distributions, along with transcriptional and replication center structural insights are rapidly changing the way the genome is understood. Indeed, these new data from high-throughput sources are often demonstrating that much of the genome's functional landscape resides in extra-sequential properties.
With this influx of new detail about the higher-level structure and dynamics of the genome, new techniques will be required to visualize and model the full extent of genomic interactions and function. Genome browsers, such as the USCS Genome Database Browser , are specifically aimed at viewing primary sequence information. Although supplemental information can easily be annotated via new tracks, representing structural hierarchies and interactions is quite difficult, particularly across non-contiguous genomic segments . In addition, in spite of the many recent efforts to measure and model the genome structure at various resolutions and detail [3–10], little work has focused on combining these models into a plausible aggregate, or has taken advantage of the large amount of genomic and epigenomic data available from new high-throughput approaches.
To address these issues, we have created an interactive 3D viewer, Genome3D, to enable integration and visualization of genomic and epigenomic data. The viewer is designed to display data from multiple scales and uses a hierarchical model of the relative positions of all nucleotide atoms in the cell nucleus, i.e., the complete physical genome. Our model framework is flexible and adaptable to handle new more precise structural information as details emerge about the genome's physical arrangement. The large amounts of data generated by high-throughput or whole-genome experiments raise issues of scale, storage, interactivity and abstraction. Novel methods will be required to extract useful knowledge. Genome3D is an early step toward such new approaches.
Genome3D is a GUI-based C++ program which runs on Windows (XP or later) platforms. Its software architecture is based on the Model-Viewer-Controller pattern . Genome3D is a viewer application to explore an underlying physical model displaying selections and annotations based on its current user settings. To support multiple resolutions and maintain a high level of interactivity, the model is designed using an object-oriented, hierarchical data architecture . Genome3D loads the model incrementally as needed to support user requests. Once a model is loaded, Genome3D supports UCSC Genome Browser track annotations of the BED and WIG formats .
At highest detail, a model of the physical genome requires a 3D position (x, y, z) for each bp atom of the genome. The large amount of such data (3 × 109 bp × 20 atoms/bp × 3 positions × 4 bytes ~ 600 gigabytes for humans) is reduced by exploiting the data's hierarchical organization. We store three scales of data for each chromosome in compressed XML format. Atomic positions are computed on demand and not saved. This technique reduces the storage size for a human genome to ~1.5 gigabytes, resulting in more than 400× savings. There are several sample models available for download from the Genome3D project homepage. More information of our representative model and its data format can be found in Additional file 1.
Results and Discussion
Genome3D Program Features
Genome3D features include:
Display of genomic data from nuclear to atomic scale.
Genome3D has multiple windows to visualize the physical genome model from simultaneous different viewpoints and scales. The model resolution of the current viewing window is set by the user, and its viewing camera is controlled by the mouse. Resolutions and viewpoints depend of the type of data that is being visualized.
A fully interactive point-and-select 3D environment
The user can navigate to an arbitrary region of interest by selecting a low resolution region and then loading corresponding higher resolution data which appears in another viewing window.
Loading of multiple resolution user-created models with an open XML format
The Genome3D application adheres to the Model-View-Controller software design pattern . The viewing software is completely separated from the multi-scale model that is being viewed. We have chosen a simple open format for each resolution of the model, and users can easily add their own models.
Image capture and PovRay/PDB model export support
Genome3D supports screen capture of the current display image to a JPG format. For highly quality renders, it can export the current model and view as a PovRay model  format for off-line print quality rendering. In addition, atomic positions of selected DNA can be saved to a PDB format file for downstream analysis.
Incorporation and user-defined visualization of UCSC annotation tracks onto the physical model
The UCSC Genome Database Browser has a variety of epigenetic information that can be exported directly from its web-site . This data can be loaded into Genome3D and displayed on the currently loaded genome model.
Visualizing Integrated Epigenetic and Genomic Data
We now give a few examples of applying biological information to a model and suggest possible methods of inferring unique structural relationships at various resolutions. One of the advantages of a multi-scale model is the ability to integrate data from various sources, and perhaps gain insight in higher level relationships or organizations. We choose to concentrate on high-throughput data sets that are becoming commonplace in current research: genome wide nucleosome positions, SNPs, histone methylations and gene expression profiles. The sample images, which can be visualized in Genome3D, were export and rendered in PovRay .
Another important source of epigenomic information is histone modification. Genome-wide histone modifications are being studied through a combination of DNA microarray and chromatin immunoprecipitation (ChIP-chip assays) . Histone methylations have important gene regulation implications, and methylations have been shown to serve as binding platforms for transcription machinery. The ENCODE initiative  is creating high-resolution epigenetic information for ~1% of the human genome. Despite the fact that such modification occurs in histone proteins, current approaches to map and visualize such information are limited to sequence coordinates in the genome. Our physical genome model visualizes methylation of histone proteins at atomic detail as determined by crystal structure. Figure 2C shows histone methylations for several histones within an ENCODE region. An integrated physical genome model can show the interplay between histone modifications and other genomic data, such as SNPs, DNA methylation, the structure of gene, promoter and transcription machinery, etc.
To illustrate the capability of Genome3D to integrate and examine data of appropriate scales, we constructed an elementary model of the physical genome (see Additional file 1 for details). This basic model is approximate since precise knowledge of the physical genome is largely unknown at present. However, the model's inaccuracies are secondary to its multi-scale approach that provides a framework to improve and refine the model. Current technologies are making significant progress toward capturing chromosome conformation within the nucleus at various scales [24, 25]. Because our multi-scale model is purely descriptive beyond the NCP scale, it can easily incorporate more accurate structural folding information, such as the 'fractal globule' behaviour . The Genome3D viewer, decoupled from the genome model, can be used to view any model that uses our model framework.
Building a 3D model of a complete physical genome is a non-trivial task. The structure and organization at a physical level is dynamic and heavily influenced by local and global constraints. A typical experiment may provide new data at a specific resolution or portion of the genome, and the integration of these data with other information to flesh out a multi-resolution model is challenging. For example, an experiment may measure local chromatin structure around a transcription site. This structure can be expressed as a collection of DNA strands, NCPs, and perhaps lower resolution 30 nm chromatin fibers. Our data formats are flexible enough to allow partial integration of this information, when the larger global structure is undetermined, or inferred by more global stochastic measurements from other experiments. Combining such data across resolutions is often difficult, but establishing data formats and visualization tools provide a framework that may simplify the integration process.
Recent advances in determining chromosome folding principles  highlight the need for new visualization methods. More detailed three-dimensional genomic models will help in discovering and characterizing epigenetic processes. We have created a multi-scale genomic viewer, Genome3D, to display and investigate genomic and epigenomic information in a three-dimensional representation of the physical genome. The viewer software and its underlying data architecture are designed to handle the visualization and integration issues that are present when dealing with large amount of data at multiple resolutions. Our data structures can easily accommodate new advances in chromosome folding and organization.
A common framework of established scales and formats could vastly improve multi-scale data integration and the ability to infer previously unknown relationships within the composite data. Our model architecture defines clear demarcations between four scales (nuclear, fiber, nucleosome and DNA), which facilitates data integration in a consistent and well-behaved manner. As more data become available, the ability to model, characterize, visualize, and perhaps most crucially, integrate information at many scales is necessary to achieve fuller understanding of the human genome.
Availability and Requirements
Project name: Genome3D
Project homepage: http://genomebioinfo.musc.edu/Genome3D/Index.html
Operating System: Windows-based operation systems (XP or later)
Programming Language: C++ and Python
Other requirements: OpenGLv2.0 and GLSL v2.0 (may not be present on some older graphics adapters - see Additional file 2)
Any restrictions to use by non-academics: None
This work is partly supported by grants IRG 97-219-08 from the American Cancer Society, Computational Biology Core of 1 UL1 RR029882-01, 3 R01 GM063265-09S1, a pilot project and statistical core of Grant 5 P20 RR017696-05, PhRMA Foundation Research Starter Grant, a pilot project from 5P20RR017677 to W.J.Z, and NSF 0904179 and 3 R01 GM078991-03S1 to JT. T.M.A. is supported by NLM training grant 5-T15-LM007438-02. The authors thank Y.Ruan for valuable discussion about the project, K.Zhao and D.E.Schones for providing nucleosome positioning data, M.Boehnke for critical reading of the manuscript, and T Qin, LC Tsoi, and K. Sims for software testing. The high performance computing facility utilized in this project is supported by NIH grants: 1R01LM009153, P20RR017696, 1T32GM074934 and 1T15 LM07438.
- Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 2008, (36 Database):D773–779.Google Scholar
- Dekker J: Gene regulation in the third dimension. Science 2008, 319(5871):1793–1794. 10.1126/science.1152850View ArticlePubMedPubMed CentralGoogle Scholar
- P Hahnfeldt JEH, Brenner DJ, Sachs RK, Hlatky LynnR: Polymer Models for interphase chromosomes. PNAS 1993, 90(16):7854–7858. 10.1073/pnas.90.16.7854View ArticlePubMedGoogle Scholar
- Sachs RK, van den Engh G, Trask B, Yokota H, Hearst JE: A randomwalk/giant-loop model for interphase chromosomes. Proc Natl Acad Sci USA 1995, 92(7):2710–2714. 10.1073/pnas.92.7.2710View ArticlePubMedPubMed CentralGoogle Scholar
- Ponomarev AL, Brenner D, Hlatky LR, Sachs RK: A polymer, random walk model for the size-distribution of large DNA fragments after high linear energy transfer radiation. Radiation and environmental biophysics 2000, 39(2):111–120. 10.1007/s004119900040View ArticlePubMedGoogle Scholar
- Woodcock CL, Grigoryev SA, Horowitz RA, Whitaker N: A chromatin folding model that incorporates linker variability generates fibers resembling the native structures. Proc Natl Acad Sci USA 1993, 90(19):9021–9025. 10.1073/pnas.90.19.9021View ArticlePubMedPubMed CentralGoogle Scholar
- Dekker J, Rippe K, Dekker M, Kleckner N: Capturing chromosome conformation. Science 2002, 295(5558):1306–1311. 10.1126/science.1067799View ArticlePubMedGoogle Scholar
- Balaeff A, Mahadevan L, Schulten K: Modeling DNA loops using the theory of elasticity. Phys Rev E Stat Nonlin Soft Matter Phys 2006, 73(3 Pt 1):031919.View ArticlePubMedGoogle Scholar
- Beard DA, Schlick T: Computational modeling predicts the structure and dynamics of chromatin fiber. Structure 2001, 9(2):105–114. 10.1016/S0969-2126(01)00572-XView ArticlePubMedGoogle Scholar
- Sharma S, Ding F, Dokholyan NV: Multiscale modeling of nucleosome dynamics. Biophysical journal 2007, 92(5):1457–1470. 10.1529/biophysj.106.094805View ArticlePubMedPubMed CentralGoogle Scholar
- Burbeck S: Applications Programming in Smalltalk-80: How to Use Model-View-Controller (MVC).1992. [http://st-www.cs.illinois.edu/users/smarch/st-docs/mvc.html]Google Scholar
- Shegogue D, Zheng WJ: Object-Oriented Biological System Integration: a SARS Coronavirus Example. Bioinformatics 2005, 21(10):2502–2509. 10.1093/bioinformatics/bti344View ArticlePubMedGoogle Scholar
- Foley JD: Computer graphics: principles and practice. 2nd edition. Reading, Mass.: Addison-Wesley; 1995.Google Scholar
- Persistence of Vision Pty. Ltd., Persistence of Vision Raytracer (Version 3.6)[http://www.povray.org/download/]
- Narlikar GJ, Fan HY, Kingston RE: Cooperation between complexes that regulate chromatin structure and transcription. Cell 2002, 108(4):475–487. 10.1016/S0092-8674(02)00654-2View ArticlePubMedGoogle Scholar
- Strahl BD, Allis CD: The language of covalent histone modifications. Nature 2000, 403(6765):41–45. 10.1038/47412View ArticlePubMedGoogle Scholar
- Eisfeld K, Candau R, Truss M, Beato M: Binding of NF1 to the MMTV promoter in nucleosomes: influence of rotational phasing, translational positioning and histone H1. Nucleic Acids Res 1997, 25(18):3733–3742. 10.1093/nar/25.18.3733View ArticlePubMedPubMed CentralGoogle Scholar
- Schones DE, Cui K, Cuddapah S, Roh TY, Barski A, Wang Z, Wei G, Zhao K: Dynamic regulation of nucleosome positioning in the human genome. Cell 2008, 132(5):887–898. 10.1016/j.cell.2008.02.022View ArticlePubMedGoogle Scholar
- Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, et al.: Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet 2008, 40(2):161–169. 10.1038/ng.76View ArticlePubMedGoogle Scholar
- Schones DE, Zhao K: Genome-wide approaches to studying chromatin modifications. Nat Rev Genet 2008, 9(3):179–191. 10.1038/nrg2270View ArticlePubMedGoogle Scholar
- The ENCODE (ENCyclopedia Of DNA Elements) Project Science 2004, 306(5696):636–640. 10.1126/science.1105136
- Wei CL, Wu Q, Vega VB, Chiu KP, Ng P, Zhang T, Shahab A, Yong HC, Fu Y, Weng Z, Liu J, Zhao XD, Chew JL, Lee YL, Kuznetsov VA, Sung WK, Miller LD, Lim B, Liu ET, Yu Q, Ng HH, Ruan Y: A global map of p53 transcription-factor binding sites in the human genome. Cell 2006, 124(1):207–219. 10.1016/j.cell.2005.10.043View ArticlePubMedGoogle Scholar
- Yoon H, Liyanarachchi S, Wright FA, Davuluri R, Lockman JC, de la Chapelle A, Pellegata NS: Gene expression profiling of isogenic cells with different TP53 gene dosage reveals numerous genes that are affected by TP53 dosage and identifies CSPG2 as a direct target of p53. Proc Natl Acad Sci USA 2002, 99(24):15632–15637. 10.1073/pnas.242597299View ArticlePubMedPubMed CentralGoogle Scholar
- Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 2009, 326(5950):289–293. 10.1126/science.1181369View ArticlePubMedPubMed CentralGoogle Scholar
- Horowitz-Scherer RA, Woodcock CL: Organization of interphase chromatin. Chromosoma 2006, 115(1):1–14. 10.1007/s00412-005-0035-3View ArticlePubMedGoogle Scholar
- Grosberg AY, Nechaev SK, Shakhnovich EI: The role of topological constraints in the kinetics of collapse of macromolecules. J Phys France 1988, 49: 2095–2100. [http://hal.archives-ouvertes.fr/docs/00/21/08/91/PDF/ajp-jphys_1988_49_12_2095_0.pdf] 10.1051/jphys:0198800490120209500View ArticleGoogle Scholar
- Koch CM, Andrews RM, Flicek P, Dillon SC, Karaoz U, Clelland GK, Wilcox S, Beare DM, Fowler JC, Couttet P, James KD, Lefebvre GC, Bruce AW, Dovey OM, Ellis PD, Dhami P, Langford CF, Weng Z, Birney E, Carter NP, Vetrie D, Dunham I: The landscape of histone modifications across 1% of the human genome in five human cell lines. Genome Res 2007, 17(6):691–707. 10.1101/gr.5704207View ArticlePubMedPubMed CentralGoogle Scholar
- Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ: Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 1997, 389(6648):251–260. 10.1038/38444View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.