Marmal-aid - a database for Infinium HumanMethylation450
© Lowe and Rakyan; licensee BioMed Central Ltd. 2013
Received: 16 September 2013
Accepted: 5 December 2013
Published: 12 December 2013
DNA methylation is indispensible for normal human genome function. Currently there is an increasingly large number of DNA methylomic data being released in the public domain allowing for an opportunity to investigate the relationships between the DNA methylome, genome function, and human phenotypes. The Illumina450K is one of the most popular platforms for assessing DNA methylation with over 10,000 samples available in the public domain. However, accessing all this data requires downloading each individual experiment and due to inconsistent annotation, accessing the right data can be a challenge.
Here we introduce ‘Marmal-aid’, the first standardised database for DNA methylation (freely available at http://marmal-aid.org). In Marmal-aid, the majority of publicly available Illumina HumanMethylation450 data is incorporated into a single repository allowing for re-processing of data including normalisation and imputation of missing values. The database is accessible in two ways: (1) Using an R package to allow for incorporation into existing analysis pipelines which can then be easily queried to gain insight into the functionality of certain CpG sites. This is aimed at a bioinformatician with experience in R. (2) Using a graphical interface allowing general biologists to query a pre-defined set of tissues (currently 15) providing a reference database of the methylation state in these tissues for the 450,000 CpG sites profiled by the Illumina HumanMethylation450.
Marmal-aid is the largest publicly available Illumina HumanMethylation450 methylation database combining Illumina HumanMethylation450 data from a number of sources into a single location with a single common annotation format. This allows for automated extraction using the R package and inclusion into existing analysis pipelines. Marmal-aid also provides a easy to use GUI to visualise methylation data in user defined genomic regions for various reference tissues.
DNA methylation particularly in CpG Islands located near promoters has long been known to repress gene expression , however, the role of methylation outside these regions is still not well understood. It is essential for the development of mamals  and its importance is further demonstrated by recent epigenome-wide association studies (EWAS) that have identified DNA methylation signatures for different human pathologies . As a consequence, there is now a vast amount of DNA methylomic data in the public domain, providing unprecedented opportunities to use computational approaches to uncover novel and fundamental relationships between the DNA methylome, genome function, and human phenotypes. However, till now, extracting this data for analysis in specific pipelines has required downloading each individual experiment by hand with limited ability to search due to inconsistent annotation.
Here we report the first dedicated DNA methylomic database called Marmal-aid. In Marmal-aid, all publicly available Illumina450K data is incorporated into a single repository and re-processed. This enables a wide variety of customizable meta-analyses to gain insights into the functionality of key CpG sites, without having to rely on gene-centric approaches such as Gene Ontology (GO). Marmal-aid has the potential to become a powerful database that will work both independently and in combination with GO and/or gene pathway analyses in the context of a variety of epigenomic investigations such as EWAS, genetic-epigenetic interactions, and environmental epigenomics.
Construction and content
A description of the column names contained in the annotation file in Marmal-aid
ID used to identify sample. If from GEO this is the GEO accession number
A long descriptive name of the sample
The GSE number which can be used to a certain whether samples are from the same experiment.
The lineage of the tissue
Main tissue type E g. Blood, Brain, Liver, Kidney
Further sub categorisation of the tissue E g. for Blood this may be CD19, CD4 etc
If a transformed cell line the name of the line is given in this column
Indication of the disease state of the sample. NA here means no information was recorded and is most likely healthy.
A further charaterisation of the disease state E g. if the DISEASE column contains Cancer then this column will indicate what type
The sex of the sample if given as taken from the annotation file
The prediction of the sex of the sample using autosomal probes
The age of the sample if given.
Any additional information that could not be described in the other columns.
Utility and discussion
The website http://marmal-aid.org provides the location for the R package as well as tutorials, important update information, samples page with a real time searchable table and a forum. A particularly useful part of the website is the methylation reference visualisation which provides a GUI interface to examine the methylation of a reference set of samples. This GUI was designed to allow a biologist with limited knowledge of bioinformatics to view the methylation state of a number of reference tissues at a particular gene or genomic location. A user can supply genomic co-ordinates of interest and the methylation state of probes contained on the array within these co-ordinates will be shown. This tool can be used to quickly ascertain whether a particular gene is differentially methylated in different tissues. For example at chr1:67,217,505-67,219,505 around the TSS of TCTEX101 we find an unmethylated CpG Island in the majority of tissues (Figure 2B).
For more in-depth analysis the R package allows an easy interface to access samples and quickly analyse them in existing pipelines. The R package can be installed using a single command [install.packages(“directory_of_download/marmalaid_1.2.tar.gz”,repos = NULL,type = “source”)]. To extract data from Marmal-aid then requires a single command, [beta = getbeta(samples,probes)], where samples are a vector of ids obtainable from the annotation file e g. (“GSM1052413”, “GSM1052414”,“GSM1052415”) and probes are the Illumina IDs of the probes required e g. “cg09432775” or ”cg03134653”. A more detailed tutorial is provided on the website. We currently provide a small number of functions that we have found useful such as a heatmap visualisation of the methylation state of each of your samples and probes. Also while Marmal-aid is useful for investigating the methylation state of a large number of samples in comparison to a 450K study design it may also be used to quickly assaying the methylation state of a region of interest (perhaps from an RNA-Seq experiment) and hence it is possible to input genomic co-ordinates instead of probes.
By allowing easy programmable access to Marmal-aid and hand curated annotation data we envision it has the potential to become a powerful database to be used as a meta-analytical tool that could work both independently and in combination with GO and/or pathway analyses in the context of a variety of epigenomic investigations such as EWAS, genetic-epigenetic interactions, and environmental epigenomics.
Marmal-aid is the largest publicly available Illumina Human 450 methylation database combining a number of public databases. While original raw data is available from Marmal-aid, we also provide processed data that has undergone normalisation as well as imputing missing data providing extra information not originally available. Marmal-aid also benefits from a standardised annotation format that has been hand annotated to allow for quick and easy searching and selection of available samples. Currently a large number of Illumina HumanMethylation450 experiments are submitted as beta values, which means there is a loss of information from the two channels in which the intensity is read. A number of normalisation techniques used for Illumina HumanMethylation450 data rely on this information and hence are not usable. It has been noted that a number of recent public submissions have included a RAW data file known as IDATs and in future we plan on including these into the database. We also plan on introducing future functions that will allow for initial QC of samples such as checks for contamination in blood samples or cell/tissue prediction algorithm as well as more automated meta-analysis of DMPs called in an experiment. Marmal-aid will be updated continually as more data becomes available and we aim to update at least once every two months.
Availability and requirements
Project name: Marmal-aid
Project home page: http://marmal-aid.org
Operating system(s): Platform independent
Programming language: R
- Bird AP, Wolffe AP: Methylation-induced repression— belts, braces and chromatin. Cell. 1999, 99 (5): 451-454. 10.1016/S0092-8674(00)81532-9.View ArticlePubMedGoogle Scholar
- Okano M, Bell DW, Haber DA, Li E: DNA methyltransferases Dnmt3a and Dnmt3b are essential for De novo methylation and mammalian development. Cell. 1999, 99 (3): 247-257. 10.1016/S0092-8674(00)81656-6.View ArticlePubMedGoogle Scholar
- Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011, 12: 529-541. 10.1038/nrg3000.PubMed CentralView ArticlePubMedGoogle Scholar
- Laird PW: Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet. 2010, 11: 191-203. 10.1038/nrg2732.View ArticlePubMedGoogle Scholar
- Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M: Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2011, 6: 692-702. 10.4161/epi.6.6.16196.View ArticlePubMedGoogle Scholar
- Hastie T, Tibshirani R, Sherlock G, Eisen M, Brown P, Botstein D: Imputing missing data for gene expression arrays. 1999, http://www-stat.stanford.edu/~hastie/Papers/missing.pdf,Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.