- Open Access
MiMiR – an integrated platform for microarray data sharing, mining and analysis
© Tomlinson et al; licensee BioMed Central Ltd. 2008
- Received: 22 May 2008
- Accepted: 18 September 2008
- Published: 18 September 2008
Despite considerable efforts within the microarray community for standardising data format, content and description, microarray technologies present major challenges in managing, sharing, analysing and re-using the large amount of data generated locally or internationally. Additionally, it is recognised that inconsistent and low quality experimental annotation in public data repositories significantly compromises the re-use of microarray data for meta-analysis. MiMiR, the Mi croarray data Mi ning R esource was designed to tackle some of these limitations and challenges. Here we present new software components and enhancements to the original infrastructure that increase accessibility, utility and opportunities for large scale mining of experimental and clinical data.
A user friendly Online Annotation Tool allows researchers to submit detailed experimental information via the web at the time of data generation rather than at the time of publication. This ensures the easy access and high accuracy of meta-data collected. Experiments are programmatically built in the MiMiR database from the submitted information and details are systematically curated and further annotated by a team of trained annotators using a new Curation and Annotation Tool. Clinical information can be annotated and coded with a clinical Data Mapping Tool within an appropriate ethical framework. Users can visualise experimental annotation, assess data quality, download and share data via a web-based experiment browser called MiMiR Online. All requests to access data in MiMiR are routed through a sophisticated middleware security layer thereby allowing secure data access and sharing amongst MiMiR registered users prior to publication. Data in MiMiR can be mined and analysed using the integrated EMAAS open source analysis web portal or via export of data and meta-data into Rosetta Resolver data analysis package.
The new MiMiR suite of software enables systematic and effective capture of extensive experimental and clinical information with the highest MIAME score, and secure data sharing prior to publication. MiMiR currently contains more than 150 experiments corresponding to over 3000 hybridisations and supports the Microarray Centre's large microarray user community and two international consortia. The MiMiR flexible and scalable hardware and software architecture enables secure warehousing of thousands of datasets, including clinical studies, from microarray and potentially other -omics technologies.
- Experimental Information
- Annotation Tool
- Public Repository
- Microarray Database
- Public Data Repository
Microarray technologies have matured rapidly over the past few years and present major challenges in managing, sharing, analysing and re-using the large amount of data generated  despite the considerable international efforts in standardising data format, content and description [2–6]. Vast numbers of microarray experiments are performed worldwide every year, many of which become available upon publication via public repositories like Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ and ArrayExpress http://www.ebi.ac.uk/arrayexpress. Many microarray databases have been created to support local communities with various focuses, for example on species http://rgd.mcw.edu and http://www.bugs.sgul.ac.uk microarray platform [10, 11], disease [12–14], institutions or research projects [15–18].
There are two major limitations of public repositories and most microarray databases. First, although most microarray databases are 'MIAME-compliant', i.e. are designed to capture the MIAME  minimal experimental information, these standards and guidelines are often not enforced, leading to variable, often very minimal, levels of experimental detail stored alongside microarray data. Because researchers who submit data to public repositories are ultimately responsible for the completeness, quality and accuracy of their submission , the majority of data sets in public repositories have insufficient experimental information available in order for the data to be re-used effectively in a different analysis. A recent study looking at Affymetrix data in GEO and ArrayExpress identified that only 38% of the microarray data meets the quality and format standards necessary for further integrative analysis . Second, the absence of appropriate security models in public repositories and many microarray databases makes it difficult or sometimes impossible to share data online prior to publication or to securely store sensitive biological or clinical information that would be important for meta-analysis. In addition, the effective collection, annotation and mining of detailed information on clinical samples e.g. patient age at diagnosis, detailed disease and treatment information, clinical treatment follow up and outcome data, is particularly challenging due to legal restrictions associated with storing and disclosing patient and volunteer clinical information (even in an anonymised way).
MiMiR, the Mi croarray Data Mi ning R esource is an integrated platform for microarray data sharing, mining and analysis that addresses many of these limitations. MiMiR stores experimental information to a level of detail higher than that suggested by MIAME using ontologies and naming conventions . It provides a powerful platform for large scale data mining and analysis and enables deposition of data in ArrayExpress on publication. MiMiR was initially developed to be used within the Microarray Centre and was not directly accessible to researchers . Here we describe new software components and enhancements of the original infrastructure that allow researchers to securely submit, access, share and analyse microarray and meta-data. Specifically, we have created: (i) a re-engineered hardware and software architecture that protects the MiMiR database integrity and enables secure online sharing of unpublished and public data amongst registered users; (ii) a new web based annotation tool allowing researchers to easily and quickly submit information about their experiments and samples; (iii) new sophisticated curation and annotation tools which automatically create annotated experiments in MiMiR and enable in-house annotators to check it and add ontology terms and systematic naming conventions; (iv) a clinical Data Mapping Tool to securely capture clinical information in a systematic way within an appropriate ethical framework; (v) a new user-friendly web interface that is used by researchers to visualise extensive experimental annotation, to download data and quality assessment reports and to share un-published datasets with collaborators or other registered users of the system; (vi) a re-engineered MAGE-ML pipeline for exporting experiments from MiMiR into the ArrayExpress repository or into the Rosetta Resolver package for data analysis; (vii) programmatic access to MiMiR from the new open source microarray data analysis software package EMAAS , allowing users to export selected data and associated meta-data for analysis.
• MiMiR security model and software architecture
• Experimental information capture, curation and annotation
MiMiR stores a high level of experimental information which exceeds that required by the MIAME guidelines . The experimental annotation process was enhanced by implementing an online data collection tool to allow users to easily and quickly submit, via the web, detailed experimental information. An internal curation and annotation tool automatically constructs an experimental model in MiMiR based on information provided, which can be checked and further annotated by trained staff.
MiMiR online experimental data collection
Experimental information is collected from users at the time of data generation rather than at the time of publication. This ensures easy recall, access and high accuracy of the meta-data provided and recorded. A web application, built using the Apache/php5/MySQL and Secure Sockets Layer (SSL) technologies, enable efficient capture and automatic submission of comprehensive experimental information at no cost to Centre staff time. Data collection is done through successive stages (Additional File 1) at which a comprehensive set of fields are presented, some of which are mandatory. Drop-down menus are available where possible to limit the use of free text and to facilitate data capture by minimising typing. Additional information can also be uploaded, for example Agilent Bioanalyser traces or Excel spreadsheets of quality control (QC) information. The successive stages follow a logical order and enable customised fields to be presented depending on choices applied in the previous step (Additional File 2).
Each stage was implemented in a flexible way to enable the easy capture of diverse experimental designs including complex pooling and splitting strategies. Data captured can be saved at any stage with the option to complete the remaining stages at a later time. Options to duplicate entries are available, where appropriate, to reduce the amount of typing for capturing details about multiple similar samples. The Online Annotation Tool is currently configured to capture data from gene expression studies including single-channel (Affymetrix 3', Exon and Gene arrays) and two-colour (Agilent) arrays, miRNA profiling, and can, in future, be extended to other microarray applications such as ChIP-on-chip. A detailed Help menu is available at each stage with comprehensive examples of experiment, sample or QC information recorded. The Online Annotation Tool allows users to rapidly and efficiently submit many experimental and QC details: it takes less than one hour to complete the entire process for the majority of experiments submitted to the Centre (involving up to 50 samples). Large scale experiments (with more than 200 samples) can be submitted using the Online Annotation Tool or via a standardised spreadsheet-based pipeline under development that can be customised for individual projects and parsed programmatically for storage into MiMiR.
Once all stages of data capture are completed online, the information provided is automatically checked for inconsistencies and missing data and is then ready for internal curation using the Curation and Annotation tools.
Curation and further annotation of experimental information
• Clinical data capture, annotation and link with microarray data
The storage and analysis of individual clinical and genetic data derived from human patients and volunteers is highly sensitive and requires that appropriate policies and procedures are defined in respect of ethical issues. MiMiR has been given formal approval to operate within strict guidelines under the jurisdiction of a Multi-centre Research Ethics Committee (MREC, Reference: 05/MREC05/69). The approval covers the handling of anonymised subjects clinical information which is typically recorded in hospital patient management systems. The ethical framework that governs the supply of data to the clinical part of MiMiR, called cMiMiR, and the subsequent use by researchers is described in Additional File 4 and Additional Files 5, 6, 7, 8.
Data Mapping Tool
Clinical data is commonly recorded in Access, Excel or similar databases that are used as routine patient management systems or clinical trial-specific databases. We developed a Data Mapping Tool to translate clinical information into codified clinical ontology terms and concepts and to allow for these descriptions to be imported into MiMiR in a standardised and structured way. Several coding schemes exist, providing recognised sets of unique concept identifiers. These include SNOMED-CT http://www.snomed.org and the Unified Medical Language Service (UMLS) http://umlsinfo.nlm.nih.gov. The UMLS was chosen and implemented in MiMiR as it is used by international efforts such as the National Cancer Institute caBIG™ https://cabig.nci.nih.gov/ and it can provide access to SNOMED-CT terms via its knowledge source web site http://umlsinfo.nlm.nih.gov/. The UMLS API is used to map each entity in the source data to the corresponding clinical ontology term and the associated encoded values are then automatically assigned (Additional File 5). The resulting encoded record is represented in an XML format and linked in the database to the corresponding biosamples and experimental information. A comprehensive user guide with a detailed practical example showing screenshots of the various stages of clinical annotation is available in Additional File 9.
• MiMiR Online experiment browser
Upon logging into MiMiR Online, a list of available experiments is displayed and each experiment can be individually selected to visualise the design, sample and hybridisation details (Figure 3b). Users can navigate through the comprehensive experimental information recorded including factors, biosources, biosamples, treatments, labelled extracts, protocols. Quality control assessments are displayed at several key steps (e.g. total RNA, labelled extract and scan) to flag potentially problematic samples that may lead to unreliable data.
The raw data files can be downloaded from several locations either in bulk (all the files for a single experiment), by factor group or for single hybridisation. Clicking on the 'Download' icon initiates retrieval of the relevant files from the database, which are zipped and sent to the web browser. A processing bar monitors the progress of the download which typically takes less than 30 seconds per .CEL file for an Affymetrix U133Plus 2.0 array.
The quality and reliability of each dataset is assessed by looking at a number of quality assessment (QA) plots and metrics generated using the BioConductor open source software framework http://www.bioconductor.com and the Affymetrix Expression Console™ software http://www.affymetrix.com that are compiled into a comprehensive QA report available for download in pdf format (Figure 3b). A comprehensive list of pheno-data e.g. experimental factors, biological and technical variables, can also be downloaded and saved in the appropriate format suitable for import into both open source (Bioconductor) or proprietary (Partek.®) software. This can be extremely useful to analyse systematic errors (e.g. technical batch effect) alongside studied biological effects.
• Data Analysis using EMAAS
• MAGE-ML export pipelines to ArrayExpress and Resolver
Data in MiMiR is sent to ArrayExpress upon publication and the original ArrayExpress export pipeline has been re-engineered into a more generic tool. A model-driven approach was adopted, whereby a local UML model was designed to represent all experimental meta-data that is required for a valid MAGE-ML submission to ArrayExpress or to the Resolver analysis package. Two sets of Java classes are created to first interrogate the MiMiR/middleware layer and extract data elements, and to then populate a MAGEstk/Java data model to generate a MAGE-ML (xml) file. The ArrayExpress validation toolkit http://www.ebi.ac.uk/~ele/ext/submitter.html#val is incorporated into the MAGE-ML building process to provide automated validation of MAGE-ML files generated for export to the relevant system. A total of 24 experiments (corresponding to 730 whole genome arrays) have been submitted to ArrayExpress to date and all the experiment annotations are of the highest quality, as confirmed by the highest MIAME score  assigned by ArrayExpress.
It is recognised that inconsistent and low quality experimental annotation in public data repository significantly compromises the re-use of microarray data for meta-analysis [1, 23]. MiMiR was designed to overcome this major limitation. Users can submit experimental information in an easy, fast and secure way via the web. The meta-data is collected and stored in MiMiR at the time of data generation rather than at the point of publication and submission to ArrayExpress or GEO, which can take up to several years. As a result, MiMiR captures more accurate and comprehensive experiment information than public repositories and most other microarray databases, and therefore provides rich experimental details often required for data mining and cross-experiment re-analysis. The experimental annotation process is efficiently performed by programmatically building the experiment structure from the submitted information and automatically populating over 60 percent of the required fields. This is recognised as a major advantage and other systems are looking at improving the performance and speed of sample annotation .
Data is centralised in MiMiR in a highly secure way enabling researchers to share data prior to publication: this is particularly useful for the national and international consortia that MiMiR supports. MiMiR is compliant with MAGE and uses MAGE-ML for data exchange with other MAGE databases (e.g. ArrayExpress and Resolver) rather than the simplified MAGE-TAB format .
MiMiR is fully integrated with the Rosetta Resolver analysis package and experimental information is automatically built in Resolver from annotations stored in MiMiR. Analysis of MiMiR data can also be done using the freely available EMAAS portal . The EMAAS user base is growing very rapidly and the system is continuously being updated with latest analysis algorithms to support new chip types and applications.
It is well known that molecular signatures derived from microarray clinical studies can be unstable and highly dependant on the selection of patients used in the training set . Michiels et al. for example, found that five of the seven largest published studies addressing cancer prognosis did not classify patients better than chance . Good validation of prognostic or predictive gene expression profiles requires large patient cohort and the clinical part of MiMiR could be used as a platform to build centralised data sets for this purpose.
MiMiR stores raw unprocessed microarray data like in GEO and ArrayExpress in order to maximise the long term value of datasets and enable processing and re-analysis of data. However normalisation is necessary in order to mine data across different experiments and we are planning to develop a dynamic normalisation pipeline to allow such comparisons. We also envisage to develop standard analysis pipelines to generate lists of differentially expressed genes that will be made available for mining, querying and further analysis. Query and search functionalities will be implemented in the system to interrogate and retrieve datasets of interest for example by species, tissue or array type.
MiMiR is a mature microarray data warehouse containing over 3000 arrays worth of data for mining and analysis and supports over 200 research groups, including two international consortia. MiMiR is not a new microarray public repository but it provides a secure environment for collection, capture, consistent annotation, visualisation and dissemination of data to our large user community and collaborators. The clinical part of MiMiR also represents a unique resource for clinicians and researchers to effectively share, mine and analyse clinical information and large scale molecular profiling data within an ethically approved environment. Analysis of MiMiR data is enabled through integration with commercial and freeware analysis packages and will be enhanced by additional normalisation and analysis pipelines. MiMiR is a powerful, scalable and flexible resource that can potentially be extended to new data modalities like next generation sequencing data for which similar ethical, social and clinical constrains apply and are beginning to be addressed by the research and clinical communities [29, 30].
MiMiR Online and the Online Annotation Tool can be accessed from the Microarray Centre-MiMiR User Centre web site http://microarray.csc.mrc.ac.uk. The code for the Curation and Annotation tools as well as the MAGE-ML export pipeline and the Data Mapping Tool can be made available on request. A comprehensive user manual for the Annotation and Curation tools is also available from the Microarray Centre web site. The tools have been optimised for Windows environment and, although untested, could be used with other operating systems.
The authors acknowledge funding from the Medical Research Council, the Department of Health (NEAT), the BBSRC (BEP), and the European Union (EURATools). We thank the NEAT Management Group and Consumer Advisory Group and in particular Lady Sarah Riddle, Prof Hani Gabra, Prof Junia Melo and other clinical collaborators at the Hammersmith Hospital. We are grateful to Dr Helen Causton and Dr Jonathan Mangion for helpful discussions and comments, and to the Microarray Centre users for providing feedback on using MiMiR.
- Larsson O, Sandberg R: Lack of correct data format and comparability limits future integrative microarray research. Nat Biotechnol 2006, 24(11):1322–1323. 10.1038/nbt1106-1322View ArticlePubMedGoogle Scholar
- Stoeckert C, Parkinson H: The MGED Ontology: a framework for describing functional genomics experiments. Comparitive and Functional Genomics 2003, 4: 127–132. 10.1002/cfg.234View ArticleGoogle Scholar
- Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game L, Heiskanen M, Morrison N, Rocca-Serra P, et al.: The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 2006, 22(7):866–873. 10.1093/bioinformatics/btl005View ArticlePubMedGoogle Scholar
- Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M, et al.: Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 2002, 3(9):RESEARCH0046. 10.1186/gb-2002-3-9-research0046PubMed CentralView ArticlePubMedGoogle Scholar
- Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, et al.: Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001, 29(4):365–371. 10.1038/ng1201-365View ArticlePubMedGoogle Scholar
- Strauss E: Arrays of hope. Cell 2006, 127(4):657–659. 10.1016/j.cell.2006.11.005View ArticlePubMedGoogle Scholar
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res 2007, (35 Database):D760–765. 10.1093/nar/gkl887Google Scholar
- Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, et al.: ArrayExpress–a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 2007, (35 Database):D747–750. 10.1093/nar/gkl995Google Scholar
- Smith CM, Finger JH, Hayamizu TF, McCright IJ, Eppig JT, Kadin JA, Richardson JE, Ringwald M: The mouse Gene Expression Database (GXD): 2007 update. Nucleic Acids Res 2007, (35 Database):D618–623. 10.1093/nar/gkl1003Google Scholar
- Saal LH, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol 2002, 3(8):SOFTWARE0003. 10.1186/gb-2002-3-8-software0003PubMed CentralView ArticlePubMedGoogle Scholar
- Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, et al.: The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 2003, 31(1):94–96. 10.1093/nar/gkg078PubMed CentralView ArticlePubMedGoogle Scholar
- Mazzarelli JM, Brestelli J, Gorski RK, Liu J, Manduchi E, Pinney DF, Schug J, White P, Kaestner KH, Stoeckert CJ Jr: EPConDB: a web resource for gene expression related to pancreatic development, beta-cell function and diabetes. Nucleic Acids Res 2007, (35 Database):D751–755. 10.1093/nar/gkl748Google Scholar
- Pan F, Chiu CH, Pulapura S, Mehan MR, Nunez-Iglesias J, Zhang K, Kamath K, Waterman MS, Finch CE, Zhou XJ: Gene Aging Nexus: a web database and data mining platform for microarray data on aging. Nucleic Acids Res 2007, (35 Database):D756–759. 10.1093/nar/gkl798Google Scholar
- Splendiani A, Brandizi M, Even G, Beretta O, Pavelka N, Pelizzola M, Mayhaus M, Foti M, Mauri G, Ricciardi-Castagnoli P: The genopolis microarray database. BMC Bioinformatics 2007, 8(Suppl 1):S21. 10.1186/1471-2105-8-S1-S21PubMed CentralView ArticlePubMedGoogle Scholar
- Marzolf B, Deutsch EW, Moss P, Campbell D, Johnson MH, Galitski T: SBEAMS-Microarray: database software supporting genomic expression analyses for systems biology. BMC Bioinformatics 2006, 7: 286. 10.1186/1471-2105-7-286PubMed CentralView ArticlePubMedGoogle Scholar
- Demeter J, Beauheim C, Gollub J, Hernandez-Boussard T, Jin H, Maier D, Matese JC, Nitzberg M, Wymore F, Zachariah ZK, et al.: The Stanford Microarray Database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 2007, (35 Database):D766–770. 10.1093/nar/gkl1019Google Scholar
- Ameur A, Yankovski V, Enroth S, Spjuth O, Komorowski J: The LCB Data Warehouse. Bioinformatics 2006, 22(8):1024–1026. 10.1093/bioinformatics/btl036View ArticlePubMedGoogle Scholar
- Le Brigand K, Barbry P: Mediante: a web-based microarray data manager. Bioinformatics 2007, 23(10):1304–1306. 10.1093/bioinformatics/btm106View ArticlePubMedGoogle Scholar
- Navarange M, Game L, Fowler D, Wadekar V, Banks H, Cooley N, Rahman F, Hinshelwood J, Broderick P, Causton HC: MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data. BMC Bioinformatics 2005, 6: 268. 10.1186/1471-2105-6-268PubMed CentralView ArticlePubMedGoogle Scholar
- Barton G, Saleem A, Krznaric M, Abbott J, MJ S, Tiwari B, Aitman T, Game LJMS, Huang Y, et al.: EMAAS: An extensible grid-based portal for microarray data analysis and management. BMC Bioinformatics 2008, in press.Google Scholar
- The Chipping Forecast II: Supplement to Nature Genetics. 2002, 32.Google Scholar
- Sherman BT, Huang da W, Tan Q, Guo Y, Bour S, Liu D, Stephens R, Baseler MW, Lane HC, Lempicki RA: DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis. BMC Bioinformatics 2007, 8: 426. 10.1186/1471-2105-8-426PubMed CentralView ArticlePubMedGoogle Scholar
- Day A, Carlson MR, Dong J, O'Connor BD, Nelson SF: Celsius: a community resource for Affymetrix microarray data. Genome Biol 2007, 8(6):R112. 10.1186/gb-2007-8-6-r112PubMed CentralView ArticlePubMedGoogle Scholar
- Safran M, Chalifa-Caspi V, Shmueli O, Olender T, Lapidot M, Rosen N, Shmoish M, Peter Y, Glusman G, Feldmesser E, et al.: Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 2003, 31(1):142–146. 10.1093/nar/gkg050PubMed CentralView ArticlePubMedGoogle Scholar
- Draghici S, Tarca AL, Yu L, Ethier S, Romero R: KUTE-BASE: storing, downloading and exporting MIAME-compliant microarray experiments in minutes rather than hours. Bioinformatics 2008, 24(5):738–740. 10.1093/bioinformatics/btm559PubMed CentralView ArticlePubMedGoogle Scholar
- Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, et al.: A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 2006, 7: 489. 10.1186/1471-2105-7-489PubMed CentralView ArticlePubMedGoogle Scholar
- Abdullah-Sayani A, Bueno-de-Mesquita JM, Vijver MJ: Technology Insight: tuning into the genetic orchestra using microarrays–limitations of DNA microarrays in clinical practice. Nat Clin Pract Oncol 2006, 3(9):501–516. 10.1038/ncponc0587View ArticlePubMedGoogle Scholar
- Michiels S, Koscielny S, Hill C: Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 2005, 365(9458):488–492. 10.1016/S0140-6736(05)17866-0View ArticlePubMedGoogle Scholar
- McGuire AL, Cho MK, McGuire SE, Caulfield T: Medicine. The future of personal genomics. Science 2007, 317(5845):1687. 10.1126/science.1147475PubMed CentralView ArticlePubMedGoogle Scholar
- McGuire AL, Caulfield T, Cho MK: Research ethics and the challenge of whole-genome sequencing. Nat Rev Genet 2008, 9(2):152–156. 10.1038/nrg2302PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.