Eureka-DMA: an easy-to-operate graphical user interface for fast comprehensive investigation and analysis of DNA microarray data
© Abelson; licensee BioMed Central Ltd. 2014
Received: 8 July 2013
Accepted: 14 February 2014
Published: 24 February 2014
In the past decade, the field of molecular biology has become increasingly quantitative; rapid development of new technologies enables researchers to investigate and address fundamental issues quickly and in an efficient manner which were once impossible. Among these technologies, DNA microarray provides methodology for many applications such as gene discovery, diseases diagnosis, drug development and toxicological research and it has been used increasingly since it first emerged. Multiple tools have been developed to interpret the high-throughput data produced by microarrays. However, many times, less consideration has been given to the fact that an extensive and effective interpretation requires close interplay between the bioinformaticians who analyze the data and the biologists who generate it. To bridge this gap and to simplify the usability of such tools we developed Eureka-DMA — an easy-to-operate graphical user interface that allows bioinformaticians and bench-biologists alike to initiate analyses as well as to investigate the data produced by DNA microarrays.
In this paper, we describe Eureka-DMA, a user-friendly software that comprises a set of methods for the interpretation of gene expression arrays. Eureka-DMA includes methods for the identification of genes with differential expression between conditions; it searches for enriched pathways and gene ontology terms and combines them with other relevant features. It thus enables the full understanding of the data for following testing as well as generating new hypotheses. Here we show two analyses, demonstrating examples of how Eureka-DMA can be used and its capability to produce relevant and reliable results.
We have integrated several elementary expression analysis tools to provide a unified interface for their implementation. Eureka-DMA's simple graphical user interface provides effective and efficient framework in which the investigator has the full set of tools for the visualization and interpretation of the data with the option of exporting the analysis results for later use in other platforms. Eureka-DMA is freely available for academic users and can be downloaded at http://blue-meduza.org/Eureka-DMA.
KeywordsGUI Software Microarray Analysis Differential expression Pathways Gene ontology
Since it was first introduced, the use of microarray technology has been growing rapidly. The capacity to survey the expression of thousands of genes in a single experiment was proven to be extremely valuable for gene expression profiling in many fields including basic biological studies; medical diagnostics and personalized medicine; drug discovery and development; toxicogenomics; and cancer research [1–3]. Gene expression analysis has moved well beyond the simple goal of identifying few genes of interest. Other algorithms - for data visualization and profiling, for assessing the involvement of genes in particular molecular pathways and for searching for the enrichment of a common ontology - have all become important approaches, both for the pursuit of hypothesis-driven inquiries and for the generation of new hypotheses. Therefore, efficient analysis and interpretation of the high volume data that is produced from microarrays represents a major challenge; it may become the most formidable obstacle, which biologists face once trying to extract meaningful information from their experiments. Many software-products specifically designed for microarray analysis are available. These range from free simple programs that preform several basic tasks with relatively limited scientific benefit to comprehensive programs, which usually require prior knowledge in statistics; these are also usually expensive and their complicated usability usually restricts their usage to researchers already familiar with the program's interface. Eureka-DMA is an application that combines the simplicity of operation and data management with the executions of multiple analysis tasks to transform the high throughput data into meaningful and understandable information. Eureka-DMA provides algorithms for searching genes with differential expression between groups, searching for enrichment of pathways from KEGG PATHWAY database (http://www.genome.jp/kegg/pathway.html) and enrichment of Gene Ontology (GO) terms from the Gene Ontology database (http://www.geneontology.org). Furthermore, it incorporates options for the analysis of time series datasets and expression profiling; all implemented, in an explicit, easy-to-use graphical user interface (GUI), designed to provide intuitive control throughout data processing. The accessibility and simplicity of Eureka-DMA, combined with the set of bioinformatics tools for full comprehensive analysis should address a wide range of scientists: specialists, as well as enthusiastic researchers with no prior knowledge in bioinformatics.
General software descriptions, aspects and design
Eureka-DMA software was written in MATLAB programming language and can be run as a GUI in the MATLAB environment or as a standalone Microsoft Windows executable. Currently, multiple effective microarray analysis algorithms are available as MATLAB functions but their usability is restricted to researchers experienced in computer science and familiar with MATLAB's computing environment. In order to provide a unified platform for an extensive interpretation of microarray data, we integrated several MATLAB-analysis tools - either as is or with slight modifications to allow proper usage with Eureka-DMA's GUI - with new tools. Eureka-DMA's main GUI enables simple and quick control throughout the elucidation processes of the input data by the interaction with several other sub-GUIs, all designed to operate with limited user-input. Eureka-DMA's main GUI also includes “ToolTip” explanation elements for the different options, helping less advanced users understand the purposes of the different software components. For programmers and bioinformaticians who might wish to edit the code, the script contains detailed explanations for the embedded functions and is designed to enable easy understanding along with easy accessibility for the implementation of modifications. Last but not least, Eureka-DMA provides with functional annotation files for mouse (M.musculus) and human, allowing the integration of a standalone gene description tool, which enables the withdrawal of the gene's summary paragraph from the NCBI gene database (http://www.ncbi.nlm.nih.gov/gene) into the main GUI and to generate description reports about genes of interest. The appropriate species is recognized automatically according to the user data file and does not require user input.
Loading and exporting of data
It is necessary that raw data will be input as Windows Excel or text files. These familiar file formats spare users from dealing with multiple and less common microarray files received by different manufacturers and can also be easily obtained by exportation from the various microarray manufacturer company's softwares. When pulling data from microarray databases, such as Gene Expression Omnibus (GEO)  and ArrayExpress , small manual adjustments may be required. The download package provides example data files that can be used either as a template or to experiment with Eureka-DMA different options. Results can be exported at any time during the analysis into ‘xls’, Windows Excel or ‘txt’ file formats for further process of the data with other external tools. Furthermore, the different visualization tools enable exportation into common figure formats.
Normalization of Raw data
Microarrays are intended to detect the expression level variation between different samples, making normalization an important preceding step to any analysis. A number of normalization methods have been designed to address the many reasons which can lead to non-biological inconsistency between samples. Quantile normalization is one method which is frequently used when interpreting microarray data . Subject to user decision, when a set of arrays is available, this method will make the distribution of the intensities for each array the same; this process can be tracked with the visualization box plot tool in Eureka-DMA main GUI.
Filtering Non-relevant data
Eureka-DMA offers two options for subtraction of transcripts which are expressed below background intensities. 1) User defined p-value threshold: Raw data is comprised with manufacturer specific probe IDs assigned to internal controls for the calculation of detection p-values . When detection p-values are available, the user can set a threshold to filter transcripts with high detection p-values. 2) User defined intensity threshold: In case that detection p-values are not available, the user can estimate and set an intensity threshold to filter transcripts with low intensities, based on prior knowledge (e.g., genes expressed only in brain tissue will have low intensities when liver tissue is tested) or observation of the raw data for the internal negative controls . It is to be noted that for both options, subtraction of transcripts is done only when all the samples do not meet the criteria, thus avoiding the elimination of those transcripts which did not get detected in only part of the samples.
Detecting differential expression among conditions
Where A = vector of 1 through the number of samples and B = vector of the gene's intensities across samples. Genes that pass the user defined r threshold will be displayed by a rank order, determined by the intra-group variation and the differential expression of the gene between the groups. Additionally, Eureka-DMA provides a filtering option based on minimal variation criteria across samples .
Pathway enrichment analysis
Mapping genes on cellular pathways is one of the primary goals in the analysis of transcriptomic data and is very important for inferring biological mechanisms of diverse biological conditions. Eureka-DMA contains a statistical feature that seeks biological and chemical pathways from KEGG pathway database that are significantly over-represented in the user's gene list. With respect to a background set of genes, hypergeometric probability density test  is conducted to check whether the number of differentially expressed genes from the user's data was greater than expected by chance. Nominal p-values are assigned and the user can browse through the enriched pathways and create an illustration of the pathway with the differentially expressed genes highlighted. It is to be noted that this analysis can be initiated with any list of genes and is not dependent on any previous analyses.
Functional enrichment of gene ontology
After identifying the differentially expressed genes, one will wish to ascribe them some biological meaning. Identification of significant co-clustered genes with similar properties (i.e. shares a cellular component, a biological process, or a molecular function) can be achieved with the ‘Gene Ontology’ gene annotation scheme. Eureka-DMA will apply the hypergeometric probability density test  to pinpoint GO categories that are statistically over-represented in the set of genes defined by the user's previous analyses. Results may be illustrated as hierarchical graphical displays, which provide the summarization of the significance GO terms. Updating the GO database version can be done automatically by user's choice.
Clustering, classification and visualization tools
Clustering and classification algorithms applied to high dimensional expression data has the potential to provide deeper insight to the underlying data structure and is commonly used when a priori knowledge about specific subgroups is lacking. Eureka-DMA supports several widely used clustering and classification algorithms such as hierarchical and K-means clustering [11–13], as well as probabilistic principal component analysis (PPCA) , all of which can be used in case the data suffer from missing values . Eureka-DMA also provides the following visualization tools: an interactive volcano plot that demonstrates genes with differential expression; a heat-map accompanying the hierarchical clustering analysis; a gene ontology bio-graph showing the terms and their ancestry; an illustration of enriched KEGG pathways; a bar plot; and, finally, box plots that display the differences between all the analyzed entries across all samples or the differences between single entries across the user defined groups. The variety of the provided visualization tools can be utilized separately for their intended use or in combination for quality control and analysis assessment.
Primer design for RT-qPCR validation
Reverse transcription quantitative PCR (RT-qPCR) is often used to validate gene expression measurements from DNA microarray experiments. Eureka-DMA offers a primer design tool that can either import the list of genes from the software's main GUI or used as a standalone without the requirement of any previous analyses. The primer design GUI contains various execution options: 1) cDNA sequence data can be imported directly from GenBank (http://www.ncbi.nlm.nih.gov/genbank) or be entered manually. 2) The user can choose to apply adjustable filters that control the desire primer's length, G and C nucleotide content, and melting temperature. 3) The user can choose to use additional parameters to filter primers that contain nucleotide repeats and primers that might form secondary structures such as hairpins, dimers and cross dimerization. 4) The length of the desire transcripts is controllable. The impact of individual filters can be observed in a 2 dimensional plot that can also reveal the existence of problematic regions within the cDNA transcript.
Results and discussion
Eureka-DMA has been tested on a number of data sets in order to assess the program’s capability to deliver meaningful and relevant biological insights regarding the analyzed data. The results of two such analyses demonstrating the feasibility of Eureka-DMA are reported here. The first dataset contains expression data of 83 colorectal cancer (CRC) patients, divided into two groups: responders and non-responders to FOLFOX chemotherapy. The second dataset is obtained from a time series study which assesses the global changes in gene expression patterns, in the lungs of mice infected with influenza virus, over a period of 60 days.
Analysis of colorectal tumors of responders and Non-responders to FOLFOX chemotherapy
Analysis of the lung transcriptome in mice infected with influenza A virus
Eureka-DMA aspires to provide a unified and flexible platform for microarray data analysis, interpretation and visualization, and can be also used as a fast validation tool for results obtained by different analysis methods. It was programmed in MATLAB, exploiting many elements from its various toolboxes while offering friendly integration with other essential features. Eureka-DMA is the only implementation tool to provide the generation of reports that include descriptive biological information regarding genes of interests. It is also the only software for investigation and analysis of DNA microarray data to include a tool for primer design. By its novel outline, this tool allows for a quick access to the inseparable part of RTq-PCR validation that ultimately leads to the complete analysis of the data. The analysis in many other software-products is performed “behind the scene” and provides only final output; this disconnects the user from his data, potentially leading to sub-optimal decisions. Eureka-DMA unique design on the other hand, allows the user to conveniently navigate through the data and understand easily how his actions affect the outcome. Through minimal system requirements and simplicity of interface, these tools can be conveniently applied by a broader range of researchers, including biologists with limited programming or scripting skills. After testing and ensuring its capability of successfully delivering biologically-meaningful and reliable results, it is our hope that Eureka-DMA will be found useful to many other in their various research areas. Eureka-DMA's files and a step-by-step manual are freely available at the software website and are also supplemented as Additional files 3 and 4 respectively.
Availability and requirements
Home page: http://blue-meduza.org/Eureka-DMA
Operating system: Windows, if used as a stand-alone. Application or platform independent, if used under MATLAB.
Requirements: If used under MATLAB: MATLAB 2010a or newer, Bioinformatics, Statistics and Image Processing Toolboxes are required. If used as a stand-alone application, MATLAB Component Runtime (MCR) is required (available to download with the software package).
Other requirements: Internet connection.
License: Free for non-commercial and academic use.
Graphical user interface
Gene expression omnibus
Probabilistic principal component analysis
reverse transcription quantitative PCR
MATLAB component runtime.
I would like to thank Professor Karl Skorecki (Rappaport Faculty of Medicine and Research Institute, Technion-Israel Institute of Technology, Rambam Medical Center, Haifa, Israel) for critical reading of the manuscript and to thank the Ed Satell foundation for the support during my graduate research.
- Yoo SM, Choi JH, Lee SY, Yoo NC: Applications of DNA microarray in disease diagnostics. J Microbiol Biotechnol. 2009, 19 (7): 635-646.PubMedGoogle Scholar
- Zarbl H: DNA microarrays: an overview of technologies and applications to toxicology. Curr Protoc Toxicol. 2001, 1.4.2-1.4.16-(Supplement. 9)
- Sadi AM, Wang DY, Youngson BJ, Miller N, Boerner S, Done SJ, Leong WL: Clinical relevance of DNA microarray analyses using archival formalin-fixed paraffin-embedded breast cancer specimens. BMC Cancer. 2011, 11: 253-10.1186/1471-2407-11-253. 251-213View ArticlePubMed CentralPubMedGoogle Scholar
- Barrett T, Edgar R: Gene expression omnibus: microarray data storage, submission, retrieval, and analysis. Methods Enzymol. 2006, 411: 352-369.View ArticlePubMed CentralPubMedGoogle Scholar
- Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, et al: ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003, 31 (1): 68-71. 10.1093/nar/gkg091.View ArticlePubMed CentralPubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19 (2): 185-193. 10.1093/bioinformatics/19.2.185.View ArticlePubMedGoogle Scholar
- Archer KJ, Reese SE: Detection call algorithms for high-throughput gene expression microarray data. Brief Bioinform. 2010, 11 (2): 244-252. 10.1093/bib/bbp055.View ArticlePubMed CentralPubMedGoogle Scholar
- Kohane IS, Kho AT, Butte AJ: Microarrays for an integrative genomics. 2003, Cambridge, MA: MIT Press, ISBN: 026211271XGoogle Scholar
- Cui X, Churchill GA: Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 2003, 4 (4): 210-10.1186/gb-2003-4-4-210.View ArticlePubMed CentralPubMedGoogle Scholar
- Mood AM, Graybill FA, Boes DC: Introduction to the theory of statistics. Third Edition. 1974, New York: McGraw-Hill, ISBN-13: 9780070854659Google Scholar
- Bar-Joseph Z, Gifford DK, Jaakkola TS: Fast optimal leaf ordering for hierarchical clustering. Bioinformatics. 2001, 17 (Suppl 1): S22-29. 10.1093/bioinformatics/17.suppl_1.S22.View ArticlePubMedGoogle Scholar
- Seber GAF: Multivariate observations. 1984, New York: John Wiley & Sons, Inc, ISBN 10: 047188104XView ArticleGoogle Scholar
- Spath HT, Goldschmidt J: Cluster dissection and analysis: theory, FORTRAN programs, examples. Translated by J. Goldschmidt. 1985, New York: Halsted Press,Google Scholar
- Michael E: Tipping CMB: probabilistic principal component analysis. J R Stat Soc Ser B (Stat Methodol). 1999, 61 (3): 611-622. 10.1111/1467-9868.00196.View ArticleGoogle Scholar
- Alexander Ilin TR: Practical approaches to principal component analysis in the presence of missing values. J Mach Learn Res. 2010, 11: 1957-2000.Google Scholar
- Tsuji S, Midorikawa Y, Takahashi T, Yagi K, Takayama T, Yoshida K, Sugiyama Y, Aburatani H: Potential responders to FOLFOX therapy for colorectal cancer by random forests analysis. Br J Cancer. 2012, 106 (1): 126-132. 10.1038/bjc.2011.505.View ArticlePubMed CentralPubMedGoogle Scholar
- Breiman L: Random forests. Mach Learn. 2001, vol. 45: 5-32. 10.1023/A:1010933404324.View ArticleGoogle Scholar
- Rodriguez-Antona C, Ingelman-Sundberg M: Cytochrome P450 pharmacogenetics and cancer. Oncogene. 2006, 25 (11): 1679-1691. 10.1038/sj.onc.1209377.View ArticlePubMedGoogle Scholar
- Krueger SK, Williams DE: Mammalian flavin-containing monooxygenases: structure/function, genetic polymorphisms and role in drug metabolism. Pharmacol Ther. 2005, 106 (3): 357-387. 10.1016/j.pharmthera.2005.01.001.View ArticlePubMed CentralPubMedGoogle Scholar
- Hyung SW, Lee MY, Yu JH, Shin B, Jung HJ, Park JM, Han W, Lee KM, Moon HG, Zhang H, et al: A serum protein profile predictive of the resistance to neoadjuvant chemotherapy in advanced breast cancers. Mol Cell Proteomics. 2011, 10: M111.011023-10.1074/mcp.M111.011023.View ArticlePubMed CentralPubMedGoogle Scholar
- Han Y, Huang H, Xiao Z, Zhang W, Cao Y, Qu L, Shou C: Integrated analysis of gene expression profiles associated with response of platinum/paclitaxel-based treatment in epithelial ovarian cancer. PLoS One. 2012, 7 (12): e52745-10.1371/journal.pone.0052745.View ArticlePubMed CentralPubMedGoogle Scholar
- Huang H, Li Y, Liu J, Zheng M, Feng Y, Hu K, Huang Y, Huang Q: Screening and identification of biomarkers in ascites related to intrinsic chemoresistance of serous epithelial ovarian cancers. PLoS One. 2012, 7 (12): e51256-10.1371/journal.pone.0051256.View ArticlePubMed CentralPubMedGoogle Scholar
- Sherman-Baust CA, Becker KG, Wood Iii WH, Zhang Y, Morin PJ: Gene expression and pathway analysis of ovarian cancer cells selected for resistance to cisplatin, paclitaxel, or doxorubicin. J Ovarian Res. 2011, 4 (1): 21-10.1186/1757-2215-4-21.View ArticlePubMed CentralPubMedGoogle Scholar
- Krupp M, Maass T, Marquardt JU, Staib F, Bauer T, Konig R, Biesterfeld S, Galle PR, Tresch A, Teufel A: The functional cancer map: a systems-level synopsis of genetic deregulation in cancer. BMC Med Genomics. 2011, 4: 53-10.1186/1755-8794-4-53.View ArticlePubMed CentralPubMedGoogle Scholar
- Kubisch R, Meissner L, Krebs S, Blum H, Gunther M, Roidl A, Wagner E: A comprehensive gene expression analysis of resistance formation upon metronomic cyclophosphamide therapy. Transl Oncol. 2013, 6 (1): 1-9.View ArticlePubMed CentralPubMedGoogle Scholar
- Pommerenke C, Wilk E, Srivastava B, Schulze A, Novoselova N, Geffers R, Schughart K: Global transcriptome analysis in influenza-infected mouse lungs reveals the kinetics of innate and adaptive host immune responses. PLoS One. 2012, 7 (7): e41169-10.1371/journal.pone.0041169.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.