Construct and Compare Gene Coexpression Networks with DAPfinder and DAPview
© Skinner et al; licensee BioMed Central Ltd. 2011
Received: 7 March 2011
Accepted: 14 July 2011
Published: 14 July 2011
DAPfinder and DAPview are novel BRB-ArrayTools plug-ins to construct gene coexpression networks and identify significant differences in pairwise gene-gene coexpression between two phenotypes.
Each significant difference in gene-gene association represents a Differentially Associated Pair (DAP). Our tools include several choices of filtering methods, gene-gene association metrics, statistical testing methods and multiple comparison adjustments. Network results are easily displayed in Cytoscape. Analyses of glioma experiments and microarray simulations demonstrate the utility of these tools.
DAPfinder is a new friendly-user tool for reconstruction and comparison of biological networks.
Microarray researchers need easy-to-use tools to identify differences in the coexpression and coregulation of genes between phenotypes that cannot be identified with traditional tools. Often researchers compute Student's t-tests, analysis of variance (ANOVA), significance analysis of microarrays  or empirical Bayes analysis  for each gene on their microarray to identify individual differentially expressed genes (DEGs) among two or more phenotypes . Unfortunately, these approaches ignore coexpression because they cannot account for the complex multivariate relationships among genes. Multivariate statistical methods like hierarchical clustering and principle components analysis (PCA) are often used for quality control and exploration of microarray data. However, these multivariate methods do not effectively model coexpression nor do they allow for hypothesis tests to compare phenotypes. Gene-gene association networks built using ARACNe , context likelihood relatedness (CLR) , maximum relevancy (MR) [6, 7] and other methods often provide helpful models of coexpression and coregulation, but the networks are based on data from a single phenotype and are not easily compared using statistical tests. New methods are needed to account for the complex relationships among genes while providing hypothesis tests to compare phenotypes.
Several research groups have addressed the question of comparing the coexpression of specific gene-gene pairs or coexpression networks among two or more phenotypes. Two early examples used search algorithms to identify optimally sized clusters of coexpressed genes and resampling tests to identify significant differences among the coexpressed clusters between phenotypes [8, 9]. Other published methods used variations on familiar statistical techniques like Fisher's Z tests or modified F-statistics to directly compare pairwise gene-gene correlations between two phenotypes [10–12]. Some of these methods [10, 11, 13] are readily available as source scripts of package libraries in R http://www.r-project.org. Some interesting approaches apply the results from statistical tests that compare pairwise gene-gene associations between two phenotypes to the construction and interpretation of gene coexpression networks [10, 14]. Both of these methods allow researchers to explore the complex differences among gene expression networks using statistical tests, but unfortunately neither method has been implemented in a user-friendly tool.
DAPfinder and DAPview are plug-ins for BRB-ArrayTools http://linus.nci.nih.gov/BRB-ArrayTools.html, which will provide researchers with accessible tools to test differences in the coexpression between two phenotypes and explore those results on gene association networks. BRB-ArrayTools is a comprehensive microarray analysis package that does not require specific skills in programming or direct script usage. It is available for free to non-commercial users and has more than 11,000 users in 65 countries . Our DAPfinder and DAPview tools will identify and visualize individual significant differences in gene-gene association between the two classes, each of which we will call a Differentially Associated Pair (DAP). Output from these tools can be used to construct gene-gene association networks and identify the significant differences in coexpression between two groups. Our hope is that these tools can be used to identify systems-level features in the gene-gene association networks like network growth or decay, network merging or splitting, and network birth or death, reflecting functional changes in biological pathways.
DAPfinder is used to compute pair-wise gene-gene associations (i.e. gene-gene correlations) for two groups of microarray experiments, then compare each specific gene-gene association between the two groups with a statistical test (Additional file 1, Figure S1). Gene-gene associations can be estimated using Pearson correlation coefficients, Spearman rank correlation coefficients, Kendall rank correlation coefficients or mutual information. Pearson correlations are the most familiar metric and the easiest to compute, but only the Spearman, Kendall and mutual information metrics are appropriate for nonlinear associations between genes. Significant Pearson correlations within each class are identified using a one-sample Fisher's Z-test. Differences in gene-gene correlations (i.e. Pearson, Spearman and Kendall) are automatically tested using Fisher's Z-test methods, while optional permutation tests are used to compare differences in gene-gene correlation or mutual information. P-values from the Fisher's Z-test methods are approximate p-values that assume large sample sizes; permutation tests make no assumption about sample size, but they require lengthy computation times. Permutation test calculations can be hastened by choosing from one of four gene-gene pair subset selection methods (Additional file 1, Figure S1). Tests can be computed with equal numbers of permutations for each gene-gene pair or with an adaptive method that identifies the minimum number of permutations required for each gene-gene pair. Fisher's Z-tests of individual Pearson correlations within each class or differences in correlation between the two classes can be corrected for multiple testing using false discovery rate (FDR) methods [16, 17], q-value methods [18–20] or Bonferroni family-wise error rate (FWER) methods using step-up adjusted p-values . The same multiple testing adjustments can be applied to the optional permutation tests. Researchers can pre-filter individual genes by the coefficient of variation (CV) of their gene expression, by a minimum sample size criteria (after outliers and missing data have been removed) or using the internal methods of BRB-ArrayTools. Researchers can also upload a specific list of gene-gene pairs for testing. Outliers among the individual expression values from each gene can be removed using univariate standard deviation or interquartile range (IQR) criteria.
Output from DAPfinder includes a hyper-text markup language (HTML) report and comprehensive output stored as an Excel spreadsheet or tab-delimited text file. The HTML report opens up automatically in a web browser to display the current user settings and diagnostics from the analyses. Reported user settings include choices of pre-filtering methods, association metrics and statistical tests, plus the directory location of the results. Diagnostics include the amount of missing data, the number of genes and gene-gene pairs used in the calculations and the computation time required. Optionally, the 10 most significant results from the Fisher's Z-tests and permutation test can be added to the HTML report. The comprehensive output includes the unique IDs and related annotations for both genes in each gene-gene pair, the individual gene-gene associations for each of the two groups with test statistics and p-values reported for the Pearson correlations in each group, the Fisher's Z-test statistics and p-values for comparisons between the two groups and finally the differences in association and permutation p-values between the two groups (if requested). These results can be sorted and reorganized in Excel to identify the most significant gene-gene associations in a single group, the most significant Fisher's Z-test results, etc. Results from the comprehensive output file can be directly imported into visualization software packages like Cytoscape [, http://www.cytoscape.org] to create network graphs using the two columns of unique IDs to define nodes and the columns of correlation coefficients or p-values to define edge weights. Both the HTML report and the comprehensive output are automatically saved to the user's BRB-ArrayTools project folder.
Evaluation of DAPfinder with Simulated Microarray Data
The efficacy of the DAPfinder procedures was evaluated using simulated microarray data with known gene-gene correlations to ensure its statistical methods can detect known differences in gene-gene association with high levels of statistical power and low levels of false positives. See the supplementary materials (Additional file 1) for details on the generation of simulated microarray data and other simulation methods. Simulation results were used to create receiver-operator characteristic (ROC) curves that explore the relationships between statistical power, sample size and effect strength under several different simulation conditions. Other simulations examined the relationship between approximate p-values from the Fisher's Z-tests and exact p-values from the permutation tests. Simulations were conducted entirely in R using the same R source code used to build DAPfinder.
Discoveries from Glioma Data
We noticed three features in the network that were not necessarily expected. First, more than half of the genes from this network were differentially expressed between the two classes of glioma (46 out 76 genes). This suggests there may be a general correlation between differential expression and differences in association between phenotypes. Second, the relationship between differential expression and direction of correlation from consistent edges may represent potential regulatory relationships among genes. Positive correlations occur whenever both genes are up- or down-regulated, while negative correlations occur whenever one gene is up-regulated and the other is down-regulated. Note, because the correlations are estimated within the same type of samples, either ODG or GBM, the fact that genes are up- or down-regulated in GBM relative to ODG should not influence the correlation results. This phenomenon is seen in all 48 correlations that are consistent between the ODG and GBM tumors. Third, the significant differences in gene-gene association seem to reflect the biological differences between ODG and GBM. Correlations that change direction between glioma types typically show strong positive or negative correlations consistent with regulation in ODG, while having zero correlation in GBM. This suggests that evolution of the tumor may lead to the loss of regulatory relationships in the de-differentiating tissue. The gene-gene association shrinks from 76 genes and 110 gene-gene pairs in ODG to 69 genes and 87 gene-gene pairs in GBM, suggesting systems-level network shrinkage from ODG to GBM resulting in loss of regulation functions.
Among the significant correlation changes in the network, we find three genes (MYT1L, EGFR, POSTN) known to have meaningful roles in glioma pathogenesis [27–29]. Myelin transcription factor 1 (MYTL1) is upregulated in the less malignant ODG tumors and it is a major factor necessary for neuronal differentiation . The significant difference in Pearson correlation between SOX5 and MYTL1 in ODG and GBM tumors is visualized with DAPview (Figure 1). Epidermal growth factor receptor (EGFR) is a famous member of the erbB family of receptors that involved in regulation of cell proliferation and differentiation. Deregulation of EGFR was shown to have critical role in gliomas  as well as in several other malignancies [32–36]. Up-regulation in the protein-coding gene POSTN (periostin) is correlated with metastasis in both melanoma and breast cancer . Although this analysis does not allow for definitive biological conclusions, it finds both previously established genes essential for tumorgenesis as wells as points to a new previously unexplored area of transcriptional regulation of gliomas. These results support the idea that estimating not only the structure but also changes in the co-expression gene networks can be a useful approach for understanding the disease process.
Analyses of empirical and simulated microarray data have shown that DAPfinder is a powerful tool to reconstruct and compare gene regulatory networks. Its design is not restricted to gene expression data from single channel and dual channel microarray experiments. The tool can also be used with expression data from RNA-Seq reads or it can analyze complex quantitative biological data like comparative genomic hybridization (CGH), metabolome, microbiome and proteome data. DAPfinder can also be used to compute gene-gene associations and construct gene coexpression networks, even when there is not a second phenotype for comparisons of gene-gene associations and networks. DAPfinder can be used within BRB-ArrayTools by biologists without specific skills in programming and/or direct script usage. Indeed, we have recently employed the tool in the meta-analysis of cervical cancer gene expression and comparative genomic hybridization data revealing critical events of tumor progression (Mine KL, Shulzhenko N, Yambartsev A, et al.: Reconstruction of an integrative gene regulatory meta-network reveals cell cycle and antiviral response as major drivers of cervical cancer, submitted). Future versions may extend the utility of the statistical tests and graphs to problems with 3 or more phenotypes, while alternative gene-gene association metrics and statistical tests can also be explored to ensure proper networks construction.
Availability and requirements
DAPfinder and DAPview may be downloaded for free from the NIAID Exon website http://exon.niaid.nih.gov/dapfinder/index.html. Complete installation instructions are provided on the website. DAPfinder and DAPview requires the installation of BRB-ArrayTools. BRB-ArrayTools currently requires the installation of Microsoft Excel, Java Virtual Machine, R 2.12.0 or higher and statconnDCOM on computer using the Microsoft Windows operating system. DAPfinder and DAPview are BRB-ArrayTools plug-ins, which mostly utilize open source R script files. A complete description of the DAPfinder and DAPview files can be found in our supplementary materials (Additional file 1). DAPfinder and DAPview are also available to download as Additional Files 2 and 3.
List of abbreviations
Analysis of Variance
Algorithm for the Reconstruction of Accurate Cellular Networks
Area Under Curve
Comparative Genomic Hybridization
Context Likelihood of Relatedness
Coefficient of Variation
Differentially Associated Pair
Differentially Expressed Gene
False Discovery Rate
Family-Wise Error Rate
Glioma Molecular Diagnostics Initiative
Hyper Text Markup Language
Maximum Relatedness or Minimum Redundancy
Principle Components Analysis
Portable Document File
Repository of Molecular Brain Neoplasia Data
Receiver Operator Characteristic.
This work was supported by funding from the Department of Intramural Research (DIR) and the Office of the Director (OD) at NIAID, NIH. Vivek Gopalan, Supriya Menezes and Ming-Chung Li helped with the initial coding of DAPfinder. Vijay Nagarajan and Michael Dolan helped produce and edit the figures. Natalia Shulzenko discussed biological questions motivating our tool.
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. P Natl Acad Sci USA 2001, 98: 5116–5121. 10.1073/pnas.091062498View ArticleGoogle Scholar
- Efron B, Tibshirani R, Storey JD, Tusher V: Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001, 96: 1151–1160. 10.1198/016214501753382129View ArticleGoogle Scholar
- Pan W: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics 2002, 18: 546–554. 10.1093/bioinformatics/18.4.546View ArticlePubMedGoogle Scholar
- Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 2006, 7(Suppl 1):S7. 10.1186/1471-2105-7-S1-S7PubMed CentralView ArticlePubMedGoogle Scholar
- Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 2007, 5: e8. 10.1371/journal.pbio.0050008PubMed CentralView ArticlePubMedGoogle Scholar
- Ding C, Peng H: Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 2005, 3: 185–205. 10.1142/S0219720005001004View ArticlePubMedGoogle Scholar
- Peng H, Long F, Ding C: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005, 27: 1226–1238.View ArticlePubMedGoogle Scholar
- Kostka D, Spang R: Finding disease specific alterations in the co-expression of genes. Bioinformatics 2004, 20(Suppl 1):i194–199. 10.1093/bioinformatics/bth909View ArticlePubMedGoogle Scholar
- Xiao YH, Frisina R, Gordon A, Klebanov L, Yakovlev A: Multivariate search for differentially expressed gene combinations. BMC Bioinformatics 2004., 5(20): doi:10.1186/1471-2105-10-20
- Choi JK, Yu US, Yoo OJ, Kim S: Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics 2005, 21: 4348–4355. 10.1093/bioinformatics/bti722View ArticlePubMedGoogle Scholar
- Dettling M, Gabrielson E, Giovanni P: Searching for differentially expressed gene combinations. Genome Biol 2005., 6(164):Google Scholar
- Lai Y, Wu B, Chen L, Zhao H: A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics 2004, 20: 3146–3155. 10.1093/bioinformatics/bth379View ArticlePubMedGoogle Scholar
- Watson M: CoXpress: differential co-expression in gene expression data. BMC Bioinformatics 2006., 7(509):Google Scholar
- Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, Califano A: A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. Mol Syst Biol 2008, 4: 169.PubMed CentralView ArticlePubMedGoogle Scholar
- Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y: Analysis of Gene Expression Data Using BRB-Array Tools. Cancer Inform 2007, 3: 11–17.PubMed CentralPubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met 1995, 57(1):289–300. doi:10.2307/2346101Google Scholar
- Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency. Ann Stat 2001, 29: 1165–1188. 10.1214/aos/1013699998View ArticleGoogle Scholar
- Storey JD: A direct approach to false discovery rates. J Roy Stat Soc B 2002, 64: 479–498. 10.1111/1467-9868.00346View ArticleGoogle Scholar
- Storey JD: The positive false discovery rate: A Bayesian interpretation and the q-value. Ann Stat 2003, 31: 2013–2035. 10.1214/aos/1074290335View ArticleGoogle Scholar
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. P Natl Acad Sci USA 2003, 100: 9440–9445. 10.1073/pnas.1530509100View ArticleGoogle Scholar
- Wright SP: Adjusted P-Values for Simultaneous Inference. Biometrics 1992, 48: 1005–1013. 10.2307/2532694View ArticleGoogle Scholar
- Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maere S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang PL, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, Ideker T, Bader GD: Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2007, 2: 2366–2382. 10.1038/nprot.2007.324PubMed CentralView ArticlePubMedGoogle Scholar
- Madhavan S, Zenklusen JC, Kotliarov Y, Sahni H, Fine HA, Buetow K: Rembrandt: helping personalized medicine become a reality through integrative translational research. Mol Cancer Res 2009, 7: 157–167. 10.1158/1541-7786.MCR-08-0435PubMed CentralView ArticlePubMedGoogle Scholar
- Behin A, Hoang-Xuan K, Carpentier AF, Delattre JY: Primary brain tumours in adults. Lancet 2003, 361: 323–331. 10.1016/S0140-6736(03)12328-8View ArticlePubMedGoogle Scholar
- Sun LX, Hui AM, Su Q, Vortmeyer A, Kotliarov Y, Pastorino S, Passaniti A, Menon J, Walling J, Bailey R, Rosenblum M, Mikkelsen T, Fine HA: Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell 2006, 9: 287–300. 10.1016/j.ccr.2006.03.003View ArticlePubMedGoogle Scholar
- Li A, Walling J, Ahn S, Kotliarov Y, Su Q, Quezado M, Oberholtzer JC, Park J, Zenklusen JC, Fine HA: Unsupervised analysis of transcriptomic profiles reveals six glioma subtypes. Cancer Res 2009, 69: 2091–2099.PubMed CentralView ArticlePubMedGoogle Scholar
- Ducray F, Idbaih A, de Reynies A, Bieche I, Thillet J, Mokhtari K, Lair S, Marie Y, Paris S, Vidaud M, Hoang-Xuan K, Delattre O, Delattre JY, Sanson M: Anaplastic oligodendrogliomas with 1p19q codeletion have a proneural gene expression profile. Mol Cancer 2008, 7: 41. 10.1186/1476-4598-7-41PubMed CentralView ArticlePubMedGoogle Scholar
- Mukasa A, Ueki K, Ge X, Ishikawa S, Ide T, Fujimaki T, Nishikawa R, Asai A, Kirino T, Aburatani H: Selective expression of a subset of neuronal genes in oligodendroglioma with chromosome 1p loss. Brain Pathol 2004, 14: 34–42.View ArticlePubMedGoogle Scholar
- Wong AJ, Bigner SH, Bigner DD, Kinzler KW, Hamilton SR, Vogelstein B: Increased Expression of the Epidermal Growth-Factor Receptor Gene in Malignant Gliomas Is Invariably Associated with Gene Amplification. P Natl Acad Sci USA 1987, 84: 6899–6903. 10.1073/pnas.84.19.6899View ArticleGoogle Scholar
- Vierbuchen T, Ostermeier A, Pang ZP, Kokubu Y, Sudhof TC, Wernig M: Direct conversion of fibroblasts to functional neurons by defined factors. Nature 2010, 463: 1035-U1050. 10.1038/nature08797PubMed CentralView ArticlePubMedGoogle Scholar
- Aguirre A, Rubio ME, Gallo V: Notch and EGFR pathway interaction regulates neural stem cell number and self-renewal. Nature 2010, 467: 323–327. 10.1038/nature09347PubMed CentralView ArticlePubMedGoogle Scholar
- Huang PH, Xu AM, White FM: Oncogenic EGFR signaling networks in glioma. Sci Signal 2009, 2: re6. 10.1126/scisignal.287re6PubMedGoogle Scholar
- Libermann TA, Nusbaum HR, Razon N, Kris R, Lax I, Soreq H, Whittle N, Waterfield MD, Ullrich A, Schlessinger J: Amplification, enhanced expression and possible rearrangement of EGF receptor gene in primary human brain tumours of glial origin. Nature 1985, 313: 144–147. 10.1038/313144a0View ArticlePubMedGoogle Scholar
- Sainsbury JR, Farndon JR, Needham GK, Malcolm AJ, Harris AL: Epidermal-growth-factor receptor status as predictor of early recurrence of and death from breast cancer. Lancet 1987, 1: 1398–1402.PubMedGoogle Scholar
- Veale D, Ashcroft T, Marsh C, Gibson GJ, Harris AL: Epidermal growth factor receptors in non-small cell lung cancer. Br J Cancer 1987, 55: 513–516. 10.1038/bjc.1987.104PubMed CentralView ArticlePubMedGoogle Scholar
- Yano S, Kondo K, Yamaguchi M, Richmond G, Hutchison M, Wakeling A, Averbuch S, Wadsworth P: Distribution and function of EGFR in human tissue and the effect of EGFR tyrosine kinase inhibition. Anticancer Res 2003, 23: 3639–3650.PubMedGoogle Scholar
- Soikkeli J, Podlasz P, Yin M, Nummela P, Jahkola T, Virolainen S, Krogerus L, Heikkila P, von Smitten K, Saksela O, Holtta E: Metastatic outgrowth encompasses COL-I, FN1, and POSTN up-regulation and assembly to fibrillar networks regulating cell adhesion, migration, and growth. Am J Pathol 2010, 177: 387–403. 10.2353/ajpath.2010.090748PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.