Mulcom: a multiple comparison statistical test for microarray data in Bioconductor
 Claudio Isella^{1, 2}Email author,
 Tommaso Renzulli^{1, 2},
 Davide Corà^{3, 4} and
 Enzo Medico^{1, 2}Email author
DOI: 10.1186/1471210512382
© Isella et al; licensee BioMed Central Ltd. 2011
Received: 26 January 2011
Accepted: 28 September 2011
Published: 28 September 2011
Abstract
Background
Many microarray experiments search for genes with differential expression between a common "reference" group and multiple "test" groups. In such cases currently employed statistical approaches based on ttests or close derivatives have limited efficacy, mainly because estimation of the standard error is done on only two groups at a time. Alternative approaches based on ANOVA correctly capture withingroup variance from all the groups, but then do not confront single test groups with the reference. Ideally, a ttest better suited for this type of data would compare each test group with the reference, but use withingroup variance calculated from all the groups.
Results
We implemented an RBioconductor package named Mulcom, with a statistical test derived from the Dunnett's ttest, designed to compare multiple test groups individually against a common reference. Interestingly, the Dunnett's test uses for the denominator of each comparison a withingroup standard error aggregated from all the experimental groups. In addition to the basic Dunnett's t value, the package includes an optional minimal foldchange threshold, m. Due to the automated, permutationbased estimation of False Discovery Rate (FDR), the package also permits fast optimization of the test, to obtain the maximum number of significant genes at a given FDR value. When applied to a timecourse experiment profiled in parallel on two microarray platforms, and compared with two commonly used tests, Mulcom displayed better concordance of significant genes in the two array platforms (39% vs. 26% or 15%), and higher enrichment in functional annotation to categories related to the biology of the experiment (p value < 0.001 in 4 categories vs. 3).
Conclusions
The Mulcom package provides a powerful tool for the identification of differentially expressed genes when several experimental conditions are compared against a common reference. The results of the practical example presented here show that lists of differentially expressed genes generated by Mulcom are particularly consistent across microarray platforms and enriched in genes belonging to functionally significant groups.
Background
A frequent approach to analyse gene expression data involves the use of ttests, or their derivatives, to identify lists of genes with differential expression between two experimental groups [1]. Indeed, several microarray expression datasets encompass multiple experimental points to be compared with a common reference point such as timecourse designs or multiple different treatments versus a control condition. The analysis is then aimed at assessing for each gene in which experimental group the expression is significantly different from the control group.
A frequently chosen approach is to run a ttest for each comparison. However, when applied to this type of data, the ttest has two main problems: (i) it does not correct the result of each comparison for the total number of comparisons made and (ii) information about experimental variability (the standard error) is extracted only from the two groups actually compared. Consequently, in the instance of limited replicates, inaccurate estimation of standard error leads to high type I and type II errors in the analysis. For these two reasons, simple remedies like Bonferroni or other types of multiple testing correction of the threshold tvalue may avoid excessive false positives only at the cost of a significant reduction of the power. Alternatively, limitations of the ttest in this context have been addressed by implementing Bayesian modeling of the error [2] or by implementing sample permutationbased estimation of False Discovery Rate (FDR), like in the SAM approach [3, 4]. In particular, SAM compares the number of null hypothesis rejections in the dataset organized in the subgroups of interest against the median number of rejections on a series of randomly generated subgroups. These approaches however do not benefit, within each comparison, from information on withingroup variability available in the additional experimental groups.
As an alternative, ANOVAbased methods accumulate withingroup variability from all the groups. However, this strategy does not permit a pairwise comparison of each test group with the reference group [5]. Therefore, if a gene is differentially expressed in only one group versus the reference, this difference is diluted in the betweengroup variance calculated from all the groups.
The ideal approach would therefore be to estimate withingroup variance from all the groups and then to perform single pairwise comparisons. Towards this end, we designed the Mulcom test, a derivative of the Dunnett's ttest [6] adapted to microarray data analysis. The test, implemented as an RBioconductor package [7], includes an optional tuneable foldchange threshold (m) and Monte Carlo simulation performing sample permutations to assess FDR in each comparison. We also implemented a streamlined procedure for automated optimization of test parameters, to maximize the number of significant genes at a given FDR.
In the present work we provide a detailed description of the Mulcom algorithm and the results of comparative analyses between Mulcom and other widely used Bioconductor packages (SAM and Limma). Comparative analyses were run on a microarray dataset obtained on two different array platforms from the same set of samples.
Implementation
The Mulcom test was implemented using the statistical programming language R [8] with some functions wrapped from C++ to improve the performance of the script. The package is included in the opensource Bioconductor project [7].
The Mulcom package is designed to analyse ExpressionSet objects from the "Biobase" package as well as standard numeric matrices from the R environment. The Mulcom algorithm is based on the Dunnett's ttest [6], which estimates the withingroup variability across all the different groups to be compared with the common reference.
The Mulcom analysis takes place according to the following steps:
where s^{2} = square error, for each group i, including the reference group and a = degrees of freedom
Where:
FC = fold change (difference), as calculated in (1)
m = minimal difference threshold (optional)
Nh = harmonic mean of sample replicates for the two conditions tested
t = tvalue of the test
MSE_{ wg } = mean square error within group, as calculated in (2)
To estimate the False Discovery Rate (FDR), steps (1) to (3) are repeated after random sample permutation for n times, to generate a distribution of the number of positive hits from n randomly assembled sample groups.
For each experimenttoreference comparison, the median number of positives in permuted sample groups is calculated, and FDR is estimated as $FDR=\frac{\overline{MRP}}{EP}$
Where:
MRP = Median Random Positives, i.e. the median number of null hypothesis rejections by the Mulcom test in all random sample permutations.
EP = Experimental Positives.
Results and Discussion
A previous spreadsheetbased implementation of the Dunnett's ttest was successfully applied to gene expression studies comparing multiple independent points against a common reference [9, 10]. To evaluate the performance of the Mulcom test implemented as a Bioconductor package, we generated and analyzed transcriptomics data on a set of 10 RNA samples profiled with two independent microarray platforms (Affymetrix hgu133a and Illumina RS8 Human Beadchip). This enabled crossplatform concordance analysis of the results. The experiment explored gene expression changes induced in MDAMB435 human melanoma cells by 1, 6 or 24 hours of stimulation with Hepatocyte Growth Factor (HGF), known to trigger proliferation, motility and invasion [11]. The same cells were also transduced with IntegrinBeta4 (ITGB4) to stably upregulate its expression. Therefore the dataset encompassed both a timecourse experiment and one positive control condition, each repeated to generate biological duplicates. Data were normalized and filtered for significant detection as described in the Preprocessing section.
Validation across microarray platforms of Mulcom, Limma and SAM tests
Mulcom  Limma  SAM  

HGF 1 h  Significant genes in Affy  867  672  723 
Validated in Illumina  150  0  48  
HGF 6 h  Significant genes in Affy  681  518  561 
Validated in Illumina  317  237  100  
HGF 24 h  Significant genes in Affy  4  0  82 
Validated in Illumina  1  0  1  
ITGB4 +/  Significant genes in Affy  26  6  75 
Validated in Illumina  1  1  0  
Total  Significant genes in Affy  1249  956  1006 
Validated in Illumina  487  246  151  
True positive rate  39%  26%  15% 
To extend the significance of the results we performed the same comparison between Mulcom, Limma and SAM on an unrelated time course experiment (GSE19044) [12]. In this experiment, germ line cellderived pluripotent stem cells (GPSC) were induced to differentiate into hepatocytes and subsequently profiled at different stages. In order to identify differentially regulated genes between the reference time 0, and the different temporal stages, (namely day 2, day 7 day 21 and day 27) we applied the same settings for the statistical analysis as previously described (FDR < 0.01 or corrected pvalue < 0.01) on the data series GPSCA and GPSCB (two independent cell lines). Cross validation of the results highlighted that Mulcom test was the most efficient in identifying a high number of differentially regulated genes, which were systematically validated in the second experiment. The results are presented in Additional file 1, Figure S1.
Conclusions
Overall, these results show that, in a multiple comparison setting, the Mulcom package is particularly good at generating reliable lists of biologically informative genes. In our opinion, the main reasons for the good performance of Mulcom under these conditions are as follows: (i) Withingroup variability is estimated from all experimental groups even if only two of them are compared at a time. It is therefore more reliable when few replicates are available for each group; (ii) The optional foldchange threshold m avoids false positives due to aberrantly low withingroup variability and (iii) Automated test optimization linked to permutationbased FDR analysis allows sensitivity to be maximised without compromising specificity. Indeed, such an approach could be prone to overfitting i.e. identification of apparently optimal settings, which are highly dependent on the dataset. Of particular relevance to this issue is the fact that in the abovedescribed dual platformdataset the Mulcom test, albeit having been separately optimized on each of the two microarray platforms, yielded the most concordant lists of significant genes. Mulcom can also easily be applied to other omics studies, like miRNomics, proteomics and metabolomics, where multiple experimental points are compared against the same reference.
Preprocessing
Microarray Data generation and preliminary treatment
Gene expression profiling was performed on the same set of RNAs independently on Affymetrix hgu133a and Illumina Human 8V1 arrays, according to the manufacturer's protocols. Affymetrix raw data were processed with the RBioconductor suite http://www.bioconductor.org. Technical quality analysis was performed with the "Affy" package [13]. Probe data was summarized and normalized with RMA [14] Probe sets without a positive presence call in at least two samples were excluded from further analyses. Illumina data were processed with the BeadStudio software 1.5.13 (Illumina) with Rank Invariant Normalization. Probes for which all samples showed a Detection Score lower than 0.99 were excluded from further analyses. Raw and normalized microarray data from both platforms are deposited in NCBI's Gene Expression Omnibus repository (accession ID: GSE26736).
Ingenuity Pathway Analysis
All lists of gene symbols generated by the various tests were uploaded on IPA http://www.ingenuity.com and tested for enrichment in molecular and cellular functions. Enrichment (chisquare pvalue) was estimated against the MDAMB435 background provided by the IPA software.
Availability and requirements
Project name: Mulcom
Project home page: Operating system(s): Platform independent
Programming language: R
License: GNU GPL
Any restrictions to use by nonacademics: none
Availability: http://bioconductor.org/help/biocviews/2.8/bioc/html/Mulcom.html
List of abbreviations
 EP:

Experimental Positives
 FC:

Fold Change
 FDR:

False Discovery Rate
 GPSC:

germ line cellderived pluripotent stem cells
 HGF:

Hepatocyte Growth Factor
 IPA:

Ingenuity Pathway Analysis
 ITGB4:

IntegrinBeta4
 MRP:

Median Random Positives
 MSE:

Mean Square Error
 SAM:

Significance Analysis of Microarray
Declarations
Acknowledgements
We thank Daniela Cantarella for technical support, Simona Destefanis for secretarial support and Riccardo Roasio, George Church and Eva Maria Pinatel for early discussions and helpful suggestions. We warmly thank Emily Hannah Crowley for careful language revision. This work was supported by Fondazione Piemontese per la Ricerca sul CancroONLUS, Associazione Italiana per la Ricerca sul Cancro (IG 9127 and "5xMille Project" n. 9970), Regione Piemonte (PRESTO and ELAB) and Ministero della Salute.
Authors’ Affiliations
References
 Allison DB, Cui X, Page GP, Sabripour M: Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006, 7(1):55–65. 10.1038/nrg1749View ArticlePubMedGoogle Scholar
 Smyth GK: Limma: linear models for microarray data. In 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. Edited by: Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. Springer, New York; 2005:397–420.View ArticleGoogle Scholar
 Tibshirani R, Chu G, Hastie T, Narasimhan B: samr: SAM: Significance Analysis of Microarrays. R package version 1.28 [http://CRAN.Rproject.org/package=samr]
 Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98: 5116–5121. 10.1073/pnas.091062498PubMed CentralView ArticlePubMedGoogle Scholar
 Lowry , Richard : One Way ANOVA  Independent Samples. Vassar.edu Retrieved on December 4th, 2008 Retrieved on December 4th, 2008
 Dunnett C: New tables for multiple comparisons with a control. Biometrics 1964, 1020: 482–491.View ArticleGoogle Scholar
 Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5(10):R80. 10.1186/gb2004510r80PubMed CentralView ArticlePubMedGoogle Scholar
 R: Development core team: R: A language and enviroment for statistical computing. R foundation for statistical computing.Vienna, Austria; 2004. [http://www.rproject.org]Google Scholar
 Mira A, Isella C, Renzulli T, Cantarella C, Martelli ML, Medico E: The GAB2 signaling scaffold promotes anchorage independence and drives a transcriptional response associated with metastatic progression of breast cancer. Oncogene 2009, 28: 4444–4455. 10.1038/onc.2009.296View ArticlePubMedGoogle Scholar
 Tardito S, Isella C, Medico E, Marchiò L, Bevilacqua E, Hatzoglou M, Bussolati O, FranchiGazzola R: The thioxotriazole copper(II) complex A0 induces endoplasmic reticulum stress and paraptotic death in human cancer cells. J Biol Chem 2009, 284: 24306–24319. 10.1074/jbc.M109.026583PubMed CentralView ArticlePubMedGoogle Scholar
 Gentile A, Trusolino L, Comoglio PM: The Met tyrosine kinase receptor in development and cancer. Cancer Metastasis 2008, 85–94. Rev RevGoogle Scholar
 Fagoonee S, Hobbs RM, De Chiara L, Cantarella D, Piro RM, Tolosano E, Medico E, Provero P, Pandolfi PP, Silengo L, Altruda F: Generation of functional hepatocytes from mouse germ line cellderived pluripotent stem cells in vitro. Stem Cells Dev 2010, 19: 1183–94. 10.1089/scd.2009.0496View ArticlePubMedGoogle Scholar
 Gautier L, Cope L, Bolstad BM, Irizarry R: A. affyanalysis of Affymetrix GeneChip data at the probe level. Bioinformatics 2004, 20: 307–315. 10.1093/bioinformatics/btg405View ArticlePubMedGoogle Scholar
 Bolstad BM, Irizarry RA, Astrand M, Speed TP: A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics 19(2):185–19.
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.