Many researchers have embraced microarray technology. Due to extensive usage of microarray technology, in recent years there has been an explosion in publicly available datasets. Examples of such repositories include Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), ArrayExpress http://www.ebi.ac.uk/microarray-as/ae/ and Stanford Microarray Database (SMD, http://genome-www5.stanford.edu/, as well as researchers' and institutions' websites. The use of these datasets is not exhausted, when used wisely they may yield a depth of information. Demand has increased to effectively utilise these datasets in current research as additional data for analysis and verification.

Meta-analysis refers to an integrative data analysis method that traditionally is defined as a synthesis or at times review of results from datasets that are independent but related [1]. Meta-analysis has ranging benefits. Power can be added to an analysis, obtained by the increase in sample size of the study. This aids the ability of the analysis to find effects that exist and is termed 'integration-driven discovery' [2]. Meta-analysis can also be important when studies have conflicting conclusions as they may estimate an average effect or highlight an important subtle variation [1, 3].

There are a number of issues associated with applying meta-analysis in gene expression studies. These include problems common to traditional meta-analysis such as overcoming different aims, design and populations of interest. There are also concerns specific to gene expression data including challenges with probes and probe sets, differing platforms being compared and laboratory effects. As different microarray platforms contain probes pertaining to different genes, platform comparisons are made difficult when comparing these differing gene lists. Often the intersection of these lists are the only probes to be retained for further analysis. Moreover, when probes are mapped to their 'Entrez IDs' [4] for cross platform comparisons often multiple probes pertain to the same gene. Due to reasons ranging from alternative splicing to probe location these probes may produce different expression results [5]. Ideal methods for aggregating these probe results in a meaningful and powerful way is currently the topic of much discussion. Laboratory effects are important because array hybridisation is a sensitive procedure. Influences that may effect the array hybridisation include different experimental procedures and laboratory protocols [6], sample preparation and ozone level [7]. For more details of the difficulties associated with microarray meta-analysis please refer to Ramasamy et al. 2008 and other works [5, 8–12].

We propose a new meta-analysis approach and provide a comprehensive comparison study of available meta-analysis methods. Our method, 'meta differential expression via distance synthesis', (mDEDS) is used to identify differentially expressed (DE) genes which extends the DEDS method [13]. This new method makes use of multiple statistical measure across datasets to obtain a DE list, but becomes a novel tool, with respect to DEDS with the ability to integrate multiple datasets. Hence this meta-method concatenates statistics from datasets in question and is able to establish a gene list. Such integration should be resilient to a range of complexity levels inherent in meta-analysis situations. The strength of mDEDS as a meta-method over DEDS as a method for selecting DE genes is highlighted by comparing these two approaches to one another in a meta-analysis context. Throughout this paper the statistics used within mDEDS and DEDS are the t and modulated t statistic [14], SAM [15], the B statistic [16] and fold-change (FC) statistic, although any statistic can be chosen.

We also perform a comparison study of meta-analysis methods including the Fisher's inverse chi-square method [17], GeneMeta [2, 18], Probability of Expression (POE) [19], POE with Integrative Correlation (*IC*) [20], RankProd [21] (the latter four are available from Bioconductor) and mDEDS as well as two naive methods, 'dataset cross-validation' and a 'simple' meta-method. For meta-methods with several varying parameters, we have made use of the suggested or default options.

The performance of the different meta-analysis methods is assessed in two ways, through a simulation study and through two case studies. For the simulation study performance is measured through receiver operating characteristic (ROC) curves as well as the area under these ROC curves (AUC). The two different case studies vary in complexity and performance are assessed through predication accuracy in a classification framework. Warnat et al. [22] uses validation to evaluate performance while using multiple datasets. Our validation method differs from their process slightly. Their method takes a random selection of samples from multiple datasets to obtain a test and training set. We retain to original datasets, leaving them complete. Our method aims to simulate real situations where an additional dataset would need to be classified after a discriminate rule was developed. Although within this paper mDEDS is used in a binary setting, mDEDS is a capable multi-class meta-analysis tool, which is a concept examined by Lu et al. [23].

It is possible to consider meta-analysis at two levels, 'relative' and 'absolute' meta-analysis. 'Relative' meta-analysis looks at how genes or features correlate to a phenotype within a dataset [10]. Multiple datasets are either aggregated or compared to obtain features which are commonly considered important. Meta-methods pertaining to this method include Fisher's inverse chi-square, GeneMeta, RankProd and the 'dataset cross-validation' meta. 'Absolute' meta-analysis seeks to combine the raw or transformed data from multiple experiments. By increasing the number of samples used, the statistical power of a test is increased. Traditional microarray analysis tools are then used on these larger datasets. The 'simple' meta method is an example of 'absolute' meta-analysis approach.

In this paper we will begin by describing existing meta-analysis methods, then we will outline our proposed mDEDS method. This is followed by the comparison study, where publicly available datasets are combined by different meta-analysis methods, examining their ability under varying degrees of complexity, as well as comparing mDEDS to DEDS. Finally, we provide discussion and conclusions of results.

### Existing meta-analysis methods

Let *X* represent an expression matrix, with *i* = 1, *..*., *I* genes and *j* = 1, *..*., *N* samples. If there are *k* = 1,..., *K* datasets, *n*
_{
k
} represents the number of samples in the *k*th dataset. For simplicity, and without loss of generality, we focus on dichotomous response; i.e., two-group comparisons. We designate groups as treatment *T* and control *C*. For two-channel competitive hybridization experiments, we assume that the comparisons of log-ratios are all indirect; that is we have *n*
_{
T
} arrays in which samples from group *T* are hybridized against a reference sample *R*, and we can obtain *n*
_{
T
} log-ratios,
; *j* = 1, *..*., *n*
_{
T
} from group T. In an identical manner *n*
_{
c
} log-ratios are also calculated from group *C*. For Affymetrix oligonucleotide array experiments, we have *n*
_{
T
} chips with gene expression measures from group *T* and *n*
_{
C
} chips with gene expression measures from group *C*.

#### Fisher's inverse chi-square

Fisher, in the 1930 s developed a meta-analysis method that combines the p-values from independent datasets. One of a plethora of methods for combining the p-values [

17], is the Fisher summary statistic,

which tests the null hypothesis that for gene *i*, there is no differences in expression means between the two groups. The p-value *p*
_{
ik
} is the p-value for the *i*th gene from the *k*th dataset. In assessing *S*
_{
i
}, the theoretical null distribution should be
. It is also possible to extend the Fisher methods by producing weights for different datasets based on, for example, quality.

#### GeneMeta

One of the first methods that integrates multiple gene expression datasets was propose by Choi et al. [2] who describe a t-statistics based approach for combining datasets with two groups. An implementation of this method is found in GeneMeta[18] an R package containing meta-analysis tools for microarray experiments.

Choi et al. [

2] described a meta-analysis method to combine estimated

*effect-sizes* from the

*K* datasets. In a two group comparisons, a natural effect size is the

*t*-statistics. For a typical gene

*i*, the effect size for the

*k*th dataset is defined as

where
and
represent the means of the treatment and the control group respectively in the *k*th study. *S*
_{
pk
} is the pooled standard deviation for the *k*th dataset.

For

*K* number of observed effect sizes, Choi et al. [

2] proposed a random effects model

where *μ* is the parameter of interest, *s*
_{
k
} denotes the within study variances and *δ* ~ *N*(0, *τ*
^{2}) represents the between study random effects with variance *τ*
^{2}. Choi et al. [2] further mentioned that when *τ*
^{2} = 0, *δ*
_{
k
} denotes the between study effect in a fixed effect model. The random effects model is then estimated using a method proposed by DerSimonian and Laird [24] and a permutation test is used to assess the false discovery rate (FDR).

#### metaArray

The R package metaArray contains a number of meta-analysis methods. The main function is a two steps procedure which transformed the data into a probability of expression (POE) matrix [19] and followed by a gene selection method based on 'integrative correlation' (*IC*) [20].

Given a study, the POE method transforms the expression matrix *X* to a matrix *E* that represents the probability of differential expression. Each element in the matrix *E*
_{
ij
} is defined as the chance of multiple conditions present across *N* samples within gene *i*. The transformed matrix, *E*, consists of three values -1, 0, 1 that represent the conditions 'under-expressed', 'not differentially expressed' and 'over-expressed'. After the transformation into a POE matrix, genes of interest are established using *IC* [20]. Notice that this integrative correlation method is not restricted to be used with a POE matrix. The method IC begins by calculating all possible pairwise Pearson correlations (
, where *i* ≠ *i*') between genes (*i* and *i*') across all samples within a dataset *k*. Thus, we generated a pairwise correlation matrix *P* with
rows representing the number of pairwise correlation and *K* columns representing the number of datasets.

For a selected pair of datasets

*k* and

*k*', let us denote

and

as means of the correlations per study. Gene-specific reproducibility for gene

*i* is obtained by only considering comparisons that contain the

*i*th gene. That is

where *i* ≠ *i*'. When more than two datasets are being compared, all integrative correlations for a particular gene are aggregated. This method provides a combined ranking for genes across *K* datasets.

In this comparison study, two metaArray results are used. Distinction will be made between them using the terms 'POE with *IC*' and 'POE with *Bss/Wss*' to indicate what type of analysis was performed after the construction of the POE matrix.

#### RankProd

RankProd is a non-parametric meta-analysis method developed by Breitling et al. [21]. Fold change (FC) is used as a selection method to compare and rank the genes within each dataset. These ranks are then aggregated to produce an overall score for the genes across datasets, obtaining a ranked gene list.

Within a given dataset

*k*, pairwise FC (pFC) is computed for each gene

*i* as

producing

*n*
_{
T
} ×

*n*
_{
C
}
*pFC*
_{
l,m
} values per gene with

*l* = 1,

*..*.,

*n*
_{
T
} and

*m* = 1,

*..*.,

*n*
_{
C
}. The corresponding pFC ratios are ranked and we may denote this value as

*pFC*
_{(i;r)}, where

*i* = 1,

*..*.,

*I* represents the number of genes and

*r* = 1,

*..*.,

*R* represents the number of pairwise comparisons between samples. Then the rank products for each gene

*i* is defined as

Expression values are independently permuted *B* times within each dataset relative to the genes, the above steps are repeated to produce
where *b* = 1, ..., *B*. A reference distribution is obtained from all the
values, and the adjusted p-value for each of the *I* genes is obtained. Gene considered significant are used in future analysis.

#### Naive meta-methods

Two forms of naive meta-methods are used in the comparison study. The 'simple' meta-method takes the microarray expression matrices and simply combines the datasets together, forming a final matrix made up of all samples with no expression adjustment. The 'dataset cross-validation' meta-method takes one datasets and applies the analysis, these results are then used by the other dataset(s) with the expectation that the results will be transferable. In a classification context this means that one dataset is used for feature selection and development of the discriminant rule and we predict the outcome of the other dataset(s) via this rule.