A comparative evaluation of datamerging and metaanalysis methods for reconstructing genegene interactions
 Vincenzo Lagani^{1, 4},
 Argyro D. Karozou^{1},
 David GomezCabrero^{2, 3, 5, 6},
 Gilad Silberberg^{2, 3, 5, 6} and
 Ioannis Tsamardinos^{1, 4}Email author
https://doi.org/10.1186/s1285901610381
© Lagani et al. 2016
Published: 6 June 2016
The Erratum to this article has been published in BMC Bioinformatics 2016 17:290
Abstract
Background
We address the problem of integratively analyzing multiple gene expression, microarray datasets in order to reconstruct genegene interaction networks. Integrating multiple datasets is generally believed to provide increased statistical power and to lead to a better characterization of the system under study. However, the presence of systematic variation across different studies makes network reverseengineering tasks particularly challenging. We contrast two approaches that have been frequently used in the literature for addressing systematic biases: metaanalysis methods, which first calculate opportune statistics on single datasets and successively summarize them, and datamerging methods, which directly analyze the pooled data after removing eventual biases. This comparative evaluation is performed on both synthetic and real data, the latter consisting of two manually curated microarray compendia comprising several E. coli and Yeast studies, respectively. Furthermore, the reconstruction of the regulatory network of the transcription factor Ikaros in human Peripheral Blood Mononuclear Cells (PBMCs) is presented as a casestudy.
Results
The metaanalysis and datamerging methods included in our experimentations provided comparable performances on both synthetic and real data. Furthermore, both approaches outperformed (a) the naïve solution of merging data together ignoring possible biases, and (b) the results that are expected when only one dataset out of the available ones is analyzed in isolation. Using correlation statistics proved to be more effective than using pvalues for correctly ranking candidate interactions. The results from the PBMC casestudy indicate that the findings of the present study generalize to different types of network reconstruction algorithms.
Conclusions
Ignoring the systematic variations that differentiate heterogeneous studies can produce results that are statistically indistinguishable from random guessing. Metaanalysis and data merging methods have proved equally effective in addressing this issue, and thus researchers may safely select the approach that best suit their specific application.
Keywords
Genenetwork Reconstruction MetaAnalysis Batcheffect Removal Surrogate Variable Analysis Integrative Analysis Escherichia coli Yeast Peripheral Blood Mononuclear Cells Ikaros Transcription FactorBackground
Reverse engineering of gene regulatory network is a vibrant research area [1], whose scope is reconstructing the biological mechanisms underlying gene activity. Several types of statistical models and algorithms have been proposed for deriving and representing gene interaction networks [2]. Relevance networks [3] are one of the most basic models, where gene pairs showing highly significant correlation in their expression values are assumed to be functionally associated.
Unfortunately, this assumption is not valid when data from different studies are integratively analyzed. Systematic biases across studies can originate spurious correlations that do not actually reflect any interactions among genes. On the other side, they can hide associations that are actually present among the measured quantities [4]. These systematic variations are usually known as “batcheffects”, and they can arise even when all studies share the same experimental design and measure the same quantities. The name originates from systematic biases that are present across “sample batches” within single studies, due to small differences in the processing of each batch [5].
MetaAnalysis (MA, [6]) and DataMerging (DM [7]) are two approaches widely employed in the literature for addressing systematic variations in studies that share the same experimental design. In MA statistical methods are separately applied on each dataset for obtaining statistics of interest, e.g., differential expression pvalues. The results from each study are then combined for creating summary statistics. The latter approach merges samples from different studies in a unique dataset, on which subsequent analyses are performed. While MA methods implicitly take in account batcheffects, DM require suitable BatchEffect Removal (BER) algorithms [8].
In this work we compare metaanalysis and datamerging methods in the context of retrieving genegene interactions in compendia of microarray studies. To this scope we compiled two different collections of microarray experiments, containing 11 and 7 studies on Escherichia coli and Yeast, respectively. For each collection we identified candidate interactions for multiple transcription factors by combining relevance networks with metaanalysis and datamerging methods, in turn. The candidate interactions are then compared against lists of known, experimentally verified interactions, in order to contrast the effectiveness of MA and DM methods in retrieving actual relationships.
The comparison between the two approaches is furthermore deepened on synthetic data, where a large variety of scenarios is simulated across different networks, levels of systematic bias, number of considered studies and number of samples in each study. All experimentations underlined that batcheffects are detrimental for the analyses, and that MA and DM prove similarly effective in addressing issues arising from systematic variations.
Finally, we present an application on human Peripheral Blood Mononuclear Cells (PBMCs), for the reconstruction of the Ikaros transcription factor regulatory network. For this specific application we used a BayesianNetwork, constraintbased learning approach in place of relevance networks, providing evidences that the results of this study transfer on more complex networklearning approaches.
Related work
To the best of our knowledge, there is no other study that systematically contrasts MA and DM methods in the context of retrieving genegene interactions. Several studies exist that evaluate the relative performances of MA methods for gene network reconstruction [9]–[14]. In short, it is not possible to rigorously come to a unique conclusion regarding the best metaanalysis algorithm for network reconstruction. The observed discrepancy among these studies is a result of numerous factors, including data complexity and heterogeneity, difficulties in determining a golden truth, and the inclusion of a limited number of metaanalysis approaches in the experimentations.
The most common MA techniques applied in the spectrum of gene network reconstruction are based on Fisher’s method [15], [16], vote counting approaches [17]–[19], fixed and random effect sizes [20]. Segal et al. [21] was the first one that marched towards unlocking hidden biological knowledge by using metaanalysis for network reconstruction. Numerous approaches then followed, as described in [22]. In all cases, metaanalysis approaches seemed to perform better than individual reverseengineering methods.
Similarly, the applicability of datamerging methods in the context of network reverse engineering has been investigated in several works [5], [23]–[27]. In earlier studies, the vast majority merely used normalization methods to merge the compendium of expression data [23], [28]. Robust MultiArray Average method (RMA) [27]–[29] seemed to outperform other normalization methods such as linear scaling procedures based on the median signal intensity [30], quantile normalization through MAS algorithm [31], GCRMA [32], Dchip PM [33]. However, RMA normalization proved to be ineffective in removing batch effects which affect particular genes and may affect different genes in different ways [5].
Recent approaches have been developed for identifying and removing batch effects [8], [24], [34] but have not been widely used. Such approaches include ComBat [35], Surrogate variable analysis (SVA) [36], Distanceweighted discrimination (DWD) [37], Meancentering (PAMR) [38], and Geometric ratiobased method [39]. In relevant studies, ComBat seems to outperform these methods as it robustly manages high dimensional data with small sample sizes. A previous attempt to evaluate the effectiveness of batch adjustment methods was made by the MAQCII project [40]. It is necessary to bear in mind that even the most effective batch effect removal method cannot sufficiently reduce the batch effects in cases of poor experimental design [41].
The literature regarding MA and DM application in the context of differential expression is particularly rich [6], [42]–[48], and a complete review is out of the scope of the present work. We point out that we found only a single study [49] that directly compares the performances of the two approaches on finding differentially expressed genes. Interestingly, this study concludes that both approaches achieve comparable results.
Methods
Experimentation protocol
Let M be a collection (or compendium) of m microarray datasets. All studies in M follow the same experimental protocol, analyze the same type of biological specimens, and measure the same n expression values (probesets). However each dataset D _{ j } includes a separate set of s _{ j } samples. This means that each study in M investigates the same generegulatory network, and that the data of all studies have been generated according to this network. Thus, any systematic bias across datasets should be due to (unknown) technical differences occurred during the measurement process or to the presence of confounding factors.
For each collection M there is a set T = {TF _{1}, TF _{2}, …, TF _{ t } , …, TF _{}_{ T }_{}} of T  transcription factors of interest. We assume to know the list I _{ t } of genes that interact with each transcription factor TF _{ t } , i.e., I _{ t } contains all genes that are targets of TF _{ t } along with the genes that regulate TF _{ t } .
We apply a relevance network approach for retrieving these known interactions. In detail, for each collection M and each transcription factor TF _{ t } the correlations among the expression values of TF _{ t } and the remaining n − 1 probesets are calculated over all datasets in M , using in turn an MA or DM approach. MA algorithms separately compute the correlations on each dataset and then summarize them, while DM methods pool together the data from all datasets and directly compute the final correlation values.
Let C _{ t }_{,}i_{,}X be the correlation between transcription factor t and probeset i produced with the MA or DM method X, and P _{ t }_{,}i_{,}X the pvalue assessing the null hypothesis H _{0} : C _{ t }_{,}i_{,}X = 0. The set of n − 1 correlations (pvalues) for transcription factor t is indicated as C _{ t }_{,}X (P _{ t }_{,}X ). Both C _{ t }_{,}X and P _{ t }_{,}X are sorted according to the absolute values of the correlations, so that the most relevant associations appear at the top of both vectors.
Relevance networks postulate that genes included in I _{ t } should be strongly correlated with TF_{t}, therefore MA and DM methods are evaluated with respect to their ability of assigning highly significant correlations to known interactions. Different metrics are used to compare each C _{ t }_{,}X against its corresponding I _{t}, and DM / MA approaches are ranked according to their respective performances.
The following sections describe in detail the experimental and synthetic data collections used in the experimentations, along with the algorithms, correlation measures and performance metrics included in the analysis.
All simulations and analyses were performed in the R software [50].
Data
Escherichia coli data compendium
The regulatory network of the Escherichia coli (E. coli) K12 bacterium has been extensively studied [51], and consequently it is an ideal test bed for our experimentation. Studies in the GEO repository on E. coli comprising more than twenty expression profiles and using the Affymetrix E. coli Antisense Genome Array were taken in consideration for inclusion in the analysis. Imposing a single microarray platform ensures that all datasets measure the same probesets. Studies applying experimental interventions known to artificially disrupt genegene interactions, as for example gene knockout, were excluded from the compendium. Eleven studies were included in the collection, whose characteristics are reported in the (Additional file 1: Table S1), for a total of sixhundred eighteen samples measured under a variety of conditions. Probesets without annotations were excluded from the analysis, leaving a total of 4088 probesets, each corresponding to a specific gene (no gene was measured by multiple probesets).
The RegulonDB database was used in order to retrieve known TFgene interactions in the E. coli regulation program [52]. This database publicly and freely provides more than 4131 transcriptional regulatory interactions, manually retrieved and curated from the literature. Interestingly, each interaction is assigned to an evidence class, ranging within the levels ‘weak’, ‘strong’ and ‘confirmed’. The level of evidence is determined by the experimental method used in the original study reporting the interaction. Experimental procedures where false positives are prevalent, like computational predictions or gene expression analysis, are catalogued as ‘weak’. Other procedures providing evidence of physical interaction or anyhow excluding explanations alternative to a genegene interaction (e.g., site mutation [53]) are considered ‘strong’. When a regulatory relationship is supported by multiple, independent strong evidences, then it is classified as ‘confirmed’.
Preliminary experimentation including all RegulonDB regulatory relationships led to poor results, close to random guessing (results not shown). We hypothesized that large number of false positives in the weak interactions could negatively affect the results, thus we decided to exclude them from the analysis, leaving a total of 2475 strong and confirmed regulatory relationships.
Finally, we decided to consider only transcription factors having at least three known interactions, for a total of 124 genes included in T _{ EColi } .
Yeast data compendium
The same criteria used for compiling the E. coli compendium were used for building a collection of seven Yeast datasets, all measured with the Affymetrix Yeast Genome S98 Array platform and containing a total of fourhundred twenty seven (427) samples (Additional file 1: Table S2). A total of 4218 probesets were not associated with a given gene name, and 149 genes were associated to more than one probeset. We removed nonannotated genes and we randomly selected a single probeset for genes with multiple measurements, leaving a total of 4944 probesets.
Known interactions were retrieved from Yeastract [54], which is the largest database of this type for the yeast organism to date, with more than 200,000 reported genegene interactions. Similarly to RegulonDB, Yeastract lists manually curated regulatory relationships retrieved from the literature, and it also provides information about the experimental procedure used for assessing each reported interactions. We again required ‘strong’ evidence, leaving 257 genegene known and reliable interactions in the analysis. Also for this compendium genes with at least three known interactions were included in T _{ Yeast } , for a total of 44 transcription factors.
Synthetic data
Several collections of synthetic datasets were produced for better characterizing MA and DM performances under different scenarios.
Data were sampled from artificial networks specifically devised in order to resemble reallife gene regulatory programs, following the scalefree theory introduced by Barabási [55]. According to this theory, biological networks are not randomly organized, and the number of connections incident to a node is regulated by the power law P(k) ~ k ^{− }^{ γ } , where k is the number of interactions, γ is a parameter whose value depends by the specific domain, and P(k) is the fraction of genes having k connections. In other words, realworld gene regulatory programs have few transcription factors (hubs) that regulate large numbers of genes, while the remaining nodes have relatively few connections. Each synthetic network is represented by a Direct Acyclic Graph (DAG) composed by a set of nodes (genes) V = {1, …, n} and a set of directed edges E = {(i, j)}. If the edge (i, j) is present in the network, then gene i is a parent of (regulates) gene j. The set of parents of node j is indicated as IN(j).
These artificial networks were equipped with a parameterization suitable for the simulation of gene expression data and batch effects. Each gene i was associated with a baseline expression value α _{ i } uniformly sampled in the interval [0, 1], while each edge (i, j) is equipped with a randomly generated coefficient β _{ ij } ∈ [−1, − 0.5] ∪ [0.5, 1] representing the strength of the interactions between i and j.
Batcheffects across studies are assumed to be composed of an additive and a multiplicative component, following an approach already used in [24]. The first component shifts the gene average value, while the multiplicative error intensifies the samplespecific variance.
According to this formula, each expression value y _{ sjk } is a linear combination of its baseline value α _{ j } and the expression values of its regulating genes y _{ sik } , i ∈ IN(j). The quantity ϵ_{ j } is random noise distributed as N(0, 1) (normal distribution with zero mean and unitary standard deviation) that represents unmodeled regulatory mechanisms that concur in determining the expression of the gene. The two factors γ _{ jk } and δ _{ jk } ϵ_{ sj } ^{'} respectively represent the addictive and multiplicative component of the systematic bias in study k, and are both randomly sampled from the distribution N(τ, τ). The random variable ϵ_{ j } ^{'} is again distributed as N(0, 1).
During our experimentations, five independent synthetic networks with fourthousand nodes each were created using the barabasi.game function from the R package igraph [56]. For each network we simulated different compendia by varying the number of studies in [5], [10], [50], the number of samples for each study in [5], [20], [50], and the hyperparameter τ controlling the level of systematic bias in [0.1, 0.5, 1], thus obtaining 27 different scenarios for each network and 135 in total.
Finally, for each network the list T of transcription factors includes all genes directly connected to at least twenty other genes. This leads to an average of 18 transcription factors for each network, each one connected on average with 40 genes. We consider only direct interactions in order to ensure that the corresponding associations are strong enough to be effectively retrieved from the data.
Relevance networks reconstruction
When a single dataset is available, the relevance network for the transcription factor TF _{ t } can be easily reconstructed by computing the vector C _{ t } containing n − 1 associations between TF _{ t } and all other probesets i. These association measures are eventually coupled with measures of statistical significance P _{ t } , and the genes Q _{ t } belonging to the reconstructed network can be selected by imposing an appropriate decision threshold θ to either the association or the significance values. When multiple datasets are available, the same procedure can be followed with the vectors C _{ t }_{,}X and P _{ t }_{,}X computed through the metaanalysis or datamerging method X (see Fig. 1).
where $\overline{x},\overline{y}$ are sample means and s _{ x } , s _{ y } sample standard deviations. The Spearman correlation uses the same formula on x and y rankings. The null hypothesis H _{0} : ρ _{ x }_{,}y = 0 can be properly assessed for both correlation measures [59], [60].
Performances metrics
The correlation values C _{ t }_{,}X and corresponding pvalues P _{ t }_{,}X are compared with the list of known interactions I _{ t } in order to assess X effectiveness in correctly retrieving gene regulatory relationships. In the ideal case high correlations would be assigned exclusively to actual interactions, while any other genepair would be reported as weakly associated. However, in real cases I _{ t } is probably incomplete and noisy, undermining a fair evaluation. Moreover, only a handful of regulatory relationships are usually known for each gene, while the number of possible genepairs is two or three order of magnitudes larger, dramatically increasing the possibility of retrieving false positives due to mere multipletesting issues.
In order to better characterize the performances of each method, we adopted several metrics commonly used in the machinelearning area of Information Retrieval, a field whose operational settings strictly resemble the one depicted above [61].
The Receiver Operator Characteristic (ROC) Area Under the Curve (AUC, [62]) is a metric that integrate sensitivity and specificity information for all possible values of the decision threshold θ. The AUC ranges in the interval [0, 1], where one corresponds to perfect rank (i.e., all true interactions are at the top of C _{ t }_{,}X ), 0.5 corresponds to random ordering and zero to perfectly inverted predictions. Interestingly, AUC values can be interpreted as the probability of correctly ranking two randomly selected interactions according to their status (true/false interaction).
The Area Under the Precision Recall Curve (AUPRC, [63]) is similar to the AUC and summarizes precision and recall information for varying θ. With respect to AUC, AUPRC has demonstrated to have higher discriminative power when very few positive cases (true interactions) are available [64].
Both AUC and AUPRC evaluate the whole list of correlation values, providing a measure of global performance. However, researchers using network reconstruction algorithms often restrict their attention to a few predicted genegene interactions, the ones deemed more reliable. These interactions are ideal candidate for subsequent in vitro or in vivo experimental validation, which are usually too expensive or demanding to be performed on all predictions.
Thus, we are interested in evaluating the partial performances of the methods on the interactions corresponding to the highest correlations in C _{ t }_{,}X . To this end we use a version of AUC known as partial AUC (pAUC, [65]), which considers a restricted region of the whole sensitivity / specificity curve (specificity in [0, 0.2] for our experimentations). The McClish formula [65] standardizes pAUC values in [0, 1], allowing the pAUC to have the same probabilistic interpretation of the AUC.
We also devised a new metric that is specific for assessing partial performances, namely the Area Under the False Discovery Rate (AUFDR). Let Q _{ t }_{,}X_{,}R be the list of R interactions with highest correlation according to C _{ t }_{,}X . The AUFDR integrates the proportion of correctly predicted interactions in the range [1, R], i.e., $\mathit{AUFDR}={\displaystyle \sum _{i=1:R}}\frac{{\mathit{Q}}_{t,X,i}\cap {\mathit{I}}_{t}}{i}$, and it is subsequently normalized in order to assume values in [0, 1], with one indicating that all top R predictions are known interactions.
In all our analysis we use in turn vectors C _{ t }_{,}X and P _{ t }_{,}X for evaluating methods’ performances. Highly significant associations often corresponds to pvalues that are indistinguishable from zero at machine precision, leading to ties in P _{ t }_{,}X that severely affect performance computations. In contrast, the vector C _{ t }_{,}X does not suffer from this drawback, varying in ranges that seldom include particularly low values. The impact of these issues on performance assessment is discussed in detail in the Result section.
Integrative approaches
MA, DM and baseline methods included in the experimentations. For each method a synthetic description is provided describing its main characteristics
Approach  Method  Description 

MetaAnalysis  Fisher  Combines pvalues in a statistic that follows a χ ^{2} distribution. 
Stouffer  Transforms pvalues in Zscores and merges them with a weighted average  
FixedEffects  Assumes all studies measure the same effect and combines estimates with a weighted average  
RandomEffects  Combines estimated effects by assuming that each study measures a biased version of the true effect  
FREffects  Estimates whether Fixed or RandomEffects assumptions hold and use one of the two methods accordingly  
RankProduct  Combines statistics’ ranks by multiplication.  
DataMerging  SVA  Provides surrogate variables that approximate the effect of confounding factors and batcheffects present in the data 
Combat  Assumes additive and multiplicative batcheffects and estimates them by pooling information across genes  
RMA  Normalizes data across expression profiles using Quantile Normalization  
RMACombat  Applies RMA and Combat one after the other  
Scaling  Scales the value of each gene in each study to have zero mean and unitary standard deviation  
NoCorrection  Merges samples from all studies in a single dataset without any correction  
Baseline  SingleDatasets  Computes the performance that is expected by analyzing a single, randomly chosen dataset 
RandomGuessing  Produces randomly sampled correlation values 
Metaanalysis
Metaanalysis has been described as “the process of synthesizing data from a series of separate studies” [66]. A typical MA application investigates a set of statistics (e.g., pvalues) derived in different studies and produces a summary statistic, for example a weighted average (see Fig. 1). Other, sophisticated MA approaches exist for more complex applications, for example metaregression [67], where differences in the design of the studies or the sampling strategy are treated with a regression approach.
We selected from the literature five MA methods whose operation matches the above definition and that are based on different assumptions and theoretical backgrounds.

Fisher method [68] is one of the first known MA approaches. Under the assumption that all P _{ t }_{,}i_{,}X ^{1} , …, P _{ t }_{,}i_{,}X ^{ m } assess the same nullhypothesis in multiple, independent studies following an identical design, then the quantity ${\chi}_{t,i}^{2}=2\xb7{\displaystyle \sum _{j}}log\left({P}_{t,i}^{j}\right)$ follows a χ ^{2} distribution with 2 · m degrees of freedom, and can be used for calculating the summarized pvalue P _{ t }_{,}i_{,}Fisher . We set C _{ t }_{,}i_{,}Fisher = χ _{ t }_{,}i ^{2} .

Stouffer method [69] is conceptually similar to Fisher’s, although it combines Zscores defined as Z _{ t }_{,}i ^{ j } = Φ^{− 1}(P _{ t }_{,}i ^{ j } ) instead of pvalues. Φ^{− 1} is the inverse of the standard normal cumulative distribution function, and the statistic ${Z}_{t,i}=\frac{{\displaystyle {\sum}_{j}}{Z}_{t,i}^{j}}{\sqrt{m}}$ follows a standard normal distribution that can be used for deriving P _{ t }_{,}i_{,}Stouffer . Also in this case C _{ t }_{,}i_{,}Stouffer = Z _{ t }_{,}i

FixedEffects approach [70] assumes that all studies investigate the same correlation Ĉ _{ t }_{,}i , whose estimation is biased by a studyspecific error factor, i.e., C _{ t }_{,}i ^{ j } = Ĉ _{ t }_{,}i + ϵ _{ j } , j = 1 … m. On the basis of these assumptions, C _{ t }_{,}i_{,}Fixed can be computed through a weighted mean${C}_{t,i,\mathit{Fixed}}=\frac{{\displaystyle {\sum}_{j}}{w}_{j}\xb7{C}_{t,i}^{j}}{{\displaystyle {\sum}_{j}}{w}_{j}}$
where the weights w _{ j } are inversely proportional to the correlations variances. P _{ t }_{,}i_{,}Fixed is computed by comparing C _{ t }_{,}i_{,}Fixed Fisher ztransformation against its theoretical normal distribution [71].

RandomEffects models do not assume that each study estimates the same correlation Ĉ _{ t }_{,}i ; the datasets are assumed to be enough ‘similar’ to be jointly analyzed, but at the same time the ground truth correlation ${\widehat{C}}^{j}{}_{t,i}$ may differ across studies. Particularly, ${\widehat{C}}^{j}{}_{t,i},\phantom{\rule{0.25em}{0ex}}j=1\dots m$ are assumed to be sampled from a distribution with mean Ĉ _{ t }_{,}i and unknown variance $\widehat{\tau}$, while in turn each C _{ t }_{,}i ^{ j } is an estimation of its corresponding ${\widehat{C}}^{j}{}_{t,i}$ subject to a studyspecific error ϵ _{ j } , i.e., ${C}_{t,i,X}^{j}={\widehat{C}}^{j}{{}_{t,i}}_{,X}+{\u03f5}_{j}$.
The summary correlation C _{ t }_{,}i_{,}Random is estimated with the FixedEffects weighted average, with the weights w _{ j } computed as inversely proportional to the sum of the studyspecific and betweenstudy variance, i.e., ${w}_{j}=\frac{1}{{v}_{j}+\widehat{\tau}}$. Interestingly, if all studies share the same ground truth effect (i.e., $\widehat{\tau}=0$), then the RandomEffects model reduces to the FixedEffects one.

The RankProduct method differs from the previous approaches since it combines correlation ranks instead of correlations or pvalues [72]. The vector C _{ t } ^{ j } containing the correlations between the transcription factor t and all other probesets in study j can be easily converted in a vector of ranks R _{ t } ^{ j } , where higher correlations rank first. The RankProduct method combines ranks R _{ t }_{,}i ^{1} , …, R _{ t }_{,}i ^{ m } from different studies by multiplying them: ${R}_{t,i,\mathit{Rank}\mathsf{Product}}={\displaystyle {\prod}_{j}{R}_{t,i}^{j}}$. True genegene interactions are then expected to be placed on the top of the vector R _{ t } of combined ranks.
The RankProduct is actually a special case of a larger family of rankbased methods [73], differing among each other mainly for the formula used for combining the single ranks (e.g., summation, average, product). Some authors have reported that rankbased methods can provide more reliable results than classical MA methods when heterogeneous datasets are analyzed together [74].
A common drawback of these methods is that statistical significance must be assessed through permutationbased procedures, which usually are quite computationally demanding. However, in this study we adopt a recently introduced formula [75] for computing approximate, yet accurate pvalues for the RankProduct results.
These five approaches were implemented in R and included in the analyses. Moreover, we included one further method, namely the FREffects model, based on a combination of Fixed and RandomEffects models. In short, the FREffect model first estimates $\widehat{\tau}$, and if the betweenstudy variance is significantly different from zero (Cochran’s Q test [76] pvalue < 0.1) the RandomEffects model is used, otherwise the FixedEffects is used.
Datamerging
In contrast with metaanalysis, the datamerging approach pools all data together and then estimates statistics on the resulting dataset. Expression profiles measured in different studies, or even in the same study but in different batches, present systematic variations in their distribution [8], and these variations are detrimental for the analysis. Batcheffect removal methods attempt to alleviate this problem, by identifying and removing systematic biases. We selected five different DM approaches, among the ones most often used on microarray data:

Combat is a method specifically devised for removing batch effects in geneexpression data [35]. This method assumes the batches to be known, and that systematic variations follow an additivemultiplicative model${y}_{sjk}={\alpha}_{j}+\mathbf{X}{\mathit{\beta}}_{j}+{\gamma}_{jk}+{\delta}_{jk}{\u03f5}_{sjk}^{\text{'}}$
where y _{ sjk } is the expression of gene j in sample s in batch k, a _{ j } is the overall gene expression of j, X and β _{j} are respectively the design matrix and the genespecific coefficients vector, while the remaining terms are the additive and multiplicative batch effects, respectively. These effects are estimated through an approach that uses hyperpriors and pool information across all available probesets. We used the Combat implementation of the R package sva in all analyses.

RMA (Robust Multiarray Average, [23]) is an algorithm for background correcting, normalizing and summarizing microarray data. The normalization phase is carried out with the Quantile Normalization method, that substitutes the expression value of each probe t with the average expression calculated over all probes that rank equally across all available profiles. In our experimentation we used the RMA function of the R package affy.

RMACombat. We also include the hybrid solution RMACombat, consisting in a pipeline that first applies the RMA method and then Combat.

Surrogate Variable Analysis (SVA). The SVA approach introduced by Leek and Storey [36] attempts to identify and remove all confounding factors negatively affecting the analysis, including eventual batcheffects. Similarly to Combat, this method explicitly takes in account the study design. In the common case–control scenario, the SVA model is the following: ${y}_{sj}={\alpha}_{j}+{\mathrm{\beta}}_{\mathrm{j}}{\mathrm{x}}_{\mathrm{s}}+{\displaystyle {\sum}_{\mathrm{k}}}{\gamma}_{jk}{g}_{ks}+{\u03f5}_{sj}$, where y _{ sj } is the expression of gene j in sample s, a _{ j } is the overall gene expression of j, x_{s} is a binary variable indicating whether sample s is a case or a control, β _{ j } represents the average difference in expression between the two conditions in gene j, and ϵ_{ sj } is a random error. The term ${\sum}_{\mathrm{k}}}{\gamma}_{jk}{g}_{ks$ represents the cumulative effect on y _{ sj } of K unknown confounding factors g _{ ks } , multiplied by their genespecific coefficients γ _{ jk } . SVA attempts to estimate confounding factors’ global effect by deriving a set of surrogate variables h _{1}, h _{2}, …, h _{ K } whose span covers the same linear space spanned by the vectors g _{ k } . These surrogate variables can then be used as covariates in all subsequent analysis in order to rule out the effect of the unknown confounding factors.
To the best of our knowledge, no previous study applied SVA on genenetwork reconstruction, and a detailed discussion about how to adapt SVA for this task is reported in the Additional file 2. Briefly, assuming that each TF_{ i } has a significant effect only on a restricted subset of genes, all major systematic variations involving a large portion of transcripts should be due to experimental factors, batcheffects or confounding factors. Given this assumption, for the data collections used in this study the SVA model becomes: ${y}_{sj}={\alpha}_{j}+{\displaystyle {\sum}_{\mathrm{k}}}{\gamma}_{jk}{g}_{ks}+{\u03f5}_{sj}$. From a computational perspective this formulation implies that the surrogate variables are estimated by applying a Singular Value Decomposition to the expression matrix, after having centered each gene on its mean. The estimated surrogate variables are then used for computing the vectors C _{ t }_{,}SVA and P _{ t }_{,}SVA . This means that C _{ t }_{,}i_{,}SVA is a partial correlations [77], quantifying the linear association between the transcription factor TF _{ t } and gene i given the information embedded within the surrogate variables.
Scaling the expression values of each dataset so that all genes have the same mean and standard deviation is a further suitable approach. In particular, we scale the expression of each probeset in each dataset to zero mean and unitary standard deviation.

Nocorrection. The naïve solution of pooling all data together without removing systematic variations is included in the analysis as well, in order to contrast the effectiveness of the other methods.
Baseline approaches
A relevant question is whether employing complicate statistical techniques in order to coanalyze several datasets actually provides any advantage with respect to analyze a single dataset in isolation. DM and MA methods heavily process the data, following assumptions that are not always satisfied. Consequently, these methods may induce biases rather than remove batcheffects. To answer this question we adopted a SingleDataset approach, consisting in separately analyzing each dataset and then averaging the performance within each data collection. More in detail, let π_{Π} ^{1} … π_{Π} ^{ m } be the performances obtained on datasets D _{1}, …, D _{ m } in collection M by using the metric Π. The SingleDataset approach calculates a weighted performance ${\pi}_{\mathrm{\Pi}}=\frac{{\displaystyle \sum}{s}_{j}\xb7{\mathrm{\pi}}_{\mathrm{\Pi}}^{j}}{{\displaystyle \sum}{s}_{j}}$, that can be interpreted as the result to be expected if a single dataset randomly chosen from the collection is analyzed.
Finally, we also include a RandomGuessing approach consisting in randomly sampling C _{ t }_{,}i from a uniform distribution. Theoretically, we expect this method to achieve the lowest performances among all other algorithms.
Reconstruction of the Ikaros interaction network on PBMC data
Generalizing the results of this work to any network learning algorithm is out of the scope of this paper. However, we perform a proofofconcept application in order to provide initial evidence that the results obtained in the context of relevance networks, arguably the simplest type of reverse engineering networks, are also valid when more complicated algorithms are used.
To this purpose, we analyze a set of Peripheral Blood Mononuclear Cells (PBMC) gene expression datasets extracted from GEO. We attempt to reconstruct the regulatory network of the Ikaros transcription factor by applying the SES (Statistically Equivalent Signatures) algorithm [78]. The predictions were validated against a list of experimentally determined Ikaros targets as retrieved from the literature [79], [80]. The IKZF1 gene encodes the transcription factor that belongs to the family of zincfinger DNA proteins [81]. Ikaros displays crucial functions in the fetal and adult hemolymphopoietic system. It functions as a regulator of lymphocyte differentiation and its loss has been connected with the development of lymphoid leukemia.
The following sections describe in detail the used data and the analysis pipeline.
PBMC Compendium and Ikaros known regulatory relationships
We assembled a compendium of seven public microarray gene expression datasets of human PBMC. PBMC are the populations of blood cells having a round nucleus that constitute a pivotal part of the peripheral immune system. These include lymphocytes (T cells, B cells and NK cells), monocytes, macrophages, dendritic cells. Their abundance and the simplicity of their extraction (an intravenous injection is sufficient for collecting a sample) render them interesting candidate for scientific studies. Note that the selection of human microarray datasets serves for further testing the validity of our results in the spectrum of human subject studies.
For assembling this compendium, only studies comprising randomlyselected healthycontrol subjects were taken in consideration. In particular, for each study only the control group was retained for our analysis. The idea is that control groups formed by randomly chosen healthy individuals can be considered as independent sampling from the same population, and are thus suitable for being analyzed through MA and DM methods. In total, the collection counts 181 expression profiles all measured with the Affymetrix Human Genome U133 Plus 2.0 Array (41245 probesets). The expression of Ikaros is measured by nine of these probesets. We used in turn each of these probesets and we merged together their respective networks.
Finally, a list I _{ Ikaros } of Ikaros regulatory relationships was built from literature information and computational analyses. Particularly, we built a list I _{ Ikaros } containing 2658 unique interactions by merging together 2497 Ikaros targets identified through Chipseq and microarray analysis [79] along with 137, 115, 133 and 154 Ikarosgene interactions found in CD43 (young mature Bcells) CD19+ (mature Bcells), Tnaïve and Treg cells, respectively. These latter lists were derived from the analysis of DNAseseq data from the ENCODE project [80], following the approach presented in [82]. Briefly, DNase hypersensitive regions (DHS) were identified using Hotspot v4 [83], and DHS peaks were subsequently scanned for footprints of DNAbinding proteins by the Wellington algorithm using pyDNase [84]. Transcription start sites (TSS) were obtained from the University of California, Santa Cruz (UCSC) Genes Track, and the region flanking 5Kb upstream to 5Kb downstream of the TSS was defined as the promoter region. The footprints within the promoters were subsequently scanned for identifying binding motifs specific for 483 transcription factors, using the TRANSFAC database [85] and the Match algorithm [86]. Genes whose promoter contained a motif instance were considered as potential regulatory targets. This allowed identifying (a) candidate regulators and (b) candidate targets for each TF, including Ikaros.
Deconvolution of PBMC and outlier identification
The presence of different celltypes in the PBMC samples implies that expression values are averaged over a mixture of different distributions. Subjects included in each study may have significantly different cell proportions, and this in turn may generate correlations among probesets that do not reflect any underlying genegene interaction [4]. In order to avoid this scenario, we estimate the cellproportions for each sample through a deconvolution approach and then we eliminate subjects that appear to be outliers and that may prejudice the analysis. We use the deconvolution method introduced by Abbas and coauthors [87] and implemented in the CellMix R package [88]. This approach uses a fixed set of expression signatures characterizing the expression profiles of seventeen different cell types in order to estimate the proportion of these cell types in the PBMC data. The multivariate outlier detection was conducted by using the PCout [89] algorithm from the “mvoutlier” R package [90]. This algorithm utilizes simple properties of principal components and is particularly effective in highdimensional data.
SES algorithm
The SES algorithm [78] as implemented in the ‘MXM’ R package was used in order to reconstruct Ikaros regulatory network. The SES algorithm attempts to identify highly predictive signatures for a given target. In this context, a gene expression signature consists of the minimal set of gene expression measurements that is necessary in order to predict the value of Ikaros. As demonstrated in [91], the signature of a target corresponds, under broadly accepted assumptions, to the variables that are adjacent to the target in the Bayesian Network representing the data distribution at hand. Consequently, these gene expression signatures also correspond to the set of potential regulators/targets of Ikaros in the context of the available measurements. Lack of statistical power may make two or more signatures statistically indistinguishable. The SES algorithm is specifically devised in order to cope with this problem and to attempt to retrieve statistically equivalent signatures.
SES belongs to the class of constraintbased, Bayesian Network reconstruction algorithms [92]. While relevance networks assess the presence of genegene interactions through simple pairwise correlations, constraintbased algorithms use tests of conditional independence in order to find variables that are associated to the target given any subset of other measurements. This implies that SES should return only genes whose association with Ikaros is not mediated by any other measured gene. In contrast, relevance network cannot distinguish among direct and indirect associations.
SES requires the user to set a priori two hyperparameters, a threshold for assessing pvalues significance and the size of the maximum conditioning set. In our analyses these hyperparameters were set to 0.01 and 5, respectively. The signatures found on single probesets were merged together, as well as the results retrieved on the nine different probesets measuring Ikaros.
Network reconstruction and validation
Based on our previous findings, we picked the Combat and FixedEffects methods as representatives for the DM and MA approaches, respectively. We also used the NoCorrection and SingleDataset approaches in order to characterize the scenarios where batcheffects are ignored or a randomly chosen dataset of the PBMC collection is analyzed in isolation. For the Combat and NoCorrection approaches the deconvolution and outlier deletion steps were performed on their respective merged datasets, while for the FixedEffects and SingleDatasets methods the two preprocessing steps were performed independently for each study of the PBMC collection.
Network reconstruction performances were measured in terms of precision, recall and odds ratio. Let Q _{ Ikaros }_{,}X be the list of Ikaros interactions retrieved using SES couple with the MA or DM method X, and ¬ I _{ Ikaros } the list of genes that are not part of Ikaros regulatory network (I _{ Ikaros } ∪ ¬ I _{ Ikaros }  = n). Precision is defined as $PRE{C}_{X}=\frac{\left{\mathit{Q}}_{\mathit{Ikaros}}\cap {\mathit{I}}_{\mathit{ikaros}}\right}{{\mathit{Q}}_{\mathit{Ikaros}}}$, and indicates the proportion of actual interactions that are present in the retrieved signature. Recall (or sensitivity) is computed as $\mathit{RECAL}{L}_{X}=\frac{\left{\mathit{Q}}_{\mathit{Ikaros}}\cap {\mathit{I}}_{\mathit{ikaros}}\right}{{\mathit{I}}_{\mathit{Ikaros}}}$, that is the proportion of genes that are in the Ikaros regulatory program and are classified as such.
The odds ratio quantifies the likelihood that a given proportion of regulatory relationships is retrieved by chance, and is computed as $\mathit{OddsRati}{o}_{X}=\frac{PRE{C}_{X}}{PRE{C}_{\mathit{Trivial}}}$, where $PRE{C}_{\mathit{Trivial}}=\frac{{\mathit{I}}_{\mathit{Ikaros}}}{n}$ represents the sensitivity achievable by classifying all n genes as belonging to the Ikaros regulatory program. An odds ratio of one indicates performances that are indistinguishable from random guessing, and we used a hypergeometric test [93] in order to assess the null hypothesis H _{0} : OddsRatio _{ X } = 1.
Results
E. coli and Yeast compendia
All four panels present a similar picture, with several DM and MA methods clustering together and achieving comparable performances, while the RandomGuess, SingleDataset and NoCorrection approaches usually providing significantly worst results. Best performing methods usually present a variability that is smaller than the one of the outperformed methods.
All in all, the results show that systematic biases across studies must be taken into account for retrieving genegene interactions, and that both MA and DM approaches are effective in dealing with such systematic variations.
Retrieving genegene interactions in the Yeast dataset collection have proven to be harder than in E. coli. Performances were generally poorer, with AUC and pAUC values up to 5 point inferior than the corresponding performances in the E. coli compendium, and both AUPRC and AUFDR ranging far below 0.05.
The SVA, Combat, RMACombat, FixedEffect and Scaling methods are confirmed as the best performing methods, occupying the first position in the RankProduct analysis. SVA shows a relatively high variance, indicating that sometimes it fails in reaching the top positions in terms of performances. The RandomGuess approach is stable in last position, followed by the SingleDataset, Stouffer and NoCorrection methods. The two bottom panels in Fig. 3 restrict the RankProduct analysis to the DM and MA methods, respectively. The SVA, RMACombat and Combat method should be the methods of choice within the DM approaches, while FixedEffects, RankProduct and Fisher excel among the MA methods.
Similar figures restricting the RankProduct analysis to Global and Local performances only, as well as Pearson and Spearman correlations and E. coli versus Yeast are available in the Additional file 1. The conclusions that can be drawn from these figures are in close agreement to the ones discussed until now.
Proportion of pvalues being exactly zero for E. coli and Yeast, Pearson correlation results
E. coli  Yeast  

RankProduct  0 %  0 % 
FREffects  0 %  0 % 
RandomEffects  0 %  0 % 
FixedEffects  0 %  0 % 
Fisher  32.5 %  19.1 % 
Stouffer  9.5 %  11.7 % 
SVA  0 %  0 % 
RMACombat  9.0 %  2.5 % 
Combat  9.4 %  2.8 % 
Scaling  9.1 %  2.7 % 
RMA  15.6 %  13.4 % 
NoCorrection  85.6 %  98.7 % 
RandomGuess  0 %  0 % 
Synthetic data
Also for the synthetic data results computed using the pvalue vectors P _{ t }_{,}X show a decrease in performance (Additional files 3 and 4). Particularly, across all simulation scenarios, correlation functions and performance metrics results based on correlations outperform the corresponding results based on pvalues 52 % of the times. The average difference in performance varies depending on the metric: 0.001 for AUC, 0.12 for AUPRC, 0.01 for pAUC and 0.1 for AUFDR. Interestingly, this effect becomes more marked with increasing sample size and decreasing systematic bias (Additional file 1: Figures S10 – S41), confirming that the performance loss is due to an excess of statistical power that generate zero or close to zero pvalues.
Reconstruction of the Ikaros interaction network on PBMC data
Reconstruction of Ikaros regulatory program in PBMC data collection. For each method the number of predicted and correctly retrieved interactions is reported, along with the odds ratio, precision and recall performances (see text for further details on these metrics)
Method  # Predicted interactions  # Retrieved interactions  Odds ratio  Odds ratio significance  Precision  Recall 

Combat  82  21  2.2726  0.00022  0.2561  0.0093 
FixedEffect  102  21  1.827  0.00440  0.2059  0.0093 
SingleDataset  387  70  1.6513  [<0.0001  0.65914]  0.1861  0.0310 
NoCorrection  113  21  1.6491  0.01435  0.185841  0.0092 
Discussion
In the present work we have compared two different approaches, DataMerging and MetaAnalysis, on the reconstruction of relevance networks in collection of microarray, geneexpression data. The comparison has been performed on two compendia of studies retrieved from the literature, on Escherichia coli and Yeast, respectively. Further analyses on simulated data have been used for strengthening and deepening the conclusion of the comparison. Finally, a contrived casestudy on human PBMC data have been presented for showing how the results of this study might transfer on more sophisticated network reconstruction approaches.
 1.
Batcheffects must be carefully taken into consideration for retrieving genegene interactions from microarray data. The naïve solution of ignoring systematic biases (NoCorrection approach) was outperformed by the other methods in all experimentations. This result supports our claim that batcheffects can hide actual dependencies between the measured quantities or create spurious associations between elements that are not functionally related.
 2.
DM and MA methods are equally effective in contrasting batcheffects. According to the results it is not possible to state that one approach is universally better than the other one. However, within their respective approaches, and acknowledging that the results vary across the performed experimentations, the SVA/Combat/RMACombat and the FixedEffects methods have usually achieved the best performances. In contrast, the SingleDataset method usually provides poorer results, supporting the hypothesis that integratively analyzing multiple datasets leads to improved and more robust findings.
 3.
Correlation statistics should be preferred to pvalues in ranking associations. Performances have proven to drastically change depending on whether they are computed on correlations or pvalues. We have observed that this effect is mainly due to ties generated by zero or close to zero pvalues.
This study presents a number of limitations that should be carefully considered when implementing the recommendations above. First, withinstudy batcheffects were only partially addressed, by preprocessing each single dataset with RMA. While the Quantile Normalization step included in the RMA algorithm should have removed at least part of the withinstudy biases, it is known that this approach is not optimal [5]. This is also demonstrated by our results, where the RMA method never achieved the best performances. Secondly, the design of the comparison slightly advantages DM method, particularly because all datasets belong to the same data collection and thus measure the same probesets. When this is not the case (e.g., when data from different microarray platforms are coanalyzed), DM method are not easily applicable, while MA methods can be straightforwardly used. Finally, we also notice that in our experimentations we did not explore joint uses of correlations and pvalues for ranking genegene interactions. A possible practice is to filter the candidate interactions by using the pvalues and then raking the most significant genepairs according to their correlation values.
The SVA method merits a separate note. To the best of our knowledge, this is the first study employing this methodology in the context of retrieving genegene interactions. Adapting SVA for this task has required a dedicated substudy, reported and commented in the Additional file 2. Despite the excellent performances obtained on the real data, we notice that this method performed quite poorly on the synthetic data. This drop in performances is particular evident for large samples sizes. A possible explanation might be the inclusion of several irrelevant surrogate variables when large datasets are analyzed: out of 60 surrogate variables produced when 2500 samples are available in the merged dataset, only 3 explain more than 1 % of variance. These noisy variables might in turn make the estimation of partial correlations and respective pvalues quite inaccurate. Further studies are needed in order to better investigate this phenomenon.
Future work will also focus on the generalization of the present results towards more sophisticated network reconstruction algorithms, particularly Bayesian and Causal Networks [94]. We already presented a first, contrived casestudy where we have reconstructed (part of) the regulatory network of the Ikaros transcription factor from human PBMC data. This casestudy presented several characteristics that made it harder to solve than the reconstruction of the E. coli and Yeast regulatory networks: different celltype proportions across subjects, a manytomany correspondence between genes and probesets, the list of known interactions was partially derived from animal models instead than human data. Moreover, we used a constraintbased network reconstruction algorithm instead of relevance networks. Despite all these difference both Combat and FixedEffects method demonstrated to be able to retrieve subsets of genes significantly enriched for known Ikaros interactions and to outperform both the NoCorrection and SingleDataset approach, as expected from the results of the comparison presented in this study.
Conclusions
Batcheffects should be carefully taken into account when retrieving genegene interactions, and researchers can adopt either a DM or MA approach depending on the specific application at hand. Correlation statistics should be preferred over pvalues for assessing and comparing the strength of associations, especially for large sample sizes.
Availability of supporting data
The data sets use in this article are available from their respective repositories. See Tables S1 to S3 in the Additional file 1 for the appropriate references.
Code for replicating the analysis is available at http://www.mensxmachina.org/.
Additional files
Notes
Declarations
Acknowledgements
This work was funded by the STATegra EU FP7 project, No 306000, and by the European Research Council (ERC, project No 617393, “CAUSALPATH  Next Generation Causal Analysis project”).
Declarations
Publication costs for this article were funded by a grant from the European Research Council (ERC, project No 617393, “CAUSALPATH  Next Generation Causal Analysis project”).
This article has been published as part of BMC Bioinformatics Volume 17 Supplement 5, 2016: Selected articles from Statistical Methods for Omics Data Integration and Analysis 2014. The full contents of the supplement are available online at http://bmcbioinformatics.biomedcentral.com/articles/supplements/volume17supplement5.
Authors’ Affiliations
References
 Hartemink AJ: Reverse engineering gene regulatory networks. Nat Biotechnol. 2005, 23: 554555. 10.1038/nbt0505554.View ArticlePubMedGoogle Scholar
 Rachel Wang YX, Huang H: Review on statistical methods for gene network reconstruction using expression data. J Theor Biol. 2014, 362: 5361. 10.1016/j.jtbi.2014.03.040.View ArticleGoogle Scholar
 Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci U S A. 2000, 97: 1218212186. 10.1073/pnas.220392197.View ArticlePubMedPubMed CentralGoogle Scholar
 Lagani V, Tsamardinos I, Triantafillou S: Learning from mixture of experimental data: a constraintbased approach. SETN’12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications. Volume 7297. Edited by: Maglogiannis I, Plagianakos V, Vlahavas I. 2012, Springer Berlin Heidelberg, Berlin, Heidelberg, 124131. [Lecture Notes in Computer Science]Google Scholar
 Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA. Tackling the widespread and critical impact of batch effects in highthroughput data. Nat Rev Genet. 2010;11:733–9.View ArticlePubMedGoogle Scholar
 Ramasamy A, Mondry A, Holmes CC, Altman DG: Key issues in conducting a metaanalysis of gene expression microarray datasets. PLoS Med. 2008, 10: 13201332.Google Scholar
 Warnat P, Eils R, Brors B: Crossplatform analysis of cancer microarray data improves gene expression based classification of phenotypes. BMC Bioinform. 2005, 6: 26510.1186/147121056265.View ArticleGoogle Scholar
 Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, WeissSolís DY, Duque R, Bersini H, Nowé A. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform. 2013;14:469–90.View ArticlePubMedGoogle Scholar
 Langfelder P, Mischel PS, Horvath S: When is hub gene selection better than standard metaanalysis?. PLoS One. 2013, 8: e6150510.1371/journal.pone.0061505.View ArticlePubMedPubMed CentralGoogle Scholar
 Campain A, Yang YH: Comparison study of microarray metaanalysis methods. BMC Bioinform. 2010, 11: 40810.1186/1471210511408.View ArticleGoogle Scholar
 Wang K, Narayanan M, Zhong H, Tompa M, Schadt EE, Zhu J: Metaanalysis of interspecies liver coexpression networks elucidates traits associated with common human diseases. PLoS Comput Biol. 2009, 5: e100061610.1371/journal.pcbi.1000616.View ArticlePubMedPubMed CentralGoogle Scholar
 Huttenhower C, Hibbs M, Myers C, Troyanskaya OG: A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics. 2006, 22: 28907. 10.1093/bioinformatics/btl492.View ArticlePubMedGoogle Scholar
 Steele E, Tucker A: Consensus and Metaanalysis regulatory networks for combining multiple microarray gene expression datasets. J Biomed Inform. 2008, 41: 91426. 10.1016/j.jbi.2008.01.011.View ArticlePubMedGoogle Scholar
 Nazri A, Lio P: Investigating metaapproaches for reconstructing gene networks in a mammalian cellular context. PLoS One. 2012, 7: e2871310.1371/journal.pone.0028713.View ArticlePubMedPubMed CentralGoogle Scholar
 RodriguezZas SL, Ko Y, Adams HA, Southey BR: Advancing the understanding of the embryo transcriptome coregulation using meta, functional, and gene network analysis tools. Reproduction. 2008, 135: 213224. 10.1530/REP070391.View ArticlePubMedGoogle Scholar
 Srivastava GP, Li P, Liu J, Xu D: Identification of transcription factor’s targets using tissuespecific transcriptomic data in Arabidopsis thaliana. BMC Syst Biol. 2010, 4 (Suppl 2): S210.1186/175205094S2S2.View ArticlePubMedPubMed CentralGoogle Scholar
 Varrault A, Gueydan C, Delalbre A, Bellmann A, Houssami S, Aknin C, Severac D, Chotard L, Kahli M, Le Digarcher A, Pavlidis P, Journot L: Zac1 regulates an imprinted gene network critically involved in the control of embryonic growth. Dev Cell. 2006, 11: 711722. 10.1016/j.devcel.2006.09.003.View ArticlePubMedGoogle Scholar
 Niida A, Imoto S, Nagasaki M, Yamaguchi R, Miyano S: A novel metaanalysis approach of cancer transcriptomes reveals prevailing transcriptional networks in cancer cells. Genome Inform. 2010, 22: 121131.PubMedGoogle Scholar
 Bell D, Berchuck A, Birrer M, Chien J, Cramer DW, Dao F, Dhir R, DiSaia P, Gabra H, Glenn P, Godwin AK, Gross J, Hartmann L, Huang M, Huntsman DG, Lacocca M, Imielinski M, Kalloger S, Karlan BY, Levine DA, Mills GB, Morrison C, Mutch D, Olvera N, Orsulic S, Park K, Petrelli N, Rabeno B, Rader JS, Sikic BI, et al. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;609–615.Google Scholar
 Page I: Fixedeffect versus randomeffects models. Introd Metaanal. 2009, 21: 450Google Scholar
 Segal E, Friedman N, Koller D, Regev A: A module map showing conditional activity of expression modules in cancer. Nat Genet. 2004, 36: 10908. 10.1038/ng1434.View ArticlePubMedGoogle Scholar
 Tseng GC, Ghosh D, Feingold E: Comprehensive literature review and statistical considerations for microarray metaanalysis. Nucleic Acids Res. 2012, 40: 378599. 10.1093/nar/gkr1265.View ArticlePubMedPubMed CentralGoogle Scholar
 Bolstad BM, Irizarry R, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19: 185193. 10.1093/bioinformatics/19.2.185.View ArticlePubMedGoogle Scholar
 Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, Liu C: Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One. 2011, 6: e1723810.1371/journal.pone.0017238.View ArticlePubMedPubMed CentralGoogle Scholar
 Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Largescale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5:e8.View ArticlePubMedPubMed CentralGoogle Scholar
 Bevilacqua V, Pannarale P, Abbrescia M, Cava C, Paradiso A, Tommasi S: Comparison of datamerging methods with SVM attribute selection and classification in breast cancer gene expression. BMC Bioinform. 2012, 13 (Suppl 7): S910.1186/1471210513S7S9.View ArticleGoogle Scholar
 Giorgi FM, Bolger AM, Lohse M, Usadel B: Algorithmdriven artifacts in median Polish summarization of microarray data. BMC Bioinform. 2010, 11: 55310.1186/1471210511553.View ArticleGoogle Scholar
 Irizarry RA, Hobbs B, Collin F, BeazerBarclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–64.View ArticlePubMedGoogle Scholar
 Carrera J, Rodrigo G, Jaramillo A, Elena SF: Reverseengineering the Arabidopsis thaliana transcriptional network under changing environmental conditions. Genome Biol. 2009, 10: R9610.1186/gb2009109r96.View ArticlePubMedPubMed CentralGoogle Scholar
 Frericks M, Meissner M, Esser C: Microarray analysis of the AHR system: tissuespecific flexibility in signal and target genes. Toxicol Appl Pharmacol. 2007, 220: 32032. 10.1016/j.taap.2007.01.014.View ArticlePubMedGoogle Scholar
 Hubbell E, Liu WM, Mei R: Robust estimators for expression analysis. Bioinformatics. 2002, 18: 158592. 10.1093/bioinformatics/18.12.1585.View ArticlePubMedGoogle Scholar
 Wu Z, Irizarry RA: Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J Comput Biol. 2005, 12: 88293. 10.1089/cmb.2005.12.882.View ArticlePubMedGoogle Scholar
 Li C, Wong WH: Modelbased analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A. 2001, 98: 316. 10.1073/pnas.98.1.31.View ArticlePubMedGoogle Scholar
 Sedaghat N, Saegusa T, Randolph T, Shojaie A: Comparative study of computational methods for reconstructing genetic networks of cancerrelated pathways. Cancer Inform. 2014, 13 (Suppl 2): 5566.PubMedPubMed CentralGoogle Scholar
 Johnson WE, Li C, Rabinovic A: Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007, 8: 11827. 10.1093/biostatistics/kxj037.View ArticlePubMedGoogle Scholar
 Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007, 3: 172435. 10.1371/journal.pgen.0030161.View ArticlePubMedGoogle Scholar
 Benito M, Parker J, Du Q, Wu J, Xiang D, Perou CM, Marron JS: Adjustment of systematic microarray data biases. Bioinformatics. 2003, 20: 105114. 10.1093/bioinformatics/btg385.View ArticleGoogle Scholar
 Sims AH, Smethurst GJ, Hey Y, Okoniewski MJ, Pepper SD, Howell A, Miller CJ, Clarke RB. The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets  improving metaanalysis and prediction of prognosis. BMC Med Genomics. 2008;1:42.View ArticlePubMedPubMed CentralGoogle Scholar
 Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, Shi T, Tong W, Shi L, Hong H, Zhao C, Elloumi F, Shi W, Thomas R, Lin S, Tillinghast G, Liu G, Zhou Y, Herman D, Li Y, Deng Y, Fang H, Bushel P, Woods M, Zhang J. A comparison of batch effect removal methods for enhancement of prediction performance using MAQCII microarray gene expression data. Pharmacogenomics J. 2010;10:278–91.View ArticlePubMedPubMed CentralGoogle Scholar
 Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L, Shaughnessy JD, Oberthuer A, Thomas RS, Paules RS, Fielden M, Barlogie B, Chen W, Du P, Fischer M, Furlanello C, Gallas BD, Ge X, Megherbi DB, Symmans WF, Wang MD, Zhang J, Bitter H, Brors B, Bushel PR, Bylesjo M, et al. The MicroArray Quality Control (MAQC)II study of common practices for the development and validation of microarraybased predictive models. Nat Biotechnol. 2010;28:827–38.View ArticlePubMedGoogle Scholar
 Wiley: Batch Effects and Noise in Microarray Experiments: Sources and Solutions  Andreas SchererGoogle Scholar
 De Magalhães JP, Curado J, Church GM: Metaanalysis of agerelated gene expression profiles identifies common signatures of aging. Bioinformatics. 2009, 25: 875881. 10.1093/bioinformatics/btp073.View ArticlePubMedPubMed CentralGoogle Scholar
 Hong F, Breitling R: A comparison of metaanalysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics. 2008, 24: 374382. 10.1093/bioinformatics/btm620.View ArticlePubMedGoogle Scholar
 Goonesekere NCW, Wang X, Ludwig L, Guda C: A meta analysis of pancreatic microarray datasets yields new targets as cancer genes and biomarkers. PLoS One. 2014, 9: 113. 10.1371/journal.pone.0093046.View ArticleGoogle Scholar
 Chou HL, Yao CT, Su SL, Lee CY, Hu KY, Terng HJ, Shih YW, Chang YT, Lu YF, Chang CW, Wahlqvist ML, Wetter T, Chu CM. Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees. BMC Bioinform. 2013;14:100.View ArticleGoogle Scholar
 Kim SC, Lee SJ, Lee WJ, Yum YN, Kim JH, Sohn S, Park JH, Lee J, Lim J, Kwon SW. Stouffer’s test in a large scale simultaneous hypothesis testing. PLoS One. 2013;8:1–11.Google Scholar
 Leach SM, Tipney H, Feng W, Baumgartner WA, Kasliwal P, Schuyler RP, Williams T, Spritz R A, Hunter L. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol. 2009;5.Google Scholar
 Chang LC, Lin HM, Sibille E, Tseng GC: Metaanalysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline. BMC Bioinform. 2013, 14: 36810.1186/1471210514368.View ArticleGoogle Scholar
 Taminau J, Lazar C, Meganck S, Nowé A: Comparison of merging and metaanalysis as alternative approaches for integrative gene expression analysis. ISRN Bioinforma. 2014, 2014: 17. 10.1155/2014/345106.View ArticleGoogle Scholar
 R Development Core Team R: R: A Language and Environment for Statistical Computing. 2013, R Foundation for Statistical Computing, ViennaGoogle Scholar
 FreyreGonzales JA, TrevinoQuintanilla LG: Analyzing regulatory networks in bacteria. Nat Educ. 2010, 3: 24Google Scholar
 Salgado H, GamaCastro S, PeraltaGil M, DíazPeredo E, SánchezSolano F, SantosZavaleta A, MartínezFlores I, JiménezJacinto V, BonavidesMartínez C, SeguraSalazar J, MartínezAntonio A, ColladoVides J. RegulonDB (version 5.0): Escherichia coli K12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 2006;34:D394–7.View ArticlePubMedGoogle Scholar
 Storici F, Resnick MA: The delitto perfetto approach to in vivo sitedirected mutagenesis and chromosome rearrangements with synthetic oligonucleotides in yeast. Methods Enzymol. 2006, 409: 329345. 10.1016/S00766879(05)090191.View ArticlePubMedGoogle Scholar
 Teixeira MC, Monteiro P, Jain P, Tenreiro S, Fernandes AR, Mira NP, Alenquer M, Freitas AT, Oliveira AL, SáCorreia I. The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucleic Acids Res. 2006;34:D446–51.View ArticlePubMedGoogle Scholar
 Barabási A: Emergence of scaling in random networks. Science. 1999, 286: 509512. 10.1126/science.286.5439.509.View ArticlePubMedGoogle Scholar
 Csárdi G, Nepusz T: The igraph software package for complex network research. Int J Complex Syst. 2006, 1695 (5): 19.Google Scholar
 Pearson K. Note on Regression and Inheritance in the Case of Two Parents. Proc Royal Soc London (1854–1905). 2006;10:240–242.Google Scholar
 Spearman C: The proof and measurement of association between two things. Int J Epidemiol. 2010, 39: 113750. 10.1093/ije/dyq191.View ArticlePubMedGoogle Scholar
 Kendall M, Stuart A: The advanced theory of statistics, volume 2: inference and relationship. 1973Google Scholar
 Best D, Roberts D: Algorithm AS 89: The upper tail probabilities of Spearman’s rho. J R Stat Soc Ser C. 1975, 24: 377379.Google Scholar
 Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Volume 1. Cambridge, UK: Cambridge University Press; 2008.View ArticleGoogle Scholar
 Fawcett T: An introduction to ROC analysis. Pattern Recognit Lett. 2006, 27: 861874. 10.1016/j.patrec.2005.10.010.View ArticleGoogle Scholar
 Boyd K, Eng KH, Page CD: Area under the precisionrecall curve: point estimates and confidence intervals. Machine learning and knowledge discovery in databases. Lecture notes in computer science volume 8190. 2013, 451466.Google Scholar
 Davis J, Goadrich M: The relationship between precisionrecall and ROC curves. Proc 23rd Int Conf Mach Learn  ICML’06. 2006, 233240. 10.1145/1143844.1143874.View ArticleGoogle Scholar
 McClish DK: Analyzing a portion of the ROC curve. Med Decis Mak. 1989, 9: 190195. 10.1177/0272989X8900900307.View ArticleGoogle Scholar
 Borenstein M, Hedges LV, Higgins JPT, Rothstein HR: Introduction to metaanalysis. 2009View ArticleGoogle Scholar
 Stanley TD, Jarrell SB: Metaregression analysis: a quantitative method of literature surveys. J Econ Surv. 2005, 19: 299308. 10.1111/j.09500804.2005.00249.x.View ArticleGoogle Scholar
 Fisher RA: Statistical methods for research workers. 1925Google Scholar
 van Zwet WR, Oosterhoff J: On the combination of independent test statistics. Ann Math Stat. 1967, 38: 659680. 10.1214/aoms/1177698861.View ArticleGoogle Scholar
 Borenstein M, Hedges LV, Higgins JPT, Rothstein HR: A basic introduction to fixedeffect and randomeffects models for metaanalysis. Res Synth Methods. 2010, 1: 97111. 10.1002/jrsm.12.View ArticlePubMedGoogle Scholar
 Fisher RA, Fisher RA: Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika. 1915, 10: 507521.Google Scholar
 Breitling R, Armengaud P, Amtmann A, Herzyk P: Rank products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 2004, 573: 8392. 10.1016/j.febslet.2004.07.055.View ArticlePubMedGoogle Scholar
 Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Aderhold A, Allison KR, Bonneau R, Camacho DM, Chen Y, Collins JJ, Cordero F, Costello JC, Crane M, Dondelinger F, Drton M, Esposito R, Foygel R, de la Fuente A, Gertheiss J, Geurts P, Greenfield A, Grzegorczyk M, Haury AC, Holmes B, Hothorn T, Husmeier D, HuynhThu VA, et al. Wisdom of crowds for robust gene network inference. Nat Methods. 2012;9:796–804.View ArticlePubMedPubMed CentralGoogle Scholar
 Ciofani M, Madar A, Galan C, Sellars M, MacE K, Pauli F, Agarwal A, Huang W, Parkurst CN, Muratet M, Newberry KM, Meadows S, Greenfield A, Yang Y, Jain P, Kirigin FK, Birchmeier C, Wagner EF, Murphy KM, Myers RM, Bonneau R, Littman DR. A validated regulatory network for Th17 cell specification. Cell. 2012;151:289–303.View ArticlePubMedPubMed CentralGoogle Scholar
 Heskes T, Eisinga R, Breitling R: A fast algorithm for determining bounds and accurate approximate pvalues of the rank product statistic for replicate experiments. BMC Bioinform. 2014, 15: 367Google Scholar
 Higgins JPT, Thompson SG, Deeks JJ, Altman DG: Measuring inconsistency in metaanalyses. BMJ Br Med J. 2003, 327: 557560. 10.1136/bmj.327.7414.557.View ArticleGoogle Scholar
 Fisher RA: The distribution of the partial correlation coefficient. Metron. 1923, 3: 329332.Google Scholar
 Tsamardinos I, Lagani V, Pappas D. Discovering multiple, equivalent biomarker signatures. In 7th Conference of the Hellenic Society for Computational Biology and Bioinformatics (HSCBB12). Heraklion;. 2012, https://sites.google.com/site/hscbb12/program, .
 FerreirósVidal I, Carroll T, Taylor B, Terry A, Liang Z, Bruno L, Dharmalingam G, Khadayate S, Cobb BS, Smale ST, Spivakov M, Srivastava P, Petretto E, Fisher AG, Merkenschlager M. Genomewide identification of Ikaros targets elucidates its contribution to mouse Bcell lineage specification and preBcell differentiation. Blood. 2013;121:1769–82.View ArticlePubMedGoogle Scholar
 The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489: 5774. 10.1038/nature11247.View ArticlePubMed CentralGoogle Scholar
 IKZF1 IKAROS family zinc finger 1 (Ikaros) [Homo sapiens (human)]  Gene  NCBIGoogle Scholar
 Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, Johnson AK, Maurano MT, Humbert R, Rynes E, Wang H, Vong S, Lee K, Bates D, Diegel M, Roach V, Dunn D, Neri J, Schafer A, Hansen RS, Kutyavin T, Giste E, Weaver M, Canfield T, Sabo P, Zhang M, Balasundaram G, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012;489:83–90.View ArticlePubMedPubMed CentralGoogle Scholar
 John S, Sabo PJ, Thurman RE, Sung MH, Biddie SC, Johnson TA, Hager GL, Stamatoyannopoulos JA. Chromatin accessibility predetermines glucocorticoid receptor binding patterns. Nat Genet. 2011;43:264–8.View ArticlePubMedGoogle Scholar
 Piper J, Elze MC, Cauchy P, Cockerill PN, Bonifer C, Ott S: Wellington: a novel method for the accurate identification of digital genomic footprints from DNaseseq data. Nucleic Acids Res. 2013, 41: e20110.1093/nar/gkt850.View ArticlePubMedPubMed CentralGoogle Scholar
 Wingender E, Dietze P, Karas H, Knüppel R: TRANSFAC: A database on transcription factors and their DNA binding sites. Nucleic Acids Res. 1996, 24: 238241. 10.1093/nar/24.1.238.View ArticlePubMedPubMed CentralGoogle Scholar
 Kel AE, Gößling E, Reuter I, Cheremushkin E, KelMargoulis OV, Wingender E: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 2003, 31: 35763579. 10.1093/nar/gkg585.View ArticlePubMedPubMed CentralGoogle Scholar
 Abbas AR, Wolslegel K, Seshasayee D, Modrusan Z, Clark HF: Deconvolution of blood microarray data identifies cellular activation patterns in systemic lupus erythematosus. PLoS One. 2009, 4: e609810.1371/journal.pone.0006098.View ArticlePubMedPubMed CentralGoogle Scholar
 Gaujoux R, Seoighe C: Cell Mix: a comprehensive toolbox for gene expression deconvolution. Bioinformatics. 2013, 29: 22112. 10.1093/bioinformatics/btt351.View ArticlePubMedGoogle Scholar
 Filzmoser P, Maronna R, Werner M: Outlier identification in high dimensions. Comput Stat Data Anal. 2008, 10: 16941711. 10.1016/j.csda.2007.05.018.View ArticleGoogle Scholar
 Filzmoser P, Gschwandtner M. mvoutlier: Multivariate outlier detection based on robust methods. R package version 2.0.6. 2015, https://CRAN.Rproject.org/package=mvoutlier, .
 Tsamardinos I, Aliferis CF: Towards principled feature selection: relevancy, filters, and wrappers. Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. 2003Google Scholar
 Tsamardinos I, Brown LE, Aliferis CF: The maxmin hillclimbing Bayesian network structure learning algorithm. Mach Learn. 2006, 65: 3178. 10.1007/s1099400668897.View ArticleGoogle Scholar
 Rivals I, Personnaz L, Taing L, Potier MC: Enrichment or depletion of a GO category within a class of genes: Which test?. Bioinformatics. 2007, 23: 401407. 10.1093/bioinformatics/btl633.View ArticlePubMedGoogle Scholar
 Pearl J. Causality: Models, Reasoning and Inference. Cambridge, UK: Cambridge University Press; 2009.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.