Detecting disease-associated genes with confounding variable adjustment and the impact on genomic meta-analysis: With application to major depressive disorder
- Xingbin Wang^{1},
- Yan Lin^{1},
- Chi Song^{1},
- Etienne Sibille^{2}Email author and
- George C Tseng^{1, 3, 4}Email author
DOI: 10.1186/1471-2105-13-52
© Wang et al; licensee BioMed Central Ltd. 2012
Received: 18 July 2011
Accepted: 29 March 2012
Published: 29 March 2012
Abstract
Background
Detecting candidate markers in transcriptomic studies often encounters difficulties in complex diseases, particularly when overall signals are weak and sample size is small. Covariates including demographic, clinical and technical variables are often confounded with the underlying disease effects, which further hampers accurate biomarker detection. Our motivating example came from an analysis of five microarray studies in major depressive disorder (MDD), a heterogeneous psychiatric illness with mostly uncharacterized genetic mechanisms.
Results
We applied a random intercept model to account for confounding variables and case-control paired design. A variable selection scheme was developed to determine the effective confounders in each gene. Meta-analysis methods were used to integrate information from five studies and post hoc analyses enhanced biological interpretations. Simulations and application results showed that the adjustment for confounding variables and meta-analysis improved detection of biomarkers and associated pathways.
Conclusions
The proposed framework simultaneously considers correction for confounding variables, selection of effective confounders, random effects from paired design and integration by meta-analysis. The approach improved disease-related biomarker and pathway detection, which greatly enhanced understanding of MDD neurobiology. The statistical framework can be applied to similar experimental design encountered in other complex and heterogeneous diseases.
Background
Microarray experiment enables researchers to examine the expression of thousands of genes in parallel. Differentially expressed (DE) gene detection is one of the most common analyses in microarray. In such an analysis, genes differentially expressed under multiple conditions are detected and are used for generating further biological hypotheses, developing potential diagnostic tools, or investigating therapeutic targets. The extensive applications of microarray technology have led to an explosion of gene expression profiling studies publicly available. However, the noisy nature of microarray data, together with small sample size in each study, often results in inconsistent biological conclusions [1–3]. Therefore, meta-analysis, a set of statistical techniques to combine multiple studies under related research hypotheses, has been widely applied to microarray analysis to increase the reliability and robustness of results from individual studies. In the literature, three major categories of meta-analysis methods have been applied to genomic meta-analysis: combining effect sizes [4, 5], combining p-values [6–8] and combining rank statistics [9, 10]. In general, different approaches have different underlying assumptions and pros and cons in the applications [11, 12].
Major depressive disorder (MDD) is a heterogeneous illness with mostly uncharacterized pathology. Despite several gene expression studies of MDD [13–17] published, the biological mechanisms of MDD remain mostly uncharacterized [18]. Although biomarkers and pathways have been identified in specific studies, the findings are not consistently observed from study to study. This variability may be due to several factors. Firstly, MDD is thought to be a complex and heterogeneous disease [19], associated with multiple genetic, post-translational, and environmental factors. Furthermore, patients might have varying disease severity, with some having psychotic features as well as exposure to a variety of medications and dosage levels to control their illness. Secondly, the genetic disease effects are potentially confounded by many covariates, which include (1) demographic variables such as age, gender and race; (2) clinical variables such as anti-depressant drug usage, death by suicide and alcohol dependence; (3) technical variables inherent in the use of post-mortem brain samples, such as the pH level of brain tissues, brain region and post-mortem interval (PMI). In statistical terms, confounding variables are defined as extraneous variables that can adversely affect the relationship between the independent variable (i.e. disease state) and dependent variable (i.e. gene expression). If the statistical models employed to identify differentially expressed genes fail to incorporate these sources of heterogeneity (potential confounding variables), not only can this reduce the statistical power, but also it will introduce sources of spurious signals to the gene detection. Finally, sample sizes for these studies are generally small (between 10-25 pairs of MDDs and controls) due to the limited availability of suitable brain specimens and the significant costs associated with their collection.
In this paper, we propose a statistical framework to tackle overall weak signal expression profiles in MDD that have small sample size, case-control paired design and confounding covariates in each study. We use a set of five MDD expression profiles as an illustrative example. In the literature, most analyses of similar data structure either ignored the potentially confounding covariates by using paired or unpaired t-test [16, 20, 21] or applied simple linear regression model to incorporate all covariates [22, 23]. The former approach undoubtedly ignored effects from confounding covariates; the latter approach was not efficient or even not applicable when the number of covariates is large and the number of samples in each study is small. In this paper, we will propose a framework that uses a random intercept model (RIM) to account for the case-control paired design and confounding covariates in single study analysis. An improved RIM with novel gene-specific variable selection (namely RIM_minP or RIM_BIC to be introduced later) will be performed to accommodate the small sample size and relatively large number of covariates in individual studies. We will then apply and compare two popular meta-analysis methods: Fisher's method [24, 25] and maximum p-value method [26–28]. The application of this approach to combine five MDD microarray studies show improved DE gene detection competency and superior statistical significance of pathway detection using our proposed method. Simulations considering various correlation structures among disease state, gene expression and covariates will be performed to demonstrate better performance of this framework. Our proposed framework is general and applicable in commonly encountered microarray meta-analysis of complex genetic diseases.
Methods
Description of motivating MDD data
This research is motivated from the meta-analysis of combining five MDD transcriptomic studies. Brain tissues of three patient cohorts (MD1, MD2 and MD3) obtained from different sources at different time were analyzed. For all three patient cohorts, tissues from the anterior cingulated cortex (ACC) brain region were analyzed by microarray experiments independently to generate three microarray studies: MD1_ACC, MD2_ACC and MD3_ACC. Similarly, tissues from the amygdala (AMY) brain region in MD1 and MD3 cohorts were analyzed to generate MD1_AMY and MD3_AMY. Details of the five patient cohorts and microarray studies are available in Additional file 1: Table S1. In each patient cohort, MDD patients were matched to control patients by three demographic variables: age, sex and race. Case-control paired design is proven to increase statistical power significantly, especially for expensive experiments that inhibit large sample size. This is exactly the case in most brain disorder studies using post-mortem samples. Three additional clinical variables (alcohol dependence, evidence of taking anti-depressant drugs and death by suicide) and two technical variables (pH level of brain tissues and post-mortem interval PMI) were also available for each patient. Among the covariates described above, six variables (age, alcohol, drug, suicide, pH and PMI) were considered potential confounders in the DE gene detection of MDD. These six covariates were not highly correlated with each other in our analysis and thus the collinearity issue does not exist in the linear models below (see Additional file 1: Table S2).
Data pre-processing, gene matching and gene filtering
Microarray images were scanned and summarized by manufacturers' defaults. Data from Affymetrix arrays were processed by RMA method and data from Illumina were processed by manufacturer's software for probe analysis. When samples in each study were processed in multiple batches, potential batch effects were evaluated and samples were normalized to have the same sample means and sample standard deviations to correct batch biases when necessary. Probes (or probe sets) were then matched to official gene symbols using Bioconductor packages. When multiple probes (or probe sets) matched to an identical gene symbol, the probe that presented the greatest inter-quartile range (IQR) was selected to represent the target gene symbol. Larger IQR represents greater variability (and thus greater information content) in the data and this probe matching method has been recommended in Bioconductor [29]. After genes were matched across five studies, 16,715 unique gene symbols were available across all five studies and intensities were all log-transformed (base 2). Two sequential steps of gene filtering were then performed. In the first step, we filtered out genes with very low gene expression that were identified with small average expression values across majority of studies. Specifically, mean intensities of each gene across all samples in each study were calculated and the corresponding ranks were obtained. The sum of such ranks across five studies of each gene was calculated and genes with the lowest 30% rank sum were considered un-expressed genes (i.e. small expression intensities) and were filtered out. Similarly, in the second step, we filtered out non-informative (small variation) genes by replacing mean intensity in the first step with standard deviation. Genes with the lowest 40% rank sum of standard deviations were filtered out. Additional file 1: Figure S1 shows the pre-processing diagram and the number of genes remained in each pre-processing step. Finally, 7,020 matched genes (16715 × 0.7 × 0.6 = 7020) in five studies were analysed. We note that the somewhat ad hoc gene filtering procedure is necessary and is commonly used in microarray analysis. Although it runs the risk to ignore important DE genes, it has benefits of reducing false positives from non-expressed or non-informative genes and increasing statistical power in multiple comparison procedure. Filtering down to ~7000 genes is moderate and adequate since the size usually keeps majority of acting genes of the genome in the analysis.
Single study analysis for DE gene detection
Paired t-test Paired t-test was performed as a comparison to the method we developed below. This method took into the MDD and control paired design into consideration but ignored the confounding covariates. We also tested non-parametric Wilcoxon signed rank test. The results were similar to paired t-test and thus not shown in the result.
In the model, Y_{ gik } was the gene expression value of gene g (1 ≤ g ≤ G) and disease status i (i = 1 for control and 2 for MDD) in pair k (1 ≤ k ≤ K). X_{ 0ik } was the disease label that took value one if the sample was MDD and zero if the sample was a control. X_{ lik } represented values for potential confounding covariate l (1 ≤ l ≤ 6; 0-1 binary variables for alcohol, drug and suicide; numerical variables for age, pH and PMI). α_{ k } was the random intercept from a normal distribution with mean zero and variance τ_{ g }^{ 2 }, which represented the deviation of averaged expression values in the k^{th} pair from the average in the whole population. Finally, ε_{ gik } were independent random noises that followed a normal distribution with mean zero and variance σ_{ g }^{ 2 }. Under this model, β_{ g0 } was the disease effect of gene g and was the parameter of major interest. To obtain an MDD-associated biomarker candidate list in a single study analysis, likelihood ratio test (LRT) was used to calculate the p-values of testing H_{0}: β_{ g0 } = 0 (vs H_{A}: β_{ g0 } ≠ 0). We denote this method as RIM_ALL (random intercept model containing all covariates) as opposed to the models with variable selection in the next section. The p-values from RIM_ALL were then corrected by Benjamini-Hochberg procedure [30] for multiple comparison to control false discovery rate (using "p.adjust" function in R).
RIM and FEM with variable selection Although RIM model can effectively adjust for confounding covariates in DE gene detection, the small sample size (10-25 pairs) and relatively high number of potential confounders (6 covariates) can make the model inefficient and impractical. In this paper, we developed and evaluated two choices of variable selection procedures in the random intercept model (namely, RIM_BIC and RIM_minP). Specifically, all possible RIM models that included at most two (0, 1 or 2) clinical variables were computed and compared. In RIM_BIC, the model with the smallest Bayesian Information Criterion (BIC) [31] value was selected. For RIM_minP, we selected the model that yielded the smallest p-value associated with the likelihood ratio test for testing the disease effect H_{0}: β_{ g0 } = 0. Conceptually, BIC selected the model with the best overall model fitting and prediction while minP focused on the model that gave the best statistical significance of the disease effect β_{ g0 }. This additional variable selection avoided to include more than 2 clinical variables in the model and allowed assessment of biomarkers affected by different sets of covariates in each gene (e.g. disease effect of gene A may be confounded by alcohol and age while gene B may be confounded by antidepressant drug), which biologically gave more appealing interpretations and conclusions. Similar to RIM model, the likelihood ratio test was used to generate p-values of testing H_{0}: β_{ g0 } = 0 in each gene for the selected model by BIC or minP. The resulting p-values from the LRT were, however, biased from the variable selection procedure and the type I error control was invalid. As a result, we performed a permutation test that randomly permuted the disease labels within each case-control pair to generate a null distribution for p-value assessment. Additional file 1: Figure S2 shows the simulated null distribution from permutation analysis. The skewed distribution deviating from uniform distribution between 0 and 1 showed the need of the permutation analysis for p-value correction. Similar to RIM_ALL, the p-values from RIM_minP and RIM_BIC are corrected by Benjamini-Hochberg procedure for multiple comparisons separately. Detailed algorithm of the permutation analysis and inference is described in Additional file 1: Part I.
where the notations were the same as in the FEM model and RIM model but now with only one covariate (i.e. a given variable l) included and a corresponding interaction term involved. We performed likelihood ratio test for H_{0}: γ_{ gl } = 0 to test the statistical significance of the interaction term of gene g and covariate l. Additional file 1: Table S3 summarizes the number of significant interaction terms in the genome of each covariate. The result shows that the interaction terms between each covariate (Age, pH or PMI) and MDD disease effect were not significant in most of the genes under false discovery rate FDR = 5% (Benjamini-Hochberg correction). As a result, we did not consider the interaction terms in our models throughout this paper.
Meta-analysis for DE gene detection
Among the many microarray meta-analysis methods used in the literature, all methods have their pros and cons depending on the data structure and biological goal [11, 34]. According to Birnbaum [35], Li and Tseng [36], and Tseng et al. [12], meta-analysis methods can be categorized into two types: one detects gene markers differentially expressed "in one or more studies" and the other detects genes differentially expressed in "all studies". Fisher's method and maxP method are two popular methods in the two categories, respectively, and we will apply both methods in this paper. Fisher's method calculates the sum of log-transformed p-vales in the meta-analysis: V_{ g }^{ Fisher } = - 2Σ_{ k }^{ K } = _{1}log(Pgk) where K is the number of studies combined and P_{ gk } is the p-value of gene g and study k. Assuming independence among studies and that the model to derive p-values is correctly specified, V_{g}Fisher follows a chi-squared distribution with degree of freedom 2K under the null hypothesis. In contrast, the maxP method combines p-values by taking the maximum of p-values across studies: V_{ g }^{ maxP }= max_{1 ≤ k ≤ kPgk}V_{ g }^{ maxP }. It can be show that V_{ g }^{ maxP } follows a beta distribution with shape parameters K and 1 under the null hypothesis. Note that in Fisher's method, an extremely small p-value of one study can drive strong statistical significance of that gene no matter whether p-values of the other studies are small or not. On the other hand, the maxP method requires small p-values of all studies for a gene to be detected.
Although the parametric inference by chi-squared and beta distributions is convenient, we chose to perform permutation analysis for the inference to avoid violation of underlying assumptions. We also noted that two pairs of studies (MD1_ACC and MD1_AMY, MD3_ACC and MD3_AMY) used the same cohorts with different brain regions. The studies may be dependent from each other. To account for the dependence structure among studies from the same patients, we kept the same permutation order for each pair of studies of the same cohort in the permutation analysis of individual studies. Benjamini-Hochberg procedure implemented in R is then used to control false discovery rate.
Evaluation by detection compentency and pathway analysis
For evaluation purpose, we compared three single-study DE gene detection methods (PT, RIM_minP and RIM_BIC). Each method was applied to five single studies and then the five single study results were combined by Fisher and maxP meta-analysis methods. As a result, a total of 3 × 7 = 21 DE gene detection results were generated and compared. To assess performance of these different methods, we applied two evaluation criteria. The first criterion compared the numbers of detected DE genes from different methods under different p-value thresholds using detection competency curves (x-axis: p-value threshold; y-axis: number of detected DE genes). This first criterion is a reflection of detection competency of different methods.
The detection competency curves alone, however, do not guarantee an improved biological finding. Thus, we evaluated the biological associations of detected DE genes from different methods with existing pathway knowledge databases in the second criterion. We collected 2,287 pathways from MsigDB [37] http://www.broadinstitute.org/gsea/msigdb/; 1,454 pathways from Gene ontology, 217 from Biocarta, 186 from KEGG and 430 from Reactome). Kolmogorov-Smirnov (KS) test was applied for each DE gene result and each pathway for pathway analysis (a.k.a. gene set analysis) [27, 37]. KS test improved a commonly used Fisher's exact test in that it does not arbitrarily apply a DE gene cut-off but directly evaluate the DE evidence ordering in the genome. For the DE detection result of a given method, a smaller p-value from KS test reflects a stronger biological validation of the detected DE genes. The detailed KS test algorithm and inference are described in the Additional file 1: Part II.
Since 2,287 pathways were tested, aggregating the total information to conclude one method being better than the other was not a trivial task. We denote by p_{ rm } the pathway enrichment p-value of method m for pathway r. Since the majority of the 2,287 pathways were irrelevant to MDD, we first identified the top V most "disease-related" pathways by committee decision of the 21 DE gene detection results under comparison. Specifically, disease relatedness of a pathway r in method m was derived as the rank of p-values ${s}_{r}^{\left(m\right)}=ran{k}_{r}\left({p}_{rm}\right)$. The disease relatedness score of pathway r was then defined as ${S}_{r}={\sum}_{m=1}^{M}{s}_{r}^{\left(m\right)}$, where M = 21 was the total number of DE gene detection methods under comparison. A small S_{ r } reflected an overall significant pathway enrichment p-value for pathway r in the 21 DE gene detection results and thus was believed to be a disease related pathway. The V pathways with the smallest S_{ r } are selected as surrogates of "gold standard" disease-related pathways for evaluation purpose. For any two selected DE gene detection results, Wilcoxon signed rank test can be used to compare pathway enrichment p-values of the V selected pathways to determine if one method is statistically better than the other method, in terms of association with the V disease-related pathways. In this paper, we use V = 100.
Post hoc analysis on confounding variables after meta-analysis
An important advantage of the variable selection scheme is the availability of post hoc analysis to compare selected confounders across studies. Three questions can be explored and answered: (1) Which variable(s) is the most or least frequently included in the model selection to confound with disease effect? (2) Are variables repeatedly selected across studies more frequently than by chance (e.g. alcohol is selected in most or all studies in a given gene)? (3) Are the directions of effect sizes of a variable consistent across studies (e.g. alcohol-dependent patients have higher expression than non-alcohol-dependent in most studies for a given gene)?
For the first question, we first generated a list of DE genes under a given FDR threshold and counted the frequency of each variable being selected in the gene list. The variables were ranked according to the frequencies in each study and a rank average of each variable was calculated across five studies. A small averaged rank of a covariate showed frequent appearance of the variable in the model selection and thus a frequent confounder.
The direction of covariates effect in RIM_minP (Table 1A) and RIM_BIC (Table 1B) models for 9 MDD related genes selected from the literature
S | Age | Alcohol | Antidep | pH | PMI | Suicide | Co-appearance T_{1} | Concordance T_{2} | Ratio R = T_{2}/T_{1} | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | ||||
VGF | 0 | 0 | -1 | 0 | -1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 7 | 7 | |
SST | 0 | 0 | -1 | 0 | -1 | 1 | 1 | 0 | 1 | 0 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 7 | 7 | |
CNP | 0 | 1 | 0 | 1 | 0 | -1 | 0 | 0 | 1 | 0 | -1 | 0 | 1 | 0 | -1 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 1 | 0 | 0 | 5 | 2 | |
NPY | 0 | -1 | -1 | 0 | -1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 1 | 1 | 10 | 7 | |
TAC1 | 0 | -1 | -1 | 0 | -1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 1 | 0 | 1 | 7 | 5 | |
MBP | 0 | 0 | 0 | 1 | 1 | 0 | -1 | 0 | 1 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | -S1 | 0 | 1 | 0 | -1 | 5 | 2 | |
MOBP | 0 | 0 | 0 | 0 | 1 | 0 | -1 | 1 | 0 | 0 | -1 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | -1 | 0 | 1 | -1 | -1 | 8 | 4 | |
RGS4 | -1 | 0 | 0 | 0 | -1 | 0 | 1 | -1 | 1 | 0 | 1 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 6 | 3 | |
HTR2A | 0 | -1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | -1 | -1 | 0 | -1 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 5 | 3 | |
Total | T_{1} = 60 | T_{2} = 40 | R = T_{2}/T_{1} = 0.67 | ||||||||||||||||||||||||||||||
p-value | 0.39 | 0.014 | |||||||||||||||||||||||||||||||
Age | Alcohol | Antidep | pH | PMI | Suicide | Co-appearance T_{1} | Concordance T_{2} | Ratio R = T_{2}/T_{1} | |||||||||||||||||||||||||
A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | 1 | 1 | ||
VGF | 0 | 0 | -1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 7 | 7 | |
SST | -1 | 0 | -1 | -1 | -1 | 0 | 1 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 3 | 3 | |
CNP | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | -1 | 0 | 7 | 7 | |
NPY | -1 | 0 | -1 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 6 | 6 | |
TAC1 | -1 | 0 | -1 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
MBP | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 3 | 3 | |
MOBP | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 3 | 3 | |
RGS4 | -1 | 0 | 0 | -1 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | |
HTR2A | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | -1 | -1 | 0 | 0 | 0 | 0 | -1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
Total | T_{1} = 32 | T_{2} = 32 | R = T_{2}/T_{1} = 1 | ||||||||||||||||||||||||||||||
p-value | 0.011 | 0.005 |
Number of detected DE genes using different single study analysis methods (PT, RIM_ALL, RIM_minP and RIM_BIC) in the five individual studies and by two meta-analysis methods (Fisher and maxP)
method | FDR | Individual analysis | Meta-analysis (3ACC+2AMY) | |||||
---|---|---|---|---|---|---|---|---|
MD1_ACC | MD2_ACC | MD3_ACC | MD1_AMY | MD3_AMY | Fisher | maxP | ||
RIM_minP | FDR = 0.05 | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
FDR = 0.1 | 0 | 0 | 2 | 0 | 725 | 109 | 99 | |
FDR = 0.15 | 0 | 0 | 5 | 0 | 1442 | 810 | 683 | |
RIM_BIC | FDR = 0.05 | 0 | 0 | 0 | 0 | 101 | 0 | 0 |
FDR = 0.1 | 0 | 0 | 1 | 0 | 506 | 0 | 0 | |
FDR = 0.15 | 0 | 0 | 6 | 0 | 873 | 38 | 0 | |
RIM_ALL | FDR = 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
FDR = 0.1 | 0 | 3 | 0 | 0 | 1 | 0 | 1 | |
FDR = 0.15 | 0 | 3 | 1 | 0 | 1 | 0 | 1 | |
PT | FDR = 0.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
FDR = 0.1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
FDR = 0.15 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
Simulation
Results
Recommended statistical framework
Comparison of various methods in single study analysis
Conclusion Incorporation of potential confounding covariates with variable selection and considering paired design in the model of single study analysis generally increased the detection competency of disease related biomarkers.
Comparing individual study analysis and meta-analysis
Smaller sample size in each study often results in a smaller statistical power of DE gene detection. In Table 2, the first five columns show the number of biomarkers detected by RIM_minP, RIM_BIC and PT under different false discovery rate (FDR) thresholds. After multiple comparison correction by Benjamini-Hochberg procedure, the first four single study analyses detected almost none DE genes under FDR = 5, 10 or 15%. This motivated us to perform meta-analysis to increase statistical power and provide validated findings on DE gene detection. In Table 2, the last two columns show the number of biomarkers detected by Fisher method and maxP method, respectively. The two meta-analysis methods detected more biomarkers than individual study analysis based on RIM_minP except for MD3_AMY.
Comparing fisher and maxP
In the literature, many microarray meta-analysis methods have been proposed and compared [11, 34, 38]. As was discussed in the method section, different methods have different strength for detecting different types of differentially expressed genes. In Li and Tseng [8], genes that are differentially expressed in all studies were termed as HS_{A} type (hypothesis setting A) while genes differentially expressed in at least one study were called HS_{B} type. Among the three methods compared in this paper, maxP were methods that detect HS_{A} type DE genes, while Fisher's method detected HS_{B} type DE genes. The result showed that the two meta-analysis methods detected different sets of DE genes, suggesting different algorithms and assumptions behind the methods. Additional file 1: Figure S3 shows heatmaps on genes detected by Fisher alone (Additional file 1: Figure 3A), maxP alone (Additional file 1: Figure S3B) or both (Additional file 1: Figure S3C). In Additional file 1: Figure S3A, majority of DE genes detected by Fisher but not by maxP were dominated by strong differential expression in one or two studies (many in MD3_AMY and some in MD2_ACC or MD3_ACC). Although Fisher's method has been popularly applied in the microarray meta-analysis literature, the result showed its weakness to be dominated by single strong signal studies that included potential false positives. For example, Fisher's method detected 810 DE genes among which 445 DE genes (about 55%) could also be detected in MD3_AMY) while only 169 (about 24%) among 683 DE genes detected by maxP method could be detected in MD3_AMY. On the other hand, maxP had better statistical power to detect many genes with weak DE evidence in all studies (Additional file 1: Figure 3B) that Fisher's method cannot detect. Conceptually, we were more interested in identifying genes differentially expressed across all studies through maxP.
Post hoc analysis for confounding covariates
Frequency of covariates appearing in RIM_minP model selection among 683 DE genes detected by maxP method under threshold FDR = 15%
MD1_ACC | MD2_ACC | MD3_ACC | MD1_AMY | MD3_AMY | Rank average | |
---|---|---|---|---|---|---|
Age | 142(5) | 213 (4) | 205 (4) | 173 (3) | 218 (3) | 3.6 |
Alcohol | 299 (2) | 279 (2) | 221 (3) | 368 (1) | 195 (4) | 2.4 |
Antidepressant | 348 (1) | 119 (6) | 271 (2) | 346 (2) | 362 (1) | 2.4 |
pH | 208 (3) | 150 (4) | 116 (6) | 108 (6) | 86 (6) | 5 |
PMI | 93 (6) | 120 (5) | 141 (5) | 133 (5) | 120 (5) | 5.2 |
Suicide | 150 (4) | 325 (1) | 340 (1) | 149 (4) | 322 (2) | 2.4 |
To further explore effects of covariates, we analysed a set of 9 genes that have been previously associated with MDD in the literature (see Table 1A and 1B). Intuitively, we expected that a covariate should be included in the model across studies more frequently than by random and effects of a covariate should have consistent differential expression direction across studies. We constructed two hypothesis testing using the co-appearing statistics T_{ 1 } and concordant ratio statistics R described in the Method section and performed the tests on the set of 9 MDD-related genes. The result showed weak to marginal statistical significance of the first hypothesis (p = 0.397 based on RIM_minP and p = 0.011 based on RIM_BIC), suggesting covariates were consistently selected across studies by RIM_BIC but not RIM_minP. For the second hypothesis, tests for both 9 MDD gene list was statistically significant (p = 0.014 and 0.005). The effects sizes of selected confounding variables have concordant direction across studies. For example, age was found a confounding variable in gene NPY and TAC1 in three out of five studies and the effect sizes were all negative (in log scale). The moderate statistical significance is reasonable since the hypothesis tests were performed only on the nine selected genes. This result demonstrated that covariates overall impacted gene expression changes consistently and confounded with disease effects among the 9 MDD-related gene list.
Simulation results
Evaluation of t-test, FEM_minP, FEM_BIC and FEM_ALL methods by simulations
Type I error (s.e.) | Power (%) (s.e) | Number of DE genes (s.e) | # of variables in Z selected | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scenario | A | B | C | D | A | B | C | D | A | B | C | D | B | C |
I | 0.051 (.001) | 0.046 (.001) | 0.049 (.001) | 0.051 (.001) | 67.9 (.006) | 72.9 (.007) | 74.6 (.006) | 69.7 (.006) | 12.5 (1.03) | 20.4 (1.11) | 23.3 (1.04) | 17.6 (1.09) | 0.97/1.97* | 0.78/1.21* |
II | 0.051 (.001) | 0.052 (.001) | 0.050 (.001) | 0.051 (.001) | 93.8 (.003) | 92.9 (.003) | 92.5 (.003) | 85.0 (.005) | 73.4 (.85) | 73.0 (.92) | 69.8 (.96) | 49.7 (1.37) | 1.7 | 0.59 |
III | 0.051 (.001) | 0.053 (.001) | 0.051 (.001) | 0.051 (.001) | 93.8 (.003) | 92.5 (.004) | 91.6 (.004) | 85.1 (.005) | 71.8 (.93) | 68.3 (.94) | 66.5 (.88) | 45.8 (1.05) | 1.8 | 0.6 |
Discussion
In this paper, we described a statistical approach adjusted for confounding variables (i.e. a random intercept model with variable selection), to tackle weak signal expression profiles that have small sample size, case-control paired design and confounding covariates in each study. The results showed increased statistical power from confounding variable adjustment, paired design modelling and meta-analysis in this genomic setting and improved biological findings and interpretations have been discovered in MDD neurobiology. Pathway analysis and post hoc analysis of variable selection revealed insightful biological conclusions. Simulations under three correlation structures were performed to verify improved performance of the proposed model. In the literature, most psychiatric disease related microarray studies of similar design either ignored the clinical variables or applied simple linear regression to include all variables in the model. Our simulations clearly show limits to those two approaches. Our approach systematically considers the critical elements in the data structure in order to obtain more accurate DE gene and pathway detection. The framework is general and can be applied to microarray meta-analysis of other complex diseases with similar data structure. Specifically, this approach will be of great use in human post-mortem studies of the brain, where confounding factors are intrinsic (1) to the nature of the cohorts (demographic parameters), (2) to their method of collection (post-mortem interval) and (3) to the illness per se (clinical heterogeneity). Since dilution of expression signal is likely to occur in complex tissue such as the brain, DE genes often show small and weak effects and eliminating the effects of confounding factors is critical to detect disease associated markers.
In the variable selection of the RIM model, we tested both BIC and minP approaches. The real data analysis showed that minP seemed to identify more DE genes in the MDD example while simulations showed similar performance and statistical power of the two methods. Another potential alternative is to apply popular regularization and shrinkage methods, such as Lasso or ridge regression, in the variable selection. A prohibitive down-side of such approaches is its expensive computational load for genome-wide analysis, particularly in the estimation of the tuning parameters. In our analysis, BIC and minP procedures were limited to up to two covariates in the model, which balanced well in biological interpretation and computation feasibility.
The goal of this study was to determine optimal analytical approaches for complex datasets with multiple putative confounding variables. For this purpose, we focused on datasets produced by a single laboratory, in order to avoid additional confounding factors due to differences in laboratory protocols, brain bank collection, tissue treatment and sample handling. Now that we have established such analytical guidelines, the next step will be to increase the scope of meta-analyses by including additional datasets that are progressively made available in the literature. However, as expected, this also comes with added variability, which necessitates the development of complementary mathematical tools. For instance, we have designed a data-driven "meta-QC" quality control approach to rigorously assess the quality of microarray datasets to be combined [39]. The quality control test is critical to assess whether the inclusion of additional datasets will increase the analytical power, or be detrimental to the meta-analysis. Finally, as briefly elucidated in this report, mechanisms underlying neurological and neuropsychiatric disorders are likely to involve a distributed sets of brain regions linked in functional neural networks. The detection of molecular pathologies associated with those disorders will thus also critically depend on a priori hypotheses for converging or opposing effects in selected brain regions, for the presence (or not) of control brain regions. For instance, genetic risk factors may be hypothesized to similarly affect biological pathways across brain regions, while compensatory mechanisms leading to pathological dysfunction may display regional specificity, depending on the respective activation or inhibition of different components of neural networks. Hence, the biological impact of the studies performed here will be investigated, validated and discussed more in-depth elsewhere.
The studies combined in this paper have significant cohort features that may introduce significant heterogeneity. The five studies came from three distinct cohorts (MD1, MD2 and MD3), different sexes (male and female), array platforms (Affymetrix and Illumina) and brain regions (ACC and AMY). Future research is currently pursued to decipher such study-specific features. A future direction is to collect more studies and apply meta-regression techniques to identify sex-specific or brain-region-specific genes in a unified meta-analysis.
Declarations
Acknowledgements
The authors would like to thank David A. Lewis for access to brain specimen and Allan Sampson for discussion. This research was supported in part by Computational Resources on PittGrid http://www.pittgrid.pitt.edu. This work was support by National Institute of Health (NIH) grants MH094862, MH077159 and MH084060.
Authors’ Affiliations
References
- Tan PK, Downey TJ, Spitznagel EL Jr, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC: Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 2003, 31(19):5676. 10.1093/nar/gkg763PubMed CentralView ArticlePubMedGoogle Scholar
- Ein-Dor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 2005, 21(2):171–178. 10.1093/bioinformatics/bth469View ArticlePubMedGoogle Scholar
- Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D, Wang C, Guo Z: Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinformatics 2009, 25(13):1662–1668. 10.1093/bioinformatics/btp295PubMed CentralView ArticlePubMedGoogle Scholar
- Choi JK, Yu U, Kim S, Yoo OJ: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 2003, 19(Suppl 1):i84-i90. 10.1093/bioinformatics/btg1010View ArticlePubMedGoogle Scholar
- Marot G, Foulley JL, Mayer CD, Jaffrezic F: Moderated effect size and P-value combinations for microarray meta-analyses. Bioinformatics 2009, 25(20):2692–2699. 10.1093/bioinformatics/btp444View ArticlePubMedGoogle Scholar
- Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res 2002, 62(15):4427–4433.PubMedGoogle Scholar
- Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA 2004, 101(25):9309–9314. 10.1073/pnas.0401994101PubMed CentralView ArticlePubMedGoogle Scholar
- Li J, Tseng GC: An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Annals of Applied Statistics 2011. accepted acceptedGoogle Scholar
- DeConde RP, Hawley S, Falcon S, Clegg N, Knudsen B, Etzioni R: Combining results of microarray experiments: a rank aggregation approach. Stat Appl Genet Mol Biol 2006., 5: Article15 Article15Google Scholar
- Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, Chory J: RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics 2006, 22(22):2825–2827. 10.1093/bioinformatics/btl476View ArticlePubMedGoogle Scholar
- Ramasamy A, Mondry A, Holmes CC, Altman DG: Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med 2008, 5(9):e184. 10.1371/journal.pmed.0050184PubMed CentralView ArticlePubMedGoogle Scholar
- Tseng GC, Ghosh D, Feingold E: Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Research 2011, in press.Google Scholar
- Sibille E, Wang Y, Joeyen-Waldorf J, Gaiteri C, Surget A, Oh S, Belzung C, Tseng GC, Lewis DA: A molecular signature of depression in the amygdala. Am J Psychiatry 2009, 166(9):1011–1024. 10.1176/appi.ajp.2009.08121760PubMed CentralView ArticlePubMedGoogle Scholar
- Sequeira A, Gwadry FG, Ffrench-Mullen JM, Canetti L, Gingras Y, Casero RA Jr, Rouleau G, Benkelfat C, Turecki G: Implication of SSAT by gene expression and genetic variation in suicide and major depression. Arch Gen Psychiatry 2006, 63(1):35–48. 10.1001/archpsyc.63.1.35View ArticlePubMedGoogle Scholar
- Kang HJ, Adams DH, Simen A, Simen BB, Rajkowska G, Stockmeier CA, Overholser JC, Meltzer HY, Jurjus GJ, Konick LC, et al.: Gene expression profiling in postmortem prefrontal cortex of major depressive disorder. J Neurosci 2007, 27(48):13329–13340. 10.1523/JNEUROSCI.4083-07.2007PubMed CentralView ArticlePubMedGoogle Scholar
- Aston C, Jiang L, Sokolov BP: Transcriptional profiling reveals evidence for signaling and oligodendroglial abnormalities in the temporal cortex from patients with major depressive disorder. Mol Psychiatry 2005, 10(3):309–322. 10.1038/sj.mp.4001565View ArticlePubMedGoogle Scholar
- Sibille E, Arango V, Galfalvy HC, Pavlidis P, Erraji-Benchekroun L, Ellis SP, John Mann J: Gene expression profiling of depression and suicide in human prefrontal cortex. Neuropsychopharmacology 2004, 29(2):351–361. 10.1038/sj.npp.1300335View ArticlePubMedGoogle Scholar
- Belmaker RH, Agam G: Major depressive disorder. N Engl J Med 2008, 358(1):55–68. 10.1056/NEJMra073096View ArticlePubMedGoogle Scholar
- Nestler EJ, Barrot M, DiLeone RJ, Eisch AJ, Gold SJ, Monteggia LM: Neurobiology of depression. Neuron 2002, 34(1):13–25. 10.1016/S0896-6273(02)00653-0View ArticlePubMedGoogle Scholar
- Iwamoto K, Kakiuchi C, Bundo M, Ikeda K, Kato T: Molecular characterization of bipolar disorder by comparing gene expression profiles of postmortem brains of major mental disorders. Mol Psychiatry 2004, 9(4):406–416. 10.1038/sj.mp.4001437View ArticlePubMedGoogle Scholar
- Tochigi M, Iwamoto K, Bundo M, Sasaki T, Kato N, Kato T: Gene expression profiling of major depression and suicide in the prefrontal cortex of postmortem brains. Neurosci Res 2008, 60(2):184–191. 10.1016/j.neures.2007.10.010View ArticlePubMedGoogle Scholar
- Park T, Yi SG, Shin YK, Lee S: Combining multiple microarrays in the presence of controlling variables. Bioinformatics 2006, 22(14):1682–1689. 10.1093/bioinformatics/btl183View ArticlePubMedGoogle Scholar
- Mistry M, Pavlidis P: A cross-laboratory comparison of expression profiling data from normal human postmortem brain. Neuroscience 2010, 167(2):384–395. 10.1016/j.neuroscience.2010.01.016PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher R: Statistic Methods for Research works. 4th edition. Edinburgh: Oliver and Boyd; 1932.Google Scholar
- Fisher R: Combining independent tests of significance. American Statistician 1948, 2(5):30. 10.2307/2681650Google Scholar
- Wilkinson B: A statistical consideration in psychological research. Psychol Bull 1951, 48(3):156–158.View ArticlePubMedGoogle Scholar
- Shen K, Tseng GC: Meta-analysis for pathway enrichment analysis when combining multiple genomic studies. Bioinformatics 2010, 26(10):1316–1323. 10.1093/bioinformatics/btq148PubMed CentralView ArticlePubMedGoogle Scholar
- Lu S, Li J, Song C, Shen K, Tseng GC: Biomarker detection in the integration of multiple multi-class genomic studies. Bioinformatics 2010, 26(3):333–340. 10.1093/bioinformatics/btp669PubMed CentralView ArticlePubMedGoogle Scholar
- Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (Eds): Bioinformatics and Computational Biology Solutions Using R and Bioconductor: Springer; 2005.Google Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate - a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B-Methodological 1995, 57(1):289–300.Google Scholar
- Schwarz GE: Estimating the dimension of a model. Annals of Statistics 1978, 6(2):461–464. 10.1214/aos/1176344136View ArticleGoogle Scholar
- Glorioso C, Oh S, Douillard GG, Sibille E: Brain molecular aging, promotion of neurological disease and modulation by Sirtuin5 longevity gene polymorphism. Neurobiol Dis 41(2):279–290.
- Glorioso C, Sibille E: Between destiny and disease: Genetics and molecular pathways of human central nervous system aging. Prog Neurobiol 93(2):165–181.
- Hong F, Breitling R: A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments. Bioinformatics 2008, 24(3):374–382. 10.1093/bioinformatics/btm620View ArticlePubMedGoogle Scholar
- Birnbaum A: Combining independent tests of significance. J Am Stat Assoc 1954, 49: 559–574. 10.2307/2281130Google Scholar
- Li J, Tseng GC: An adaptively weighted statistic for detecting differential gene expression when combining multiple transcriptomic studies. Ann Appl Stat 2011, 5(2):994–1019. 10.1214/10-AOAS393View ArticleGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005, 102(43):15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Campain A, Yang YH: Comparison study of microarray meta-analysis methods. BMC Bioinformatics 11: 408.
- Kang DD, Sibille E, Kaminski N, Tseng GC: MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis. Nucleic Acids Research 2011, in press.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.