An efficient genome-wide association test for multivariate phenotypes based on the Fisher combination function

Background In genome-wide association studies (GWAS) for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including the multivariate analysis of variance (MANOVA), the principal component analysis (PCA), the generalizing estimating equations (GEE), the trait-based association test involving the extended Simes procedure (TATES), and the classical Fisher combination test. We propose a new method that relaxes the unrealistic independence assumption of the classical Fisher combination test and is computationally efficient. To demonstrate applications of the proposed method, we also present the results of statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data. Results Our simulation study shows that the proposed method has higher power than existing methods while controlling for the type I error rate. The GEE and the classical Fisher combination test, on the other hand, do not control the type I error rate and thus are not recommended. In general, the power of the competing methods decreases as the correlation between phenotypes increases. All the methods tend to have lower power when the multivariate phenotypes come from long tailed distributions. The real data analysis also demonstrates that the proposed method allows us to compare the marginal results with the multivariate results and specify which SNPs are specific to a particular phenotype or contribute to the common construct. Conclusions The proposed method outperforms existing methods in most settings and also has great applications in GWAS on complex diseases with multiple phenotypes such as the substance abuse disorders.


Background
In the past decade, genome-wide association studies (GWAS) have produced rich single-nucleotide polymorphism (SNP) data available to researchers. Among them, the large scale studies including the HapMap project [1] and the 1000 Genomes project [2] have provided publicly accessible databases of reference ancestral populations for imputation and quality control purposes. The idea of GWAS is to conduct fast SNP-based association tests to scan the whole genome using case-control samples.
*Correspondence: jjyang@umich.edu 1 School of Nursing, University of Michigan, Ann Arbor, Michigan Full list of author information is available at the end of the article Yet, many complex diseases such as mental health disorders may have multiple phenotypic traits with continuous outcomes [3]. This pleiotropy in complex traits [4] provides several potential advantages to the direct modeling of pleiotropic associations. First, a model search for loci that are simultaneously associated with multiple phenotypes would likely have higher power than a model search that only considers each phenotype individually. Second, more exact modeling may yield more accurate prediction of either or both phenotypes. Third, pleiotropic genes may tend to have a more central role in the relevant functional pathways.
Existing statistical methods for complex diseases with multivariate phenotypes can be categorized into three types of approaches. The first approach is to conduct a GWAS for each marginal phenotype and then aggregate the results. The major issue with this approach is that it does not make use of the correlation structure among phenotypes. The second approach is to summarize multiple phenotypic traits into a composite score and then conduct a GWAS on the score. This approach, however, may have difficulty in identifying proper summary scores. The third approach involves multiple phenotypic traits simultaneously. Thus, it may gain power as well as avoid the issue of multiple testing. However, it is based on stronger assumptions that may not be satisfied in some practical settings.
In this study, we provide a comprehensive review of existing statistical methods for conducting GWAS on complex diseases with multiple phenotypic traits. We also propose a new statistical method based on the Fisher combination function. The performance of competing methods is evaluated by a simulation study. In order to demonstrate applications of the proposed method, we conduct statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE).

Methods
Let X i (= 0, 1, 2) be the number of reference alleles corresponding to a candidate SNP and Y i = (Y i1 , . . . , Y im ) be the measures of multiple phenotypes for the individual i. In this study, we conduct a comprehensive review of existing statistical methods that can be used to test the association between X i and Y i .

Multivariate analysis of variance (MANOVA)
When the phenotype is univariate, we can use the oneway analysis of variance (ANOVA) with three levels of the genotype for GWAS. When we have correlated multivariate phenotypic traits, the natural extension of the one-way ANOVA is the one-way multivariate analysis of variance (MANOVA) [5]. Similar to ANOVA, MANOVA tests the equality of mean phenotypic vectors by comparing the within genotypes and between genotypes variancecovariance matrices. The strength of MANOVA is that the multivariate normal distribution provides many good statistical properties for testing and estimation [6]. However, in practice, multivariate phenotype data are very unlikely to meet the multivariate normal assumption. Furthermore, MANOVA is most powerful when the phenotypes are negatively correlated and yet this situation is also unlikely in practice, especially when the number of phenotypes is larger than 2. With respect to its relevant applications, this method has been used in GWAS on dose-response [7] and facial morphology [8].

Principal component analysis (PCA)
The principal component analysis (PCA) [9,10] is another classical statistical method for multivariate analysis. The primary objective of PCA is to find a small set of linear combinations of the original variables (i.e. principal components) that account for the most variability in the original variables. Thus, it can be employed to reduce the dimension of multivariate phenotypes. The PCA has been used in gene-based studies to increase the power of statistical testing [11,12]. Furthermore, He et al. [13] has used PCA to combine four highly correlated obesity phenotypes for a whole genome linkage scan. When the phenotypes are highly correlated, the first principal component (corresponding to the largest eigenvalue) contains most information about the phenotype data. Thus, testing the association between a SNP and the first principal component is a commonly adopted approach to effectively change the multivariate setting associated with multiple phenotypes in GWAS to the univariate setting (e.g. Zhang et al. [14] and Karasik et al. [15]). In this study, we investigate the statistical properties of this approach.

Generalized estimating equations (GEE)
The method of generalized estimating equations (GEE) [16] was developed for analyzing correlated multivariate outcomes primarily from longitudinal studies. It can be applied to test the association between a candidate SNP and multivariate phenotypes. The GEE only requires specification of the link function and the working correlation matrix. The former depends on the measurement scale of the outcomes (e.g. the identify link for continuous outcomes). The latter assumes the correlation structure among multivariate outcomes. The estimation of GEE is usually robust against this assumption. GEE was widely used in GWAS. For example, GEE was proposed as one of the multivariate approaches in Solovieff et al. [4]. For another example, Liu et al. [17] proposed to use GEE for bivariate association analyses for the mixture of continuous and binary traits. However, to the best of our knowledge, none of the existing studies have conducted simulations to investigate whether GEE can control the type I error rate when multivariate traits are involved in GWAS. To fill in this knowledge gap, we conduct a simulation study to examine the statistical properties of this approach. In this study, we only consider the identity link because we are mainly interested in continuous phenotypic traits. We also assume the working correlation matrix to be compound symmetry because it only requires us to estimate one additional parameter.

Trait-based association test involving the extended Simes procedure
Recently, van der Sluis et al. [18] developed a trait-based association test involving the extended Simes procedure (TATES). The TATES calculates a global p-value based on individual p-values of association tests for marginal phenotypes. Specifically, for m-variate phenotypic traits, one can conduct m tests of the association between a candidate SNP and each marginal phenotypic trait and derive m p-values: p 1 , . . . p m . Let p (1) , . . . p (m) be the ordered p-values from the smallest to the largest. The Simes multiple procedure declares significance between a SNP and multivariate phenotypic traits at the α level if any of the pvalues satisfy p (j) < jα/m [19]. Hence, the global p-value based on the Simes procedure is p traits = min{mp (j) /j, j = 1, . . . , m}. The TATES improves this procedure by replacing m and j with the effective number of independent traits, m e and j e , which are estimated from the eigenvalues of the correlation matrix [20,21]. Since m e ≤ m, this new adjusted global p-value, defined as p traits = min{m e p (j) /j e , j = 1, . . . , m} is smaller than the Simes global p-value. Therefore, the TATES is more powerful than the Simes. A simulation study also showed that the TATES is more powerful than MANOVA. In this study, we conduct a comprehensive simulation study to compare this method with not only the classical methods reviewed above but also the proposed methods.

The methods based on the Fisher combination function
Combining independent tests of significance to form a join statistic has been used as an alternative approach to tackling complex multivariate location problems [22]. This approach is quite popular in practice because it is much easier to develop a univariate association test statistic than a multivariate association test statistic. Birnbaum [23] discussed various combination functions among which the Fisher combination function has been proven to be asymptotically Bahadur optimal [24,25]. Thus, we focus on the Fisher combination function in this study.

Fisher combination test with the independence assumption
Based on the Fisher combination test [22], to test the association between a SNP and multivariate phenotypic traits, we only need to test the association between the SNP and each marginal phenotypic trait individually. Thus, for mmultivariate phenotypes, we have m marginal p-values: p 1 , . . . , p m . The Fisher combination statistic is defined as ( 1 ) T is used to infer the association between the SNP and multivariate phenotypic traits. When the marginal p-values are independent, the statistic T follows a chisquared distribution with 2m degrees of freedom so the p-value of T can be obtained straightforwardly. In reality, however, the phenotypic traits are always correlated so the chi-squared distribution with 2m degrees of freedom tends to underestimate the variance of the T statistic. The resulting chi-squared test is, therefore, too liberal.

The permutation method
Because of the negative consequence of the independence assumption, it is desirable to conduct the Fisher combination test without the assumption. Ideally, we could calculate the exact p-value of the T statistic in Eq. (1) using the permutation method, which does not require the unrealistic assumption and also controls for the type I error [26]. Yet, the permutation method is a very time-consuming procedure, particularly in a genome-wide context. Thus, an improvement in computational efficiency is warranted.

The proposed efficient method
For correlated phenotypic traits, T is the sum of dependent chi-squared statistics. Brown [27] and Yang [28] have shown that, under the null hypothesis of no association between a SNP and multivariate phenotypic traits, the distribution of T statistic follows a scale chi-squared distribution (γ χ 2 v ), or equivalently, a gamma distribution with the shape parameter v/2 and the scale parameter 2γ . Therefore, to calculate the global p-value of T statistic, we only need to estimate the parameters v and γ . Suppose that the mean of T is μ and the variance of T is σ 2 . Using the first and second moments of T, the values of v and γ can be calculated as v = 2μ 2 /σ 2 and γ = σ 2 /(2μ). The following are technical details of the derivation of the mean and variance of T statistic when the marginal pvalues are based on two-sided tests (see Brown [27] and Yang [28] for the case of one-sided marginal tests): Without loss of generality, we assume that the association test statistic for the jth phenotypic trait is z j where j = 1, . . . , m. The corresponding two-sided p-value is defined as p j = 2 (−|z j |), where is the standard Gaussian distribution function. Under the null hypothesis of no association between a SNP and multivariate phenotypic traits, the distribution of T statistic is approximated by a Gaussian distribution with the mean of T as and the variance of T as Therefore, in order to calculate the variance of T, we need to calculate the covariance for each pair (j, k) which can be expressed as where F is the standard bivariate Gaussian distribution. Let Thus, δ jk is a function of the correlation between z j and z k : ρ j,k . We explore the relationship between δ jk and ρ j,k by calculating δ jk numerically for the values of ρ j,k from -0.99 to 0.99 with the step of 0.01. The results are shown in Fig. 1. Since the curve of Fig. 1 is a convex curve symmetric about the y-axis, we can approximate the relationship between δ jk and ρ j,k using a tenth-order polynomial: Using the adapt function in the R package fCopulae [29], we obtained the following estimates: c 1 = 3.9081, c 2 = 0.0313, c 3 = 0.1022, c 4 = −0.1378 and c 5 = 0.0941; the maximum residual was less than 0.0001.
To estimate δ jk in Eq. (2) accurately, we have taken two steps to remove potential biases. First, since the sample correlationρ j,k is not an unbiased estimator of ρ j,k [30], we estimate ρ j,k by the bias-corrected sample correlationr j,k : where n is the samples size used to calculateρ j,k . Now, let's define the right hand side of Eq.
(2) as The estimate f (r j,k ) is still a biased estimator of δ jk . Thus, we propose a second step to remove the bias. Using the Taylor series expansion, we can estimate the bias as Fig. 1 The relationship between the covariance δ (y-axis) and the correlation ρ (x-axis) Therefore, the proposed unbiased estimator of δ jk is Hence, based on Eqs. (3) and (4), the variance of T can be estimated as Given the proposed estimators of μ and σ 2 , the global p-value of T statistic can be computed efficiently using the gamma distribution function as follows: is the gamma distribution function with the shape parameter v/2 and the scale parameter 2γ .
In this study, we compare two alternative methods to calculateρ j,k : the Pearson sample correlation coefficient and the rank correlation coefficient of Kendall's τ . Kendall and Gibbons [31] have shown the relation between ρ and τ as ρ = sin πτ 2 .
Thus, we can use Kendall's τ to deriveρ j,k as sin πτ j,k 2 . In the simulation study, we evaluate the robustness of Kendall's τ in comparison to Pearson's sample correlation coefficient.

Simulation study
We conducted a simulation study to evaluate the performance of the four methods reviewed (MANOVA, PCA, GEE, and TATES) as well as the methods based on the Fisher combination function. In the simulation, we adopted four different approaches to calculating the p-value of the Fisher combination test:

Simulation configurations
For each subject, the relationship between the SNP, x, and the multivariate phenotypes, Y , was defined as where β is the effect of the SNP on the phenotypes and is the error term. In this simulation study, we simulated x based on a minor allele frequency of 0.25. We evaluated the performance of competing methods under three different settings of effect sizes: Null hypothesis (no effects) β = (0, 0, 0, 0, 0) . Moderate equal effect sizes β = (0.3, 0.3, 0.3, 0.3, 0.3) . Varied effect sizes β = (0.1, 0.2, 0.3, 0.4, 0.5) .
For the error term, we considered two cases. In the first case, the error term was simulated from a multivariate normal distribution with the mean 0 and the variancecovariance , which is a compound symmetry matrix with the value of 1 on the diagonal and the value of = 0, 0.25, 0.5, or 0.75 on the off-diagonal. In the second case, the error term was simulated from a mixture of two multivariate normal distributions: 90 % from the same multivariate normal distribution in the first case and 10 % from the multivariate normal distribution with the mean 0 and the variance-covariance matrix 5 . The purpose of the second case was to simulate long tailed distributions of phenotypic traits which are common in real data. We generated simulated data of 100 subjects under each configuration. In addition, each configuration was repeated 10,000 times. The nominal type I error rate was set at 0.05 and the power was calculated as the proportion of p-values less than 0.05. Table 1 presents the simulation results when the multivariate phenotypes come from a multivariate normal distribution with the value of the correlation varied from 0 to 0.75. The numbers in each cell are the mean (standard deviation) of the indicator variable for p-value < 0.05 among the 10,000 replications. The top panel corresponds to the case of β = (0, 0, 0, 0, 0) (i.e. when the null hypothesis is true) and thus can be used to evaluate if each of the competing methods was able to control the type I error.

Simulation results
The results indicate that GEE and the Fisher combination test with χ 2 2m did not control the type I error rate when > 0 while all the other methods did quite well under all values of . We, thus, did not find it meaningful to further compare these two methods with the other methods in terms of the statistical power.
The middle panel of Table 1 compares the power of competing methods under the situation that the SNP has the same level of association with each of the phenotypic traits: β = (0.3, 0.3, 0.3, 0.3, 0.3) . The power of MANOVA decreased rapidly as the correlation increased. When = 0.75, for instance, all the other methods had the power of at least 0.4 but the power of MANOVA was only 0.2. Further, PCA and TATES had higher power than MANOVA when > 0. Yet, none of these three methods can beat the three Fisher combination tests The three different effect sizes are: no effect β = (0, 0, 0, 0, 0) ; moderate effects β = (0. 3 The bottom panel of Table 1 evaluates the performance of competing methods under the situation that the strength of association between the SNP and phenotypic traits varies from 0.1 to 0.5. Similar to the previous situation, the three Fisher combination tests had almost identical performance and their power decreased as the correlation increased. Furthermore, the Fisher combination tests had higher power than the other three methods (MANOVA, PCA, and TATES) in most conditions. MANOVA only beat the Fisher when = 0.75; TATES had higher power than the Fisher when ≥ 0.5. Table 2 shows the simulation results when the multivariate phenotypes come from a mixture of two multivariate normal distributions. In comparison to the corresponding settings under the multivariate normal distributions in Table 1, all the competing methods tended to have lower power under these long tailed distributions. Yet, Table 2 demonstrates similar patterns to the ones observed in Table 1 in general.

Real data analysis The Study of Addiction: Genetics and Environment(SAGE)
The National Center for Biotechnology Information (NCBI) has been managing and distributing the large The three different effect sizes are: no effect β = (0, 0, 0, 0, 0) ; moderate effects β = (0. 3 database of Genotypes and Phenotypes (dbGaP) for scientific investigation of various human diseases [32]. In order to demonstrate the application of the proposed method, we conducted statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data [33], http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study. cgi?study_id=phs000092.v1.p1. The institutional review board of the University of Michigan has approved this secondary data analysis project (HUM00084927). The SAGE is a case-control study that aggregated together the data from three large scale studies in the substance abuse field: the Collaborative Study on the Genetics of Alcoholism (COGA), the Family Study of Cocaine Dependence (FSCD), and the Collaborative Genetic Study of Nicotine Dependence (COGEND). The total number of individuals with individual level data available is 4121. Each individual was genotyped using the Illumina Human 1M-Duo beadchip which contains over 1 million SNP markers.
We selected unrelated individuals that passed the quality control measures according to the Gene Environment Association Studies Initiative (GENEVA) quality control report. The final number of unrelated individuals is 3,741 (1,732 male, 2,079 female) and the total number of SNP markers is 917,694. Because the purpose of our analysis is to identify the genes that are associated with addiction, we used the symptomatology variables

SAGE data analysis results
The values of phenotype variables range from 0 to 7. Figure 2 shows the frequency distributions of the 4 phenotype variables. Since they are not normally distributed, we calculated their correlations using the Kendall rank correlation. Table 3 shows moderate correlations ranged from 0.34 to 0.51. For each trait, we conducted a genome-wide association test using the hurdle model [34] because of the discrete nature and excess zero values associated with the symptom counts. The hurdle regression model assumes the observed data are generated from two processes: one generates zero and the other generates positive values. Since our interest is in the severity of symptomatology, the pvalues from the positive component of the hurdle model were used for further analysis. For each phenotype of addiction, the estimated p-values are summarized using QQ-plots in Fig. 3. The diagonal straight lines have the slope 1 and intercept 0. When the curve of the p-values deviate far away from the diagonal line, it indicates that there are many SNPs significantly associated with the corresponding phenotype trait. By examining the 4 plots in Fig. 3, we obtain the following two findings: (1) For the nicotine symptoms, the p-values fall on the diagonal line  Since the four symptomatology variables are moderately correlated and they all measure the common construct of addiction, we used them as the multivariate phenotype and applied the proposed Fisher combination approach (FC-Kendall) to identify the SNPs associated with it. The QQ-plot of the p-values for this multivariate analysis is shown in Fig. 4 indicating that some SNPs are associated with addiction across substances.
To identify the SNPs associated with the phenotypes, we adopted the commonly used significance level for GWAS of 10 −6 to account for multiplicity. Based on the results of the marginal tests, the numbers of SNPs identified to be associated with the individual phenotypes are 917 for alcohol dependence symptoms, 0 for nicotine symptoms, 9 for marijuana symptoms, and 0 for cocaine symptoms. Using the proposed Fisher combination method (FC-Kendall), on the other hand, we identified 6 SNPs associated with the multivariate phenotype of addiction. Among them, 5 SNPs were also identified by the marginal test for alcohol symptoms. This implies that if we ignore the correlations among the 4 phenotypic traits and conduct the marginal tests, we would identify many SNPs that may be specific to alcohol dependence. Thus, if our goal is to identify the genes associated with the construct of addiction that contributes to the 4 types of substance dependence symptomatology, the proposed method is a better approach.

Discussion
In GWAS for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power

Multivariate Analysis
Expected − log 10 (p) Observed − log 10 (p) Fig. 4 The QQ-plot of p-values for the Fisher combination test of association between the SNPs and the multivariate phenotype of addiction. The x-axis is the expected −log 10 (p-value), and the y-axis is the observed −log 10 (p-value). The diagonal gray straight lines have the slope 1 and intercept 0 of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including MANOVA, PCA, GEE, TATES, and the classical Fisher combination test. Built upon the Fisher combination test, we proposed a new method that relaxes the unrealistic independence assumption and is also computationally efficient. Particularly, in an exploratory study where multiple sets of phenotypes may be of interest, when the set is changed, our proposed methods only require recalculation of the correlation between phenotypes and then the available marginal p-values for each SNP can be re-used. The competing methods which do not involve marginal p-values such as the PCA, MANOVA, and GEE, on the other hand, would require a complete re-analysis. We conducted a simulation study to compare the performance of the competing methods. The GEE and the Fisher combination test with the independence assumption did not control the type I error rate and thus are not recommended. In general, the power of the methods decreased as the correlation between phenotypes increased. Furthermore, all the competing methods tended to have lower power when the multivariate phenotypes come from long tailed distributions. The proposed method (with the correlation being estimated by the Pearson's sample correlation coefficient or the Kendall's τ ) performed as well as the permutation method and yet only required 10 −2 computational time. In most settings of the simulation, these three Fisher combination tests outperformed the other methods. The real data analysis also demonstrated that the Fisher combination tests allow us to compare the marginal results with the multivariate results and specify which SNPs are specific to a particular phenotype or contribute to the common construct.
In our simulation study, we only considered continuous multivariate phenotypes. Future studies may extend the methodology work to the case of correlated discrete phenotypes. For example, in the substance abuse field, many outcomes are zero-inflated count data [35] or ordinal data [36]. A future direction that is particularly challenging is how to analyze multivariate phenotypes with different measurement scales.