Family-based association analysis: a fast and efficient method of multivariate association analysis with multiple variants

Background Many disease phenotypes are outcomes of the complicated interplay between multiple genes, and multiple phenotypes are affected by a single or multiple genotypes. Therefore, joint analysis of multiple phenotypes and multiple markers has been considered as an efficient strategy for genome-wide association analysis, and in this work we propose an omnibus family-based association test for the joint analysis of multiple genotypes and multiple phenotypes. Results The proposed test can be applied for both quantitative and dichotomous phenotypes, and it is robust under the presence of population substructure, as long as large-scale genomic data is available. Using simulated data, we showed that our method is statistically more efficient than the existing methods, and the practical relevance is illustrated by application of the approach to obesity-related phenotypes. Conclusions The proposed method may be more statistically efficient than the existing methods. The application was developed in C++ and is available at the following URL: http://healthstat.snu.ac.kr/software/mfqls/.


Background
During the last decade, more than a hundred genomewide association studies (GWAS) have been initiated, and GWAS have been successful in identifying many susceptibility loci involved in human disease. However, phenotypic variance explained by significant findings has often been small, even for most heritable phenotypes [1,2]. For example, SNPs significantly associated with human height in GWAS involving tens of thousands of subjects explain only about 5% of the phenotypic variance [3]. Various reasons for the so-called missing heritability have been provided [2], but the effect-size distribution for many phenotypes [4] reveals that further investigation of an efficient strategy for genetic association analysis remains necessary.
It has been found that analysis with secondary phenotypes [5][6][7][8][9] reduces false negative findings, and several different methods, such as the linear mixed model [9] and combining of p-values [7], have been proposed. The most efficient approach of multiple phenotypes depends on the unknown disease model between multiple phenotypes and genotypes. For instance, if multiple genes have a causal effect on multiple phenotypes, and the genotype-phenotype models are multidimensional, multivariate analyses are often expected to be most efficient [7]. In such a case, if the marginal effects of genotypes on multiple phenotypes are separately tested, multiple p-values for each marginal effect need to be adjusted with multiple comparison correction methods [10][11][12], and for a large number of p-values, the chance to identify the disease susceptibility loci becomes smaller. However, joint analysis of multiple phenotypes is much less affected by multiple comparison issues, and is thought to improve power. Furthermore, the presence of linkage disequilibrium (LD) between markers reveals the benefit of multi-marker association analysis [13,14]. For instance, two-marker genome-wide association analysis can sometimes be more efficient than onemarker analysis, if the large-scale genetic information is sufficiently dense [15][16][17]. Therefore in this report, we focus on the joint analysis of multiple phenotypes and genotypes.
The family-based design has been considered to be an important strategy in genetic association analysis. However the parameter estimations for the analysis of family data is numerically complicated, and few methods other than the linear mixed model for quantitative phenotypes have been available for family-based samples. In particular, FBAT statistics [18], based on the within-family component, has been extended for the joint analysis of multiple phenotypes and genotype [19][20][21]. Given the nature of FBAT statistics, they are robust against the population substructure and can be combined with rank-based p-values [22,23] based on the between-family component in a robust way [24]. However, even though this approach provides global robustness against population substructure, the phenotypic information is only partially utilized and the loss of power can be substantial if the number of founders is large.
In this report, we propose a new statistical method for the joint analysis of multiple phenotypes and genotypes with family-based samples. Our method can be utilized for both quantitative and dichotomous phenotypes, and is robust against the population substructure if the correlation matrix between individuals can be estimated from large-scale genetic data. The proposed method consists of two steps. First, phenotypes are adjusted with the offset based on the best linear unbiased predictor (BLUP) [25] or disease prevalence. Second adjusted phenotypes are utilized for statistical inference. Using extensive simulations, we showed that our method is statistically more efficient than existing methods, and its computational simplicity makes possible large-scale genome-wide association analysis. The proposed method was applied to the joint analysis of obesity-related phenotypes with the healthy twin study, Korea (HTK) and our significant results illustrate the practical value of the proposed method.

Notations and the disease model
The genetic association between M variants and Q phenotypes is considered. We assume that there are n families and ni individuals in family i. If we denote the sample size by N, N is equal to X n i¼1 n i . We let x ijm and y ijq denote the coded genotype of individual j in family i at variant m and the qth phenotype respectively, where m = 1, …, M and q = 1, …, Q. We let Here X is a N × M matrix and Y is a N × Q matrix. We also define We assume that covariate column vector, Zij, which affects the phenotype, is observed for individual j in family i, and the intercept is included in Zij. We let In addition, we assume that b ijq is a random effect for an additive polygenic effect for the qth phenotype and the variable e ijq is a random error. We let Covariances between individuals are explained by the random effect b ijq , and the variance-covariance matrix for B q can be parameterized by the function of kinship coefficient matrix Ф. If we let π ij,i'j' be the kinship coefficient between individual i in family j and individual i' in family j', and d ij be the inbreeding coefficient for individual j in family i, Ф i is denoted by 1 þ d 11 2π 11;12 2π 11;13 ⋯ 2π 11;12 1 þ d 12 2π 12;13 ⋯ 2π 11;13 2π 12; 13 Under the presence of population substructure, Ф should be replaced with the genetic relationship matrix estimated with large-scale genetic data to provide the robustness of the proposed method [26,27]. However the robustness of proposed method depends on the accuracy of the estimated genetic relationship matrix, and if the level of population substructure depends on the genomic location, the proposed method is not valid [23,28]. In such a case, transmission disequilibrium tests based on Mendelian transmission [18,29] are unique choices robust against the population substructure.

Quasi-likelihood for association analysis
If we let the effect of variant m on phenotype q be β mq , the null and alternative hypotheses are Either prospective or retrospective analysis for this hypothesis testing can be selected depending on the sampling scheme. While prospective analysis assumes that phenotypes are response variable and compares the phenotype distributions between each genotype group, retrospective analysis assumed that individuals were selected based on their phenotypes, and compares genotype distributions between affected and unaffected individuals. In particular large numbers of genotypes enables the estimation of genotypic correlations between individuals, and analysis robust against nonnormality of phenotypes can be conducted with retrospective analysis. As a result, we focus on the retrospective analysis which compares genotype frequencies according to disease phenotypes. When comparing the genotype distribution, it has been shown that the statistical efficiency of the test statistic can be improved by adjusting phenotype [30], and we introduce the offset μ ijq for qth phenotype of individual j in family i at variant m to improve the efficiency of the proposed score test. We set For any positive integer w, we let 1 w be the w × 1 column vector that consisted of 1 and I w be the w × w identity matrix. We denoted an MAF of variant m in unaffected individuals by p m , and p = (p 1 , … , pM) t . We assumed [31] that for a constant γ m,q , where 0 < 2 p m + γ m,q < 1. If we let V be the working correlation matrix for X m , the score for a variant m can be defined by Here V and μ were incorporated to generalize the quasi-likelihood score function and can be estimated by maximizing the efficiency of the score statistics. Their incorporation is a main difference from the assumptions for WQLS and MQLS statistics [31,32]. The most efficient choices of them makes the proposed score test equivalent to MQLS statistic [33], and we extend this approach to the joint analysis of multiple phenotypes and genotypes. If we let ⊗ indicate the Kronecker product, the quasi-likelihood score corresponding to the null hypothesis is If we let e ij be an N × 1 vector where the j þ X j−1 i¼1 n i th element is 1 and the others are 0, Thus if we define Efficient choices of μ and V Statistical efficiency depends on the choices of V and μ [31,33,34], and the optimal choices have been provided by maximizing the non-centrality parameters under the alternative hypothesis (see Won and Lange [33] for detailed information). For V, the identity matrix maximizes the statistical efficiency of the quasi-likelihood [33], and we consider it for V. The most efficient choice of μ may be related with the sampling scheme, and either BLUP or the prevalence were shown to be the most efficient [33], depending on the sampling scheme. If families are randomly selected, BLUP was shown to be the most efficient for both dichotomous and quantitative phenotypes [33], and if families with a large number of affected family members are selectively utilized for association analysis, it was recommended that prevalence was used for dichotomous phenotypes [31,33]. In this report, we focus on randomly selected families, and we incorporate BLUP from the linear mixed model for μ. The linear mixed model [35] for quantitative phenotype is given by Here, we denote the qth diagonal element for Σ B by σ 2 B;q . Several algorithms to estimate variance parameters such as Σ B and σ 2 E;q for linear mixed model exist [36][37][38], and the average information method [36] has often been recommended because of its computational efficiency. If we denote the estimates for σ 2 B;q and σ 2 E;q bŷ σ 2 B;q andσ 2 E;q under the null hypothesis respectively, and then incorporation of BLUP as offset makes For a dichotomous phenotype, the generalized linear mixed model [39] might be considered as an appropriate approach but the generalized linear mixed models cannot be directly optimized. Approximations to avoid numerical integration sometimes lead to serious bias [40,41], and Crowder [42,43] showed that the choice of a linear mixed model for dichotomous phenotypes is reasonable in this context. Therefore we consider the dichotomous phenotypes as quantitative phenotypes, and T q estimated by the same way for quantitative phenotypes was recommended for dichotomous phenotypes when individuals were randomly selected [33]. Therefore, for randomly selected families, we utilize the identity matrix for V and BLUP for μ for both quantitative and dichotomous.

Quasi-likelihood maximum estimator for minor allele frequencies
We denote var(X ij ) by Ψ and we assume that Then we can have var vec X ð Þ ð ÞÞ¼Ψ⊗Φ: Ψ was estimated with a sample variance covariance matrix, and we found that this choice usually works well. Therefore the marginal quasi-likelihood score function for p is and without any knowledge about Ψ, the quasi maximum likelihood estimates for p can be calculated bŷ The quasi-likelihood maximum estimator for p is equivalent to the best linear unbiased estimator. We can simply assume that Therefore Gauss-Markov theorem indicates that the best linear unbiased estimator for p is Family-based multivariate association test and utilize the proposed choices of V and μ and the quasi likelihood maximum estimator for p, S ij becomes and our score is For the statistic based on quasi-likelihood score, we can calculate the covariance of S ij and S i'j' as follows: and we have The proposed statistic will be denoted as MFQLS in the remainder of this report.

Utilizing individuals with incomplete information
Individuals with missing genotypes and nonmissing phenotypes, or vice versa, can be utilized for genetic association analysis. For individuals with missing phenotypes and nonmissing genotypes, tijq are assumed to be 0 and these individuals are utilized for the proposed analysis. These individuals are informative only for enhancing the accuracy of the estimated variance-covariance matrix of genotypes Ψ. For individuals with missing genotypes and nonmissing phenotypes, the missing genotypes can be replaced with the conditional expectations for the association analysis [44]. We let the superscripts U and O indicate individuals with missing genotypes and individuals with nonmissing genotypes, respectively. We assume that N O (N U ) indicates the numbers of individuals with nonmissing (missing) genotypes, and in a similar way we define Then, if we denote the minor allele frequency for variant m by p m , the conditional mean vector of the missing genotypes for multiple variants is and the incorporation of best linear unbiased estimator [45] to pm makes it This is an extension of the conditional expectation for a single variant [44]. Therefore, if the proposed score and its variance, respectively, become The simulation model In our simulation studies, we considered large families with 10 subjects that extended over three generations (see Figure 1). We assumed the existence of two disease susceptibility loci, and that minor (major) alleles for both loci were denoted by A(a) and B(b), respectively. If we denote the minor allele frequencies as p A and p B , and the linkage disequilibrium between these two loci by d, the haplotype frequencies for AB, Ab, aB, and ab were calculated by In our simulation, Lewontin's D' [46] was assumed to be 0 or 0.5. Genotypes were assumed to be in Hardy-Weinberg equilibrium and founders' genotypes were generated by multinomial distributions defined by genotype frequencies. The non-founders' genotypes were obtained by simulated Mendelian transmissions from their parents, and we assumed that there was no recombination between two loci.  The empirical type-I errors were estimated with 10,000 replicates at several significance levels. We assumed that the number of markers is two, and that their minor allele frequencies were generated as U(0.1, 0.4). ρ was assumed to be 0.2.
The quantitative phenotypes were defined by summing the phenotypic mean, polygenic effect, main genetic effect, and random error. We assumed that Q = 2 and 5, and denoted the phenotypic means for Q phenotypes by α 1 , …, and α Q . The genetic effect at variant m for phenotype q was generated by the product of βmq and the number of disease alleles. Under the null hypothesis, the genetic effect size parameters β mq were set to 0. The polygenic effects B for Q phenotypes for each founder was independently generated from MVN(0,Σ B ), and the average of maternal and paternal polygenic effects was combined with values independently sampled from MVN(0, 0.5Σ B ) for the polygenic effects of offspring. The random errors for Q phenotypes were assumed to be independent and were independently sampled from MVN(0,σ E,q2 I N ). We assume that if Q = 2, and if Q = 5, they were Here ρ indicates the correlation between different phenotypes.
Furthermore, the robustness of the proposed statistic in the presence of population substructure was evaluated with simulated data. We assumed that there were two subpopulations and each founder was assigned to one of the two subpopulations with 0.5 probability. Means of Q phenotypes in both populations differed by 0.2. The amounts of linkage disequilibrium for both populations were assumed to be same and the allele frequencies for each marker in two subpopulations were generated by the Balding-Nichols model [47]. The allele frequencies, q A and q B , in an ancestral population was generated from U(0.1, 0.4) and if we let FST be the fixation index by Wright [48], the marker allele frequencies for the two subpopulations were independently sampled from the beta distributions (p k (1 -FST)/FST, (1-p k )(1 -FST)/ FST). The value for Wright's FST was assumed to be 0.01, and 0.05.
Last, the simulations of the dichotomous phenotypes were performed using the liability threshold model. Once the quantitative phenotypes with polygenic effect and random error were generated, they were transformed to being affected if quantitative phenotypes are larger than the threshold, but to unaffected when not. The threshold was chosen to preserve the assumed prevalence. We assumed that prevalence was 0.1 and 0.2 if Q = 2, and it was 0.1, 0.1, 0.2, 0.2, and 0.3 if Q = 5. The statistical validity of the proposed method for dichotomous phenotypes was also evaluated under the presence of population substructure. Genotypes and liability scores were generated under the same model as used for the quantitative traits with the Balding-Nichols model, and liabilities for each individual were transformed to either being affected or unaffected, respectively.

Evaluation of the proposed statistical approach using simulated data
For the evaluation of statistical validity, the empirical type-1 error estimates for extended families were calculated at the various significance levels from 10,000 replicates for both dichotomous and quantitative phenotypes. One hundred extended families were generated in each replicate, and we assume that ρ = 0.2. Table 1 shows that the empirical type-1 error rates always preserve the 0.005, 0.01, and 0.05 nominal significance levels for both quantitative and dichotomous phenotypes. The quantile quantile (QQ) plots in Figures 2 and 3 also confirmed the overall validity of our statistical approach for both dichotomous and quantitative phenotypes.
For comparison of power with existing methods, the empirical power estimates were calculated from 2,000 replicates at the 0.005 significance level for quantitative and dichotomous phenotypes. We assumed that ρ were 0.2 and 0.5. For the proposed method, results from different choices of V and μ were compared, and they were Figure 3 QQ-plots for dichotomous phenotypes in the absence of population substructure. QQ-plots were generated from results of 10,000 replicates for quantitative phenotypes. We assumed that the number of markers was 2, and that their minor allele frequencies were generated as U(0.1, 0.5). ρ was assumed to be 0. with an omnibus family-based association test (MFBAT) [21]. We let diag(var(Y 1 ), …, var(Y Q )) be the block diagonal matrix that consists of submatrices, var(Y 1 ), …, and var(Y Q ). Then it is a NQ × NQ dimensional matrix. If diag(var(Y 1 ), …, var(Y Q )) and BLUP are utilized for V and μ, respectively, the proposed method for quantitative phenotypes becomes an extension of the mixedmodel association score test on related individuals (MASTOR) [9] for the joint analysis of multiple phenotypes and multiple genotypes. For dichotomous phenotypes, if I NM and the prevalence are utilized for V and μ, respectively, our score is an extension of the more powerful quasi-likelihood score test (MQLS) [27,31] for the joint analysis of multiple phenotypes and multiple genotypes. Therefore, they will be denoted as MMAS-TOR and MMQLS in the remainder of this report. Table 2 shows that MFQLS are always most efficient for both quantitative and dichotomous phenotypes, and it is followed by MMASTOR for quantitative traits, and by MMQLS for dichotomous traits. Even though MFBAT is always least efficient, this method is globally robust to population substructure, and thus MFBAT is still preferred in some scenarios, such as candidate gene analysis. In addition, our results show that the power improvement for each method is proportional to Q and D', but inversely related with ρ. This result is reminiscent of the analysis of repeated measures, even though results may vary depending on the situation. For the analysis of repeated measurements, it has been shown that power improvement is proportionally related with the number of observations for each individual, but inversely related with the correlation between different measurements [49]. This may be because the larger D' leads to reduced standard deviation of the statistics, while the larger ρ may induce sample size reduction.
Evaluation with simulated data in the presence of population substructure The proposed methods for both dichotomous and quantitative phenotypes were evaluated in the presence of population substructure. Wright's F ST indicates the level of population substructure and we assumed that FST = 0.01 and 0.05. Robustness of the proposed method to population substructure is provided if the genetic relationship matrix is estimated with large-scale genetic information and replace the kinship coefficient matrix [27]. In our simulation studies, we generated 100,000 common variants of which minor allele frequencies were larger than 0.1, and which are not related to the phenotypes. With these large-scale genotypes, we empirically estimated the genetic relationship matrix [27], which was then used as Φ in the proposed methods. The empirical type-1 error rates were calculated from 10,000 replicates at the 0.005, 0.01, and 0.05 significance levels.  The empirical power was estimated using 2,000 replicates at the 0.005 significance level. We assumed that the number of markers was two, and that their minor allele frequencies were 0.2. The empirical type-I errors were estimated using 10,000 replicates at several significance levels. We assumed that the number of markers is two, and that their minor allele frequencies were generated as U(0.1, 0.5). The phenotypic correlations were assumed to be 0.2. Table 3 shows that the empirical type-1 error rates for MFQLS are approximately equal to the nominal significance levels in the presence of the population substructure. Figures 4 and 5 respectively show QQ plots from results for quantitative and dichotomous phenotypes when F ST was assumed to be 0.01 and ρ was 0.2. The QQ plots showed that the statistical validities for both dichotomous and quantitative phenotypes were preserved at various significance levels.
The empirical power estimates for quantitative and dichotomous phenotypes are shown in Tables 4 and 5. The empirical power estimates were calculated from 2,000 replicates and the nominal significance levels were assumed to be 0.001 and 0.01 for quantitative and dichotomous phenotypes, respectively. The empirical power estimates for the proposed method were compared with those of MMASTOR and MFBAT for quantitative phenotypes, and with those of MMQLS and MFBAT for dichotomous phenotypes. The results showed that our method is always the most efficient, followed by MMAS-TOR for quantitative phenotypes and by MMFBAT for dichotomous phenotypes; this was also the case in the absence of population substructure. In particular, a greater reduction in power was observed along with the larger FST.
Applications to a genome-wide association in the HTK cohort The HTK cohort which consisted of families ascertained with healthy twins was initiated to identify genetic variation responsible for complex traits and the role of the environment in the etiology of complex diseases. HTK cohort consists of 2,473 individuals including 900 monozygotic (MZ) twins and 234 dizygotic (DZ) twins. In particular, MZ twins have same genotypes, and a single individual from each twin was randomly selected for genotyping. 1861 individuals were genotyped with Affymetrix Genome-Wide Human SNP array 6.0. We discarded SNPs with p-values for Hardy-Weinberg equilibrium (HWE) less than 10 -5 or MAF less than 0.01, leaving 516,610 SNPs for subsequent analysis. The proportion of genotypes identical between individuals in Figure 4 QQ-plots for quantitative phenotypes in the presence of population substructure. QQ-plots were generated from results of 10,000 replicates for quantitative phenotypes. We assumed that the number of markers was 2, and that their minor allele frequencies were generated as U(0.1, 0.5). ρ was assumed to be 0.2, and Wright's F ST was assumed to be 0.01. (a) Q=2 and D'=0, (b) Q=2 and D'=0.5, (c) Q=5 and D'=0, and (d) Q=5 and D'=0.5 were assumed respectively. each family was calculated and individuals with inconsistency between the genetic and reported relationship (n = 58) were excluded. At the same time, individuals with coding error about type of twin status were excluded, and in total genotypes for 1801 individuals were used for analysis.
The body mass index (BMI) is defined as individuals' body mass divided by the square of their height and the waist-hip ratio (WHR) is the ratio of the circumference of the waist to that of the hips. The triglyceride (TG) is an ester derived from glycerol and three fatty acids, and we took a logarithm to TG. With these three phenotypes we performed joint analysis to identify the disease susceptibility loci for obesity-related phenotypes. Age and sex were included as covariates for the linear mixed model and BLUP was utilized as offset for MFQLS. The number of individuals with missing phenotypes for BMI, WHR, and TG were 4, 1, and 28, respectively, and their tijq were assumed to be 0. For comparison, EMMAX [26] based on linear mixed model was separately applied for each phenotype and covariates used for MFQLS were also included as those for EMMAX. We calculate genetic relationship matrix with common SNPs and they were used as variance-covariance matrix for EMMAX to adjust the population substructure.
The QQ plots in Figure 6 show that the results for the EMMAX and MFQLS preserve the nominal significance level, and Manhattan plots in Figure 7 demonstrate that the results from MFQLS are more significant than the results from EMMAX. Genome-wide significance level with Bonferroni correction is 9.68 × 10 -8 and Table 6 shows the results for SNPs of which p-values were less than 5 × 10 -7 for EMMAX or MFQLS. rs651821 is an unique genome-wide significant result and the p-value of rs651821 derived by MFQLS was markedly less than those derived by EMMAX. P-values of rs17119975 and rs4417316 were larger than the significance level by Bonferroni correction but they are still expected to be promising candidate disease susceptible loci. In particular, the genetic positions of these three SNPs were similar, and we checked the linkage disequilibrium between theses SNPs with pairwise r 2 from the Chinese and Japanese data in the HapMap Release 3. rs17119975 and rs4417316 were in linkage disequilibrium with r 2 = 0.823, but r 2 between rs651821 and the others are less than 0.01. Small p-values of rs17119975 and rs4417316 may be generated with the same genetic component even though both are located in different genes, and it should be noticed that the smallest p-value for rs17119975 and rs4417316 was found with MFQLS.
Based on those results, we conducted the gene-based analysis with MFQLS for those three genes. All SNPs in each gene were utilized for the joint analysis of multiple phenotypes and multiple genotypes. Single SNP is located in APOA5, and three SNPs are in BUD13 and ZNF259. The result for APOA5 is same as results for rs651821. Thus, our MQLS statistics assumes that Q = 3 and M = 1 for APOA5, and Q = 3 and M = 3 for BUD13 and ZNF259. Table 7 shows results from the MFQLS analyses, and we found that APOA5 and ZNF259 are genome-wide significant even though the genome-wide association analyses with M = 1 identified only a single genome-wide significant SNP. Therefore, the analyses of multiple genotypes provided more genome-wide significant results, and seem to be efficient strategy for association analysis.

Discussion
In this report, we have extended a score test based on the quasi-likelihood to joint analysis of multiple phenotypes and genotypes. The proposed method can be applied to dichotomous and quantitative phenotypes, and it is statistically valid even in the presence of population substructure. With extensive simulation studies, we found that the proposed method is statistically more efficient than existing methods. The genome-wide association analysis of the HTK cohort with M = 1 and Q = 3 required 13 minutes and 26 seconds. The pedigree structure does not affect the computational intensity and thus we can conclude that the proposed method is computationally efficient enough to complete genome-wide association analysis using a few thousand individuals within a few hours. The software for the proposed method is downloadable from http://healthstat.snu.ac.kr/software/ mfqls/.
The proposed method is based on quasi-likelihood [31][32][33]44] and the relationship of the proposed method with the existing methods based on quasi-likelihood can be explained by different choices of V and μ. For instance, if M and Q are 1, the MASTOR statistic [44] used the phenotypic variance covariance matrix and BLUP for V and μ, respectively. If an identity matrix and prevalence are used, our method is equivalent to MQLS [31]. We empirically confirmed that, in retrospective analysis, the identity matrix was the most efficient choice for V and the most efficient choice of offset can The empirical power was estimated using 2,000 replicates at the 0.005 significance level. We assumed that the number of markers was two, and that their minor allele frequencies were generated as U(0.1, 0.5). The empirical power was estimated using 2,000 replicates at the 0.005 significance level. We assumed that the number of markers was 2, and that their minor allele frequencies were 0.2.  be either BLUP or prevalence, depending the sampling schemes [31,33]. Our results for the joint analysis of multiple genotypes and phenotypes also yielded similar results. However, families for association analysis are often ascertained based on some family members and the choice of offset is not clear in such a scenario. This will be further investigated in our follow-up studies. The proposed methods test the homogeneity of genotype distribution along the phenotypes, but this retrospective analysis is expected to be less efficient than the prospective analysis of random samples. However, it has recently been shown that power loss for retrospective analysis is often negligible [33], and the retrospective analysis can be preferred because of their flexibilities for genetic association analysis. For instance, first, the proposed method is robust to outliers and nonnormality of phenotypes. While the genetic heterogeneity between individuals can be adjusted with an estimated kinship coefficient matrix, nonnormality and outliers of phenotypes often lead to loss of validity or efficiency of the statistical inference [33]. In particular, when multiple samples are pooled, the heterogeneity of phenotypic distributions between samples requires stratified analysis, but the heterogeneity of genotypes between individuals may be controlled by using a genetic relationship matrix for retrospective analysis, which enables the direct analysis of the pooled sample. Second, the uncertainty of missing genotypes can be controlled using the proposed method. Missing genotypes are usually imputed based on linkage disequilibrium, and they were utilized for association analysis without consideration of the uncertainty of the imputed genotypes. However if the variation of the imputed genotypes is substantial and it is not considered for genetic association analysis, statistical inference can be invalidated. However the proposed method can consider the uncertainty of the imputed genotypes, and it enables the valid statistical inference in such a scenario.
Even though GWAS have successfully identified many genetic variants for diseases in the past decade, our experience has revealed that further investigation of the analysis strategies for reducing false negative findings is necessary. The significant results from our analysis with simulated data and real data for obesity indicated that joint analysis with multiple phenotypes and genotypes may provide a breakthrough in genetic association analysis.

Conclusion
We proposed a new method for the joint analysis of multiple phenotypes and genotypes. There is no uniformly most powerful method for the joint analysis and the statistically most efficient method depends on the unknown disease model. The proposed method assumes that multiple genes have a causal effect on multiple phenotypes, and the genotype-phenotype models are multidimensional, multivariate analyses. In such a scenario, our method is expected to be an efficient strategy. The proposed method is implemented with C++ and the computationally efficient at the genome-wide scale. We feel the current methods open new ways to identify the disease susceptibility loci.