Family-based association analysis: a fast and efficient method of multivariate association analysis with multiple variants
- Sungho Won^{1, 2, 3}Email author,
- Wonji Kim^{2},
- Sungyoung Lee^{2},
- Young Lee^{4},
- Joohon Sung^{1, 3} and
- Taesung Park^{2, 5}Email author
https://doi.org/10.1186/s12859-015-0484-5
© Won et al.; licensee BioMed Central. 2015
Received: 22 June 2014
Accepted: 29 January 2015
Published: 15 February 2015
Abstract
Background
Many disease phenotypes are outcomes of the complicated interplay between multiple genes, and multiple phenotypes are affected by a single or multiple genotypes. Therefore, joint analysis of multiple phenotypes and multiple markers has been considered as an efficient strategy for genome-wide association analysis, and in this work we propose an omnibus family-based association test for the joint analysis of multiple genotypes and multiple phenotypes.
Results
The proposed test can be applied for both quantitative and dichotomous phenotypes, and it is robust under the presence of population substructure, as long as large-scale genomic data is available. Using simulated data, we showed that our method is statistically more efficient than the existing methods, and the practical relevance is illustrated by application of the approach to obesity-related phenotypes.
Conclusions
The proposed method may be more statistically efficient than the existing methods. The application was developed in C++ and is available at the following URL: http://healthstat.snu.ac.kr/software/mfqls/.
Keywords
Background
During the last decade, more than a hundred genome-wide association studies (GWAS) have been initiated, and GWAS have been successful in identifying many susceptibility loci involved in human disease. However, phenotypic variance explained by significant findings has often been small, even for most heritable phenotypes [1,2]. For example, SNPs significantly associated with human height in GWAS involving tens of thousands of subjects explain only about 5% of the phenotypic variance [3]. Various reasons for the so-called missing heritability have been provided [2], but the effect-size distribution for many phenotypes [4] reveals that further investigation of an efficient strategy for genetic association analysis remains necessary.
It has been found that analysis with secondary phenotypes [5-9] reduces false negative findings, and several different methods, such as the linear mixed model [9] and combining of p-values [7], have been proposed. The most efficient approach of multiple phenotypes depends on the unknown disease model between multiple phenotypes and genotypes. For instance, if multiple genes have a causal effect on multiple phenotypes, and the genotype-phenotype models are multidimensional, multivariate analyses are often expected to be most efficient [7]. In such a case, if the marginal effects of genotypes on multiple phenotypes are separately tested, multiple p-values for each marginal effect need to be adjusted with multiple comparison correction methods [10-12], and for a large number of p-values, the chance to identify the disease susceptibility loci becomes smaller. However, joint analysis of multiple phenotypes is much less affected by multiple comparison issues, and is thought to improve power. Furthermore, the presence of linkage disequilibrium (LD) between markers reveals the benefit of multi-marker association analysis [13,14]. For instance, two-marker genome-wide association analysis can sometimes be more efficient than one-marker analysis, if the large-scale genetic information is sufficiently dense [15-17]. Therefore in this report, we focus on the joint analysis of multiple phenotypes and genotypes.
The family-based design has been considered to be an important strategy in genetic association analysis. However the parameter estimations for the analysis of family data is numerically complicated, and few methods other than the linear mixed model for quantitative phenotypes have been available for family-based samples. In particular, FBAT statistics [18], based on the within-family component, has been extended for the joint analysis of multiple phenotypes and genotype [19-21]. Given the nature of FBAT statistics, they are robust against the population substructure and can be combined with rank-based p-values [22,23] based on the between-family component in a robust way [24]. However, even though this approach provides global robustness against population substructure, the phenotypic information is only partially utilized and the loss of power can be substantial if the number of founders is large.
In this report, we propose a new statistical method for the joint analysis of multiple phenotypes and genotypes with family-based samples. Our method can be utilized for both quantitative and dichotomous phenotypes, and is robust against the population substructure if the correlation matrix between individuals can be estimated from large-scale genetic data. The proposed method consists of two steps. First, phenotypes are adjusted with the offset based on the best linear unbiased predictor (BLUP) [25] or disease prevalence. Second adjusted phenotypes are utilized for statistical inference. Using extensive simulations, we showed that our method is statistically more efficient than existing methods, and its computational simplicity makes possible large-scale genome-wide association analysis. The proposed method was applied to the joint analysis of obesity-related phenotypes with the healthy twin study, Korea (HTK) and our significant results illustrate the practical value of the proposed method.
Methods
Notations and the disease model
Under the presence of population substructure, Ф should be replaced with the genetic relationship matrix estimated with large-scale genetic data to provide the robustness of the proposed method [26,27]. However the robustness of proposed method depends on the accuracy of the estimated genetic relationship matrix, and if the level of population substructure depends on the genomic location, the proposed method is not valid [23,28]. In such a case, transmission disequilibrium tests based on Mendelian transmission [18,29] are unique choices robust against the population substructure.
Quasi-likelihood for association analysis
Efficient choices of μ and V
For a dichotomous phenotype, the generalized linear mixed model [39] might be considered as an appropriate approach but the generalized linear mixed models cannot be directly optimized. Approximations to avoid numerical integration sometimes lead to serious bias [40,41], and Crowder [42,43] showed that the choice of a linear mixed model for dichotomous phenotypes is reasonable in this context. Therefore we consider the dichotomous phenotypes as quantitative phenotypes, and T ^{q} estimated by the same way for quantitative phenotypes was recommended for dichotomous phenotypes when individuals were randomly selected [33]. Therefore, for randomly selected families, we utilize the identity matrix for V and BLUP for μ for both quantitative and dichotomous.
Quasi-likelihood maximum estimator for minor allele frequencies
Family-based multivariate association test
The proposed statistic will be denoted as MFQLS in the remainder of this report.
Utilizing individuals with incomplete information
The simulation model
In our simulation, Lewontin’s D' [46] was assumed to be 0 or 0.5. Genotypes were assumed to be in Hardy–Weinberg equilibrium and founders’ genotypes were generated by multinomial distributions defined by genotype frequencies. The non-founders’ genotypes were obtained by simulated Mendelian transmissions from their parents, and we assumed that there was no recombination between two loci.
Here ρ indicates the correlation between different phenotypes.
Furthermore, the robustness of the proposed statistic in the presence of population substructure was evaluated with simulated data. We assumed that there were two subpopulations and each founder was assigned to one of the two subpopulations with 0.5 probability. Means of Q phenotypes in both populations differed by 0.2. The amounts of linkage disequilibrium for both populations were assumed to be same and the allele frequencies for each marker in two subpopulations were generated by the Balding–Nichols model [47]. The allele frequencies, q _{ A } and q _{ B }, in an ancestral population was generated from U(0.1, 0.4) and if we let FST be the fixation index by Wright [48], the marker allele frequencies for the two subpopulations were independently sampled from the beta distributions (p _{ k }(1 – FST)/FST, (1– p _{ k })(1 – FST)/FST). The value for Wright’s FST was assumed to be 0.01, and 0.05.
Last, the simulations of the dichotomous phenotypes were performed using the liability threshold model. Once the quantitative phenotypes with polygenic effect and random error were generated, they were transformed to being affected if quantitative phenotypes are larger than the threshold, but to unaffected when not. The threshold was chosen to preserve the assumed prevalence. We assumed that prevalence was 0.1 and 0.2 if Q = 2, and it was 0.1, 0.1, 0.2, 0.2, and 0.3 if Q = 5. The statistical validity of the proposed method for dichotomous phenotypes was also evaluated under the presence of population substructure. Genotypes and liability scores were generated under the same model as used for the quantitative traits with the Balding–Nichols model, and liabilities for each individual were transformed to either being affected or unaffected, respectively.
Results
Evaluation of the proposed statistical approach using simulated data
Empirical type-I error estimates in the absence of population substructure
α | |||||
---|---|---|---|---|---|
TYPE | Q | D ' | 0.005 | 0.01 | 0.05 |
Quantitative | 2 | 0 | 0.0056 | 0.0105 | 0.0481 |
2 | 0.5 | 0.0043 | 0.0091 | 0.0482 | |
5 | 0 | 0.0059 | 0.0115 | 0.0506 | |
5 | 0.5 | 0.0044 | 0.0103 | 0.0526 | |
Dichotomous | 2 | 0 | 0.0038 | 0.0088 | 0.0455 |
2 | 0.5 | 0.0039 | 0.0095 | 0.0502 | |
5 | 0 | 0.0041 | 0.0083 | 0.0509 | |
5 | 0.5 | 0.0056 | 0.0098 | 0.0501 |
For comparison of power with existing methods, the empirical power estimates were calculated from 2,000 replicates at the 0.005 significance level for quantitative and dichotomous phenotypes. We assumed that ρ were 0.2 and 0.5. For the proposed method, results from different choices of V and μ were compared, and they were with an omnibus family-based association test (MFBAT) [21]. We let diag(var(Y ^{1}), …, var(Y ^{ Q })) be the block diagonal matrix that consists of submatrices, var(Y ^{1}), …, and var(Y ^{Q}). Then it is a NQ × NQ dimensional matrix. If diag(var(Y ^{1}), …, var(Y ^{Q})) and BLUP are utilized for V and μ, respectively, the proposed method for quantitative phenotypes becomes an extension of the mixed-model association score test on related individuals (MASTOR) [9] for the joint analysis of multiple phenotypes and multiple genotypes. For dichotomous phenotypes, if I _{ NM } and the prevalence are utilized for V and μ, respectively, our score is an extension of the more powerful quasi-likelihood score test (MQLS) [27,31] for the joint analysis of multiple phenotypes and multiple genotypes. Therefore, they will be denoted as MMASTOR and MMQLS in the remainder of this report.
Empirical power estimates in the absence of population substructure
ρ | Q | D ' | MMASTOR | MFBAT | MFQLS | |
---|---|---|---|---|---|---|
Quantitative phenotypes | 0.2 | 2 | 0 | 0.5180 | 0.2025 | 0.5830 |
2 | 0.5 | 0.7235 | 0.3750 | 0.7805 | ||
5 | 0 | 0.7800 | 0.3855 | 0.7870 | ||
5 | 0.5 | 0.9200 | 0.6430 | 0.9240 | ||
0.5 | 2 | 0 | 0.4915 | 0.1655 | 0.5400 | |
2 | 0.5 | 0.6785 | 0.3340 | 0.7505 | ||
5 | 0 | 0.7015 | 0.3020 | 0.7350 | ||
5 | 0.5 | 0.8725 | 0.5405 | 0.8885 | ||
Dichotomous phenotypes | ρ | Q | D' | MMQLS | MFBAT | MFQLS |
0.2 | 2 | 0 | 0.2015 | 0.0530 | 0.2340 | |
2 | 0.5 | 0.3050 | 0.1070 | 0.3470 | ||
5 | 0 | 0.3205 | 0.0995 | 0.3710 | ||
5 | 0.5 | 0.6215 | 0.2460 | 0.6660 | ||
0.5 | 2 | 0 | 0.1795 | 0.0535 | 0.2130 | |
2 | 0.5 | 0.2945 | 0.0915 | 0.3270 | ||
5 | 0 | 0.2670 | 0.0910 | 0.3085 | ||
5 | 0.5 | 0.5200 | 0.2130 | 0.5900 |
Evaluation with simulated data in the presence of population substructure
Empirical type-I error estimates in the presence of population substructure
α | ||||||
---|---|---|---|---|---|---|
TYPE | F _{ ST } | Q | D ' | 0.005 | 0.01 | 0.05 |
Quantitative | 0.01 | 2 | 0 | 0.0048 | 0.0098 | 0.0546 |
2 | 0.5 | 0.0066 | 0.0105 | 0.0513 | ||
5 | 0 | 0.0046 | 0.0098 | 0.0521 | ||
5 | 0.5 | 0.0058 | 0.0105 | 0.0534 | ||
0.05 | 2 | 0 | 0.0054 | 0.0094 | 0.0514 | |
2 | 0.5 | 0.0050 | 0.0108 | 0.0521 | ||
5 | 0 | 0.0057 | 0.0094 | 0.0509 | ||
5 | 0.5 | 0.0046 | 0.0094 | 0.0496 | ||
Dichotomous | 0.01 | 2 | 0 | 0.0050 | 0.0107 | 0.0488 |
2 | 0.5 | 0.0039 | 0.0082 | 0.0472 | ||
5 | 0 | 0.0059 | 0.0108 | 0.0499 | ||
5 | 0.5 | 0.0045 | 0.0089 | 0.0465 | ||
0.05 | 2 | 0 | 0.0065 | 0.0125 | 0.0529 | |
2 | 0.5 | 0.0049 | 0.0108 | 0.0477 | ||
5 | 0 | 0.0053 | 0.0115 | 0.0525 | ||
5 | 0.5 | 0.0046 | 0.0093 | 0.0480 |
Empirical power estimates for quantitative phenotypes in the presence of population substructure
FST | ρ | Q | D ' | MMASTOR | MFBAT | MFQLS |
---|---|---|---|---|---|---|
0.01 | 0.2 | 2 | 0 | 0.5020 | 0.1935 | 0.5680 |
2 | 0.5 | 0.6860 | 0.3530 | 0.7570 | ||
5 | 0 | 0.7380 | 0.3610 | 0.7965 | ||
5 | 0.5 | 0.9065 | 0.6430 | 0.9180 | ||
0.5 | 2 | 0 | 0.4765 | 0.1630 | 0.5300 | |
2 | 0.5 | 0.6710 | 0.3365 | 0.7390 | ||
5 | 0 | 0.6820 | 0.2990 | 0.6975 | ||
5 | 0.5 | 0.8450 | 0.5057 | 0.8600 | ||
0.05 | 0.2 | 2 | 0 | 0.4880 | 0.1925 | 0.5330 |
2 | 0.5 | 0.6550 | 0.3250 | 0.6925 | ||
5 | 0 | 0.7210 | 0.3465 | 0.7375 | ||
5 | 0.5 | 0.8765 | 0.6430 | 0.8885 | ||
0.5 | 2 | 0 | 0.4555 | 0.1620 | 0.4830 | |
2 | 0.5 | 0.6335 | 0.3150 | 0.6745 | ||
5 | 0 | 0.6525 | 0.2995 | 0.6570 | ||
5 | 0.5 | 0.8160 | 0.4850 | 0.8190 |
Empirical power estimates for dichotomous phenotypes in the presence of population substructure
FST | ρ | Q | D ' | MMQLS | MFBAT | MFQLS |
---|---|---|---|---|---|---|
0.01 | 0.2 | 2 | 0 | 0.2075 | 0.0565 | 0.2350 |
2 | 0.5 | 0.3365 | 0.1135 | 0.3795 | ||
5 | 0 | 0.3455 | 0.0975 | 0.3825 | ||
5 | 0.5 | 0.6025 | 0.2330 | 0.6455 | ||
0.5 | 2 | 0 | 0.1830 | 0.0545 | 0.2140 | |
2 | 0.5 | 0.2900 | 0.1165 | 0.3120 | ||
5 | 0 | 0.2855 | 0.0910 | 0.3240 | ||
5 | 0.5 | 0.5345 | 0.2200 | 0.5965 | ||
0.05 | 0.2 | 2 | 0 | 0.1975 | 0.0575 | 0.2300 |
2 | 0.5 | 0.2840 | 0.0995 | 0.3210 | ||
5 | 0 | 0.2990 | 0.0915 | 0.3335 | ||
5 | 0.5 | 0.5405 | 0.2140 | 0.5860 | ||
0.5 | 2 | 0 | 0.1680 | 0.0595 | 0.2065 | |
2 | 0.5 | 0.2605 | 0.1095 | 0.2930 | ||
5 | 0 | 0.2620 | 0.0910 | 0.3025 | ||
5 | 0.5 | 0.4835 | 0.1800 | 0.5370 |
Applications to a genome-wide association in the HTK cohort
The HTK cohort which consisted of families ascertained with healthy twins was initiated to identify genetic variation responsible for complex traits and the role of the environment in the etiology of complex diseases. HTK cohort consists of 2,473 individuals including 900 monozygotic (MZ) twins and 234 dizygotic (DZ) twins. In particular, MZ twins have same genotypes, and a single individual from each twin was randomly selected for genotyping. 1861 individuals were genotyped with Affymetrix Genome-Wide Human SNP array 6.0. We discarded SNPs with p-values for Hardy–Weinberg equilibrium (HWE) less than 10^{–5} or MAF less than 0.01, leaving 516,610 SNPs for subsequent analysis. The proportion of genotypes identical between individuals in each family was calculated and individuals with inconsistency between the genetic and reported relationship (n = 58) were excluded. At the same time, individuals with coding error about type of twin status were excluded, and in total genotypes for 1801 individuals were used for analysis.
The body mass index (BMI) is defined as individuals’ body mass divided by the square of their height and the waist-hip ratio (WHR) is the ratio of the circumference of the waist to that of the hips. The triglyceride (TG) is an ester derived from glycerol and three fatty acids, and we took a logarithm to TG. With these three phenotypes we performed joint analysis to identify the disease susceptibility loci for obesity-related phenotypes. Age and sex were included as covariates for the linear mixed model and BLUP was utilized as offset for MFQLS. The number of individuals with missing phenotypes for BMI, WHR, and TG were 4, 1, and 28, respectively, and their tijq were assumed to be 0. For comparison, EMMAX [26] based on linear mixed model was separately applied for each phenotype and covariates used for MFQLS were also included as those for EMMAX. We calculate genetic relationship matrix with common SNPs and they were used as variance-covariance matrix for EMMAX to adjust the population substructure.
Significant results from genome-wide association study
SNP | CHR | POS | Gene | Minor allele | EMMAX | MFQLS |
---|---|---|---|---|---|---|
rs651821 | 11 | 116167789 | APOA5 | C | 1.075 × 10^{-12} | 2.295 × 10^{-14} |
rs17119975 | 11 | 116139767 | BUD13 | C | 2.191 × 10^{-7} | 1.940 × 10^{-7} |
rs4417316 | 11 | 116157511 | ZNF259 | T | 3.121 × 10^{-7} | 3.138 × 10^{-7} |
Gene-based association analysis for APOA5, BUD13 and ZNF259
CHR | Gene | List of SNPs | P-value |
---|---|---|---|
11 | APOA5 | rs651821 | 2.295 × 10^{-14} |
11 | BUD13 | rs11600380, rs17119975, rs1145208 | 1.331 × 10^{-5} |
11 | ZNF259 | rs4417316, rs6589566, rs603446 | 2.044 × 10^{-9} |
Discussion
In this report, we have extended a score test based on the quasi-likelihood to joint analysis of multiple phenotypes and genotypes. The proposed method can be applied to dichotomous and quantitative phenotypes, and it is statistically valid even in the presence of population substructure. With extensive simulation studies, we found that the proposed method is statistically more efficient than existing methods. The genome-wide association analysis of the HTK cohort with M = 1 and Q = 3 required 13 minutes and 26 seconds. The pedigree structure does not affect the computational intensity and thus we can conclude that the proposed method is computationally efficient enough to complete genome-wide association analysis using a few thousand individuals within a few hours. The software for the proposed method is downloadable from http://healthstat.snu.ac.kr/software/mfqls/.
The proposed method is based on quasi-likelihood [31-33,44] and the relationship of the proposed method with the existing methods based on quasi-likelihood can be explained by different choices of V and μ. For instance, if M and Q are 1, the MASTOR statistic [44] used the phenotypic variance covariance matrix and BLUP for V and μ, respectively. If an identity matrix and prevalence are used, our method is equivalent to MQLS [31]. We empirically confirmed that, in retrospective analysis, the identity matrix was the most efficient choice for V and the most efficient choice of offset can be either BLUP or prevalence, depending the sampling schemes [31,33]. Our results for the joint analysis of multiple genotypes and phenotypes also yielded similar results. However, families for association analysis are often ascertained based on some family members and the choice of offset is not clear in such a scenario. This will be further investigated in our follow-up studies.
The proposed methods test the homogeneity of genotype distribution along the phenotypes, but this retrospective analysis is expected to be less efficient than the prospective analysis of random samples. However, it has recently been shown that power loss for retrospective analysis is often negligible [33], and the retrospective analysis can be preferred because of their flexibilities for genetic association analysis. For instance, first, the proposed method is robust to outliers and nonnormality of phenotypes. While the genetic heterogeneity between individuals can be adjusted with an estimated kinship coefficient matrix, nonnormality and outliers of phenotypes often lead to loss of validity or efficiency of the statistical inference [33]. In particular, when multiple samples are pooled, the heterogeneity of phenotypic distributions between samples requires stratified analysis, but the heterogeneity of genotypes between individuals may be controlled by using a genetic relationship matrix for retrospective analysis, which enables the direct analysis of the pooled sample. Second, the uncertainty of missing genotypes can be controlled using the proposed method. Missing genotypes are usually imputed based on linkage disequilibrium, and they were utilized for association analysis without consideration of the uncertainty of the imputed genotypes. However if the variation of the imputed genotypes is substantial and it is not considered for genetic association analysis, statistical inference can be invalidated. However the proposed method can consider the uncertainty of the imputed genotypes, and it enables the valid statistical inference in such a scenario.
Even though GWAS have successfully identified many genetic variants for diseases in the past decade, our experience has revealed that further investigation of the analysis strategies for reducing false negative findings is necessary. The significant results from our analysis with simulated data and real data for obesity indicated that joint analysis with multiple phenotypes and genotypes may provide a breakthrough in genetic association analysis.
Conclusion
We proposed a new method for the joint analysis of multiple phenotypes and genotypes. There is no uniformly most powerful method for the joint analysis and the statistically most efficient method depends on the unknown disease model. The proposed method assumes that multiple genes have a causal effect on multiple phenotypes, and the genotype-phenotype models are multidimensional, multivariate analyses. In such a scenario, our method is expected to be an efficient strategy. The proposed method is implemented with C++ and the computationally efficient at the genome-wide scale. We feel the current methods open new ways to identify the disease susceptibility loci.
Declarations
Acknowledgement
This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2013R1A1A2010437); the Industrial Core Technology Development Program (10040176, Development of Various Bioinformatics Software using Next Generation Bio-data) funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea); and by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2014S1A2A2028559); and by NRF grant funded by the Korea government (MSIP) (No. 2012R1A3A2026438).
Authors’ Affiliations
References
- Maher B. Personal genomes: the case of the missing heritability. Nature. 2008;456(7218):18–21.View ArticlePubMedGoogle Scholar
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–53.View ArticlePubMedPubMed CentralGoogle Scholar
- Visscher PM. Sizing up human height variation. Nat Genet. 2008;40(5):489–90.View ArticlePubMedGoogle Scholar
- Park JH, Wacholder S, Gail MH, Peters U, Jacobs KB, Chanock SJ, et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat Genet. 2010;42(7):570–5.View ArticlePubMedPubMed CentralGoogle Scholar
- O'Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin MR, et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One. 2012;7(5):e34861.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang J, Shete S. Analysis of secondary phenotype involving the interactive effect of the secondary phenotype and genetic variants on the primary disease. Ann Hum Genet. 2012;76(6):484–99.View ArticlePubMedPubMed CentralGoogle Scholar
- van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9(1):e1003235.View ArticlePubMedPubMed CentralGoogle Scholar
- Li H, Gail MH. Efficient adaptively weighted analysis of secondary phenotypes in case–control genome-wide association studies. Hum Hered. 2012;73(3):159–73.View ArticlePubMedPubMed CentralGoogle Scholar
- Schifano ED, Li L, Christiani DC, Lin X. Genome-wide association analysis for multiple continuous secondary phenotypes. Am J Hum Genet. 2013;92(5):744–59.View ArticlePubMedPubMed CentralGoogle Scholar
- Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc. 1955;50(272):1096–121.View ArticleGoogle Scholar
- Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6(2):65–70.Google Scholar
- Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75(4):800–2.View ArticleGoogle Scholar
- Wang X, Morris NJ, Schaid DJ, Elston RC. Power of single- vs. multi-marker tests of association. Genet Epidemiol. 2012;36(5):480–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Han F, Pan W. Powerful multi-marker association tests: unifying genomic distance-based regression and logistic regression. Genet Epidemiol. 2010;34(7):680–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim S, Morris NJ, Won S, Elston RC. Single-marker and two-marker association tests for unphased case–control genotype data, with a power comparison. Genet Epidemiol. 2010;34(1):67–77.PubMedPubMed CentralGoogle Scholar
- Kim S, Abboud HE, Pahl MV, Tayek J, Snyder S, Tamkin J, et al. Examination of association with candidate genes for diabetic nephropathy in a Mexican American population. Clin J Am Soc Nephrol. 2010;5(6):1072–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Slavin TP, Feng T, Schnell A, Zhu X, Elston RC. Two-marker association tests yield new disease associations for coronary artery disease and hypertension. Hum Genet. 2011;130(6):725–33.View ArticlePubMedPubMed CentralGoogle Scholar
- Laird NM, Horvath S, Xu X. Implementing a unified approach to family-based tests of association. Genet Epidemiol. 2000;19 Suppl 1:S36–42.View ArticlePubMedGoogle Scholar
- Horvath S, Xu X, Laird NM. The family based association test method: strategies for studying general genotype–phenotype associations. Eur J Hum Genet. 2001;9(4):301–6.View ArticlePubMedGoogle Scholar
- Lange C, Laird NM. On a general class of conditional tests for family-based association studies in genetics: the asymptotic distribution, the conditional power, and optimality considerations. Genet Epidemiol. 2002;23(2):165–80.View ArticlePubMedGoogle Scholar
- Lasky-Su J, Murphy A, McQueen MB, Weiss S, Lange C. An omnibus test for family-based association studies with multiple SNPs and multiple phenotypes. Eur J Hum Genet. 2010;18(6):720–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Raby BA, Van Steen K, Celedon JC, Litonjua AA, Lange C, Weiss ST. Paternal history of asthma and airway responsiveness in children with asthma. Am J Respir Crit Care Med. 2005;172(5):552–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Won S, Wilk JB, Mathias RA, O'Donnell CJ, Silverman EK, Barnes K, et al. On the analysis of genome-wide association studies in family-based designs: a universal, robust analysis approach and an application to four genome-wide association studies. PLoS Genet. 2009;5(11):e1000741.View ArticlePubMedPubMed CentralGoogle Scholar
- Lange C, Lyon H, DeMeo D, Raby B, Silverman EK, Weiss ST. A new powerful non-parametric two-stage approach for testing multiple phenotypes in family-based association studies. Hum Hered. 2003;56(1–3):10–7.View ArticlePubMedGoogle Scholar
- Hersderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31(2):423–47.View ArticleGoogle Scholar
- Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42(4):348–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Thornton T, McPeek MS. ROADTRIPS: case–control association testing with partially or completely unknown population and pedigree structure. Am J Hum Genet. 2010;86(2):172–84.View ArticlePubMedPubMed CentralGoogle Scholar
- Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11(7):459–63.View ArticlePubMedPubMed CentralGoogle Scholar
- Spielman RS, Ewens WJ. The TDT and other family-based tests for linkage disequilibrium and association. Am J Hum Genet. 1996;59(5):983–9.PubMedPubMed CentralGoogle Scholar
- Lange C, DeMeo DL, Laird NM. Power and design considerations for a general class of family-based association tests: quantitative traits. Am J Hum Genet. 2002;71(6):1330–41.View ArticlePubMedPubMed CentralGoogle Scholar
- Thornton T, McPeek MS. Case–control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet. 2007;81(2):321–37.View ArticlePubMedPubMed CentralGoogle Scholar
- Bourgain C, Hoffjan S, Nicolae R, Newman D, Steiner L, Walker K, et al. Novel case–control test in a founder population identifies P-selectin as an atopy-susceptibility locus. Am J Hum Genet. 2003;73(3):612–26.View ArticlePubMedPubMed CentralGoogle Scholar
- Won S, Lange C. A general framework for robust and efficient association analysis in family-based designs: quantitative and dichotomous phenotypes. Stat Med. 2013;32(25):4482–98.View ArticlePubMedGoogle Scholar
- Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM. Using the noninformative families in family-based association tests: a powerful new testing strategy. Am J Hum Genet. 2003;73(4):801–11.View ArticlePubMedPubMed CentralGoogle Scholar
- George VT, Elston RC. Testing the association between polymorphic markers and quantitative traits in pedigrees. Genet Epidemiol. 1987;4(3):193–201.View ArticlePubMedGoogle Scholar
- Gilmour AR, Thompson R, Cullis BR. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics. 1995;51(4):1440–50.View ArticleGoogle Scholar
- Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38(4):963–74.View ArticlePubMedGoogle Scholar
- Lindstom MJ, Bates DM. Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. J Am Stat Assoc. 1988;83(404):1014–22.Google Scholar
- Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc. 1993;88(421):9–25.Google Scholar
- Gilmour AR, Anderson RD, Rae AL. The analysis of binomial data by a generalized linear mixed model. Biometrika. 1985;72:539–99.View ArticleGoogle Scholar
- Schall R. Estimation in generalized linear models with random effects. Biometrika. 1991;78:719–27.View ArticleGoogle Scholar
- Crowder M. On linear and quadratic estimating functions. Biometrika. 1987;74(3):591–7.View ArticleGoogle Scholar
- Crowder M. Gaussian estimation for correlated binomial data. J R Stat Soc B. 1985;1985(2):229–37.Google Scholar
- Jakobsdottir J, McPeek MS. MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals. Am J Hum Genet. 2013;92(5):652–66.View ArticlePubMedPubMed CentralGoogle Scholar
- McPeek MS, Wu X, Ober C. Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics. 2004;60(2):359–67.View ArticlePubMedGoogle Scholar
- Lewontin RC. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics. 1964;49(1):49–67.PubMedPubMed CentralGoogle Scholar
- Balding DJ, Nichols RA. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica. 1995;96(1–2):3–12.View ArticlePubMedGoogle Scholar
- Wright S. Genetical structure of populations. Nature. 1950;166(4215):247–9.View ArticlePubMedGoogle Scholar
- Liu G, Liang KY. Sample size calculations for studies with correlated observations. Biometrics. 1997;53(3):937–47.View ArticlePubMedGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.