Robust joint analysis allowing for model uncertainty in twostage genetic association studies
 Dongdong Pan^{1, 2},
 Qizhai Li^{2}Email author,
 Ningning Jiang^{2},
 Aiyi Liu^{3} and
 Kai Yu^{4}
DOI: 10.1186/14712105129
© Pan et al; licensee BioMed Central Ltd. 2011
Received: 29 June 2010
Accepted: 7 January 2011
Published: 7 January 2011
Abstract
Background
The cost efficient twostage design is often used in genomewide association studies (GWASs) in searching for genetic loci underlying the susceptibility for complex diseases. Replicationbased analysis, which considers data from each stage separately, often suffers from loss of efficiency. Joint test that combines data from both stages has been proposed and widely used to improve efficiency. However, existing joint analyses are based on test statistics derived under an assumed genetic model, and thus might not have robust performance when the assumed genetic model is not appropriate.
Results
In this paper, we propose joint analyses based on two robust tests, MERT and MAX3, for GWASs under a twostage design. We developed computationally efficient procedures and formulas for significant level evaluation and power calculation. The performances of the proposed approaches are investigated through the extensive simulation studies and a real example. Numerical results show that the joint analysis based on the MAX3 test statistic has the best overall performance.
Conclusions
MAX3 joint analysis is the most robust procedure among the considered joint analyses, and we recommend using it in a twostage genomewide association study.
Background
The twostage design is often adopted in genomewide association studies (GWASs) to search for genetic variants underlying susceptibility for complex diseases. The advantages of the twostage design have been investigated extensively (see e.g., [1–12]). In a typical twostage design for GWASs, a proportion of the available samples are genotyped at the initial stage on a large number of single nucleotide polymorphisms (SNPs) using a commercial genotyping platform. Based on association test results obtained at this stage, a small percentage of SNPs are selected and further genotyped on the remaining samples in the second stage. To analyze data generated from such a twostage design, the joint analysis strategy has been recommended, which combines the test statistics from both stages as the final test statistic, and is shown to be more powerful than the replicationbased analysis that only utilizes the second stage data [12].
The efficiency of joint analysis based on the allelefrequencydifferencebased test (AFDT) was evaluated in detail in comparison to the replicationbased analysis [12]. It is commonly adopted as a single marker test in GWASs. The AFDT is valid when HardyWeinberg equilibrium (HWE) holds in the target population, and is powerful when the underlying genetic models are additive or multiplicative. The CochranArmitage trend test (CATT) [13, 14] derived under the additive (in log scale) genetic risk model is also used in singlemaker analysis, which is optimal when the underlying additive genetic model is true. However, both tests are not so powerful compared with other methods such as MAX3 [15] when the underlying genetic model is not additive. Since in most cases there is no evidence suggesting that the additive risk model is most appropriate for the underlying disease model, especially in the typical GWASs where we most likely evaluate only the tagging SNPs, but not the causal SNPs directly. Thus, it is advantageous to adopt a more robust single marker test that has a relatively good performance under all possible disease models. To this end, two types of such robust tests, the MERT (maximin efficiency robust test) [15, 16] and MAX3 (the maximum values of CATTs under recessive, additive and dominant models) have been recently considered [15, 17]. Nevertheless, their performances under the twostage design have not been thoroughly investigated.
In this report we propose two types of joint test statistics for the twostage design based on the two robust tests, MERT and MAX3. We derive closedform formula to calculate the power of the MERTbased joint analysis, and propose a computationally efficient Monte Carlo procedure to evaluate the significance level of the MAX3based joint analysis. Facilitated by these two procedures, we evaluate the performances of the two robust test based joint analyses, in comparison with the ones based on AFDT, under various twostage design setups and disease models.
Methods
Notations
Suppose that r cases and s controls are randomly sampled from the source population in a GWAS. Denote the number of SNPs genotyped and the proportion of the subjects in Stage 1 by m and π, respectively. Throughout, we only consider biallelic SNPs with two alleles G and g, with G being the risk allele. Then there are three genotypes: gg, Gg, and GG. Using the disease risk at gg as the baseline, we define the relative risks of Gg and GG as λ_{1} = f_{1}/f_{0} and λ_{2} = f_{2}/f_{0}, respectively, where f_{0} = Pr(casegg) > 0, f_{1} = Pr(caseGg), f_{2} = Pr(caseGG) are the penetrances. Let K = Pr(case) be the disease prevalence. Denote the genotype frequencies in case population as p_{0} = Pr(ggcase) = Pr(gg)f_{0}/K, p_{1} = Pr(Ggcase) = Pr(Gg)f_{1}/K, p_{2} = Pr(GGcase) = Pr(GG)f_{2}/K and in control population as q_{0} = Pr(ggcontrol) = Pr(gg)(1f_{0})/(1K), q_{1} = Pr(Ggcontrol) = Pr(Gg)(1f_{1})/(1K), q_{2} = Pr(GGcontrol) = Pr(GG)(1f_{2})/(1K). Then the null hypothesis of no association is H_{0} : p_{ i } = q_{ i } , i = 0, 1, 2, which is equivalent to H_{0} : λ_{1} = λ_{2} = 1. The alternative hypothesis is H_{1} : λ_{2} ≥ λ_{1} ≥ 1 with λ_{2} > 1. The commonly used three genetic models, recessive, additive and dominant models are corresponding to λ_{2} >λ_{1} = 1, 2λ_{1} = λ_{2} +1 and λ_{1} = λ_{2} > 1, respectively. We assume that SNPs with pvalues less than γ in Stage 1 will be further investigated in Stage 2 and α be the whole genomewide type I error.
Genotype frequencies in case population and control population for both stages
cases  controls  

gg  Gg  GG  gg  Gg  GG  
Stage 1  p _{10}  p _{11}  p _{12}  q _{10}  q _{11}  q _{12} 
Stage 2  p _{20}  p _{21}  p _{22}  q _{20}  q _{21}  q _{22} 
AlleleFrequencyDifferenceBased Joint Analysis
The joint statistic is ${Z}_{J}=\sqrt{\pi}{Z}_{1}+\sqrt{1\pi}{Z}_{2}$. The Bonferroni correction threshold (b_{ J } ) for Z_{ J } is the solution of the equation ${\mathrm{Pr}}_{{\text{H}}_{\text{0}}}\left(\left{\text{Z}}_{\text{1}}\right>{b}_{1},\left{Z}_{J}\right>{b}_{J}\right)=\alpha /m$,
where ${({Z}_{1},{Z}_{J})\text{'}}_{{\text{H}}_{\text{0}}}~{N}_{2}((0,0)\text{'},\mathrm{\Gamma}\mathrm{\Gamma}\text{'})$, $\mathrm{\Gamma}=\left(\begin{array}{cc}1& 0\\ \sqrt{\pi}& \sqrt{1\pi}\end{array}\right)$. So the power of the joint test under the alternative hypothesis is given by ${\mathrm{Pr}}_{{\text{H}}_{\text{1}}}\left(\left{\text{Z}}_{\text{1}}\right>{b}_{1},\left{Z}_{J}\right>{b}_{J}\right)$, where ${\left(\begin{array}{c}{Z}_{1}\\ {Z}_{J}\end{array}\right)}_{{\text{H}}_{\text{1}}}~{N}_{2}\left(\left(\begin{array}{c}{\mu}_{1}\\ \sqrt{\pi}{\mu}_{1}+\sqrt{1\pi}{\mu}_{2}\end{array}\right),\mathrm{\Gamma}{\Delta}_{1}\mathrm{\Gamma}\text{'}\right)$, with $\begin{array}{l}{\mu}_{1}=\frac{{\theta}_{1}{\varpi}_{1}}{\sqrt{{\scriptscriptstyle \frac{1}{2r\pi}}+{\scriptscriptstyle \frac{1}{2s\pi}}}}\\ \times \frac{1}{\sqrt{\left[{\theta}_{1}\xi +{\varpi}_{1}(1\xi )\right]\left[1{\theta}_{1}\xi {\varpi}_{1}(1\xi )\right]}}\end{array}$, $\begin{array}{l}{\mu}_{2}=\frac{{\theta}_{2}{\varpi}_{2}}{\sqrt{{\scriptscriptstyle \frac{1}{2r(1\pi )}}+{\scriptscriptstyle \frac{1}{2s(1\pi )}}}}\\ \times \frac{1}{\sqrt{\left[{\theta}_{2}\xi +{\varpi}_{2}(1\xi )\right]\left[1{\theta}_{2}\xi {\varpi}_{2}(1\xi )\right]}}\end{array}$, ${\Delta}_{1}=\left(\begin{array}{cc}{\delta}_{1}& 0\\ 0& {\delta}_{2}\end{array}\right)$, ${\delta}_{1}=\frac{(1\xi ){\theta}_{1}(1{\theta}_{1})+\xi {\varpi}_{1}(1{\varpi}_{1})}{\left[{\theta}_{1}\xi +{\varpi}_{1}(1\xi )\right]\left[1{\theta}_{1}\xi {\varpi}_{1}(1\xi )\right]}$ and ${\delta}_{2}=\frac{(1\xi ){\theta}_{2}(1{\theta}_{2})+\xi {\varpi}_{2}(1{\varpi}_{2})}{\left[{\theta}_{2}\xi +{\varpi}_{2}(1\xi )\right]\left[1{\theta}_{2}\xi {\varpi}_{2}(1\xi )\right]}$.
The calculation of ${\mathrm{Pr}}_{{\text{H}}_{\text{1}}}\left(\left{\text{Z}}_{\text{1}}\right>{b}_{1},\left{Z}_{J}\right>{b}_{J}\right)$ is based on twofold integration which can be computed using the builtin function, "pmvnorm", in the R package "mvtnorm" [18–20].
The above approach is slightly different from the one considered in [12], where the authors constructed the test statistics by estimating the variance of the differences of allele frequency between case population and control population using the cases and controls separately under the null hypothesis. In our joint analysis, we estimated the variance using the combined data of case sample and control sample. Results (not show here) show that the two approaches have very similar performance.
CochranArmitage Trend Test under the Additive ModelBased Joint Analysis
CochranArmitage trend test under the additive model (CATTA) (see e.g., [13, 15]) is often used in the genetic association studies including GWASs. Denote CATTA for both stages by ${T}_{1}^{A}$ and ${T}_{2}^{A}$, respectively. Then the threshold for selecting SNPs in Stage 1 is d_{1} = Φ^{1}(1γ/2). The joint test statistic is ${T}_{J}^{A}=\sqrt{\pi}{T}_{1}^{A}+\sqrt{1\pi}{T}_{2}^{A}$. The threshold (d_{ J }) for ${T}_{J}^{A}$ can be obtained by solving the equation ${\mathrm{Pr}}_{{\text{H}}_{\text{0}}}\left(\left{T}_{1}^{A}\right>{d}_{1},\left{T}_{J}^{A}\right>{d}_{J}\right)=\alpha /m$. The power of the joint analysis is ${\mathrm{Pr}}_{{\text{H}}_{\text{1}}}\left(\left{T}_{1}^{A}\right>{d}_{1},\left{T}_{J}^{A}\right>{d}_{J}\right)$, which can be calculated again using the R package "mvtnorm". The joint distributions of $\left({T}_{1}^{A},{T}_{J}^{A}\right)\text{'}$ under the null and alternative hypotheses are given in Appendix A in Additional file 1.
MERTBased Joint Analysis
The power of the test is given by ${\mathrm{Pr}}_{{\text{H}}_{\text{1}}}\left(\left{T}_{1}^{mert}\right>{u}_{1},\left{T}_{J}^{mert}\right>{u}_{J}\right)$, whose numerical values can be calculated using the R package "mvtnorm". The joint distributions of $\left({T}_{1}^{mert},{T}_{J}^{mert}\right)\text{'}$ under the null and alternative hypotheses are derived in Appendix B in Additional file 1.
MAX3Based Joint Analysis
MAX3, the maximal value of CATT under three genetic models, is another commonly used robust test in the current GWASs (see e.g., [7, 15, 17]). Once we have $\left({T}_{1}^{R},{T}_{1}^{A},{T}_{1}^{D}\right)$ and $\left({T}_{2}^{R},{T}_{2}^{A},{T}_{2}^{D}\right)$, the test statistic in Stage 1 is ${T}_{1}^{\mathrm{max}}=\mathrm{max}\left\{\left{T}_{1}^{R}\right,\left{T}_{1}^{A}\right,\left{T}_{1}^{D}\right\right\}$ and the joint analysis based on MAX3 can be defined as ${T}_{J}^{\mathrm{max}}=\mathrm{max}\left\{\left{T}_{J}^{R}\right,\left{T}_{J}^{A}\right,\left{T}_{J}^{D}\right\right\}$, where ${T}_{J}^{R}=\sqrt{\pi}{T}_{1}^{R}+\sqrt{1\pi}{T}_{2}^{R}$, ${T}_{J}^{A}=\sqrt{\pi}{T}_{1}^{A}+\sqrt{1\pi}{T}_{2}^{A}$, and ${T}_{J}^{D}=\sqrt{\pi}{T}_{1}^{D}+\sqrt{1\pi}{T}_{2}^{D}$. For a given significance level γ in Stage 1, the threshold (v_{1}) can be obtained by solving the equation ${\mathrm{Pr}}_{{\text{H}}_{\text{0}}}\left(\mathrm{max}\left\{\left{T}_{1}^{R}\right,\left{T}_{1}^{A}\right,\left{T}_{1}^{D}\right\right\}>{v}_{1}\right)=\gamma $.
 1)
Generate B identical and independently distributed bivariate normal random variates $\left({T}_{11}^{R},{T}_{11}^{D}\right)\text{'},\left({T}_{12}^{R},{T}_{12}^{D}\right)\text{'},\cdots ,\left({T}_{1B}^{R},{T}_{1B}^{D}\right)\text{'}~{N}_{2}\left(\left(\begin{array}{c}0\\ 0\end{array}\right),\left(\begin{array}{cc}1& {\rho}_{1}^{RD}\\ {\rho}_{1}^{RD}& 1\end{array}\right)\right)$. Then calculate ${T}_{1i}^{A}={\omega}_{\text{11}}{T}_{1i}^{R}+{\omega}_{12}{T}_{1i}^{D}$, and ${T}_{1i}^{\mathrm{max}}=\mathrm{max}\left\{\left{T}_{1i}^{R}\right,\left{T}_{1i}^{A}\right,\left{T}_{1i}^{D}\right\right\}$ for i = 1, 2, ⋯, B. Without loss of generality, we assume ${T}_{1i}^{\mathrm{max}}>{v}_{1}$ for i = 1, 2, ⋯, B _{1} and ${T}_{1i}^{\mathrm{max}}\le {v}_{1}$ for i = B _{1} + 1, B _{1} + 2, ⋯. B.
 2)Generate B _{1} identical and independently distributed bivariate normal random variates $\left({T}_{21}^{R},{T}_{21}^{D}\right)\text{'},\left({T}_{22}^{R},{T}_{22}^{D}\right)\text{'},\cdots ,\left({T}_{2{B}_{1}}^{R},{T}_{2{B}_{1}}^{D}\right)\text{'}~{N}_{2}\left(\left(\begin{array}{c}0\\ 0\end{array}\right),\left(\begin{array}{cc}1& {\rho}_{2}^{RD}\\ {\rho}_{2}^{RD}& 1\end{array}\right)\right)$. Then calculate ${T}_{2i}^{A}={\omega}_{\text{21}}{T}_{2i}^{R}+{\omega}_{22}{T}_{2i}^{D}$, ${\omega}_{21}=\frac{{\rho}_{2}^{RA}{\rho}_{2}^{RD}{\rho}_{2}^{AD}}{1{\left({\rho}_{2}^{RD}\right)}^{2}}$ and ${\omega}_{22}=\frac{{\rho}_{2}^{AD}{\rho}_{2}^{RD}{\rho}_{2}^{RA}}{1{\left({\rho}_{2}^{RD}\right)}^{2}}$, with ${\rho}_{2}^{RA}$ and ${\rho}_{2}^{AD}$ given in Appendix C in Additional file 1. For i = 1, 2, ⋯, B _{1}, calculate${T}_{Ji}^{\mathrm{max}}=\mathrm{max}\left\{\begin{array}{l}\left\sqrt{\pi}{T}_{1i}^{R}+\sqrt{1\pi}{T}_{2i}^{R}\right,\\ \left\sqrt{\pi}{T}_{1i}^{A}+\sqrt{1\pi}{T}_{2i}^{A}\right,\\ \left\sqrt{\pi}{T}_{1i}^{D}+\sqrt{1\pi}{T}_{2i}^{D}\right\end{array}\right\}.$
 3)
Find v_{ J } that yields $\mathrm{min}\left\frac{\#\left\{{T}_{Ji}^{\mathrm{max}}>{v}_{J},i=1,2,\cdots ,{B}_{1}\right\}}{{B}_{1}}\frac{\alpha}{m\gamma}\right$ with $\frac{\#\left\{{T}_{Ji}^{\mathrm{max}}>{v}_{J},i=1,2,\cdots ,{B}_{1}\right\}}{{B}_{1}}\le \frac{\alpha}{m\gamma}$ and ${v}_{J}\in \left\{{T}_{Ji}^{\mathrm{max}},i=1,2,\cdots ,{B}_{1}\right\}$.
Once we have v_{1} and v_{ J } , we generate the data under the alternative hypothesis to calculate the power empirically. In the simulation studies, we generate 10,000 data sets under the alternative hypothesis. For the i^{th} data set (i = 1, 2. ⋯, 10000), we calculate ${T}_{1}^{\mathrm{max}}$ and ${T}_{J}^{\mathrm{max}}$, denote them again by ${T}_{1i}^{\mathrm{max}}$ and ${T}_{Ji}^{\mathrm{max}}$, respectively. Then the empirical power is $\frac{\#\left\{{T}_{1i}^{\mathrm{max}}>{v}_{1},{T}_{Ji}^{\mathrm{max}}>{v}_{J};i=1,2,\cdots ,10000\right\}}{10000}$.
Results
Simulation Setup
Power comparison for MAF = 0.15 (K = 0.1, α = 0.05, m = 5 × 10^{5})
π  γ  ALLEJ  CATAJ  MERTJ  MAX3J  

Recessive Model  0.5  0.0001  0.070  0.058  0.385  0.759 
r = s = 5000  0.0002  0.076  0.064  0.420  0.798  
λ_{1} = 1, λ_{2} = 2  0.4  0.0001  0.049  0.042  0.273  0.583 
0.0002  0.058  0.048  0.317  0.646  
0.3  0.0001  0.029  0.025  0.155  0.346  
0.0002  0.036  0.031  0.193  0.423  
Additive Model  0.5  0.0001  0.601  0.613  0.440  0.555 
r = s = 2000  0.0002  0.643  0.655  0.477  0.599  
λ_{1} = 1.4, λ_{2} = 1.8  0.4  0.0001  0.450  0.460  0.317  0.406 
0.0002  0.507  0.517  0.364  0.451  
0.3  0.0001  0.271  0.277  0.183  0.226  
0.0002  0.326  0.334  0.226  0.281  
Dominant Model  0.5  0.0001  0.679  0.711  0.356  0.726 
r = s = 2000  0.0002  0.720  0.752  0.388  0.768  
λ_{1} = λ_{2} = 1.5  0.4  0.0001  0.520  0.551  0.254  0.552 
0.0002  0.579  0.611  0.293  0.621  
0.3  0.0001  0.322  0.345  0.146  0.339  
0.0002  0.383  0.408  0.181  0.400 
Power comparison for MAF = 0.25 (K = 0.1, α = 0.05, m = 5 × 10^{5})
π  γ  ALLEJ  CATAJ  MERTJ  MAX3J  

Recessive Model  0.5  0.0001  0.075  0.066  0.220  0.517 
r = s = 5000  0.0002  0.083  0.073  0.242  0.546  
λ_{1} = 1, λ_{2} = 1.5  0.4  0.0001  0.053  0.047  0.154  0.365 
0.0002  0.062  0.055  0.180  0.408  
0.3  0.0001  0.031  0.028  0.087  0.197  
0.0002  0.039  0.035  0.110  0.254  
Additive Model  0.5  0.0001  0.835  0.846  0.782  0.799 
r = s = 2000  0.0002  0.868  0.878  0.820  0.838  
λ_{1} = 1.4, λ_{2} = 1.8  0.4  0.0001  0.687  0.700  0.625  0.639 
0.0002  0.742  0.754  0.683  0.700  
0.3  0.0001  0.462  0.474  0.405  0.413  
0.0002  0.530  0.542  0.472  0.476  
Dominant Model  0.5  0.0001  0.717  0.757  0.511  0.826 
r = s = 2000  0.0002  0.758  0.796  0.551  0.853  
λ_{1} = λ_{2} = 1.5  0.4  0.0001  0.557  0.597  0.375  0.651 
0.0002  0.617  0.656  0.427  0.726  
0.3  0.0001  0.350  0.382  0.221  0.425  
0.0002  0.413  0.447  0.270  0.495 
Power comparison for MAF = 0.35 (K = 0.1, α = 0.05, m = 5 × 10^{5})
π  γ  ALLEJ  CATAJ  MERTJ  MAX3J  

Recessive Model  0.5  0.0001  0.420  0.384  0.536  0.824 
r = s = 4000  0.0002  0.456  0.418  0.578  0.860  
λ_{1} = 1, λ_{2} = 1.5  0.4  0.0001  0.302  0.274  0.393  0.657 
0.0002  0.348  0.317  0.447  0.717  
0.3  0.0001  0.175  0.158  0.231  0.436  
0.0002  0.216  0.196  0.282  0.492  
Additive Model  0.5  0.0001  0.891  0.900  0.882  0.864 
r = s = 2000  0.0002  0.916  0.925  0.909  0.895  
λ_{1} = 1.4, λ_{2} = 1.8  0.4  0.0001  0.760  0.773  0.747  0.715 
0.0002  0.809  0.821  0.797  0.767  
0.3  0.0001  0.537  0.551  0.522  0.481  
0.0002  0.604  0.618  0.590  0.548  
Dominant Model  0.5  0.0001  0.558  0.607  0.464  0.758 
r = s = 2000  0.0002  0.600  0.649  0.502  0.806  
λ_{1} = λ_{2} = 1.5  0.4  0.0001  0.413  0.455  0.337  0.599 
0.0002  0.468  0.512  0.385  0.660  
0.3  0.0001  0.246  0.274  0.197  0.374  
0.0002  0.298  0.330  0.242  0.437 
Power comparison for MAF = 0.45 (K = 0.1, α = 0.05, m = 5 × 10^{5})
π  γ  ALLEJ  CATAJ  MERTJ  MAX3J  

Recessive Model  0.5  0.0001  0.282  0.253  0.263  0.542 
r = s = 2000  0.0002  0.308  0.277  0.288  0.572  
λ_{1} = 1, λ_{2} = 1.5  0.4  0.0001  0.199  0.178  0.184  0.380 
0.0002  0.231  0.207  0.215  0.442  
0.3  0.0001  0.114  0.101  0.104  0.220  
0.0002  0.142  0.127  0.131  0.263  
Additive Model  0.5  0.0001  0.886  0.896  0.894  0.854 
r = s = 2000  0.0002  0.912  0.921  0.919  0.881  
λ_{1} = 1.4, λ_{2} = 1.8  0.4  0.0001  0.753  0.767  0.765  0.701 
0.0002  0.803  0.815  0.813  0.760  
0.3  0.0001  0.529  0.543  0.542  0.473  
0.0002  0.597  0.610  0.609  0.545  
Dominant Model  0.5  0.0001  0.279  0.317  0.302  0.590 
r = s = 2000  0.0002  0.305  0.346  0.329  0.626  
λ_{1} = λ_{2} = 1.5  0.4  0.0001  0.197  0.225  0.214  0.428 
0.0002  0.229  0.260  0.248  0.483  
0.3  0.0001  0.112  0.128  0.123  0.241  
0.0002  0.140  0.160  0.153  0.291 
Simulation Results
For convenience, we refer to the aforementioned four joint analysis approaches as, respectively, ALLEJ (allelefrequencydifferencebased joint analysis), CATAJ (CochranArmitage trend test under the additive modelbased joint analysis), MERTJ (MERTbased joint analysis), MAX3J (MAX3based joint analysis). Table 2, 3, 4 and 5 report powers of the four joint analysis methods corresponding to MAF equal to 0.15, 0.25, 0.35 and 0.45, respectively. From these tables, we have the following observations. Under the recessive model, MERTJ and MAX3J are more powerful than ALLEJ and CATAJ, with MAX3J being most powerful among the four methods under consideration. In some cases, the advantage of MAX3J is quite impressive. For example, in Table 2, with π = 0.5, γ = 0.0001, the powers of ALLEJ, CATAJ, MERTJ and MAX3J are 0.070, 0.058, 0.385 and 0.759, respectively. Under the additive model, CATAJ and ALLEJ have comparable power and are more powerful than the other two tests. However, the power difference between CATAJ and MAX3J is mostly at the level of 6.6%, with the largest discrepancy of 7%. Under the dominant model, CATAJ and MAX3J are more powerful than ALLEJ and MERTJ. Both tests have comparable power when MAF = 0.15, and MAX3J is much more powerful than CATAJ when MAF = 0.25, 0.35 and 0.45. In summary, it appears that MAX3J has the best overall performance.
A Real Example: Type 2 Diabetes Mellitus
Genotype counts and pvalues of SNPs rs1005316 and rs2876711 for type 2 diabetes mellitus
SNP ID  r_{0}  r_{1}  r_{2}  s_{0}  s_{1}  s_{2}  ALLEJ  CATAJ  MERTJ  MAX3J  

rs1005316  Stage 1  13  224  457  44  211  399  6.13 × 10^{5}  7.78 × 10^{6}  3.87 × 10^{6}  8.12 × 10^{7} 
Stage 2  89  669  1708  89  913  1856  
rs2876711  Stage 1  99  322  272  121  351  182  2.92 × 10^{7}  2.07 × 10^{8}  5.97 × 10^{8}  3.10 × 10^{8} 
Stage 2  389  1191  989  484  1404  987 
Discussion and Conclusions
In genetic association studies, the underlying genetic inheritance model is often unknown, and thus hinders the use of methods such as CATT, which has to be derived under an assumed genetic model. Robust tests, such as MERT and MAX3, had been proposed to relax the dependence on the underlying genetic models. Extending these tests to a twostage setting, we construct two robust joint analyses based on MERT and MAX3. Numerical results show that MAX3J has the best overall performance among the four considered joint analysis approaches. For type 2 diabetes mellitus, based on MAX3J, we found that SNP rs2876711 was significantly associated with type 2 diabetes mellitus besides their findings.
Pearson Chisquare test is a robust test that was used in genetic association studies (see e.g., [25]). Recently, a comprehensive power comparison between MAX3 and Pearson Chisquare test and CochranArmitage trend test under the additive model was conducted in [17]. They reported that MAX3 has the most robust performances. The proposed joint analysis combing the test statistics of both stages considers the betweenstage heterogeneity. It is intractable for Pearson Chisquare test to consider the relative risk heterogeneity of both stages, especially when the relative risk in Stage 1 is larger than one and that in Stage 2 is less than one.
Recently, a joint analysis based on genetic model selection [26] to overcome the genetic model uncertainty was proposed in [22]. Based on the data in Stage 1, they used HardyWeinberg disequilibrium trend test studied in [27] to determine a score that corresponds to a genetic model. This score was then used to construct the trend test based on the data of Stage 2. Results (not shown here) show that the proposed joint analysis has comparable power. Therefore, the proposed MAX3J can be used as an alternative procedure in twostage genomewide association studies.
Abbreviations
 GWAS:

genomewide association study
 SNP:

single nucleotide polymorphism
 MAF:

minor allele frequency
 AFDT:

allelefrequencydifferencebased test
 CATT:

CochranArmitage trend test
 MERT:

maximin efficiency robust test
 MAX3:

maximum values of CochranArmitage trend tests under recessive, additive and dominant models
 ALLEJ:

allelefrequencydifferencebased joint analysis
 CATAJ:

CochranArmitage trend test under the additive modelbased joint analysis
 MERTJ:

MERTbased joint analysis
 MAX3J:

MAX3based joint analysis.
Declarations
Acknowledgements
We would like to thank the editor and three anonymous reviewers for their very constructive comments and suggestions, which significantly improved our presentation. This work is partially support by the National Young Science Foundation of China, No. 10901155.
Authors’ Affiliations
References
 Zuo YJ, Zou GH, Zhao HY: Twostage designs in casecontrol association analysis. Genetics 2006, 173: 1747–1760. 10.1534/genetics.105.042648PubMed CentralView ArticlePubMedGoogle Scholar
 Goll A, Bauer P: Twostage designs applying methods differing in costs. Bioinformatics 2007, 23: 1519–1526. 10.1093/bioinformatics/btm140View ArticlePubMedGoogle Scholar
 Muller HH, Pahl R, Schafer H: Including sampling and phenotyping costs into the optimization of twostage designs for genomewide association studies. Genetic Epidemiology 2007, 31: 844–852. 10.1002/gepi.20245View ArticlePubMedGoogle Scholar
 Satagopan JM, Elston RC: Optimal twostage genotyping in populationbased association studies. Genetic Epidemiology 2003, 25: 149–157. 10.1002/gepi.10260View ArticlePubMedGoogle Scholar
 Satagopan JM, Venkatraman ES, Begg CB: Twostage designs for genedisease association studies with sample size constraints. Biometrics 2004, 60: 589–597. 10.1111/j.0006341X.2004.00207.xView ArticlePubMedGoogle Scholar
 Skol AD, Scott LJ, Abecasis GR, Boehnke M: Optimal designs for twostage genomewide association studies. Genetic Epidemiology 2007, 31: 776–788. 10.1002/gepi.20240View ArticlePubMedGoogle Scholar
 Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S, Balkau B, Heude B, Charpentier G, Hudson TJ, Montpetit A, Pshezhetsky AV, Prentki M, Posner BI, Balding DJ, Meyre D, Polychronakos C, Froguel P: A genomewide association study identifies novel risk loci for type 2 diabetes. Nature 2007, 445: 881–885. 10.1038/nature05616View ArticlePubMedGoogle Scholar
 Thomas D, Xie R, Gebregziabher M: Twostage sampling designs for gene association studies. Genetic Epidemiology 2004, 27: 401–414. 10.1002/gepi.20047View ArticlePubMedGoogle Scholar
 Wang H, Thomas DC, Pe'er I, Stram DO: Optimal twostage designs for genomewide association scans. Genetic Epidemiology 2006, 30: 356–368. 10.1002/gepi.20150View ArticlePubMedGoogle Scholar
 Yu K, Chatterjee N, Wheeler W, Li Q, Wang S, Rothman N, Wacholder S: Flexible design for following up positive findings. American Journal of Human Genetics 2007, 81: 540–551. 10.1086/520678PubMed CentralView ArticlePubMedGoogle Scholar
 Zheng G, Meyer M, Li W, Yang Y: Comparison of twophase analyses for casecontrol genetic association studies. Statistics in Medicine 2008, 27: 5054–5075. 10.1002/sim.3336View ArticlePubMedGoogle Scholar
 Skol AD, Scott LJ, Abecasis GR, Boehnke M: Joint analysis is more efficient than replicationbased analysis for twostage genomewide association studies. Nature Genetics 2006, 38: 209–213. 10.1038/ng1706View ArticlePubMedGoogle Scholar
 Sasieni PD: From genotypes to genes: Doubling the sample size. Biometrics 1997, 53: 1253–1261. 10.2307/2533494View ArticlePubMedGoogle Scholar
 Zheng G, Gastwirth JL: On estimation of the variance in CochranArmitage trend tests for genetic association using casecontrol studies. Statistics in Medicine 2006, 25: 3150–3159. 10.1002/sim.2250View ArticlePubMedGoogle Scholar
 Zheng G, Friedlin B, Gastwirth JL: Comparison of robust tests for genetic association using casecontrol studies. In Optimality: The Second Erich L. Lehmann Symposium. Edited by: Rojo J. Beachwood: Institute of Mathematical Statistics; 2006:253–265. [DasGupta A (Series Editor): Lecture NotesMonograph Series, vol 49.] [DasGupta A (Series Editor): Lecture NotesMonograph Series, vol 49.]View ArticleGoogle Scholar
 Gastwirth JL: The use of maximin efficiency robust tests in combining contingency tables and survival analysis. Journal of the American Statistical Association 1985, 80: 380–384. 10.2307/2287901View ArticleGoogle Scholar
 Li Q, Zheng G, Liang X, Yu K: Robust tests for singlemarker analysis in casecontrol genetic association studies. Annals of Human Genetics 2009, 73: 245–252. 10.1111/j.14691809.2009.00506.xView ArticlePubMedGoogle Scholar
 Genz A: Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics 1992, 1: 141–150. 10.2307/1390838Google Scholar
 Genz A: Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statistics 1993, 25: 400–405.Google Scholar
 Tong YL: The multivariate normal distribution. New York: SpringerVerlag; 1990.View ArticleGoogle Scholar
 Zheng G, Yang Y, Zhu X, Elston RC: Casecontrol studies of genetic association. New York: Springer; 2010.Google Scholar
 Kwak M, Joo J, Zheng G: A robust test for twostage design in genomewide association studies. Biometrics 2009, 65: 1288–1295. 10.1111/j.15410420.2008.01187.xView ArticlePubMedGoogle Scholar
 Schaid DJ, Sommer SS: Genotype relative risks: methods for design and analysis of candidategene association studies. American Journal of Human Genetics 1993, 53: 1114–1126.PubMed CentralPubMedGoogle Scholar
 Terwilliger JD, Ott J: Handbook of human genetic linkage. Baltimore: Johns Hopkins University Press; 1994.Google Scholar
 Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, Wang Z, Welch R, Staats BJ, Calle EE, Feigelson HS, Thun MJ, Rodriguez C, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Giovannucci E, Willett WC, CancelTassin G, Cussenot O, Valeri A, Andriole GL, Gelmann EP, Tucker M, Gerhard DS, Fraumeni JF Jr, Hoover R, Hunter DJ, Chanock SJ, Thomas G: Genomewide association study of prostate cancer identifies a second risk locus at 8q24. Nature Genetics 2007, 39: 645–649. 10.1038/ng2022View ArticlePubMedGoogle Scholar
 Zheng G, Ng HKT: Genetic model selection in twophase analysis for case control association studies. Biostatistics 2008, 9: 391–399. 10.1093/biostatistics/kxm039PubMed CentralView ArticlePubMedGoogle Scholar
 Song K, Elson RC: A powerful method of combining measures of association and HardyWeinberg disequilibrium for finemapping in casecontrol studies. Statistics in Medicine 2006, 25: 105–126. 10.1002/sim.2350View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.