### Point estimation for *γ*

Consider an X-linked diallelic locus with normal allele *a* and mutant allele *A*. We only select the females because XCI is unrelated to males. For females, suppose that *aa*, *Aa* and *AA* are three genotypes and let *X*={0, *γ*, 2} be the corresponding genotypic value, respectively, with *γ*∈ [ 0, 2]. For a case-control design, let *Y*=1 (0) denote that the female is affected (unaffected). Then, the association between *Y* and *X* can be expressed using a logistic regression model

$$ \text{Logit}(\text{Pr}(Y=1|X,\boldsymbol{z})) =\beta_{0} + \beta X+\boldsymbol{b}^{T}\boldsymbol{z}, $$

(1)

where *β*_{0} is the intercept, *β* is the regression coefficient for *X*, *z* is a vector of covariates that need to be adjusted (e.g. age), and *b*^{T} is a vector of regression coefficients for *z*.

To estimate *γ*, we decompose the genotypic value *X* as *X*=*γ**X*_{1}+(2−*γ*)*X*_{2}, where *X*_{1}=*I*_{{G=Aa or AA}}, *X*_{2}=*I*_{{G=AA}}, *G* denotes the genotype of the female and *I*_{{.}} is the indicator function. It can be seen that *X*_{1} indicates if the genotype contains the mutant allele *A* and *X*_{2} represents if the genotype is the homozygote *AA*. As such, Model (1) becomes

$$\begin{array}{*{20}l} \text{Logit}(\text{Pr}(Y &= 1|X_{1},X_{2},\boldsymbol{z}))\\ &= \beta_{0} + \beta \gamma X_{1} + \beta (2-\gamma) X_{2}+\boldsymbol{b}^{T}\boldsymbol{z}. \end{array} $$

Let *β*_{1}=*β**γ* and *β*_{2}=*β*(2−*γ*). Then, the above model can be rewritten as

$$\begin{array}{*{20}l} \text{Logit}(\text{Pr}(Y &= 1|X_{1},X_{2},\boldsymbol{z})) \\ &= \beta_{0} + \beta_{1} X_{1} + \beta_{2} X_{2}+\boldsymbol{b}^{T}\boldsymbol{z}. \end{array} $$

(2)

Further, due to this reparameterization, *γ* can be represented as

$$ \gamma =\frac {2\beta_{1}}{\beta_{1}+\beta_{2}}, $$

(3)

when *β*=(*β*_{1}+*β*_{2})/2≠0. *γ* can only be well defined in presence of the association between the disease and the allele *A*. Note that *γ*∈ [ 0, 2] means that *β*_{1} and *β*_{2} have the same sign. That is, the genetic effect of heterozygous genotype in females lies between those of two homozygotes, which is generally satisfied in real applications. From Eq. (3), we have *γ*=0 (2) if and only if *β*_{1}=0 and *β*_{2}≠0 (*β*_{1}≠0 and *β*_{2}=0), representing XCI-S fully towards the normal (mutant) allele *a* (*A*), while *γ*=1 if and only if *β*_{1}=*β*_{2}≠0, which means XCI-R. So, if we get the MLEs \(\hat {\beta }_{1}\) and \(\hat {\beta }_{2}\) of *β*_{1} and *β*_{2}, then \(\hat {\gamma }=2\hat {\beta }_{1}/(\hat {\beta }_{1}+\hat {\beta }_{2})\) is the MLE of *γ* by the invariance property of MLE.

Note that \(\hat {\beta }_{1}\) and \(\hat {\beta }_{2}\) can be easily calculated through the standard logistic regression procedure. Specifically, suppose that we collect *N* unrelated females from a homogeneous population. Then, the log-likelihood function of the sample can be written as

$$\begin{array}{*{20}l} &l_{1}\left(\beta_{0},\beta_{1},\beta_{2},\boldsymbol{b}^{T}\right) \\ &=\sum\limits_{i=1}^{N} \left[{y_{i}}\left(\beta_{0}+\beta_{1} x_{i1}+\beta_{2} x_{i2}+\boldsymbol{b}^{T}\boldsymbol{z_{i}}\right)\right.\\ &\quad- \left.\text{log}\left(1+\text{exp}\left(\beta_{0}+\beta_{1} x_{i1}+\beta_{2} x_{i2}+\boldsymbol{b}^{T}\boldsymbol{z}_{i}\right)\right)\right], \end{array} $$

where *y*_{i}, *x*_{i1}, *x*_{i2} and *z*_{i} respectively are the values of *Y*, *X*_{1}, *X*_{2} and *z* of female *i*. Then, \(\hat {\beta }_{1}\) and \(\hat {\beta }_{2}\) are obtained by maximizing the above log-likelihood function, i.e.

$$l_{1}\left(\hat{\beta_{0}},\hat{\beta}_{1},\hat{\beta}_{2},\hat{\boldsymbol{b}}^{T}\right)= {\underset{\beta_{0}, \beta_{1},\beta_{2}, \boldsymbol{b}^{T}}{\arg\max}} \ l_{1}\left(\beta_{0},\beta_{1},\beta_{2},\boldsymbol{b}^{T}\right), $$

where \(\hat {\beta _{0}}\) and \(\hat {\boldsymbol {b}}^{T}\) are the MLEs of *β*_{0} and *b*^{T}, respectively.

### Confidence interval of *γ* based on delta method

Once the point estimate of *γ* is derived, we need to calculate the standard error or CI to evaluate its precision. Since \(\hat {\gamma }\) is also a ratio estimate, a natural idea is to use the first order Taylor series expansion of \(\hat {\gamma }\) and then obtain its asymptotic variance. Specifically, by the consistency of MLE, \(\hat {\beta }_{1}\) and \(\hat {\beta }_{2}\) are close to *β*_{1} and *β*_{2}, respectively, when *N* is large. Note that *β*=(*β*_{1}+*β*_{2})/2 and thus *γ* can be rewritten as *β*_{1}/*β*. Making a first order Taylor expansion of \(\hat {\gamma }\) around the point (*β*_{1}, *β*) and evaluating this at \(\left (\hat {\beta }_{1},\ \hat {\beta }\right)\), we have

$$\hat{\gamma}\approx\frac {\beta_{1}}{\beta}+\left(\hat{\beta_{1}}-\beta_{1}\right) \frac{1}{\beta}-\left(\hat{\beta}-\beta\right) \frac{\beta_{1}}{\beta^{2}}, $$

where \(\hat {\beta }=\left (\hat {\beta _{1}} +\hat {\beta _{2}}\right)/2\). Taking variance from both sides, the above equation becomes

$$\begin{array}{*{20}l} \text{Var}(\hat{\gamma})&\approx \frac{1}{\beta^{2}}\text{Var}\left(\hat{\beta_{1}}\right)\\ &+\frac{\beta_{1}^{2}}{\beta^{4}}\text{Var}\left(\hat{\beta}\right)- \frac{2\beta_{1}}{\beta^{3}}\text{Cov}\left(\hat{\beta_{1}},\ \hat{\beta}\right). \end{array} $$

(4)

Notice that

$$\text{Var}\left(\hat{\beta}\right)=\frac{1}{4}\left(\text{Var}\left(\hat{\beta_{1}}\right)+\text{Var}\left(\hat{\beta_{2}}\right)+2\mbox{Cov}\left(\hat{\beta_{1}},\ \hat{\beta_{2}}\right)\right) $$

and

$$\text{Cov}\left(\hat{\beta_{1}},\ \hat{\beta}\right)=\frac{1}{2}\left({\text{Var}}\left(\hat{\beta_{1}}\right)+ \text{Cov}\left(\hat{\beta_{1}},\ \hat{\beta_{2}}\right)\right), $$

where \(\text {Var}\left (\hat {\beta _{1}}\right)\), \(\text {Var}\left (\hat {\beta _{2}}\right)\) and \({\text {Cov}}\left (\hat {\beta _{1}},\hat {\beta _{2}}\right)\) are the elements of the variance-covariance matrix *V* of \(\hat {\beta _{1}}\) and \(\hat {\beta _{2}}\). Generally, *V* has no simple form when covariates are included in the model, but can be derived from the empirical Fisher’s information matrix \(\boldsymbol {\hat {I}}\) for (*β*_{0}, *β*_{1}, *β*_{2}, *b*^{T})^{T} [33]. For Model (2),

$$\boldsymbol{\hat{I}}=\boldsymbol{U^{T}\hat{W}U}, $$

where *U*=(*1*, *X*_{1}, *X*_{2}, *z*) is the design matrix, *X*_{1}=(*x*_{11},*x*_{21},...,*x*_{N1})^{T}, *X*_{2}=(*x*_{12},*x*_{22},...,*x*_{N2})^{T}, *z*=(*z*_{1},*z*_{2},...,*z*_{N})^{T}, \(\boldsymbol {\hat {W}}={\text {diag}}\left (\hat {w}_{1},\hat {w}_{2},...,\hat {w}_{N}\right)\) is a diagonal matrix with diagonal elements

$$\hat{w}_{i}=\hat{f}_{i} \left(1-\hat{f}_{i}\right)\ (i=1,2,...,N), $$

and

$$\hat{f}_{i}=\frac{\text{exp}\left(\hat{\beta}_{0} + \hat{\beta}_{1} x_{i1} + \hat{\beta}_{2} x_{i2} +\boldsymbol{\hat{b}}^{T}\boldsymbol{z}_{i}\right)}{1+{\text{exp}}\left(\hat{\beta}_{0} + \hat{\beta}_{1} x_{i1} + \hat{\beta}_{2} x_{i2} +\boldsymbol{\hat{b}}^{T}\boldsymbol{z}_{i}\right)} $$

represents the estimated penetrances for female *i*. Once \(\boldsymbol {\hat {I}}\) is estimated, the partial information matrix \(\boldsymbol {\hat {I}}_{1}\) for *β*_{1} and *β*_{2} given *β*_{0} and *b*^{T} can be computed and thus \(\boldsymbol {V}=\boldsymbol {\hat {I}}^{-1}_{1}\). If there is no covariate in the model, then *V* has the following form [see Appendix A of Additional file 1]

$$\left(\begin{array}{cc} \frac{1}{n_{aa} \hat{w}_{aa}}+\frac{1}{n_{Aa} \hat{w}_{Aa}} & -\frac{1}{n_{Aa} \hat{w}_{Aa}}\\ -\frac{1}{n_{Aa} \hat{w}_{Aa}} & \frac{1}{n_{Aa} \hat{w}_{Aa}}+ \frac{1}{n_{AA} \hat{w}_{AA}} \end{array} \right), $$

where *n*_{aa}, *n*_{Aa} and *n*_{AA} are the numbers of the females with *aa*, *Aa* and *AA*, respectively, and *N*=*n*_{aa}+*n*_{Aa}+*n*_{AA}; \(\hat {w}_{aa}\), \(\hat {w}_{aa}\) and \(\hat {w}_{aa}\) are the weighted elements for *aa*, *Aa* and *AA*, respectively, with

$$\hat{w}_{G}=\hat{f}_{G} \left(1-\hat{f}_{G}\right) \ (G=aa,Aa, \text{or} AA), $$

and

$$\begin{array}{*{20}l} \hat{f}_{aa} &=\frac{\text{exp}\left(\hat{\beta}_{0}\right)}{1+\text{exp}\left(\hat{\beta}_{0}\right)},\\ \hat{f}_{Aa} &=\frac{\text{exp}\left(\hat{\beta}_{0} + \hat{\beta}_{1}\right)}{1+\text{exp}\left(\hat{\beta}_{0}+\hat{\beta}_{1}\right)} \end{array} $$

and

$$\hat{f}_{AA} =\frac{\text{exp}\left(\hat{\beta}_{0}+ \hat{\beta}_{1}+\hat{\beta}_{2}\right)}{1+\text{exp}\left(\hat{\beta}_{0}+\hat{\beta}_{1}+\hat{\beta}_{2}\right)} $$

representing the estimated penetrances for *aa*, *Aa* and *AA*, respectively.

Replacing *β* and *β*_{1} by \(\hat {\beta }\) and \(\hat {\beta _{1}}\) in Eq. (4), we estimate the delta-type standard error [34]

$$\begin{array}{*{20}l} \hat{\text{Var}}\left(\hat{\gamma}\right)&\approx\frac{1}{\hat{\beta}^{2}}\text{Var}\left(\hat{\beta_{1}}\right)\\ &+\frac{\hat{\beta}_{1}^{2}}{\hat{\beta}^{4}}\text{Var}\left(\hat{\beta}\right)- \frac{2\hat{\beta}_{1}}{\hat{\beta}^{3}}\text{Cov}\left(\hat{\beta_{1}},\ \hat{\beta}\right). \end{array} $$

As such, the delta-type CI \(\left (\gamma _{L}^{d},\gamma _{U}^{d}\right)\) at level (1−*α*) can be expressed as

$$\begin{array}{*{20}l} \left(\ \hat{\gamma}-Z_{1-\frac{\alpha}{2}} \sqrt{\hat{\text{Var}}\left(\hat{\gamma}\right)},\ \hat{\gamma}+Z_{1-\frac{\alpha}{2}} \sqrt{\hat{\text{Var}}\left(\hat{\gamma}\right)}\ \right), \end{array} $$

where *Z*_{1−α/2} denotes the (1−*α*/2)-quantile of the standard normal distribution. Note that the estimated CI may be out of the range of [0, 2] when the variation is large, which should be cut off. To test the null hypothesis *H*_{0}:*γ*=*γ*_{0} against the alternative hypothesis *H*_{1}:*γ*≠*γ*_{0}, we have

$$\frac{\hat{\gamma}-\gamma_{0}}{\sqrt{\text{Var}\left(\hat{\gamma}\right)}} \sim N(0,1) $$

under *H*_{0}, where *γ*_{0} is an arbitrary constant between [ 0, 2], such as 1 (XCI-R).

The delta method is a non-iterative procedure and thus is easy to be implemented. However, the CI of a ratio estimate is generally skewed, while the delta-type CI is symmetrical [35, 36]. Therefore, it is necessary to propose the Fieller’s and likelihood ratio methods to overcome this shortcoming in the following sections.

### Confidence interval of *γ* based on Fieller’s method

The Fieller’s method is another widely used non-iterative approach for constructing CI for ratio estimate [37]. This type of CI can be asymmetrical around \(\hat {\gamma }\). To propose the Fieller’s CI, we first need to build a Wald test for testing *γ*=*γ*_{0}. Specifically, under *γ*=*γ*_{0}, we have *β*_{1}−*γ*_{0}*β*=0. Therefore, the Wald test for testing *γ*=*γ*_{0} can be written as

$$\frac{\hat{\beta}_{1}-\gamma_{0} \hat{\beta}}{\sqrt{\text{Var}\left(\hat{\beta}_{1}\right)+\gamma_{0}^{2} \text{Var}\left(\hat{\beta}\right) -2\gamma_{0} \text{Cov}\left(\hat{\beta_{1}},\ \hat{\beta}\right) }}, $$

which follows a standard normal distribution. Then, the confidence limits \(\gamma _{L}^{f}\) and \(\gamma _{U}^{f} \left (\gamma _{L}^{f}<\gamma _{U}^{f}\right)\) for Fieller’s CI at level (1−*α*) can be found by solving the following equation

$$\frac{\hat{\beta}_{1}-\gamma_{0} \hat{\beta}}{\sqrt{\text{Var}\left(\hat{\beta}_{1}\right)+\gamma_{0}^{2} \text{Var}\left(\hat{\beta}\right) -2\gamma_{0} \text{Cov}\left(\hat{\beta_{1}},\ \hat{\beta}\right) }}=Z_{1-\frac{\alpha}{2}}. $$

Rearranging the above equation yields a quadratic equation with respect to *γ*_{0}

$$D\gamma_{0}^{2}+E \gamma_{0}+F=0, $$

where

$$\begin{array}{*{20}l} D&=\hat{\beta}^{2}-Z_{1-\frac{\alpha}{2}}^{2} \text{Var}\left(\hat{\beta}\right), \\ E&=2 \left(Z_{1-\frac{\alpha}{2}}^{2}\text{Cov}\left(\hat{\beta_{1}},\ \hat{\beta}\right) - \hat{\beta}_{1} \hat{\beta} \right) \end{array} $$

and

$$F=\hat{\beta}_{1}^{2}-Z_{1-\frac{\alpha}{2}}^{2} \text{Var}\left(\hat{\beta}_{1}\right). $$

Suppose *Δ*=*E*^{2}−4*D**F*>0, then this equation must have two unequal roots with \(\gamma _{L}^{f}\) and \(\gamma _{U}^{f}\) being \(\left (-E\pm \sqrt {\Delta }\right)/2D\). According to Fieller’s theorem, we know that *D*>0 implies *Δ*>0. In this situation, the Fieller’s CI is continuous and can be denoted by \(\left (\gamma _{L}^{f}, \gamma _{U}^{f}\right)\). Note that *D*>0 is equivalent to \(\left |\hat {\beta }/\sqrt {\text {Var}\left (\hat {\beta }\right)}\right | >Z_{1-\frac {\alpha }{2}}\). That is, there exists statistically significant association between the disease and the allele *A* at the significance level *α*. However, if there is no such association (i.e. *D*<0), the Fieller’s CI will be unbounded. For instance, if *Δ*<0, the Fieller’s CI will be (−*∞*,*∞*). If *Δ*>0, the Fieller’s CI will be \(\left (-\infty, \gamma _{L}^{f}\right)\bigcup \left (\gamma _{U}^{f},\infty \right)\), which is the discontinuous CI. In real applications, it generally makes little sense to infer about *γ* if there is no association between the disease and the allele *A* according to its definition. In addition, the Fieller’s CI should also be restricted to the interval [0,2] when needed.

The Fieller’s method usually demonstrates better coverage probability than the delta method. Notice that the Fieller’s CI is based on the inversion of the Wald test. Since the LR test is expected to have more robust properties in small samples, so it is desirable to propose the LR method in the next section.

### Confidence interval of *γ* based on likelihood ratio method

To obtain the LR-based CI, we first construct a likelihood ratio test for testing *γ*=*γ*_{0}. As mentioned above, we have derived the MLEs \(\hat {\beta }_{0}\), \(\hat {\beta }_{1}\), \(\hat {\beta }_{2}\) and \(\hat {\boldsymbol {b}}^{T}\) of *β*_{0}, *β*_{1}, *β*_{2} and *b*^{T} under *H*_{1}. To calculate the likelihood ratio test statistic *λ*, we further evaluate the likelihood function under *H*_{0}:*γ*=*γ*_{0}. If *H*_{0} holds, the genotypic value *X* equals 0, *γ*_{0} and 2 for *aa*, *Aa* and *AA*, respectively. In this regard, Model (1) is reduced to be a standard logistic model and the log-likelihood function under *H*_{0} can be written as

$$\begin{array}{*{20}l} l_{0}\left(\beta_{0},\beta,\boldsymbol{b}^{T}\right)&= \sum\limits_{i=1}^{N} \left[{y_{i}}\left(\beta_{0}+\beta x_{i}+\boldsymbol{b}^{T}\boldsymbol{z_{i}}\right)\right.\\ &\quad-\left. \text{log}\left(1+\text{exp}\left(\beta_{0}+\beta x_{i}+\boldsymbol{b}^{T}\boldsymbol{ z}_{i}\right)\right)\right]\!, \end{array} $$

where *x*_{i} is the genotypic value of *X* of female *i*. Let \(\tilde {\beta }_{0}\), \(\tilde {\beta }\) and \(\tilde {\boldsymbol {b}}^{T}\) be the MLEs of *β*_{0}, *β* and *b*^{T} under *H*_{0}, respectively. Then,

$$l_{0}\left(\tilde{\beta}_{0},\tilde{\beta},\tilde{\boldsymbol{b}}^{T}\right)= {\underset{\beta_{0}, \beta, \boldsymbol{b}^{T}}{\arg\max}} \ l_{0}\left(\beta_{0},\beta,\boldsymbol{b}^{T}\right), $$

and *λ* can be computed as

$$\lambda=2\left(l_{1}\left(\hat{\beta_{0}},\hat{\beta}_{1},\hat{\beta}_{2},\hat{\boldsymbol{b}}^{T}\right) -l_{0}\left(\tilde{\beta}_{0},\tilde{\beta},\tilde{\boldsymbol{b}}^{T}\right)\right). $$

*λ* asymptotically follows a chi-square distribution with the degree of freedom being one \(\left (\text {i.e.}\, \chi _{1}^{2}\right)\).

Now, we introduce how to obtain the LR-type CI. For each given *γ*_{0}, we can calculate the corresponding value of the log-likelihood function *l*_{0} and \(\tilde {\boldsymbol {\theta }}=\left (\tilde {\beta }_{0}, \tilde {\beta }, \tilde {\boldsymbol {b}}^{T}\right)^{T}\) under *H*_{0}. So, *l*_{0} and \(\tilde {\boldsymbol {\theta }}\) are single variable functions with respect to *γ*_{0} and can be denoted by *l*_{0}=*l*_{0}(*γ*_{0}) and \(\tilde {\boldsymbol {\theta }}=\tilde {\boldsymbol {\theta }}(\gamma _{0})\), respectively. At the significance level *α*, the confidence limits \(\gamma _{L}^{l}\) and \(\gamma _{U}^{l} \left (\gamma _{L}^{l}<\gamma _{U}^{l}\right)\) of the LR-type CI is determined by the following equation with respect to *γ*_{0}

$$ l_{0}(\gamma_{0})-l_{1}\left(\hat{\beta_{0}},\hat{\beta}_{1},\hat{\beta}_{2},\hat{\boldsymbol{b}}^{T}\right)+\frac{q_{1-\alpha}}{2}=0, $$

(5)

where *q*_{1−α} denotes the (1−*α*)-quantile of the \(\chi _{1}^{2}\) distribution. Obviously, Eq. (5) has no closed form solutions and numerical method can be adopted. Note that *θ*=(*β*_{0},*β*,*b*^{T})^{T} are nuisance parameters in Eq. (5) which depend on *γ*_{0}. To solve this equation, it generally requires several iterations with different values of *γ*_{0}, and for each *γ*_{0}, the iterative maximization over the remaining parameters is also needed to determine \(\tilde {\boldsymbol {\theta }}\). This procedure is relatively time-consuming. Therefore, to reduce the computational burden, borrowing the idea of Venzon and Moolgavkar [38], we can find the roots of Eq. (5) by solving the following system of non-linear equations

$$\begin{array}{*{20}l} l_{0}(\gamma_{0}, \boldsymbol{\theta})-l_{1}\left(\hat{\beta_{0}},\hat{\beta}_{1},\hat{\beta}_{2},\hat{\boldsymbol{b}}^{T}\right)+\frac{q_{1-\alpha}}{2}&=0 \\ \frac{\partial l_{0}}{\partial \boldsymbol{\theta}}(\gamma_{0}, \boldsymbol{\theta})&=\boldsymbol{0}, \end{array} $$

which is easily implemented in the commonly used software (e.g. nleqslv package in R). Note that the above system differs only in the first equation from the system (with the first equation being replaced by \(\frac {\partial l_{0}}{\partial \gamma _{0}}(\gamma _{0}, \boldsymbol {\theta })=0\)) that defines the MLEs \(\hat {\gamma }\) and \(\hat {\boldsymbol {\theta }}=\left (\hat {\beta }_{0}, \hat {\beta }, \hat {\boldsymbol {b}}^{T}\right)^{T}\). Therefore, finding a root of such system almost has the same difficulty as that of finding the MLEs of Model (2) [38]. As such, this algorithm is generally more efficient.

On the other hand, based on the fact that the Wald test and the LR test are asymptotically equivalent in large samples, we know that the confidence limits of the Fieller’s CI and the LR-type CI should be close to each other. Therefore, we used the confidence limits of the Fieller’s CI as the initial values for *γ*_{0}. For example, when searching the lower limit, we chose the initial values for *γ*_{0} and *θ* as \(\gamma _{L}^{f}\) and \(\tilde {\boldsymbol {\theta }}\!\left (\gamma _{L}^{f}\right)\), respectively, where \(\tilde {\boldsymbol {\theta }}\!\left (\gamma _{L}^{f}\right)\) can be computed from the standard logistic regression procedure. Similarly, we used the same strategy to search the upper limit. The algorithm based on this choice of the initial values works well in most situations. However, in some scenarios, the Fieller’s CI and the LR-type CI may be very different. Thus, using the confidence limits of the Fieller’s CI as the initial values may cause that the algorithm does not converge. In this regard, we should directly solve the single variable function of Eq. (5). For example, we can use the bisection method to find the roots of Eq. (5) within the interval [ 0,2] (e.g. rootSolve package in R).

Like the Fieller’s CI, the LR-type CI can be unbounded when there is no association between the disease and the allele *A*. Specifically, when Equation (5) has no root, then the LR-type CI will be (−*∞*,*∞*). Otherwise, there will be two roots \(\gamma _{L}^{l}\) and \(\gamma _{U}^{l}\). If \(\gamma _{L}^{l}<\hat {\gamma }<\gamma _{U}^{l}\), then the LR-type CI is continuous and can be represented as \(\left (\gamma _{L}^{l},\gamma _{U}^{l}\right)\). If \(\hat {\gamma }\not \in \left (\gamma _{L}^{l},\gamma _{U}^{l}\right)\), then the LR-type CI will be \(\left (-\infty, \gamma _{L}^{l}\right)\bigcup \left (\gamma _{U}^{l},\infty \right)\), which is the discontinuous CI. Similar to the delta and Fieller’s methods, the LR-type CI is also truncated by [ 0, 2] if necessary.

The LR-based CI and the Fieller’s CI can be asymmetrical which is an appealing choice, compared to the delta method. This is because the distribution of a ratio estimate is generally non-normal with a heavy tail, especially when *N* is small. Additionally, it will be quite straightforward to incorporate covariates using the LR method.

### Simulation settings

For simplicity, we assumed that there is no covariate included in the model in our simulation study. We incorporated the covariate into the real data analysis later. For a case-control design, we presumed that the genotype distribution in the case group and that in the control group of females follow trinomial distributions with probabilities (*h*_{0}, *h*_{1}, *h*_{2}) and (*g*_{0}, *g*_{1}, *g*_{2}), respectively, where *h*_{0} (*g*_{0}), *h*_{1} (*g*_{1}) and *h*_{2} (*g*_{2}) are the frequencies of *aa*, *Aa* and *AA* in the case (control) group, respectively. Let *h*_{0}/*g*_{0}=*m*, *h*_{1}/*g*_{1}=*λ*_{1}(*h*_{0}/*g*_{0})=*λ*_{1}*m* and *h*_{2}/*g*_{2}=*λ*_{2}(*h*_{0}/*g*_{0})=*λ*_{2}*m*, where *λ*_{1} and *λ*_{2} are the odds ratios for *Aa* and *AA* compared to *aa* in females, respectively. By *h*_{0}+*h*_{1}+*h*_{2}=1, we have *m*=1/(*g*_{0}+*λ*_{1}*g*_{1}+*λ*_{2}*g*_{2}), *h*_{0}=*g*_{0}/(*g*_{0}+*λ*_{1}*g*_{1}+*λ*_{2}*g*_{2}), *h*_{1}=*λ*_{1}*g*_{1}/(*g*_{0}+*λ*_{1}*g*_{1}+*λ*_{2}*g*_{2}) and *h*_{2}=*λ*_{2}*g*_{2}/(*g*_{0}+*λ*_{1}*g*_{1}+*λ*_{2}*g*_{2}). Thus, (*h*_{0}, *h*_{1}, *h*_{2}) is calculated by (*g*_{0}, *g*_{1}, *g*_{2}), *λ*_{1} and *λ*_{2}. The value of (*g*_{0}, *g*_{1}, *g*_{2}) is determined by the allele frequency of *A* (denoted by *p*=1−*q*) and the inbreeding coefficient *ρ*, i.e. *g*_{0}=*q*^{2}+*ρ**p**q*, *g*_{1}=2(1−*ρ*)*p**q* and *g*_{2}=*p*^{2}+*ρ**p**q*. Specifically, *ρ*=0 implies that Hardy-Weinberg equilibrium holds in the control group and *ρ*≠0 indicates Hardy-Weinberg disequilibrium. Furthermore, from Model (2), *β*_{0}=log(*m*), *β*_{1}=log(*λ*_{1}) and *β*_{1}+*β*_{2}=log(*λ*_{2}). Then, *γ*=2log(*λ*_{1})/log(*λ*_{2}), i.e. \(\lambda _{1}=\lambda _{2}^{\gamma /2}\). As such, we defined the simulation settings as follows: *p* is fixed at 0.1 and 0.3, and *ρ* is set to be 0 and 0.05. The true value of *γ* is fixed at 0, 0.5, 1, 1.5 and 2. *λ*_{2} is assigned to be 1.5 and 2. We selected *N* as 500 (2000) where both the case and control groups have 250 (1000) females. The confidence level is fixed at (1−*α*)=95*%* and the number of replications *k* is 10,000.

We compared the performance of three types of CIs based on CP, ML, MR, ML/(ML+MR) and DP. CP is defined to be the proportion that the CI contains the true value of *γ* among *k* replications, regardless of whether the CI is continuous or not. ML and MR are calculated by \(\text {ML}=\#\left [(\gamma _{0}<\gamma _{L})\cap \left (\gamma _{L}\le \hat {\gamma }\le \gamma _{U}\right)\right ]/k\), and \(\text {MR}=\#\left [(\gamma _{0}>\gamma _{U})\cap \left (\gamma _{L}\le \hat {\gamma }\le \gamma _{U}\right)\right ]/k\), respectively, where *#* denotes the counting measure, and *γ*_{L} and *γ*_{U} are the confidence limits of the estimated CI. Note that \(\gamma _{L}\le \hat {\gamma }\le \gamma _{U}\) means that the CI is continuous. As such, we only consider the continuous CIs when estimating ML and MR, since it is impossible to distinguish between the left side and the right side if the CI is discontinuous. Further, DP is computed as \(1-\#(\gamma _{L}\le \hat {\gamma }\le \gamma _{U})/k\). We believed that a good CI should control the CP well as well as have the balanced ML and MR. ML and MR are used together to measure the location of CI. If a balance between ML and MR is achieved, then ML/(ML+MR) is close to 0.5. On the other hand, note that the delta-type CI is always bounded. Therefore, the DP for the delta method will stay at 0. However, for the Fieller’s CI and the LR-type CI, small DP is desirable.