BEXCIS: Bayesian methods for estimating the degree of the skewness of X chromosome inactivation

X chromosome inactivation (XCI) is an epigenetic phenomenon that one of two X chromosomes in females is transcriptionally silenced during early embryonic development. Skewed XCI has been reported to be associated with some X-linked diseases. There have been several methods measuring the degree of the skewness of XCI. However, these methods may still have several limitations. We propose a Bayesian method to obtain the point estimate and the credible interval of the degree of XCI skewing by incorporating its prior information of being between 0 and 2. We consider a normal prior and a uniform prior for it (respectively denoted by BN and BU). We also propose a penalized point estimate based on the penalized Fieller’s method and derive the corresponding confidence interval. Simulation results demonstrate that the BN and BU methods can solve the problems of extreme point estimates, noninformative intervals, empty sets and discontinuous intervals. The BN method generally outperforms other methods with the lowest mean squared error in the point estimation, and well controls the coverage probability with the smallest median and the least variation of the interval width in the interval estimation. We apply all the methods to the Graves’ disease data and the Minnesota Center for Twin and Family Research data, and find that SNP rs3827440 in the Graves’ disease data may undergo skewed XCI towards the allele C. We recommend the BN method for measuring the degree of the skewness of XCI in practice. The R package BEXCIS is publicly available at https://github.com/Wen-YiYu/BEXCIS.

. There are three patterns of XCI [4], which are random XCI (XCI-R), skewed XCI (XCI-S) and escape from XCI (XCI-E). Generally, XCI-R is a random and independent selection process in each cell of females, i.e., 50% cells have either the paternal or maternal allele silenced and the remaining 50% keep the other allele inactivated at an X-chromosomal locus [4]. XCI-E means that both the paternal and maternal alleles at a locus will be active. In humans, 15-30% X-linked genes have been shown to undergo XCI-E [5,6]. Besides, XCI-S is the observation that the same allele is inactivated in more than 75% cells in females [7][8][9], and the extreme XCI-S is a phenomenon that at least 90% cells in females keep the same allele inactivated [10]. Due to the analytical complications caused by XCI, association tests for detecting disease-associated single nucleotide polymorphisms (SNPs) on autosomes may not be directly applied to X chromosome.
Researchers have proposed some methods to test for the association on X chromosome for qualitative traits [11][12][13][14][15][16][17] and quantitative traits [18][19][20][21]. For qualitative traits, Zheng et al. [11] took account of XCI-E and put forward a series of test statistics combining the genetic effect in two sexes. Clayton [12] first incorporated XCI-R into the association analysis by regarding males as homozygous females. However, Clayton's methods do not consider the XCI-E or all the XCI-S patterns. As such, Wang et al. [13] proposed a resampling-based maximum likelihood ratio approach for qualitative traits, which is robust to any XCI pattern. For XCI-E, Wang et al. [13] coded three genotypes in females (dd, Dd and DD) as 0, 1 and 2 and coded two genotypes in males (d and D) as 0 and 1, where d is the normal allele and D is the deleterious allele at the SNP under study. For XCI-R and XCI-S, three genotypes in females were coded as 0, γ and 2 and two genotypes in males were coded as 0 and 2, respectively, where γ ∈ [0, 2] is an unknown genotypic value for heterozygous females and can be used to measure the degree of XCI-S [13]. The value of γ not only reveals the potential XCI pattern but also gives us a hint about the proportion of the cells in females expressing the normal allele d or the deleterious allele D at the SNP. Specifically, γ ∈ [0, 1) means XCI-S skewed towards D, γ = 1 represents XCI-R, and γ ∈ (1, 2] suggests XCI-S skewed towards d. If the estimate of γ is significantly different from 1, the SNP is statistically inferred to undergo XCI-S, otherwise, the SNP may undergo XCI-R or XCI-E. For example, γ = 0.4 represents XCI-S skewed towards D, where only about 20% (0.4/2) of the cells have D active and the other 80% of the cells have d active. For quantitative traits, Zhang et al. [18] proposed an association test based on nuclear families, which requires the quantitative trait being normally distributed and assumes that the variances of the trait value for the three genotypes in females are the same. However, Ma et al. [19] reported that XCI and other factors (e.g., gene-gene interactions and gene mutation) may cause higher variance of the trait in heterozygous females compared to homozygous females. As a result, Ma et al. [19] proposed three methods for testing the association based on unrelated females, which take account of the inflated variance of the quantitative trait in heterozygous females. Gao et al. [20] further developed a software toolset, which can implement the three test statistics in Ma et al. [19].
In addition to the detection of the disease-associated SNPs on X chromosome, it is also important to measure the degree of XCI-S. It has been reported that the degree of XCI-S may increase with age [4] and is associated with many diseases such as scleroderma, rheumatoid arthritis, breast cancer, ovarian cancer, severe combined immunodeficiency and so on [22][23][24][25][26][27][28]. For heterozygous females, larger proportion of the cells with active deleterious allele will lead to more severe expression of the related diseases, while smaller proportion can protect the body from negative effects, which suggests that XCI-S is somehow both a confounding factor in genetic association analysis and a critical tool providing valuable information about the pathogenesis at the X-chromosomal locus [22]. Therefore, methods for measuring the skewness of XCI are necessary and researchers have provided several methods for qualitative traits [29,30] and quantitative traits [31]. Specifically, Xu et al. [29] proposed a statistical measure for γ based on family trios and derived the corresponding confidence interval (CI) with the likelihood ratio (LR) test. Based on case-control design, Wang et al. [30] showed that γ can be expressed as a ratio of two logistic regression coefficients and derived three types of the CIs for γ (the LR, Fieller's and delta methods). The Fieller's and LR methods outperform the delta method and the Fieller's method is recommended because it is noniterative and requires much less computations than the LR method. Since the approach of Xu et al. [29] and those of Wang et al. [30] are only applicable to qualitative traits, Li et al. [31] extended the methods of Wang et al. [30] to make them accommodate quantitative traits. Note that both the Fieller's and LR methods may cause unbounded CIs if the denominator of the ratio is not significantly deviated from 0 [30]. Fortunately, Wang et al. [32] proposed a penalized Fieller's (PF) method for the ratio estimate, which can always obtain a bounded CI with an appropriate penalty parameter. The PF method has never been used to measure the degree of XCI-S, and we will apply it to such task for the first time. However, all the existing methods for measuring γ do not consider the constraint condition that the value of γ should be between 0 and 2. They simply cut off the point estimates and the corresponding CIs into [0, 2] to get the final results, which may lead to extreme point estimates (0 or 2) as well as noninformative CIs ( [0, 2] ) or invalid CIs (empty sets). In contrast, the Bayesian method [33,34] can incorporate the prior information and has been widely used in statistical genetics in recent years [35]. To make an improvement, we will apply the Bayesian method to the γ measuring problem so that we can make full use of the prior information of γ and obtain more accurate and robust point estimate and credible interval for γ.
Therefore, in this article, borrowing the idea of Wang et al. [32], we first derive a penalized point estimate to measure the degree of XCI-S ( γ ) and compute the corresponding CI by the PF method. Then, we propose a Bayesian method to obtain the samples of γ from its approximate posterior distribution and calculate the mode of the samples as its point estimate and the highest posterior density interval (HPDI) as its credible interval [36]. We conduct extensive simulation studies to compare the proposed Bayesian and penalized point estimates with the existing point estimate, as well as to compare the Bayesian and PF methods with the existing Fieller's method in the interval estimation, respectively. Finally, we apply all the methods to the Graves' disease data and the Minnesota Center for Twins and Family Research (MCTFR) data for their practice on the qualitative trait and the quantitative trait, respectively.

Simulation results
To evaluate the performances of the proposed point estimation and interval estimation methods, we conduct extensive simulation studies. Assume that σ 2 0 , σ 2 1 and σ 2 2 are the variances of the quantitative trait for females with genotypes dd, Dd and DD, respectively. We consider the qualitative trait and the quantitative trait when (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) and (σ 2 0 , σ 2 1 , σ 2 2 ) = (4, 4.8, 4) , and the sample size n is taken as 500 and 2,000, the minor allele frequency (MAF) is fixed at 0.3 and 0.1, and the inbreeding coefficient ρ is set to be 0, -0.05 and 0.05, where ρ = 0 means that the Hardy-Weinberg equilibrium (HWE) holds in females and ρ = 0 denotes the departure from HWE in females. We simulate 500 SNPs with stochastic underlying γ 's for each scenario. The penalized point estimate and the existing point estimate of γ may obtain the point estimate less than 0 or larger than 2 while the value of γ should be within [0, 2] , so we need to truncate the penalized point estimate and the existing point estimate into [0, 2] to get the final results. We denote the penalized point estimate and the existing point estimate before truncation as γ * origin and γ origin , and denote those after truncation as γ PF and γ F , respectively. We also use the Bayesian methods with the normal prior and the uniform prior for γ (represented by BN and BU) to obtain the point estimate of γ , which are denoted as γ BN and γ BU , respectively. To reveal the accuracy and robustness of γ BN , γ BU , γ PF and γ F , we calculate their mean squared errors (MSEs) and summarize the proportions of the extreme values (0 or 2) they get among the 500 replicates, respectively. Here, , where K is the number of replicates, γ k is the kth true value of γ , and γ k is the estimate of γ k . We also draw scatter plots to directly display the four point estimates against the true values of γ . To investigate the performances of the BN, BU, PF and Fieller's methods, we respectively assess the coverage probability (CP) as well as the mean, median, standard deviation and interquartile range of the widths of the 95% HPDIs or CIs of γ (denoted by W mean , W median , W SD and W IQR ) for them. We compute the proportions of the noninformative interval ( [0, 2] ), empty set and discontinuous interval they obtain among the 500 replicates (denoted by NP, EP and DP) to further confirm these methods' validity. Scatter plots are drawn to show the widths of the 95% HPDIs or CIs of these methods against the true values of γ.
The proportions of the extreme values of γ PF and γ F among the 500 replicates for qualitative trait and quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) are presented in Table 1, the MSEs of γ BN , γ BU , γ PF and γ F for qualitative trait and quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) are shown in Table 2, and the scatter plots of these four point estimates against the true values of γ under these settings are respectively displayed in Figs. 1, 2 and Additional file 1: Figs. S1-S22. Note that γ BN and γ BU can solve the problem of extreme point estimates and thus are not listed in Table 1. Comparing the proportions of the extreme values of γ PF with those of γ F in Table 1, γ PF can reduce the proportion of the extreme point estimates equal to 2. An explanation for this is that γ PF is obtained by shrinking the denominator of γ F away from 0 and accordingly adjusting the numerator of γ F , which can avoid the point estimate being positive infinity before the truncation (since β 1 and β 2 in γ origin = 2β 1 β 1 +β 2 usually have the same sign [30]) and hence can cut down the proportion of the point estimates equal to 2 after the truncation. On the other hand, we can see from Table 1 that the proportions of the extreme point estimates equal to 0 are the same for γ PF and γ F . Note that γ * origin and γ origin always have the same sign if they are not zero. When γ * origin and γ origin are negative ( β 1 and β 2 have different signs), γ PF =γ F = 0 , and when they are positive, γ PF and γ F will both be greater than 0. That is why γ PF and γ F always have the same amount of the extreme point estimates equal to 0. It can also be observed from Table 1 that the total proportions of the extreme point estimates in γ PF and γ F both decrease when n becomes larger, MAF gets higher or the trait turns from qualitative into quantitative.
In addition to the advantage of avoiding the extreme point estimates, it can be seen from Table 2 that γ BN and γ BU always have smaller MSEs than γ PF and γ F , and the MSEs of γ BN remain the smallest across all the situations. Irrespective of other factors, we find that ρ generally has a little effect on the MSEs of the four point estimates, which means that the four point estimates are robust to the deviation from HWE in general. When other parameters remain unchanged, the MSEs of the four point estimates all become smaller with larger n or higher MAF. Compared to the qualitative trait, all the four point estimation methods give less MSEs for the quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) , regardless of the values of n, MAF and ρ.   Fig. 1a shows good agreement between γ BN and the true values of γ , while Fig. 1b presents larger discrepancies between γ BU and the true values of γ , which means that γ BN performs better than γ BU under this situation. Compared to Fig. 1a-c for γ PF and Fig. 1d for γ F both display worse point estimates with the existence of extreme values (represented by red points). Similar results can be found in all the other cases ( Fig. 2 and Additional file 1: Figs. S1-S22), which indicates that γ BN and γ BU have better performances than γ PF and γ F , and γ BN is generally the best one among these four point estimates across all the simulation scenarios. Figure 2 gives the four point estimates of γ against the true values of γ with n = 500 , MAF = 0.3 and ρ = 0 for the quantitative trait when (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) . In the comparison of Figs. 1 and 2, we see that the four point estimation methods provide better point estimates for the quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) than for the qualitative trait (can also be seen in Additional file 1: Figs. S1-S11 vs. S12-S22). Similarly, we have the same findings on the effects of n, MAF and ρ on the performances of these four point estimation methods from Figs. 1, 2 and Additional file 1: Figs. S1-S22 as we did in Table 2. In addition, the four point estimates are generally scattered evenly around the true values of γ except for those settings when n = 500 and MAF = 0.1 for qualitative trait, where γ BN and γ BU tend to underestimate the true value of γ (Additional file 1: Figs. S3-S5). The four point estimation methods obtain their best performance at n = 2000 and MAF = 0.3 for quantitative trait when (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) (Additional file 1: Figs. S17-S19), where γ PF and γ F still have a small amount of extreme point estimates (represented by red points) when the true values of γ are smaller than 0.5 or larger than 1.5.
The NP, EP and DP of the PF and Fieller's methods among the 500 replicates for qualitative trait and quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) are displayed in Table 3, the CP, W mean and W median of the BN, BU, PF and Fieller's methods for qualitative trait and quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) are listed in Table 4, and the widths of the 95% HPDIs or CIs for these four interval estimation methods against the true values of γ under these settings are respectively presented in Figs. 3, 4 and Additional file 2: Figs. S23-S44. Notice that the BN and BU methods are not listed in Table 3 because of the superiority of the BN and BU methods over the other two methods that they have no noninformative HPDI, empty set or discontinuous HPDI under all the situations. We can see from Table 3 that the DPs of the PF method are all equal to 0 because we choose 4 ) for the PF method, while the Fieller's method may obtain nonzero DPs especially when n = 500 and MAF = 0.1 . Moreover, the PF method always has less NP than the Fieller's method. The reason for this result is that the PF method tends to obtain shorter CIs than the Fieller's method before the truncation [32], which benefits for the reduction of NPs since a noninformative CI is created by the truncation when [0, 2] is totally contained by the wide original CI. Although the zero DP and lower NPs show the advantages of the PF method over the Fieller's method, the PF method may have greater EPs when MAF is low, which is actually caused by the shorter CIs of the PF methods as well. Specifically, an empty set is created by the truncation when the original CI is disjoint from [0, 2] , which can occur when the original point estimates locate outside [0, 2] . In these cases, the shorter the CI is, the larger the probability for the original CI to be disjoint from [0, 2] is, which causes bigger EPs of the PF method in some scenarios. It is also shown in Table 3 that the NP of the PF method as well as the NP and DP of the Fieller's method get smaller if n is larger, MAF is higher or the trait changes from qualitative to quantitative. The EP of the PF method gets lower From Table 4, we find that the CPs of the BN and BU methods are controlled around 95% in all the simulated situations, while the CPs of the PF and Fieller's methods are usually underestimated or overestimated when MAF is low. Moreover, we observe from Table 4 that the W mean 's and W median 's of the BN and BU methods are smaller than those of the PF and Fieller's methods under all the scenarios. Specifically, among these four interval estimation methods, the BN method has the smallest W mean in most cases and owns the least W median under all the circumstances. The W median 's of the Fieller's method are all 2 when n = 500 and MAF = 0.1 for qualitative trait, which means that more than half of the CIs obtained by the Fieller's method are noninformative in this case. Irrespective of other factors, ρ has a little effect on the W mean 's and W median 's of the four methods, which indicates that all the four methods are robust to the departure from HWE. The W mean 's and W median 's of these four methods decrease when n gets larger, MAF becomes higher or the trait changes from qualitative to quantitative.
In addition to the support of the findings from Table 4  the results with n = 500 , MAF = 0.3 and ρ = 0 for the qualitative trait. It is shown in Fig. 3a, b that the BN and BU methods obtain similar widths of the 95% HPDIs, which are both close to 1.5. The widths of the 95% CIs for the PF and Fieller's methods shown in Fig. 3c, d are quite dispersive and a great amount of noninformative CIs (represented by red points) can be seen in these two subplots. Comparing Fig. 4 with Fig. 3, we notice that the four methods obtain shorter intervals with less variation, and the PF and Fieller's methods have less noninformative CIs for the quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) than for the qualitative trait when n = 500 , MAF = 0.3 and ρ = 0 . This result is true for all other simulation settings (Additional file 2: Figs. S23-S33 vs. S34-S44). Similarly, the findings from Table 4   The W SD 's and W IQR 's of the four methods for qualitative trait and quantitative trait with (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) are listed in Additional file 3: Table S1 and described in Additional file 4: Text. Note that the BN method has the lowest W SD and W IQR among the four methods. When the variances of the quantitative trait become larger, i.e., (σ 2 0 , σ 2 1 , σ 2 2 ) = (4, 4.8, 4) , the results are given in Additional file 3: Tables S2-S6 and Additional file 5: Figs. S45-S68. By comparing these results with those under (σ 2 0 , σ 2 1 , σ 2 2 ) = (1, 1.2, 1) , the four point estimation methods and the four interval estimation methods generally perform worse, and even worse than those for qualitative

Application to the Graves' disease data
According to Chu et al. [37], SNP rs3827440 within the GPR174 gene on X chromosome was detected to be associated with the Graves' disease. In fact, in addition to the Graves' disease, SNP rs3827440 was also reported to be significantly associated with the autoimmune Addison's disease [38]. There were two stages of the association analysis in Chu et al. [37], i.e., the genome-wide association study (GWAS) stage and the replication stage. The association between SNP rs3827440 and the Graves' disease was identified in both of two stages and the pooled data of these two stages. There are 2941 subjects (699 males and 2242 females) in the GWAS stage and 8074 subjects (1814 males and 6260 females) in the replication stage. We exclude the males and get 1115 (1127) females in the case (control) group in the GWAS stage, and 3375 (2885) females in the case (control) group in the replication stage. Note that there are two alleles T and C at rs3827440, where T is the deleterious allele leading to higher expression of the GPR174 gene. In the GWAS stage, there are respectively 163, 508 and 444 (219, 541 and 367) females with genotypes CC, TC and TT in the case (control) group. In the replication stage, the sample sizes of the females with genotypes CC, TC and TT are 471, 1606 and 1298 (584, 1344 and 957) in the case (control) group, respectively. The allele frequency of T in females is 0.57 in the GWAS stage and 0.56 in the replication stage.
We respectively obtain γ BN , γ BU , γ PF and γ F , and derive the corresponding intervals with the BN, BU, PF and Fieller's methods based on the data in the GWAS stage and replication stage without considering any covariate, and apply these methods to the pooled data by regarding stage as a covariate [30]. The hyperparameters in the Bayesian methods are set to be the same as those in the Methods section and we choose N (0, 10 2 ) as the prior distribution of the effect size of the stage. The point estimates and the corresponding 95% HPDIs or CIs of γ at SNP rs3827440 are given in Table 5. From Table 5, we find that the results of the Fieller's method we get are consistent with those in Wang et al. [30].The HPDIs or CIs obtained by these four interval estimation methods do not contain 1 in the replication stage and the pooled data, which suggests XCI-S at rs3827440. In the replication stage, the four point estimates are all close to 1.5, which indicates XCI-S towards allele C, and about 75% (1.5/2) cells in a heterozygous female have allele T active at this locus. The four point estimates are all close to 1.37 in the pooled data, which suggests XCI-S towards allele C, with allele T active in about 68.50% (1.37/2) cells in a heterozygous female at rs3827440. Note that all the HPDIs or CIs in the GWAS stage contain 1, which indicates XCI-R or XCI-E. This difference may be caused by the heterogeneity of the data in these two stages. Furthermore, the BN method always has the shortest interval among the four methods, which highlights its advantage.

Application to the MCTFR data
The Minnesota Center for Twin and Family Research Genome-Wide Association Study of Behavioral Disinhibition from the database of Genotypes and Phenotypes is a large, ongoing and family-based epidemiological study of substance abuse and related psychopathology with 2183 families, including 7377 participants (3546 males and 3831 females). Among them, 5960 participants have both phenotypic data and genotypic data while the others only have phenotypic data. There are five quantitative traits: the nicotine composite score, the alcohol consumption composite score (CON), the alcohol dependence composite score (DEP), the illicit drug composite score and the behavioral disinhibition composite score (BD) in the dataset. To avoid family structure and population structure, we exclude all the offspring in the dataset. Because we only need the information of females, we also exclude males in the dataset. Eventually, we get 1998 female individuals. There are 12,354 SNPs genotyped on X chromosome in the dataset. We use the standard quality control procedures [39] as follows. Firstly, we exclude those female individuals with missing genotype rate over 10% . Secondly, we delete those SNPs with missing rate over 10% . Thirdly, we exclude those SNPs whose MAF is less than 5% . Finally, we conduct the HWE tests for the remaining SNPs with the PLINK software (version 1.90) [39] and set the significance level to be 1 × 10 −4 [40], and those SNPs out of HWE are also excluded. After the quality control procedures, we include 1996 female individuals with 11,344 SNPs on X chromosome in this application. Note that all the point estimation methods ( γ BN , γ BU , γ PF and γ F ) and the interval estimation methods (BN, BU, PF and Fieller) mentioned above require the presence of association between the X-chromosomal SNP and the trait under study. So, the association analysis for each SNP and each trait in the MCTFR dataset is required before we apply these methods to measure the degree of XCI-S. We use linear regression to test for the association by including the age as a covariate. However, we notice that all the residuals derived from the regressions of the five quantitative traits do not satisfy the normality assumption. So, we use the association tests based on the direct inverse normal transformation (D-INT), the indirect inverse normal transformation (I-INT) and the adaptive omnibus test (O-INT) proposed by McCaw et al. [41]. The significance level of the association tests is set to be 4.408 × 10 −6 (0.05/11344) after the Bonferroni correction. We then select the SNPs with at least one of the three P values of D-INT, I-INT and O-INT is less than 4.408 × 10 −6 . After obtaining the associated SNPs, we calculate the point estimates ( γ BN , γ BU , γ PF and γ F ) of γ , and use the BN, BU, PF and Fieller's methods to derive the corresponding intervals of γ for these SNPs, respectively. Since the methods proposed in this article require the normality of the trait, each trait is first regressed on the age to obtain the residuals, and the inverse normal transformation is respectively applied to the residuals of the five quantitative traits which can be treated as new outcomes to measure the degree of XCI-S [41]. The hyperparameters in the Bayesian methods are set to be the same as those in the Methods section.
There are four SNPs (rs331318, rs5928558, rs10522027 and rs12849233) associated with the DEP trait, six SNPs (rs12557060, rs3008896, rs5961051, rs4489437, rs2097322 and rs463233) associated with the BD trait and one SNP (rs4240042) associated with the CON trait. The positions, the alleles, the MAFs, the P values of the HWE tests and three association tests (D-INT, I-INT and O-INT) together with the related traits and the genes of these associated SNPs are presented in Table 6.
Among these SNPs, rs12557060 is within the gene interleukin-1 receptor accessory protein-like 1 (IL1RAPL1) [42] and SNP rs331318 is located in the gene Duchenne muscular dystrophy (DMD) [43]. It was reported that IL1RAPL1 and DMD are two large genes located immediately adjacent to each other within the common fragile site region of instability, which are active in normal brain tissue but are under-expressed in every brain tumor cell line and xenograft [46]. The disruption or deletion of the IL1RAPL1 gene is found to be associated with the BD trait in our association analysis whose disruption or deletion was previously detected in individuals with mental retardation and/ or autism spectrum disorder [42]. According to Miyagoe-Suzuki et al. [43], the DMD gene encodes the dystrophin protein required for the stability of the sarcolemma and the mutations of DMD may cause X-linked Duchenne muscular dystrophy. Miyagoe-Suzuki et al. [43] also found that many induce pluripotent stem clones derived from a manifesting female carrier of DMD had two active X chromosomes or mixed XCI patterns, which means that the DMD gene may escape from XCI or undergo different XCI patterns within different female subgroups. SNP rs10522027 is within the gene transmembrane protein 47 (TMEM47), which may be a useful biomarker for predicting the response to chemotherapy and a potential therapeutic target for overcoming hepatocellular carcinoma cell chemoresistance [44]. SNP rs12849233 is in PAS domain containing repressor 1 (PASD1), which might possibly serve as a new target for the prognosis and the future treatment of glioma [45].
The point estimates and the corresponding 95% HPDIs or CIs of γ for these SNPs are given in Table 7. Note that the CIs of the PF and Fieller's methods are obtained by truncating the original CI into [0, 2] . As a result, some CIs of these two methods have the left endpoints equal to 0 or the right endpoints equal to 2, while the HPDIs of the BN and BU methods will generally be an open interval, and the left (right) endpoints of the HPDIs are generally larger (less) than 0 (2). Although the 95% HPDIs or CIs of the SNPs all contain 1, which is indicative of the XCI-R or XCI-E pattern, we can still observe the advantage of the BN and BU methods that they generally get shorter intervals than the PF and Fieller's methods. On the other hand, notice that the HPDIs and CIs for SNPs rs4489437, rs2097322 and rs463233 are strongly asymmetrical and the corresponding point estimates (1.5543, 1.6712, 1.7697 and 1.7802 for SNP rs4489437; 1.6407, 1.7820, 1.9987 and 2.0000 for SNP rs2097322, and 0.3859, 0.1586, 0.1715 and 0.1728 for SNP rs463233) are either all greater than 1.5 or all smaller than 0.5. So, this may give a clue that these three SNPs are possible to undergo XCI-S, which needs to be further confirmed by, for example, larger sample sizes or molecular genetics.

Discussion
In this article, we proposed a Bayesian method to obtain the point estimate and the credible interval of the degree of XCI-S ( γ ) by incorporating its prior information. We calculated the mode and the HPDI of the samples of γ as the point estimate and the credible interval for γ , respectively. In fact, we also used the median and the percentile interval (the 2.5th and 97.5th percentiles of the width of interval) of the samples as the point estimate and the credible interval of γ . However, their performances are worse than the mode and the HPDI (data not shown) and hence we chose the latter instead. We considered a normal prior and a uniform prior for the degree of XCI-S in the Bayesian method, which are respectively denoted as the BN and BU methods. We also derived a penalized point estimate γ PF based on the idea of the PF method and obtained its corresponding CI [32]. We compared the proposed γ BN , γ BU and γ PF with the existing point estimate γ F , and investigated the performances of the BN, BU, PF and Fieller's methods in the interval estimation for both the qualitative and quantitative traits via extensive simulation studies. The framework of these four estimation methods is illustrated in Fig. 5. As summarized in Fig. 5, there is no extreme value (0 or 2) to occur for γ BN and γ BU while the extreme point estimates can be found in both γ PF and γ F under all the scenarios. Besides, the BN and BU methods can solve the problems of noninformative intervals, empty sets and discontinuous intervals which can be found in the Fieller's method, while the PF method can only avoid the discontinuous CIs to occur. Note that the extreme point estimate 0 (2) means that 100% of the cells have the deleterious (normal) allele inactivated at a SNP, which is not a common case in reality. On the other hand, it is hard for us to identify the XCI pattern with the noninformative CIs and the discontinuous CIs, and the empty sets even provide no information on the XCI pattern. These facts highlight We applied the four point estimation methods and the four interval estimation methods to the Graves' disease data and the MCTFR data for their practical use on the qualitative trait and the quantitative trait, respectively. In the Graves' disease data application, we found that SNP rs3827440 may undergo the XCI-S pattern towards the allele C in the replication stage and the pooled data. Although we did not detect the XCI-S pattern in the GWAS stage, the BN and BU methods still show their superiority by providing shorter HPDIs, compared to the PF and Fieller's methods. In the MCTFR data application, the 95% HPDIs and CIs of the SNPs all contain 1, which indicates the XCI-R or XCI-E pattern. However, we also found three suspectable SNPs rs4489437, rs2097322 and rs463233 which may undergo the XCI-S pattern based on their extremely asymmetrical HPDIs and CIs. Since the inverse normal transformation applied to the original traits may lead to the loss of the information in the four interval estimation methods, we expect shorter intervals of these three SNPs that do not contain 1 if we have larger samples or a normally distributed trait. However, these conclusions need to be further confirmed by molecular genetics.
On the other hand, in our simulation study, we did not incorporate any covariate. To further investigate the performances of the four point estimates and the four interval estimation methods with a covariate, here we conducted additional simulation studies by considering a covariate under HWE (i.e., ρ = 0 ). The simulation settings can be found in Additional file 4: Text, and the simulation results are listed in Additional file 3: Tables S7-S11 and Additional file 5: Figs. S69-S84 and described in Additional file 4: Text, respectively. From these results, we observed that although the performances of all the proposed methods under the scenarios with a covariate are worse than those without any covariate, the trends are similar to those in the Results section, and the Bayesian methods also show their own advantages over the PF and Fieller's methods.
The last but not least, the proposed methods have the following issues to discuss. Firstly, the prior distributions of the unknown parameters are required in the Bayesian methods and the choice of them may have influence on the results. We considered two prior distributions for γ , U (0, 2) is a noninformative prior that should have little impact on the posterior distribution, and N (1, 1) ∈ [0, 2] is chosen based on its own genetic background. We also considered weakly informative priors for each of the unknown parameters other than γ , which should be robust to different kinds of parameters. The researchers can choose the priors based on their own study background or refer to the priors used in this article if they have limited knowledge of the distributions of the parameters. Secondly, although we assume that all the unknown parameters are independent of each other in the Bayesian method because the Hamiltonian Monte Carlo (HMC) algorithm used for sampling in the Bayesian method does not greatly suffer from the correlated parameters, we expect better performance of the Bayesian method by considering the correlations between the unknown parameters and regard it as our future work. Thirdly, the HPDI or CI containing 1 indicates the XCI-R or XCI-E pattern at the SNP. How to further distinguish between the XCI-R and XCI-E patterns is our future work. On the other hand, note that there is an assumption that the underlying genetic model is additive to guarantee that the estimated γ value departing from 1 indicates the XCI-S rather than the non-additive models, such as the genotypic values X i = {0, 2, 2} for the dominant model and X i = {0, 0, 2} for the recessive model. It is also true that γ can be greater than 2 or less than 0 in the situations of the overdominance and the underdominance, respectively. It may not be possible to distinguish a non-additive model from the XCI-S by considering the estimation of γ simply based on the mean effects of a generalized linear regression model. However, the variance-based tests may be alternative [19,21], which is our future work. However, it should be noted that Dobyns et al. recommended discontinuing the use of the terms "X-linked dominant inheritance" and "X-linked recessive inheritance" because both are incomplete and fail to explain some aspects of the X-linked inheritance due to some biological mechanisms including cell autonomy or non-autonomy of the gene product, XCI status and mosaicism [47,48]. Fourthly, the normality assumption of quantitative traits is required for all the methods we discussed in this article. In future, we will extend the methods to accommodate the traits which do not follow a normal distribution. Finally, all the methods are only applicable to unrelated female subjects. Thus, we will extend the methods and make them suitable for data with family structure in future studies.

Conclusion
In summary, the existing point estimate and the existing Fieller's method cannot consider the prior information of the degree of XCI-S, and respectively have the problems of the extreme point estimates (0 or 2) and the noninformative CIs, empty sets as well as discontinuous CIs. To solve these problems, we proposed a penalized point estimate and obtained its CI with the PF method to make an improvement, and proposed two Bayesian methods (BN and BU) to incorporate the prior information of the degree of XCI-S by using a normal prior or a uniform prior for the degree of XCI-S in the model. We recommend the Bayesian methods in practice because it can avoid obtaining the extreme point estimates and guarantee continuous HPDIs to provide useful information on the XCI pattern all the time. The BN method can also provide point estimates with the smallest MSE and HPDIs with well controlled CP, the shortest width and the lowest variation across all the simulation settings. In the real data application, we found that SNP rs3827440 in the Graves' disease data may undergo XCI-S towards the allele C, which need to be confirmed by molecular genetics.

Notations
To detect the SNPs undergoing XCI-S and measure their degree of XCI-S, we focus on females because only females can provide the information on XCI-S. Assume that n females are sequenced at a candidate diallelic SNP on X chromosome, where d (D) is the normal (deleterious) allele. Then, for female i, the genotypes G i = dd, Dd, DD and the corresponding genotypic values X i = {0, γ , 2} , i = 1, 2, · · · , n , where γ ∈ [0, 2] represents the degree of XCI-S. Let Z i be a M × 1 covariates vector and Y i be the trait, which can be either qualitative or quantitative. As such, the following generalized linear regression model is used to describe the association between G i and Y i , where β 0 is the intercept and β is the regression coefficient for X i . b is a M × 1 vector of the regression coefficients for Z i . E(Y i |X i , Z i ) is the conditional expected value of Y i given X i and Z i , and h(•) is a link function. When Y i is a qualitative trait, h(•) is the logit function. Then, Eq. (1) can be written as where Y i is the disease status of female i, and Y i = 1 (0) denotes that female i is affected (unaffected). When Y i is a quantitative trait, h(•) is the identity function, and Y i has a random error ε i . In this case, Equation (1) becomes and I(•) is the indicator function. According to Ma et al. [19], the variance of the quantitative trait for heterozygous females may be higher than those for homozygous females, i.e., σ 2 1 may be greater than σ 2 0 and σ 2 2 . The genotypic value X i can be decomposed into X 1i and X 2i according to Wang et al. [30], i.e., X i = γ X 1i + (2 − γ )X 2i , where X 1i and X 2i are two indicator variables. X 1i = I {G i =Dd or DD} indicates if female i has at least one deleterious allele D and X 2i = I {G i =DD} denotes if female i has two deleterious alleles D. So, Eq. (1) can be reexpressed as Let β 1 = βγ and β 2 = β(2 − γ ) . So, γ = β 1 β and β = β 1 +β 2 2 . Equation (3) turns to be After respectively obtaining the maximum likelihood estimates β 0 ,β 1 ,β 2 and b of β 0 , β 1 , β 2 and b , we have β =β 1 +β 2 2 . Assume that v 1 , v 2 and v 12 are respectively the variance of β 1 , the variance of β and the covariance of β 1 and β . To derive v 1 , v 2 and v 12 , the empirical Fisher information matrix is used for qualitative traits [30] and the glm function in R software is applied for quantitative traits [31].

Existing point estimate and CI of γ by Fieller's method
Here, we recall the existing point estimate and the corresponding CI obtained by the Fieller's method [30,31]. The existing point estimate of γ can be given as a ratio of two regression coefficients Since γ represents the degree of XCI-S, which should be within [0, 2] , the final point estimate can be derived by cutting the γ origin in Eq. (4) into [0, 2] . So, we have γ F = 2β 1 To obtain the corresponding CI of γ by the Fieller's method, a Wald test can be built to test for H 0 : γ = γ 0 . Since γ can be expressed as γ = of Equation (6) as the confidence limits with γ L F < γ U F .

Penalized point estimate and CI of γ by PF method
Here, we propose a penalized point estimate and obtain its corresponding CI by the PF method [32]. Notice that if the denominator β =β 1 +β 2 2 of γ origin is not statistically significantly different from 0, then γ origin will tend to be infinite (mainly positive infinite because β 1 and β 2 usually have the same sign according to Wang et al. [30]) and the corresponding CI of the Fieller's method before the truncation will tend to be unbounded, which is the common case if the denominator β has a large variance. To solve this problem, Wang et al. [32] proposed the PF method to reduce the variance of the denominator of a ratio estimate by imposing a penalty on it and adjusting the numerator accordingly. Borrowing this idea, we define a penalized log-likelihood function of β as where > 0 is the penalty parameter. Maximizing the log-likelihood function (7), we obtain the penalized denominator β * =β/2 + sign(β) β2 /4 + v 2 , where sign(•) is the signum function [32]. Making a Taylor expansion for β * around β and v 2 , we get According to Wang et al. [32], if we simply replace β by β * in γ origin =β 1 β , we will get a biased estimate of γ . To reduce the bias caused by the penalized denominator, we need to further adjust the numerator β 1 by β * 1 =β 1 +γ (β * −β) , where γ =β 1 β * . Making a Taylor expansion for β * 1 around β 1 and β , we have As such, the original penalized point estimate is γ * origin =β * 1 β * . Although we may avoid the situation of the denominator approaching 0, γ * origin may still be out of [0, 2] . Therefore, we need to cut γ * origin into [0, 2] and get the final penalized point estimate as follows, For the construction of the corresponding CI of γ PF , the PF method uses the same theory as the Fieller's method. So, we only need to respectively replace β ,β 1 ,v 1 ,v 2 and v 12 by β * , β * 1 , v * 1 , v * 2 and v * 12 in Eqs. (5) and (6) and choose an appropriate penalty parameter for the PF method to get the penalized CI. From Wang et al. [32], we know that when ≥ z 2 1−α/2 4 , the PF method can always produce a bounded CI. But when → ∞ , A < 0 and � < 0 ∅, the width of the CI will tend to be 0 and the CP will also tend to be 0. So, we select = z 2 1−α/2 4 , which enables the PF method to produce a bounded CI and control the CP at the same time. However, although the PF method can always get a bounded CI when = z 2 1−α/2 4 , the CI may still be out of [0, 2] and needs to be cut off in [0, 2]. The point estimates and CIs of the Fieller's and PF methods we discussed above are not able to include the prior information that γ ∈ [0, 2] in the model. By contrast, the Bayesian approach can flexibly incorporate this prior information into the analysis.

Point estimate and credible interval of γ by Bayesian method
Bayesian method has been widely used in genetic analysis in recent years [35] and various algorithms such as HMC [36] make sampling from the parameters' approximate posterior distributions possible even if the analytical solutions of those posterior distributions are not available. Assume that θ · represents θ d (the unknown parameters for qualitative trait) or θ c (the unknown parameters for quantitative trait). For the qualitative trait, we suppose that Y i follows a Bernoulli distribution, i.e., . In this case, the unknown parameters For the quantitative trait, we assume that Y i is normally distributed, i.e., where µ i = β 0 + βγ X 1i + β(2 − γ )X 2i + b T Z i . In this case, the unknown parameters θ c =(β 0 , β, γ , b T , σ 0 , σ 1 , σ 2 ) T . Let Y = (Y 1 , Y 2 , · · · , Y n ) T and D = (X 1 , X 2 , Z) , where X 1 = (X 11 , X 12 , · · · , X 1n ) T , X 2 = (X 21 , X 22 , · · · , X 2n ) T and Z = (Z 1 , Z 2 , · · · , Z n ) T . Then, the posterior distribution of θ d or θ c is We find that f (θ · |Y , D) is hard to calculate, which means that the closed form of the posterior distribution of θ · is difficult for us to obtain. So, instead of directly computing their posterior distributions of θ · , we use the HMC algorithm (e.g., the rstan package in R) to sample the parameters from the approximate posterior distribution. We choose the HMC algorithm because it can improve the independence of the samples and has higher efficiency than the other Markov-Chain Monte Carlo methods.
The HMC algorithm requires the prior distributions of γ and the other parameters in θ · . Since HMC does not dramatically suffer from the correlated parameters in model, we assume that the unknown parameters are independent of each other for simplicity [36]. Then, f (θ · ) can be given as f (θ · ) = θ # · g=1 f (θ ·g ) , where θ # · is the number of the parameters in θ · , and f (θ ·g ) is the prior distribution of the gth parameter in θ · .
Since the value of γ should be between 0 and 2, we consider a uniform distribution on [0, 2] as the prior distribution for γ , i.e., γ ∼ U (0, 2) , which is a noninformative prior.
When it comes to quantitative traits, we need to provide the prior distributions for σ 0 , σ 1 and σ 2 additionally and sample them respectively because the variances of the quantitative trait may be different across different genotypes in females according to Ma et al. [19]. We also choose a weakly informative prior for σ j (j=0, 1, 2), which is an exponential distribution with the mean being 1 [36], i.e., σ j ∼ exp(a j ) (j=0, 1, 2), where a 0 , a 1 and a 2 are the hyperparameters needed to be pre-defined and are all set to be 1 in this article. The hyperparameters µ β 0 , µ β , µ b , σ 2 β 0 , σ 2 β , , a 0 , a 1 and a 2 can also be selected based on the research background or experience.
Once the likelihood function of Y and the prior distributions of the parameters in θ · are provided, we can obtain as many samples of θ · as we want by the HMC algorithm. After getting enough samples of θ · , we calculate the mode and the HPDI of the samples of γ as the point estimate and the credible interval for γ , respectively.

Simulation settings
Since males provide no information on XCI-S, we only include females in simulation studies. We consider the qualitative trait and the quantitative trait, respectively. For simplicity, we do not include any covariate in the simulation.
For the qualitative trait, according to Wang et al. [30], we set the frequencies of genotypes dd, Dd and DD in the control (case) group to be g 0 , g 1 and g 2 ( c 0 , c 1 and c 2 ), respectively. Assume that the frequency of the deleterious allele D is p in the control group, which is usually the MAF at the SNP considered. Assume that the frequency of the normal allele d in the control group is q (p + q = 1) . As such, we have (g 0 , g 1 , g 2 ) = (q 2 + ρpq, 2(1 − ρ)pq, p 2 + ρpq) , where ρ is the inbreeding coefficient. In our simulation, MAF is fixed at 0.3 and 0.1, and ρ is set to be 0, -0.05 and 0.05. We define 1 and 2 as the odds ratios for genotypes Dd and DD compared to genotype dd in females, respectively. Then, we have 1 = exp(βγ ) and 2 = exp(2β) . Notice that 1 = γ /2 2 and γ = 2ln( 1 )/ln( 2 ) . Fixing 2 = 2 and randomly sampling γ from U(0, 2), we can calculate β and 1 . So, we have c 0 g 0 = exp(β 0 ), c 1 g 1 = 1 exp(β 0 ), and c 2 g 2 = 2 exp(β 0 ). With c 0 + c 1 + c 2 = 1 , we can calculate ( c 0 , c 1 , c 2 ) and β 0 from the values of ( g 0 , g 1 , g 2 ), 1 and 2 . Then, we generate the samples of three genotypes for the control group and the case group by the trinomial distributions with probabilities ( g 0 , g 1 , g 2 ) and ( c 0 , c 1 , c 2 ), respectively. Finally, we can accordingly get X 1i and X 2i for all the females. Further, we assume that the case-control ratio is 1 : 1, with the sample size n = 500 and 2000.
In the Bayesian methods, the prior distributions of γ , β 0 , β and σ j (j=0, 1, 2) are set as we mentioned in the Methods section, i.e., γ ∼ U (0, 2) and γ ∼ N (1, 1) ∈ [0, 2] , β 0 ∼ N (0, 10 2 ) , β ∼ N (0, 10 2 ) and σ j ∼ exp(1) . We set 8 chains to extract the samples parallelly and simultaneously. We extract 20,000 samples in each chain, among which the first 10,000 samples are only used for warming up and are discarded when the sampling is finished. So eventually, we get 80,000 samples in total. The target acceptance rate is set to be 0.99 to ensure the convergence. The convergence diagnostic R for Markov chains in the Bayesian method is done, which compares the between-chain and withinchain estimates for the model parameters. If the chains have not mixed well (i.e., the between-chain and within-chain estimates do not agree with each other), the R of the convergence diagnostic will be larger than 1. Note that the calculated R 's in our Bayesian models are all less than 1.05 which indicates good convergence (data not shown). The simulation study is implemented by the R software (version 4.0.0).