A powerful parent-of-origin effects test for qualitative traits on X chromosome in general pedigrees

Background Genomic imprinting is one of the well-known epigenetic factors causing the association between traits and genes, and has generally been examined by detecting parent-of-origin effects of alleles. A lot of methods have been proposed to test for parent-of-origin effects on autosomes based on nuclear families and general pedigrees. Although these parent-of-origin effects tests on autosomes have been available for more than 15 years, there has been no statistical test developed to test for parent-of-origin effects on X chromosome, until the parental-asymmetry test on X chromosome (XPAT) and its extensions were recently proposed. However, these methods on X chromosome are only applicable to nuclear families and thus are not suitable for general pedigrees. Results In this article, we propose the pedigree parental-asymmetry test on X chromosome (XPPAT) statistic to test for parent-of-origin effects in the presence of association, which can accommodate general pedigrees. When there are missing genotypes in some pedigrees, we further develop the Monte Carlo pedigree parental-asymmetry test on X chromosome (XMCPPAT) to test for parent-of-origin effects, by inferring the missing genotypes given the observed genotypes based on a Monte Carlo estimation. An extensive simulation study has been carried out to investigate the type I error rates and the powers of the proposed tests. Our simulation results show that the proposed methods control the size well under the null hypothesis of no parent-of-origin effects. Moreover, XMCPPAT substantially outperforms the existing tests and has a much higher power than XPPAT which only uses complete nuclear families (with both parents) from pedigrees. We also apply the proposed methods to analyze rheumatoid arthritis data for their practical use. Conclusions The proposed XPPAT and XMCPPAT test statistics are valid and powerful in detecting parent-of-origin effects on X chromosome for qualitative traits based on general pedigrees and thus are recommended. Electronic supplementary material The online version of this article (10.1186/s12859-017-2001-5) contains supplementary material, which is available to authorized users.


Appendix B: Simulation study for the validity of XPPAT when testing parent-of-origin effects under X chromosome inactivation
Imprinting effects and X chromosome inactivation (XCI) are two important biological mechanisms on X chromosome. XCI happens during early embryonic development in females whose paternal or maternal X chromosome is silenced to achieve dosage compensation between two sexes. It is generally a random process where both of the paternal and maternal X chromosomes have equal chance to be inactived. In this regard, XCI is easily confounded with imprinting effects. Here, we denote random XCI as XCI-R. However, recent studies have revealed that skewed XCI (XCI-S) is a biological plausibility, which has been defined as a significant deviation from XCI-R, for instance, the inactivation of one of the alleles in more than 75% of cells. In mice, XCI-S can be controlled by Xce gene or influenced by parent-of-origin effects. For human beings, XCI-S is more likely caused by secondary selection. The initial choice of active X chromosome is considered as random. During the body growth, when an X-linked mutation affects cells proliferation or survival, there will be a larger or smaller proportion of cells with the mutant allele active. For heterozygous females, positive selection cells with mutant allele will lead to more severe expression of the disease, whereas negative selection cells with mutant allele can provide protection from deleterious effects [1][2]. To investigate if our proposed methods are still valid for testing parent-of-origin effects under XCI-R and XCI-S, we conduct the following simulation study.
Consider a case-control design. For females, let X = {0, r, 2} be the genotypic values for three unordered genotypes dd, Dd and DD at the candidate SNP locus on X chromosome, where r ∈ [0, 2]. For males, we use X = {0, s} to denote the allelic values of alleles d and D, where s ≥ 0. Let Y = 1 (0) denote that the individual (female or male) is affected (unaffected). Then, when there is no parent-of-origin effects, borrowing the idea of Wang et al. [3], the association between Y and X can be expressed using a logistic regression model where β 0 is the intercept; z is the gender of the individual with female and male being coded as 1 and 0, respectively, to indicate either female or male being at increased risk for disease; β and β z are respectively the regression coefficient for X and that for z. In Wang et al. [3], s is set to be 2, which means dosage compensation (the effect of two risk alleles in females is equivalent to that of one-risk allele in males). According to Wang et al. [3], when 1 < r ≤ 2, this coding assumes a nonrandom XCI-S skewed toward to the disease allele D. Similarly, when 0 ≤ r < 1, this coding assumes a nonrandom XCI-S skewed toward to the normal allele d. In addition, r = 1 means XCI-R.
On the other hand, suppose that the genotype distribution in the control group and that in the case group of females follow trinomial distributions with probabilities (g 0 , g 1 , g 2 ) and (h 0 , h 1 , h 2 ), respectively, where g 0 (h 0 ), g 1 (h 1 ) and g 2 (h 2 ) are the genotype frequencies of dd, Dd and DD in the control (case) group, respectively. If the frequency p of allele D is given and Hardy-Weinberg equilibrium holds, then with q = 1 − p, we have g 0 = q 2 , g 1 = 2pq and where λ f 1 and λ f 2 are the odds ratios of genotypes Dd and DD compared to dd in females.
As such, λ f 1 = exp(βr) and λ f 2 = exp(2β). By Similarly, assume that the allele distribution in control group and that in the case group of males follow binomial distributions, and we use λ m = exp(βs) to denote the odds ratio of allele D compared to d in males. According to Chen et al. [4], for females, mean that the genetic models are dominant, recessive and additive, respectively. Further, when the association between the disease and X is present For simplicity, we only generate N parents-daughter trios under model (1), each with an affected daughter and her parents. For each family trio, we first generate the allele of the father according to the allele frequencies p and q and simulate the genotype of the mother based on the genotype distribution (g 0 , g 1 , g 2 ). Then, generate the genotype of the daughter from her parental genotypes. From Equation (1), we have the penetrances f 0 , f 1 and f 2 for genotypes dd, Dd and DD as follows The affection status of the daughter is simulated based on the penetrances and her genotype.
Here, N is taken to be 100 and 200. Then, we can calculate the value of the XPPAT test statistic based on these N parents-daughter trios. The frequency p of allele D is fixed at 0.1 and 0.3. We assume β 0 = −2.55 and β z = −0.0513, and β takes values of 0.0953 and 0.2624.
The true value of r is set to be 0, 0.5, 1, 1.5 and 2. We use the nominal significance levels α = 5% and 1% for the type I error rate assessment. The simulation study is conducted based on 10,000 replications. Finally, we would like to investigate the analogy and distinction between the simulation of parent-of-origin effects on X chromosome and the simulation of XCI. For easy comparison, we only generate N parents-daughter trios, each with an affected daughter and her parents, in the simulation of parent-of-origin effects, just like the above-mentioned simulation of XCI.
In the main text, we simulate the parent-of-origin effects (without XCI) by fixing the values of four penetrances f 00 , f 01 , f 10 and f 11 corresponding to genotypes d/d, d/D, D/d and D/D.
In fact, we can also use the following logistic regression model to simulate the parent-of-origin where (X 1 , X 2 ) takes the values of (0 Then, based on these penetrances, we can generate N parents-daughter trios in the way similar to XCI. Note that in model (2), β 1 = β 2 = β implies f 01 = f 10 (i.e., no parent-oforigin effects) and model (2) is reduced to be Logit(Pr(Y = 1|X * , z)) = β 0 + βX * + β z z, where X * = X 1 + X 2 takes the values of 0, 1 and 2 for unordered genotypes dd, Dd and DD, respectively. Therefore, model (2) under no parent-of-origin effects is equivalent to model (1) under XCI-R.