The admixture maximum likelihood test to test for association between rare variants and disease phenotypes

Background The development of genotyping arrays containing hundreds of thousands of rare variants across the genome and advances in high-throughput sequencing technologies have made feasible empirical genetic association studies to search for rare disease susceptibility alleles. As single variant testing is underpowered to detect associations, the development of statistical methods to combine analysis across variants – so-called “burden tests” - is an area of active research interest. We previously developed a method, the admixture maximum likelihood test, to test multiple, common variants for association with a trait of interest. We have extended this method, called the rare admixture maximum likelihood test (RAML), for the analysis of rare variants. In this paper we compare the performance of RAML with six other burden tests designed to test for association of rare variants. Results We used simulation testing over a range of scenarios to test the power of RAML compared to the other rare variant association testing methods. These scenarios modelled differences in effect variability, the average direction of effect and the proportion of associated variants. We evaluated the power for all the different scenarios. RAML tended to have the greatest power for most scenarios where the proportion of associated variants was small, whereas SKAT-O performed a little better for the scenarios with a higher proportion of associated variants. Conclusions The RAML method makes no assumptions about the proportion of variants that are associated with the phenotype of interest or the magnitude and direction of their effect. The method is flexible and can be applied to both dichotomous and quantitative traits and allows for the inclusion of covariates in the underlying regression model. The RAML method performed well compared to the other methods over a wide range of scenarios. Generally power was moderate in most of the scenarios, underlying the need for large sample sizes in any form of association testing.


Background
Genome-wide association studies (GWAS) have successfully identified common genetic variants, mainly common single nucleotide polymorphisms (SNP), with small to modest effects for many complex human diseases and traits [1]. However, for most diseases common susceptibility variants identified to date explain only a small proportion of the heritable component of disease risk. A range of genetic models may explain the missing heritability from a very large number of common variants that confer risks too small to be detected to many rare variants with stronger effects. Rare disease-susceptibility alleles identified so far have mostly been in the coding sequence of genes and are associated with higher disease risks than known common susceptibility alleles.
Until recently a limited understanding of the architecture of rare genetic variation across the genome and limitations of genotyping technologies have restricted the search for rare disease-susceptibility alleles to the analysis of a small number of candidate genes for specific diseases. The development of genotyping arrays containing hundreds of thousands of rare variants across the genome and advances in high-throughput sequencing technologies have made feasible empirical genetic association studies to search for rare disease susceptibility alleles. Even so, standard methods used for association testing, in which association with the trait of interest is tested one variant at a time, are limited by statistical power. As a result there has been an increase in interest in alternative analytic methods in which the information across multiple variant sites is combinedfor example all variants in a specified gene or genomic region. Depending on the underlying model of genetic association these "burden" tests can enhance statistical power. Examples of these include the cohort allelic sum test (CAST) [2,3], the combined multivariate and collapsing (CMC) test [2], the weighted sum test (WST) [4], the variable-threshold test (VTT) [5], the C-alpha test (CAT) [6], the sequence kernel association test (SKAT) [7,8], and the estimated regression coefficient (EREC) test [9].
The CAST and CMC methods collapse information on all rare variants within a region into a single dichotomous variable for each subject and then apply a univariate test. Rare variants are defined by a fixed threshold for minor allele frequency. The WST is a non-parametric test in which rare variants are grouped according to function (e.g. gene) and each individual is scored by a sum of the mutation counts weighted by the variance under the null hypothesis. The VT approach selects an allele frequency threshold by maximising the test statistic over all thresholds and assesses statistical significance by permutation. The major limitation of these simple burden tests is that they do not account for the direction of effects of the functional alleles that are assumed to be the same. However, a gene harbouring phenotypically relevant variation could include a handful of rare Mendelian mutations that cause disease, some variants that moderately increase or decrease risk, along with numerous variants of no effect.
The CAT contrasts the variance of each observed count with the expected variance. However, the method does not allow for covariate adjustment. SKAT is a score-based variance-component test that makes no assumption about directionality of effect by performing multiple regressions of a phenotype on genotype for all variants in a region [7]. P-values are calculated analytically. The most recent implementation, called SKAT-O is a generalisation of the C-alpha test that enables the incorporation of covariates and is more powerful than simple burden tests over a range of plausible genetic models [8]. The EREC test is a modification of the WST and VTT in which the weighting is based on the estimated regression coefficient.
We have previously developed an admixture maximum likelihood (AML) test to test for association of multiple common genetic variants with the trait of interest [10]. The method is flexible and can be applied to both disease and quantitative traits and can include covariates. In this paper we propose an extension of the AML test, hereafter called RAML (rare admixture maximum likelihood test), for the analysis of uncommon variants. RAML takes account of variants that increase or decrease risk or have no effect on risk. We have compared the performance of RAML with SKAT-O and with five tests implemented by Score-SEQ (two fixed threshold methods with the minor allele frequency threshold set to 1 percent and 5 percent, a modified WST, a modified VTT and the EREC test) [9].

Methods
The RAML method provides an omnibus test for joint effects of multiple variants on a phenotype and formulates the alternative hypothesis in terms of the probability that a given variant is associated with disease (α), the average effect of the associated variants (η) and the expected standard error of this effect (σ). The effect of each variant is estimated as the signed z-statistic (Z) from the score test. To generate the omnibus test statistic the distribution of the effects under the alternative hypothesis need to be defined. It is desirable to use a distribution with a conjugate prior so that the likelihood will have a tractable computational form. We defined this as a normal distribution of the z-statistic. Using the z-statistic gives flexibility to incorporate covariates and the approach can be easily extended to quantitative trait and survival-time analyses. The average expected z-statistic can be positive or negative corresponding to an excess of deleterious or protective variants respectively. Given that a variant is associated with the phenotype the likelihood of observing the test statistic (Z) To solve this equation we can think of it in terms of the sum of two normal distributions x~N(0,1) and y~N (μ,σ 2 ). The likelihood of observing S = x + y can then be expressed as The distribution of Z given μ is distributed as a N(0,1) distribution so it can be seen that equations (1) and (2) are equivalent. Thus in the case that a variant is associated the test statistic Z is distributed as N(η,1 + σ 2 ). The log-likelihood is of the form where L 1i is the likelihood given the variant is associated (which will be from a N(η,1 + σ 2 ) distribution) and L 0i is the null likelihood (which is from a N(0,1) distribution). Thus the log-likelihood can be expressed as which we maximised using the Bound Optimization BY Quadratic Approximation (BOBYQA) algorithm [11].         One major problem in testing multiple variants in the same region of the genome is correlation due to linkage disequilibrium (LD), although this may be less of an issue for rare genetic variation. A permutation test can account for type I error, but there is still a loss of power as variants in strong LD with each other have a disproportionate effect on the test statistic. To deal with this, the RAML groups variants using a single link cluster approach, such that every variant in a group has a squared correlation (r 2 ) greater than a specified threshold with at least one other variant in the same group. For each group a proxy variant is generated, where the proxy is the maximum number of rare alleles for any variant in the group (0, 1 or 2) carried by each subject. The default r 2 threshold is set to 0.9. This deals with perfectly correlated variants whilst still being able to test most variants individually.
Restricting the parameter space to plausible values should help to improve power. Therefore bounds for α, η and σ 2 need to be defined as well as a definition of a rare variant. We set the bound for specifying a variant as rare as having a minor allele frequency of 0.04 or less. We expect most variants will not be associated so we set an upper limit for α of 0.2. As we want to be able to model both strong protective and deleterious effects we took the bounds for η to be from −5 to 5. We chose the minimum value of σ 2 to be 0.25. This represents the minimum amount of variability we could expect about the associated variant effects. Different choices of bounds will do better in some scenarios and worse in others. Our aim was to try to find a reasonable choice for likely effects that are seen in genetic association studies.
Two haplotypes were selected at random for each individual. The overall risk associated with any pair of haplotypes was calculated under a log-additive model by summing the risks from each causal variant carried. An individual in the population was assigned as a case or control at random based on this risk and a disease prevalence of 10 per cent. Two thousand cases and two thousand controls were then selected randomly from the population.
For each risk distribution we tested three scenarios in which 5%, 10% or 20% of variants were causal. The effect was set as proportional to the log of the variant minor allele frequency (p). The standard error (σ) varied under the different distributions: For 5% of associated variants, where k is 1 for distributions 1 and 2, 1.5 for distributions 3 and 4, 2 for distribution 5 and 6 and 2.5 for distribution 7; for 10% and 20% of associated variants the average effect was 0.7 and 0.5 times this respectively. We sampled haplotypes from three different regions of the genome (TERT chr5: 1253287-1295162, BRCA2 chr13: 32889617-32973809, BRCA1 chr17: 41243452-41277500) in order to evaluate the influence of different local LD structure on the different tests. Two hundred replicate data sets were simulated under each of the 21 different scenarios. We derived the power of each test as the proportion of replicates for which the empirical significance level achieved P < 0.05, P < 0.01 and P < 0.001.
We compared the RAML method to SKAT-O using the default weights and the five methods included in the program ScoreSeq. We also applied our tests to three different genes to evaluate the effects that varying genomic architecture has on the relative efficacy of the methods.
In order to evaluate the power for a larger sample size and slightly stronger effects we repeated the simulations for four thousand cases and four thousand controls across the same genomic regions with the effect of associated variants being 25 per cent larger. This data set was used to compare the performance of RAML with SKAT-O.

Results and discussion
There were 145 rare variants (MAF < 0.04) in BRCA1 (109 variants with MAF < 0.01, 27 with 0.01 < MAF < 0.02, 9 with 0.02 < MAF < 0.04), 274 rare variants in BRCA2 (196,22,56) and 193 rare variants in TERT (155, 23, 15). The power of the seven methods for 21 scenarios at the three thresholds for statistical significance are shown in Tables 1, 2 and 3 for BRCA1, BRCA2 and TERT respectively. Generally, power was limited (< 50 per cent) for all methods across a wide range of plausible scenarios for the underlying genetic model. The RAML test had the greatest power for most scenarios for both BRCA2 and TERT. For BRCA1, RAML tended to have the greatest power for the scenarios with a small proportion of associated variants, whereas SKAT-O performed a little better for the scenarios with a higher proportion of associated variants. Whether 100% or 80% of variants are associated with effects in the same direction did not change the relative efficacy of the two methods.
Given the limited power of all the methods for the analysis of 2,000 cases and 2,000 controls, we repeated the evaluation of RAML and SKAT-O using data simulated for 4,000 cases and 4,000 controls across the same genomic regions. There were 145 rare variants (MAF < 0.04) in BRCA1 (109 variants with MAF < 0.01, 27 with 0.01 < MAF < 0.02, 9 with 0.02 < MAF < 0.04), 274 rare variants in BRCA2 (196, 22, 56) and 193 rare variants in TERT (155, 23, 15). The power of the two methods for 21 scenarios at the three thresholds for statistical significance are shown in Tables 4, 5 and 6 for BRCA1, BRCA2 and TERT respectively. As expected, the power of the both methods is improved. The RAML method was still more powerful than SKAT-O under most scenarios, but the difference was greater than for the smaller sample sizes.
The apparent difference between RAML and SKAT-O for the BRCA1 and the other two regions is related to the fact that SKAT-O does not use a fixed threshold for the minor allele frequency, but uses a weighting function based on the minor allele frequency [7]. Thus the method is sensitive to the number of variants around the threshold MAF of interest. There were relatively fewer variants with a MAF just above 4 per cent in BRCA1 than in the other two genes.

Conclusion
We have described a new method for association testing of multiple rare variants that makes no assumptions about the proportion of variants that are associated with the phenotype of interest or the magnitude and direction of their effect. The method is flexible and can be applied to both dichotomous and quantitative traits and allows for the inclusion of covariates in the underlying regression model. We have compared the performance of RAML with six other similar methods using data simulated under 21 plausible scenarios for the underlying genetic model of association. Under most of these scenarios, RAML was found to have the greatest power, although SKAT-O performed better under some circumstances.
Genome-wide association studies have been very successful in identifying common alleles associated with many disease and physiological traits. However, these alleles explain a small fraction of the genetic component of the variance for most traits. It is very likely that rare variants will contribute to some of the so-called missing heritability. A systematic search for disease associated rare variants has been made possible by the availability of high-throughput, affordable sequencing technologies and the development of genotyping arrays that include hundreds of thousands of rare variants. Given that the underlying genetic model for association between rare genetic variants and disease related phenotypes is not knowneffect allele frequency, effect size and proportion of associated variants -it is not possible to provide a definitive guide to the situations in which RAML should be preferred to SKAT-O or other methods. Until empirical evidence emerges for association of multiple rare variants across a genomic region it would seem reasonable to use multiple methods for burden testing including both RAML and SKAT-O.