The RAML method provides an omnibus test for joint effects of multiple variants on a phenotype and formulates the alternative hypothesis in terms of the probability that a given variant is associated with disease (α), the average effect of the associated variants (η) and the expected standard error of this effect (σ). The effect of each variant is estimated as the signed zstatistic (Z) from the score test. To generate the omnibus test statistic the distribution of the effects under the alternative hypothesis need to be defined. It is desirable to use a distribution with a conjugate prior so that the likelihood will have a tractable computational form. We defined this as a normal distribution of the zstatistic. Using the zstatistic gives flexibility to incorporate covariates and the approach can be easily extended to quantitative trait and survivaltime analyses. The average expected zstatistic can be positive or negative corresponding to an excess of deleterious or protective variants respectively. Given that a variant is associated with the phenotype the likelihood of observing the test statistic (Z)
{\displaystyle \int p\left(T\mu =x\right)p\left(\mu =x\eta ,\sigma \right)\mathit{dx}}
(1)
To solve this equation we can think of it in terms of the sum of two normal distributions x ~ N(0,1) and y ~ N(μ,σ^{2}). The likelihood of observing S = x + y can then be expressed as
{\displaystyle \int p\left(Sy=Y\right)p\left(y=YN\left(\mu ,{\sigma}^{2}\right)\right)\mathit{dy}}
(2)
The distribution of Z given μ is distributed as a N(0,1) distribution so it can be seen that equations (1) and (2) are equivalent. Thus in the case that a variant is associated the test statistic Z is distributed as N(η,1 + σ^{2}). The loglikelihood is of the form
l\left(\alpha ,\eta ,\sigma \right)={\displaystyle \sum _{i}\mathit{\text{log}}\left(1\alpha +\alpha {L}_{1i}/{L}_{0i}\right)}
(3)
where L_{
1i
} is the likelihood given the variant is associated (which will be from a N(η,1 + σ^{2}) distribution) and L_{
0i
} is the null likelihood (which is from a N(0,1) distribution). Thus the loglikelihood can be expressed as
l\left(\alpha ,\eta ,\sigma \right)={\displaystyle \sum _{i}\mathit{\text{log}}\left(1\alpha +\frac{\alpha}{\sqrt{1+{\sigma}^{2}}}\mathit{\text{exp}}\left\{\frac{1}{2}\left(\frac{{\left({x}_{i}\eta \right)}^{2}}{1+{\sigma}^{2}}{x}_{i}{}^{2}\right)\right\}\right)}
(4)
which we maximised using the Bound Optimization BY Quadratic Approximation (BOBYQA) algorithm [11].
One major problem in testing multiple variants in the same region of the genome is correlation due to linkage disequilibrium (LD), although this may be less of an issue for rare genetic variation. A permutation test can account for type I error, but there is still a loss of power as variants in strong LD with each other have a disproportionate effect on the test statistic. To deal with this, the RAML groups variants using a single link cluster approach, such that every variant in a group has a squared correlation (r^{2}) greater than a specified threshold with at least one other variant in the same group. For each group a proxy variant is generated, where the proxy is the maximum number of rare alleles for any variant in the group (0, 1 or 2) carried by each subject. The default r^{2} threshold is set to 0.9. This deals with perfectly correlated variants whilst still being able to test most variants individually.
Restricting the parameter space to plausible values should help to improve power. Therefore bounds for α, η and σ^{2} need to be defined as well as a definition of a rare variant. We set the bound for specifying a variant as rare as having a minor allele frequency of 0.04 or less. We expect most variants will not be associated so we set an upper limit for α of 0.2. As we want to be able to model both strong protective and deleterious effects we took the bounds for η to be from −5 to 5. We chose the minimum value of σ^{2} to be 0.25. This represents the minimum amount of variability we could expect about the associated variant effects. Different choices of bounds will do better in some scenarios and worse in others. Our aim was to try to find a reasonable choice for likely effects that are seen in genetic association studies.
Simulation testing
We simulated population data using phased haplotypes from the 1000 Genome Project data (http://www.1000genomes.org/). In order to determine the risk associated with each haplotype the risk associated with each variant was based on seven different scenarios with risk distributions defined below

1.
β ~ N(2σ, σ ^{2})

2.
80 per cent of risk variants β ~ N(2σ, σ ^{2}), 20 per cent of risk variants β ~ N(−2σ, σ ^{2})

3.
β ~ N(σ, σ ^{2})

4.
80 per cent of risk variants β ~ N(σ, σ ^{2}), 20 per cent of risk variants β ~ N(−σ, σ ^{2})

5.
β ~ N(σ/2, σ ^{2})

6.
80 per cent of risk variants β ~ N(σ/2, σ ^{2}), 20 per cent of risk variants β ~ N(−σ/2, σ ^{2})

7.
β ~ N(0, σ ^{2})
These scenarios are similar to the ones presented by Lee et al.[8] with the main difference being that we added some variability. The parameters were chosen to give roughly the same power for each scenario.
Two haplotypes were selected at random for each individual. The overall risk associated with any pair of haplotypes was calculated under a logadditive model by summing the risks from each causal variant carried. An individual in the population was assigned as a case or control at random based on this risk and a disease prevalence of 10 per cent. Two thousand cases and two thousand controls were then selected randomly from the population.
For each risk distribution we tested three scenarios in which 5%, 10% or 20% of variants were causal. The effect was set as proportional to the log of the variant minor allele frequency (p). The standard error (σ) varied under the different distributions: For 5% of associated variants,
\sigma =0.04\mathit{\text{log}}\left(p\right)k
where k is 1 for distributions 1 and 2, 1.5 for distributions 3 and 4, 2 for distribution 5 and 6 and 2.5 for distribution 7; for 10% and 20% of associated variants the average effect was 0.7 and 0.5 times this respectively.
We sampled haplotypes from three different regions of the genome (TERT chr5: 12532871295162, BRCA2 chr13: 3288961732973809, BRCA1 chr17: 4124345241277500) in order to evaluate the influence of different local LD structure on the different tests. Two hundred replicate data sets were simulated under each of the 21 different scenarios. We derived the power of each test as the proportion of replicates for which the empirical significance level achieved P < 0.05, P < 0.01 and P < 0.001.
We compared the RAML method to SKATO using the default weights and the five methods included in the program ScoreSeq. We also applied our tests to three different genes to evaluate the effects that varying genomic architecture has on the relative efficacy of the methods.
In order to evaluate the power for a larger sample size and slightly stronger effects we repeated the simulations for four thousand cases and four thousand controls across the same genomic regions with the effect of associated variants being 25 per cent larger. This data set was used to compare the performance of RAML with SKATO.