In genetics, heterogeneity is a major feature of human traits. Genetic heterogeneity occurs when the same or clinically indistinguishable phenotypes are caused by different genetic factors. This can be due to multiple variants located in the same locus (allelic heterogeneity) or to mutations located in different loci (locus heterogeneity).
The focus of this work is locus heterogeneity, specifically heterogeneity caused by having an unknown subset of pedigrees in a sample being unlinked to a disease locus while the rest are linked [1, 2].
There are many reported examples of locus heterogeneity, including breast cancer [3–6], maturity-onset diabetes of the young (MODY) , epilepsy , early-onset Alzheimer's Disease , rheumatoid arthritis , non-polyposis colorectal cancer , non-syndromic hearing loss [12–14] and retinitis pigmentosa [15–17].
Locus heterogeneity can substantially affect the power of linkage and association analyses [18–27]. In linkage analysis, there are many examples of methods that address this issue. For example, we have: the M test  (also known as K-test [29, 30]), a likelihood ratio test (LRT) that estimates the value of the (assumed fixed) recombination fraction (θ) for each pedigree in a sample; the B-test , which is a more powerful version of the M-test that assumes an underlying beta null distribution for each estimated θ; the admixture test (A-test), which is based on the difference between the log-likelihood of the admixture model (data are composed of linked and unlinked families) and the homogeneity model (families are all linked with a common θ) [2, 31–36]; the D-test , a combination of the A and B tests and finally, the C-test , which is based on the M-test and for which the underlying null probability distribution is determined by simulation. The M and B tests were originally developed to identify different values of θ for different pedigrees. For the A-test, families are grouped into two types: a proportion a that are linked to the disease locus (θ < 1/2) and a proportion 1- α that are unlinked (θ = 1/2) [1, 2]. As contrasted with M and the B tests, which place pedigrees into classes a priori, the A test accounts for heterogeneity by maximizing the standard log-odds (LOD) score  over α and θ. That is, each pedigree has some probability of being in the linked or unlinked group. This statistic is known as the heterogeneity LOD score (HLOD) .
The A-test has been implemented in a suite of programs to test for heterogeneity vs. homogeneity (HOMOG) . More complex heterogeneity scenarios are also available in this package: HOMOG1 allows for gender specific differences in θ. HOMOG2, HOMOG3, HOMOG4, distinguish two, three and four types of families respectively, each linked to different disease loci on the same chromosome. HOMOG3R is a special case of HOMOG3 where there are three family classes: the first class is linked to a given marker; the second is linked to another marker on a different chromosome and the third is linked to neither marker. Lastly, HOMOGM , an extension of HOMOG3R, allows for any number of disease loci.
It is important to mention linkage analysis methods for quantitative trait loci (QTL) that account for locus heterogeneity in the analysis. Yang et al.  proposed a QTL mapping model for sib pair data. Knight et al.  and Ekstrøm et al.  independently developed LRT-based models in which the underlying null probability distributions are determined by simulation while Wang and Peng  proposed three test statistics with known null asymptotic distributions. It appears that relatively fewer publications considering locus heterogeneity for association have been published as compared with heterogeneity for linkage. When using the search terms "(locus heterogeneity) AND (linkage)" in ISI Web of Knowledge, we retrieve a total of 2,418 titles. By contrast, using the using the search terms "(locus heterogeneity) AND (association)", we retrieve a total of 884 titles, an almost 67% reduction. Having documented that, we do note that methods to address locus heterogeneity for association-based methods have been developed.
Latent class models  have been used to estimate membership-class probabilities for individuals with similar genetic backgrounds [45–48].Ordered Subset Analysis (OSA)-based models have been extended to association, including the sequential addition (SA) procedure  and the OSA case-control (OSACC) method . For family-based data, the OSA-TDT  applies OSA to the transmission disequilibrium test (TDT) , and the APL-OSA  similarly applies OSA to the "association in the presence of linkage" test (APL) .
Yang et al.  extended the Posterior Probability of Linkage (PPL) method to one that incorporates linkage disequilibrium information between marker and disease alleles. Huang et al.  extended the PPL method to case-control data. These methods maintain all the features of the original PPL method for linkage, namely, they do not require correction for multiple testing and they can sequentially update information across multiple data sets.
Wang and Huang  developed two LRT extensions of the HLOD: the LD-Het for general pedigrees and the LD-multinomial for affected sib pair data. Here, LD stands for linkage disequilibrium. Schmidt et al.  proposed using a two-stage linkage/association approach for affected sib pair data. Finally, Zhou and Pan  used a mixture model to allow for locus heterogeneity in a case-control design.
The purpose of this work is the development of a new test statistic that we call TDT-HET, that allows for locus heterogeneity when applying the TDT statistic. This work is largely motivated by the recent work of Zhou and Pan . As in their paper, our statistic is based on an underlying mixture model. We apply an expectation-maximization (EM) algorithm to compute log-likelihoods of the data under null and alternative hypotheses. The EM algorithm also produces maximum likelihood estimates of parameters such as the probability that a heterozygous parent transmits the disease allele to an affected child, the probability that a trio (mother, father, affected child) is linked to the locus in question, and the probability that certain trio types (determined by the constellation of genotypes) are linked to the locus being studied. In addition, we extend our TDT-HET method to statistic that can evaluate multiple loci jointly. This extension is motivated by and similar to the work of Hoh, Ott, and colleagues. They called their method SumStat [59–62].
For both single-locus and multi-locus simulations, we evaluate the type I error rate and the power of the TDT-HET method to detect association. In addition, we apply the TDT-HET method to candidate loci from a study of idiopathic scoliosis trios to determine if there is any suggestion of locus heterogeneity at the loci considered, and whether the results suggest evidence for association in the presence of heterogeneity.