 Methodology article
 Open Access
 Published:
TDTHET: A new transmission disequilibrium test that incorporates locus heterogeneity into the analysis of familybased association data
BMC Bioinformatics volume 13, Article number: 13 (2012)
Abstract
Background
Locus heterogeneity is one of the most documented phenomena in genetics. To date, relatively little work had been done on the development of methods to address locus heterogeneity in genetic association analysis. Motivated by Zhou and Pan's work, we present a mixture model of linked and unlinked trios and develop a statistical method to estimate the probability that a heterozygous parent transmits the disease allele at a diallelic locus, and the probability that any trio is in the linked group. The purpose here is the development of a test that extends the classic transmission disequilibrium test (TDT) to one that accounts for locus heterogeneity.
Results
Our simulations suggest that, for sufficiently large sample size (1000 trios) our method has good power to detect association even the proportion of unlinked trios is high (75%). While the median difference (TDTHET empirical power  TDT empirical power) is approximately 0 for all MOI, there are parameter settings for which the power difference can be substantial. Our multilocus simulations suggest that our method has good power to detect association as long as the markers are reasonably wellcorrelated and the genotype relative risk are larger. Results of both singlelocus and multilocus simulations suggest our method maintains the correct type I error rate.
Finally, the TDTHET statistic shows highly significant pvalues for most of the idiopathic scoliosis candidate loci, and for some loci, the estimated proportion of unlinked trios approaches or exceeds 50%, suggesting the presence of locus heterogeneity.
Conclusions
We have developed an extension of the TDT statistic (TDTHET) that allows for locus heterogeneity among coded trios. Benefits of our method include: estimates of parameters in the presence of heterogeneity, and reasonable power even when the proportion of linked trios is small. Also, we have extended multilocus methods to TDTHET and have demonstrated that the empirical power may be high to detect linkage. Last, given that we obtain PPBs, we conjecture that the TDTHET may be a useful method for correctly identifying linked trios. We anticipate that researchers will find this property increasingly useful as they apply nextgeneration sequencing data in family based studies.
Background
In genetics, heterogeneity is a major feature of human traits. Genetic heterogeneity occurs when the same or clinically indistinguishable phenotypes are caused by different genetic factors. This can be due to multiple variants located in the same locus (allelic heterogeneity) or to mutations located in different loci (locus heterogeneity).
The focus of this work is locus heterogeneity, specifically heterogeneity caused by having an unknown subset of pedigrees in a sample being unlinked to a disease locus while the rest are linked [1, 2].
There are many reported examples of locus heterogeneity, including breast cancer [3–6], maturityonset diabetes of the young (MODY) [7], epilepsy [8], earlyonset Alzheimer's Disease [9], rheumatoid arthritis [10], nonpolyposis colorectal cancer [11], nonsyndromic hearing loss [12–14] and retinitis pigmentosa [15–17].
Locus heterogeneity can substantially affect the power of linkage and association analyses [18–27]. In linkage analysis, there are many examples of methods that address this issue. For example, we have: the M test [28] (also known as Ktest [29, 30]), a likelihood ratio test (LRT) that estimates the value of the (assumed fixed) recombination fraction (θ) for each pedigree in a sample; the Btest [29], which is a more powerful version of the Mtest that assumes an underlying beta null distribution for each estimated θ; the admixture test (Atest), which is based on the difference between the loglikelihood of the admixture model (data are composed of linked and unlinked families) and the homogeneity model (families are all linked with a common θ) [2, 31–36]; the Dtest [30], a combination of the A and B tests and finally, the Ctest [19], which is based on the Mtest and for which the underlying null probability distribution is determined by simulation. The M and B tests were originally developed to identify different values of θ for different pedigrees. For the Atest, families are grouped into two types: a proportion a that are linked to the disease locus (θ < 1/2) and a proportion 1 α that are unlinked (θ = 1/2) [1, 2]. As contrasted with M and the B tests, which place pedigrees into classes a priori, the A test accounts for heterogeneity by maximizing the standard logodds (LOD) score [37] over α and θ. That is, each pedigree has some probability of being in the linked or unlinked group. This statistic is known as the heterogeneity LOD score (HLOD) [38].
The Atest has been implemented in a suite of programs to test for heterogeneity vs. homogeneity (HOMOG) [38]. More complex heterogeneity scenarios are also available in this package: HOMOG1 allows for gender specific differences in θ. HOMOG2, HOMOG3, HOMOG4, distinguish two, three and four types of families respectively, each linked to different disease loci on the same chromosome. HOMOG3R is a special case of HOMOG3 where there are three family classes: the first class is linked to a given marker; the second is linked to another marker on a different chromosome and the third is linked to neither marker. Lastly, HOMOGM [39], an extension of HOMOG3R, allows for any number of disease loci.
It is important to mention linkage analysis methods for quantitative trait loci (QTL) that account for locus heterogeneity in the analysis. Yang et al. [40] proposed a QTL mapping model for sib pair data. Knight et al. [41] and Ekstrøm et al. [42] independently developed LRTbased models in which the underlying null probability distributions are determined by simulation while Wang and Peng [43] proposed three test statistics with known null asymptotic distributions. It appears that relatively fewer publications considering locus heterogeneity for association have been published as compared with heterogeneity for linkage. When using the search terms "(locus heterogeneity) AND (linkage)" in ISI Web of Knowledge, we retrieve a total of 2,418 titles. By contrast, using the using the search terms "(locus heterogeneity) AND (association)", we retrieve a total of 884 titles, an almost 67% reduction. Having documented that, we do note that methods to address locus heterogeneity for associationbased methods have been developed.
Latent class models [44] have been used to estimate membershipclass probabilities for individuals with similar genetic backgrounds [45–48].Ordered Subset Analysis (OSA)based models have been extended to association, including the sequential addition (SA) procedure [49] and the OSA casecontrol (OSACC) method [50]. For familybased data, the OSATDT [51] applies OSA to the transmission disequilibrium test (TDT) [52], and the APLOSA [53] similarly applies OSA to the "association in the presence of linkage" test (APL) [54].
Yang et al. [55] extended the Posterior Probability of Linkage (PPL) method to one that incorporates linkage disequilibrium information between marker and disease alleles. Huang et al. [56] extended the PPL method to casecontrol data. These methods maintain all the features of the original PPL method for linkage, namely, they do not require correction for multiple testing and they can sequentially update information across multiple data sets.
Wang and Huang [22] developed two LRT extensions of the HLOD: the LDHet for general pedigrees and the LDmultinomial for affected sib pair data. Here, LD stands for linkage disequilibrium. Schmidt et al. [57] proposed using a twostage linkage/association approach for affected sib pair data. Finally, Zhou and Pan [58] used a mixture model to allow for locus heterogeneity in a casecontrol design.
The purpose of this work is the development of a new test statistic that we call TDTHET, that allows for locus heterogeneity when applying the TDT statistic. This work is largely motivated by the recent work of Zhou and Pan [58]. As in their paper, our statistic is based on an underlying mixture model. We apply an expectationmaximization (EM) algorithm to compute loglikelihoods of the data under null and alternative hypotheses. The EM algorithm also produces maximum likelihood estimates of parameters such as the probability that a heterozygous parent transmits the disease allele to an affected child, the probability that a trio (mother, father, affected child) is linked to the locus in question, and the probability that certain trio types (determined by the constellation of genotypes) are linked to the locus being studied. In addition, we extend our TDTHET method to statistic that can evaluate multiple loci jointly. This extension is motivated by and similar to the work of Hoh, Ott, and colleagues. They called their method SumStat [59–62].
For both singlelocus and multilocus simulations, we evaluate the type I error rate and the power of the TDTHET method to detect association. In addition, we apply the TDTHET method to candidate loci from a study of idiopathic scoliosis trios to determine if there is any suggestion of locus heterogeneity at the loci considered, and whether the results suggest evidence for association in the presence of heterogeneity.
Methods
Notation
Much of the notation we use comes from the work of Zhou and Pan [58], who developed a test statistic for casecontrol data that allows for locus heterogeneity. Also, much of the TDT notation comes from the work of Schaid and Sommer [63]. Here we present notation used in the main body of this work. A fuller notation list may be found in the additional file 1, Appendix (Notation section).
M = The disease allele at the putative disease SNP locus.
N = The nondisease allele at the putative disease SNP locus.
x_{ abc }= The trio where parent 1, parent 2, and affected child have a, b, and c copies of the M allele at the putative disease locus (range for all copies: 0  2). For example, x_{222} is the trio with mating type MM × MM and affected child genotype MM. Throughout this work, we will use the notation abc interchangeably with x_{ abc }.
n_{ abc }= The number of trios x_{ abc }in the sample.
n = The total number of trios in the study.
D = Event that the child in a trio is affected.
A = Event that individual in a population is affected.
ϕ = Pr(A) = Disease prevalence.
f_{ i }= Pr(Ai copies of M allele in individual's genotype) = Disease penetrances, i = 0,1,2.
, i = 1,2 = genotype relative risks (GRR) [63]. R_{1} corresponds to the heterozygote GRR and R_{2} to the GRR for disease allele homozygote. We consider three kinds of disease modes of inheritance (MOI) in this work: R_{1} = R_{2} (dominant); R_{1} = 1 (recessive); ${R}_{1}^{2}={R}_{2}$ (multiplicative).
p = Pr(M) = Disease allele frequency (DAF).
q = Pr(N) = 1  p = Nondisease allele frequency.
t = Pr(heterozygous parent transmits M allele to affected offspring). In this work, the null hypothesis, H_{0}, is t = 0.5. The alternative hypothesis, H_{1}, is t ≠ 0.5.
μ_{k,i}= Pr(Mating type = iD, pop = k) = probability that the mating type is i given that the child is affected and the trio comes from the k^{th} population, 1 ≤ k ≤ 2. Throughout this work, we shall use the notation k = 1 to indicate that the trio is in the linked population (t ≠ 0.5) and k = 2 to indicate that the trio is in the unlinked population (t = 0.5). Similar to Schaid and Sommer [63], we consider 6 mating types in this work. We recognize that other models, such as those considered by Weinberg and colleagues [64, 65], require more than six mating type frequencies. We conjecture that our work extends to such situations.
π_{1} = Pr(trio is linked to trait locus) = Pr(t ≠ 0.5). In this work we specify that t is the same for all linked trios. This specification is also made for the recombination fraction in some tests of linkage allowing for heterogeneity (see, e.g., work by C. A. B. Smith [1, 2] and Ott [38], specifically the method implemented in programs such as HOMOG [38], GENEHUNTER [66], SIMWALK2 [67], VITESSE [68], MERLIN [69], and other programs).
= Maximum likelihood estimate (MLE) of the parameter x. This MLE is determined by means of the EM algorithm.
z_{ k,j }= The indicator variable for population k and trio x_{ j }, where the subscript j indicates the j^{th} trio in the sample.
iteration step estimate that the j^{th} trio is in the k^{th} population, k = 1,2. Without loss of clarity, we will use sometimes write ${\tau}_{k,abc}^{\left(r\right)}$, where abc refers to the trio x_{ abc }(see above).
TDTHET Test Statistic
The TDTHET statistic is a likelihood ratio statistic. Loglikelihoods under the null hypothesis, H_{0}: t = 0.5 or π_{1} = 0, and under the alternative hypothesis,H_{1} : t ≠ 0.5 and π_{1} ≠ 0, are computed by maximizing these parameters for the observed data. We compute the maximum likelihood estimates under H_{0} and H_{1} using the ExpectationMaximization method [70]. Pvalues are computed using permutation methods. Full details are provided in the additional file 1, Appendix (TDTHET Statistic section).
All trios drawn from a population with one set of parental mating types
Simulations
Single locus
To evaluate the type I error rate and power of the test statistic under different scenarios, we perform simulations. In this section, we describe simulations where we consider type I error rate and power for a single disease locus that has been genotyped. The parameter settings that we consider are presented in Table 1.
We comment that, in item 3 in Table 1, we specify that the disease locus is in Hardy Weinberg Equilibrium. In our simulations, we use the value p to determine the mating type frequencies. Specifically, we specify random mating in the singlelocus simulations, so that the matingtype frequencies μ_{ i }are the products of the parental genotype frequencies, which themselves are determined by the allele frequency p according to HardyWeinberg Equilibrium. For example, the frequency of the matingtype MN × NN is 2 × (2pq) × q^{2} = 4pq^{3}, where q = 1  p is the frequency of the N allele. Schaid and Sommer provide similar results in their Table 1 [63]. While we do not simulate nonHardyWeinberg situations in our singlelocus simulations, we do so in our multilocus simulations (see below).
Multilocus
To evaluate the TDTHET statistic for multiple loci, we apply a slight variant of the "SumStat " procedure developed by Hoh, Ott, and colleagues [59–62]. While these researchers consider sums of everincreasing number of SNPs, in this work, we consider just the full sum. Specifically, for each of the k loci, 1 ≤ k ≤ L, where L is the number of loci in the simulation, we compute TDTHET(k), the value of the statistic at the k^{th} locus. We then compute:
Empirical significance levels are determined through permutation. Since each locus k has 500 permuted TDTHET(k) statistics associated with it, we can compute the permuted SumStat statistics for each permutation number (1 to 500) (This value changes to 100,000 for the Idiopathic scoliosis Candidate Loci  see below). The empirical significance level is defined as the proportion of SumStat values that exceed the SumStat value for the observed data. Table 2 lists the parameter settings for our multilocus simulations (those parameters not listed there are the same as listed in Table 1).
Here, we consider 4 correlated SNPs in each simulation. Mating types for the first locus (labeled MT[1] [i]) are determined using the disease allele frequency p (setting 2b in Table 2). As above, each mating type i = 1, ..., 6 will have its frequency determined using HWE proportions. See above (Simulations  Single Locus) for formulas determining mating type frequencies for the first locus. For each consecutive locus l, 2 ≤ l ≤ 4, the mating type frequencies for the i^{th} mating type is determined in the following fashion.

1.
Define [l][i] = ρ × MT[l  1][i] + (1  ρ) × X, where X~U(0,1).

2.
Compute $SumY\left[l\right]={\sum}_{i=1}^{6}Y\left[l\right]\left[i\right]$.

3.
$$MT\left[l\right]\left[i\right]=\frac{Y\left[l\right]\left[i\right]}{SumY\left[l\right]}$$
.
Note that, if ρ = 1 (perfect correlation), then the mating type frequencies for each locus are identical. If ρ = 0 (no correlation), then each locus has mating type frequencies that are essentially random numbers that sum to 1. In the Results section, power is computed at the 1% significance level (see below).
Idiopathic Scoliosis Candidate Loci
We applied our method to a dataset that included selected loci from our published genomewide association study (GWAS) of adolescent idiopathic scoliosis (AIS) [71]. Briefly, AIS is a common spinal deformity with a prevalence of ~3% in school age children worldwide. The underlying genetics of AIS are generally complex and heterogeneity is apparent [71, 72]. In the work presented here we selected genotypes for five loci derived in a total of 447 trios (1849 samples) from 447 families that were included in our previous publication [71]. Of the five loci, four (rs1400180, rs10510181, rs1040315, and rs2222973) were selected due to their significance by TDT analysis, their evidence of clustering, and their proximity to genes of potential biological relevance. We also selected an additional locus, rs11770843, because of its proximity to haplotypes previously linked and associated with AIS [73].
While we keep a number of the settings fixed (Table 1, settings 89), we alter the number of permutations per statistic to 100,000. Note that this number is much larger than the number performed in our simulation studies. The reason for this is that we are analyzing far fewer markers here than in our simulations, so time/CPU constraints are not really an issue. Also, the SumStat Pvalue is based on 100,000 permutations, since we have 100,000 permutation TDTHET statistics for each locus.
As a comparison, we compute the TDT statistic [52] as implemented in the PLINK software [74]. We also compute pointwise and familywise permutation pvalues (labeled Emp1 and Max(T), respectively by Purcell et al. [74]). The Max(T) permutation statistic is based on the maximum observed test per permutation and so accurately reflects the familywise error rate in the presence of LD.
While this description is for a genomewide study, we consider only the situation max(T) applied to 5 candidate SNPs. We compare the max(T) statistic to our Bonferronicorrected maximum TDTHET SumStat statistic (corrected over 2 chromosomes, since one chromosome has one locus).
Results
Simulations
Null hypothesis (Type I error rate)
R_{ 2 }= 1. 0
For the situation where R_{2} = 1.0, we present empirical type I error rates at the 5% and 1% levels in Figure 1. The minimum observed type I error rate at the 5% level for TDTHET is 0.04 (log(0.04) = 1.46), which occurs for the settings: ϕ = 0.05, π_{1} = 0.50, p = 0.10, and the maximum observed type I error rate is 0.07 (log(0.07) = 1.17), which occurs for the settings: ϕ = 0.15, π_{1} = 0.50, p = 0.75. The median type I error rate is 0.05.
At the 1% level, the minimum observed type I error rate for TDTHET is 0.006 (log(0.006) = 2.22), which occurs for the settings: ϕ = 0.05, π_{1} = 0.25, p = 0.25, and the maximum observed type I error rate is 0.02 (log(0.02) = 1.74), which occurs for the settings: ϕ = 0.15, π_{1} = 0.50, p = 0.75. The median type I error rate is 0.01.
Given that the type I error rate is computed over 250 replicates for each simulation vector setting in Table 1, we can use the method implemented in the BINOM program [38] to compute exact 95% confidence intervals for each empirical type I error rate. For the minimum and maximum empirical rates presented above from Figure 1, BINOM indicates that 0.05 and 0.01 are contained in in each respective 95% confidence interval. In addition, in Figure 1 we include linear trend lines using the method implemented in the MS Office 2007 Excel Spreadsheet software. Note that the 5% and 1% trend lines are very close to the constant lines y = 1.30 and y = 2.00, which are the logtransformed values of 0.05 and 0.01, respectively. This result suggests that the TDTHET maintains the correct type I error rate under the null hypotheses.
As a confirmation of our simulation code, we comment that the minimum observed type I error rate at the 5% level for TDT is 0.03 (log(0.03) = 1.55), the maximum observed type I error rate is 0.06 (log(0.06) = 1.19), and the median type I error rate is 0.05. At the 1% level, the minimum observed type I error rate for TDT is 0.004 (log(0.004) = 2.40), the maximum observed type I error rate is 0.02 (log(0.02) = 1.72), and the median type I error rate is 0.01. These results suggest that our simulation code is correctly simulating null data.
Alternative hypotheses (Power)
Single Locus
In Figures 2, 3 and 4 we present contour plots of empirical powers of the TDTHET method for a single locus at the 5% significance level for dominant, multiplicative and recessive MOIs, respectively. The contour plots provide empirical power ranges as a function of the proportion of linked trios π_{1} and DAF (p), where settings of the input values are stated in Table 1, items 4 and 5). Other parameter settings that are used in these simulations are also provided in Table 1. In each of these figures, the prevalence is 0.05.
Each contour in each figure represents a range of empirical power values. In each figure, there are five contours, corresponding to power ranges (x, x + 0.20), where x = 0.00, 0.20, 0.40, 0.60, 0.80. For example, the black contour represents the power range (0.00, 0.20). The light gray contour contiguous to the black contour represents the power range (0.20, 0.40) and so forth. The lightest contour represents the power range (0.80, 1.00).
Studying these figures, we can draw a number of conclusions. First, we see that, independent of the disease MOI, as the proportion of linked trios π_{1} increases, the empirical power increases as well. This result is not surprising. It is interesting to note that power for a fixed DAF is very much dependent upon disease MOI. For example, we see in Figure 2 that empirical power for a dominant MOI tends to be larger when p ≤ 0.50. For a multiplicative MOI (Figure 3), empirical power tends to be larger for 0.25 ≤ p ≤ 0.75. Finally, for a recessive MOI (Figure 4), MOI tends to be larger when p ≥ 0.50.
How does the TDTHET statistic's power compare with that of the TDT in the presence of heterogeneity? Our previous work determining the noncentrality parameter of the TDT in the presence of heterogeneity [27] allows us to answer this question directly. However, we use TDT empirical power instead to compare "apples to apples" for power. In Figure 5, we present a Box and Whiskers plot [75] that reports a summary of the distribution of differences (TDTHET empirical power  TDT empirical power) for the various MOIs (Dominant, Multiplicative, Recessive) and significance levels (5%, 1%). TDT empirical power is computed using the method we previously published [27].
While the median power difference is approximately 0 for all six categories, we see that there is a pattern associated with disease MOI. That is, for the dominant MOIs, TDT tends to have larger power than TDTHET (gray quartile boxes below 0 in Figure 5), while for multiplicative and recessive MOIs, TDTHET tends to have higher power than TDT (gray quartile boxes above 0 in Figure 5). The minimum value for power difference of 0.05 occurs for the parameter settings: ϕ = 0.05; MOI = Dominant; R_{2} = 2.25; DAF = 0.50; π_{1} = 0.50; Significance Level = 1%. For these settings, TDTHET empirical power is 0.45, while TDT empirical power is 0.50. The maximum value for power difference of 0.17 occurs for the parameter settings: ϕ = 0.05; MOI = Recessive; R_{2} = 2.25; DAF = 0.25; π_{1} = 0.75; Significance Level = 1%. For these settings, TDTHET empirical power is 0.74, while TDT empirical power is 0.56.
Multilocus
In Figures 6, 7 and 8, we present TDTHET and TDT SumStat empirical power values (type I error rate values in Figure 6) for parameter settings listed in the Methods. The results presented in Figure 6 (R_{2} = 1.0) indicate that our SumStat statistic appears to maintain the correct type I error rate for all simulation parameters considered, with the exception of the settings: TDT SumStat Statistic, π_{1} = 0.25, DAF = 0.90, which gives an empirical type I error rate of 0.03 at the 1% level. According to BINOM, the exact 95% confidence interval for this value does not contain 0.01 (the lower bound of the interval is 0.02). However, all other simulations do contain 0.01 in their 95% confidence intervals.
Regarding empirical power, when R_{1} = 1.5 (Figure 7), TDTHET and TDT produce nearly identical powers. This can be seen from the fact that the hollow symbols of the TDT empirical powers do not seem to appear in Figure 7. The reason is that they are covered by the TDTHET empirical power symbols. When R_{1} = 3.0 (Figure 8), TDTHET and TDT also produce nearly identical powers. Note that the values on the vertical axis for Figure 8 are much higher than those for Figures 6 and 7. We comment that, when R_{1} = 3.0, we have very high power at the 1% significance level even with the proportion of linked trios is low (π_{1} = 0.25; diamonds and triangles; Figure 8). This result suggests that genotype relative risk can "trump" locus heterogeneity. We have observed this phenomenon in previous studies, where genotype relative risk is the most significant factor in determining power [76], even in the presence of "missing data" (e.g., misclassification errors).
Idiopathic Scoliosis Candidate Loci
In Table 3, we present the results of our TDTHET analysis for the five candidate loci mentioned in the Methods section. They are: RS1400180, RS10510181 (both on Chromosome 3), RS11770843 (on Chromosome 7); and RS1040315, RS2222973 (both on Chromosome 21).
The first thing to notice about these results is that the statistic values are similar. For example, on Chromosome 3, locus RS1400180 has a TDTHET statistic value of 14.78 versus a TDT value of 14.35. Similarly, on Chromosome 21, locus RS2222973 has a TDTHET statistic value of 22.53 versus a TDT value of 22.25. However, as noted above, the TDTHET statistic does not follow a central chisquared distribution with 1 degree of freedom under the null hypothesis. For that, we must compare permutation pvalues. If we compare the pointwise permutation pvalues (Pvalue (Perm) column for TDTHET and Perm01 column for PLINK TDT), we see that, for most loci the permutation pvalues are quite similar (same order of magnitude). In fact, according to BINOM, for most of the loci, the exact 95% confidence intervals overlap (full results not shown). The one exception is for locus RS11770843 on Chromosome 7. For this locus, the upper bound of the exact 95% confidence interval of the TDTHET permutation pvalue as computed by BINOM is 5.6 × 10^{5}, while the lower bound of the exact 95% confidence interval of the PLINK TDT permutation pvalue (Perm01) is 9.1 × 10^{5}. This result suggests that, for this marker locus, the TDTHET has slightly more power.
As for the multilocus results, the situation is quite similar. The Bonferroni corrected minimum pvalue of the TDTHET SumStat statistic is 0.00, on Chromosome 21. The upper bound of the exact 95% confidence interval is 3.0 × 10^{5}. The lower bound of the exact 95% confidence interval for the minimum max(T) pvalue is 2.8 × 10^{5}, indicating that the pvalues overlap. Thus power for each method is equivalent for this data set. While additional studies need to be performed, this result suggests that the SumStat method for TDTHET may not be as advantageous when loci are in HWE and/or are in linkage disequilibrium.
If there is no gain in power for the TDTHET method over the standard TDT method, what is its utility? We suggest that the value comes from the estimates of the transmission probability, the proportion of linked trios, and most especially, the estimates of the probabilities that each of the trios is linked to a particular locus. Similar information is available for the HLOD statistic in that we may obtain probability estimates that each family is linked to a particular locus [38].
We demonstrate this utility for locus RS2222973, on Chromosome 21. In Table 4, we present the posterior probability estimates ${\widehat{\tau}}_{1,abc}^{\left(r\right)}$ that each of the 10 coded trios abc (see Table 5) is linked to the locus. We note that in Table 3, the estimate of the overall proportion of linked trios, $\hat{{\pi}_{1}}$, is 0.87. As Ott points out [32] and [38] (page 224), this value can be used as a cutoff for determining coded trios that are linked to the locus as compared with coded trios that are not. Specifically, if ${\widehat{\tau}}_{1,abc}^{\left(r\right)}>\hat{{\pi}_{1}}$, then we conclude that the coded trio x_{ abc }is linked to the locus. Similarly, if ${\widehat{\tau}}_{1,abc}^{\left(r\right)}<\hat{{\pi}_{1}}$, then we conclude that the coded trio x_{ abc }is unlinked. The results in Table 4 suggest that the coded trios x_{000}, x_{100}, x_{110}, x_{201}, x_{211}, x_{222} are linked to locus RS2222973, while the remaining 4 coded trios x_{101}, x_{111}, x_{112}, x_{212} are not. Of the coded trios for which at least one parent is heterozygous (i.e., either a or b equals 1), the linked coded trios are defined by the fact that the affected child always receives the "1" allele from a heterozygous parent. This result suggests that, for this locus, the "1" allele is the risk allele.
Discussion
In this work, we present a mixture model of linked and unlinked trios and develop a statistical method to estimate the probability t that a heterozygous parent transmits the disease allele at a diallelic locus, as well as the probability π_{1} that any trio is in the linked group. The null hypothesis is that t = 0.5. The purpose here is the development of a test, the TDTHET, which extends the classic transmission disequilibrium test (TDT) to one that accounts for locus heterogeneity. Our results suggest that use of permutation pvalues enable us to correctly maintain correct type I error rates at the 5% and 1% significance levels. Power simulations using disease MOIs suggest that power can be disease model dependent, with the TDT being slightly more powerful for dominant MOIs, and the TDTHET being more power for recessive MOIs. Also, we find that our statistic can have high power, even in the presence of locus heterogeneity, when the GRR is larger.
It is interesting to note that the value of the TDTHET statistic and the corresponding permutation pvalue appears to be about the same as that of ordinary TDT for the Idiopathic scoliosis Candidate Loci data set even though results of the TDTHET analysis suggest that there is locus heterogeneity for several loci. Based on our simulations, we might conjecture that the singlelocus MOI for each SNP is multiplicative.
We computed parameters for the situation where linked and unlinked trio types come from populations with different sets of parental mating type frequencies, but apart from determining the r^{th} iteration step estimates, we did not investigate this form of the TDTHET statistic further. Given the extensive amount of work already present, we consider this work to be beyond scope of the present manuscript. We plan to follow up this research and report our findings in another manuscript.
As noted in Results, Idiopathic Scoliosis Candidate Loci section, Ott documents that a decision rule for determining whether a particular trio type x_{ abc }is linked to a locus is seeing whether the inequality:
is satisfied. Having said that, Terwilliger and Ott [77] report that, for linkage, the conditional probabilities "...should be taken with a grain of salt, and they cannot ever be validly used to separate families for the remainder of a linkage study. It should be required that any further marker typings be done on all families combined..." Their rationale for this statement is that selectively typing only linked families would introduce bias and increase the type I error rate of the linkage statistic. However, this book was published in 1994, even before the advent of SNPs. We are now producing next generation sequence data, so that the causative variant may well be typed in the first set. It remains an open question whether one can use the parameter estimates ${\widehat{\tau}}_{1,abc}^{\left(r\right)}$ to find trios that contain the causative variant(s). We recognize that there are situations where parameter estimation may be quite difficult. Vieland and Logue [78] documented that when the genetic models at linked and unlinked loci differ, maximizing the HLOD yields incorrect parameter estimates. These authors found that the admixture parameter α does not even measure the proportion of linked families within the sample, as is commonly supposed.
We conjecture that having additional information on the posterior probabilities ${\widehat{\tau}}_{1,abc}^{\left(r\right)}$ may increase the probability of correctly identifying linked trios. One of the advantages of the TDTHET statistic is that it provides estimates that each of the 10 types of trios (Table 5) is linked/unlinked. We can use this information to create a decision rule about whether a particular trio type is linked (i.e., harbors the disease allele). One possible decision rule is the inequality documented by Ott [38] and listed above (2). Ott reports that, for linkage analysis allowing for locus heterogeneity, a decision rule for determining whether a particular family is linked to a locus is checking whether the posterior probability that the family is linked is larger than or equal to the overall estimate of the proportion of linked families. We can extend this rule to our work by making the decision rule be that a trio type x_{ abc }is linked to a locus if and only if the inequality is satisfied. Here r is the iteration step such that the loglikelihoods are less than the stopping criterion.
This decision rule potentially reduces the number of trios that we need consider when looking for linked trios. We can further reduce the number of trios considered by adding the condition that we only consider trios in which at least one parent is heterozygous. Thus, the two decision rules we consider here for selecting linked trios using the TDTHET statistic are: (i) all trios that satisfy inequality (2); and (ii) all trios for which at least one parent is heterozygous and that also satisfy inequality (2).
For the TDT statistic, our analogous decision rules are: (i) all trios; and (ii) all trios for which at least parent is heterozygous.
We plan to perform an extensive analysis to evaluate the empirical probabilities that each statistic can correctly identify linked trios. We can simulate linked and unlinked trios using the method implemented in the FASTSLINK software [79, 80]. We can use different genetic model parameter settings, specifically, settings in which the genetic effect is small/large. Since FASTLINK produces pedigree files that indicate which pedigrees are linked or unlinked, we can directly test our decision rules. This is work in progress.
Given that next generation sequencing data applied to families is bound to identify large amounts of locus heterogeneity, any methods that increase the probability of identifying true disease variants should be welcome. We realize that, though, the probabilities of correctly identifying linked trios may be dependent upon the true proportion of linked trios. One way we can reduce heterogeneity is to look at larger family sizes. We plan to apply our statistic to such families and investigate its performance.
Conclusions
Motivated by the recent work of Zhou and Pan [58], we have developed a TDT statistic, TDTHET, that allows for locus heterogeneity among coded trios. This method is an extension of TDT, in that our simulation results suggest it has approximately the same power as the original TDT. Results of our simulations suggest that our method maintains correct type I error for the null hypothesis (R_{1} = 1.0). Benefits of our method include: estimates of parameters in the presence of heterogeneity (e.g., the proportion of linked coded trios, the posterior probabilities that a particular trio type is linked to a locus), and reasonable power even when the proportion of linked trios is lower. Also, we have extended Hoh, Ott, and colleagues' SumStat method to TDTHET. The parameter estimation above, particular, estimation of the probability that a trio is linked will be useful as we enter the age of nextgeneration sequencing, where one can expect extensive levels of locus heterogeneity given the rare disease frequencies.
References
 1.
Smith CAB: Homogeneity test for linkage data. Proc Sec Int Congr Hum Genet 1961, 1: 212–213.
 2.
Smith CAB: Testing for heterogeneity of recombination fraction values in human genetics. Ann Hum Genet 1963, 27: 175–182.
 3.
Duncan JA, Reeves JR, Cooke TG: BRCA1 and BRCA2 proteins: roles in health and disease. Mol Pathol 1998, 51(5):237–247.
 4.
Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King MC: Linkage of earlyonset familial breast cancer to chromosome 17q21. Science 1990, 250(4988):1684–1689.
 5.
Miki Y, Swensen J, ShattuckEidens D, Futreal PA, Harshman K, Tavtigian S, Liu Q, Cochran C, Bennett LM, Ding W, Bell R, Rosenthal J, Hussey C, Tran T, McClure M, Frye C, Hattier T, Phelps R, HaugenStrano A, Katcher H, Yakumo K, Gholami Z, Shaffer D, Stone S, Bayer S, Wray C, Bogden R, Dayananth P, Ward J, Tonin P, et al.: A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 1994, 266(5182):66–71.
 6.
Wooster R, Neuhausen SL, Mangion J, Quirk Y, Ford D, Collins N, Nguyen K, Seal S, Tran T, Averill D, Fields P, Marshall G, Narod S, Lenoir GM, Lynch H, Feunteun J, Devilee P, Cornelisse CJ, Menko FH, Daly PA, Ormiston W, McManus R, Pye C, Lewis CM, CannonAlbright LA, Peto J, Ponder BAJ, Skolnick MH, Easton DF, Goldgar DE, et al.: Localization of a breast cancer susceptibility gene, BRCA2, to chromosome 13q12–13. Science 1994, 265(5181):2088–2090.
 7.
Froguel P, Velho G: Molecular Genetics of Maturityonset Diabetes of the Young. Trends Endocrinol Metab 1999, 10(4):142–146.
 8.
De Marco EV, Gambardella A, Annesi F, Labate A, Carrideo S, Forabosco P, Civitelli D, Candiano IC, Tarantino P, Annesi G, Quattrone A: Further evidence of genetic heterogeneity in families with autosomal dominant nocturnal frontal lobe epilepsy. Epilepsy Res 2007, 74(1):70–73.
 9.
Selkoe DJ: Amyloid betaprotein and the genetics of Alzheimer's disease. J Biol Chem 1996, 271(31):18295–18298.
 10.
Criswell LA, Chen WV, Jawaheer D, Lum RF, Wener MH, Gu X, Gregersen PK, Amos CI: Dissecting the heterogeneity of rheumatoid arthritis through linkage analysis of quantitative traits. Arthritis Rheum 2007, 56(1):58–68.
 11.
NystromLahti M, Parsons R, Sistonen P, Pylkkanen L, Aaltonen LA, Leach FS, Hamilton SR, Watson P, Bronson E, Fusaro R, Cavalieri J, Lynch J, Lanspa S, Smyrk T, Lynch P, Drouhard T, Kinzler KW, Vogelstein B, Lynch HT, Chapelle Adl, Peltomäki P: Mismatch repair genes on chromosomes 2p and 3p account for a major share of hereditary nonpolyposis colorectal cancer families evaluable by linkage. Am J Hum Genet 1994, 55(4):659–665.
 12.
Kelsell DP, Dunlop J, Stevens HP, Lench NJ, Liang JN, Parry G, Mueller RF, Leigh IM: Connexin 26 mutations in hereditary nonsyndromic sensorineural deafness. Nature 1997, 387(6628):80–83.
 13.
Grifa A, Wagner CA, D'Ambrosio L, Melchionda S, Bernardi F, LopezBigas N, Rabionet R, Arbones M, Monica MD, Estivill X, Zelante L, Lang F, Gasparini P: Mutations in GJB6 cause nonsyndromic autosomal dominant deafness at DFNA3 locus. Nat Genet 1999, 23(1):16–18.
 14.
Van Laer L, Huizing EH, Verstreken M, van Zuijlen D, Wauters JG, Bossuyt PJ, Van de Heyning P, McGuirt WT, Smith RJ, Willems PJ, Legan PK, Richardson GP, Van Camp G: Nonsyndromic hearing impairment is associated with a mutation in DFNA5. Nat Genet 1998, 20(2):194–197.
 15.
Dryja TP, Li T: Molecular genetics of retinitis pigmentosa. Hum Mol Genet 1995, 4: 1739–1743. Spec No Spec No
 16.
Papaioannou M, Chakarova CF, Prescott DC, Waseem N, Theis T, Lopez I, Gill B, Koenekoop RK, Bhattacharya SS: A new locus (RP31) for autosomal dominant retinitis pigmentosa maps to chromosome 9p. Hum Genet 2005, 118(3–4):501–503.
 17.
Tong Z, Yang Z, Meyer JJ, McInnes AW, Xue L, Azimi AM, Baird J, Zhao Y, Pearson E, Wang C, Chen Y, Zhang K: A novel locus for Xlinked retinitis pigmentosa. Ann Acad Med Singapore 2006, 35(7):476–478.
 18.
Huang J, Vieland VJ: Comparison of 'modelfree' and 'modelbased' linkage statistics in the presence of locus heterogeneity: single data set and multiple data set applications. Hum Hered 2001, 51(4):217–225.
 19.
MacLean CJ, Ploughman LM, Diehl SR, Kendler KS: A new test for linkage in the presence of locus heterogeneity. Am J Hum Genet 1992, 50(6):1259–1266.
 20.
Teare DM, Barrett JH: Genetic linkage studies. The Lancet 2005, 366(9490):1036–1044.
 21.
Vieland VJ, Wang K, Huang J: Power to detect linkage based on multiple sets of data in the presence of locus heterogeneity: comparative evaluation of modelbased linkage methods for affected sib pair data. Hum Hered 2001, 51(4):199–208.
 22.
Wang D, Huang J: Detecting linkage disequilibrium in the presence of locus heterogeneity. Ann Hum Genet 2006, 70(Pt 3):397–409.
 23.
Abreu PC, Greenberg DA, Hodge SE: Direct power comparisons between simple LOD scores and NPL scores for linkage analysis in complex diseases. Am J Hum Genet 1999, 65(3):847–857.
 24.
Abreu PC, Hodge SE, Greenberg DA: Quantification of type I error probabilities for heterogeneity LOD scores. Genet Epidemiol 2002, 22(2):156–169.
 25.
Falk CT: Effect of genetic heterogeneity and assortative mating on linkage analysis: a simulation study. Am J Hum Genet 1997, 61(5):1169–1178.
 26.
Chiano MN, Yates JR: Bootstrapping in human genetic linkage. Ann Hum Genet 1994, 58(Pt 2):129–143.
 27.
Chen C, Yang G, Buyske S, Matise T, Finch SJ, Gordon D: Transmission disequilibrium test power and sample size in the presence of locus heterogeneity. Stat Appl Genet Mol Biol 2009, 8(1):44. Article Article
 28.
Morton NE: The detection and estimation of linkage between the genes for elliptocytosis and the Rh blood type. Am J Hum Genet 1956, 8: 80–96.
 29.
Risch N: A new statistical test for linkage heterogeneity. Am J Hum Genet 1988, 42(2):353–364.
 30.
Goldstein DR: A combined test of linkage heterogeneity. Am J Hum Genet 1994, 55(4):841–848.
 31.
Hodge SE, Anderson CE, Neiswanger K, Sparkes RS, Rimoin DL: The search for heterogeneity in insulindependent diabetes mellitus (IDDM): Linkage studies, twolocus models, and genetic heterogeneity. Am J Hum Genet 1983, 35: 1139–1155.
 32.
Ott J: Linkage analysis and family classification under heterogeneity. Ann Hum Genet 1983, 47: 311–320.
 33.
Risch N, Baron M: Xlinkage and genetic heterogeneity in bipolarrelated major affective illness: reanalysis of linkage data. Ann Hum Genet 1982, 46(Pt 2):153–166.
 34.
Ott J: Counting methods (EM algorithm) in human pedigree analysis: linkage and segregation analysis. Ann Hum Genet 1977, 40(4):443–454.
 35.
Faraway JJ: Distribution of the admixture test for the detection of linkage under heterogeneity. Genet Epidemiol 1993, 10(1):75–83.
 36.
Ott J: Strategies for characterizing highly polymorphic markers in human gene mapping. Am J Hum Genet 1992, 51(2):283–290.
 37.
Morton NE: Sequential tests for the detection of linkage. Am J Hum Genet 1955, 7(3):277–318.
 38.
Ott J: Analysis of Human Genetic Linkage. Third edition. Baltimore, MD: The John Hopkins University Press; 1999.
 39.
Bhat A, Heath SC, Ott J: Heterogeneity for multiple disease loci in linkage analysis. Hum Hered 1999, 49(4):229–231.
 40.
Yang X, Wang K, Huang J, Vieland VJ: Genomewide linkage analysis of blood pressure under locus heterogeneity. BMC Genet 2003, 4(Suppl 1):S78.
 41.
Knight J, North BV, Sham PC, Curtis D: Mapping loci influencing blood pressure in the Framingham pedigrees using modelfree LOD score analysis of a quantitative trait. BMC Genet 2003, 4(Suppl 1):S74.
 42.
Ekstrom CT, Dalgaard P: Linkage analysis of quantitative trait loci in the presence of heterogeneity. Hum Hered 2003, 55(1):16–26.
 43.
Wang K, Peng Y: Quantitativetraitlocus mapping in the presence of locus heterogeneity. Ann Hum Genet 2006, 70(Pt 6):882–892.
 44.
Lazarsfeld PFWHN: Latent Structure Analysis. Boston: Houghton Mifflin; 1968.
 45.
Holliday EG, McLean DE, Nyholt DR, Mowry BJ: Susceptibility locus on chromosome 1q23–25 for a schizophrenia subtype resembling deficit schizophrenia identified by latent class analysis. Arch Gen Psychiatry 2009, 66(10):1058–1067.
 46.
Todd RD, Rasmussen ER, Neuman RJ, Reich W, Hudziak JJ, Bucholz KK, Madden PA, Heath A: Familiality and heritability of subtypes of attention deficit hyperactivity disorder in a population sample of adolescent female twins. Am J Psychiatry 2001, 158(11):1891–1898.
 47.
Bureau A, Croteau J, Tayeb A, Merette C, Labbe A: Latent class model with familial dependence to address heterogeneity in complex diseases: adapting the approach to familybased association studies. Genet Epidemiol 2011, 35(3):182–189.
 48.
Derks EM, Allardyce J, Boks MP, Ophoff RA: Improvement of phenotyping in genome wide association studies on schizophrenia: an application of latent class factor analysis. Schizophrenia Research 2010, 117(2–3):184–185.
 49.
Macgregor S, Craddock N, Holmans PA: Use of phenotypic covariates in association analysis by sequential addition of cases. Eur J Hum Genet 2006, 14(5):529–534.
 50.
Qin X, Hauser ER, Schmidt S: Ordered subset analysis for casecontrol studies. Genet Epidemiol 2010, 34(5):407–417.
 51.
Perdry H, Maher BS, Babron MC, McHenry T, ClergetDarpoux F, Marazita ML: An ordered subset approach to including covariates in the transmission disequilibrium test. BMC Proc 2007, 1(Suppl 1):S77.
 52.
Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM). Am J Hum Genet 1993, 52(3):506–516.
 53.
Chung RH, Schmidt S, Martin ER, Hauser ER: Orderedsubset analysis (OSA) for familybased association mapping of complex traits. Genet Epidemiol 2008, 32(7):627–637.
 54.
Martin ER, Bass MP, Hauser ER, Kaplan NL: Accounting for linkage in familybased tests of association with missing parental genotypes. Am J Hum Genet 2003, 73(5):1016–1026.
 55.
Yang X, Huang J, Logue MW, Vieland VJ: The posterior probability of linkage allowing for linkage disequilibrium and a new estimate of disequilibrium between a trait and a marker. Hum Hered 2005, 59(4):210–219.
 56.
Huang Y, Vieland VJ: Association statistics under the PPL framework. Genet Epidemiol 2010, 34(8):835–845.
 57.
Schmidt S, Schmidt MA, Qin X, Martin ER, Hauser ER: Increased efficiency of casecontrol association analysis by using allelesharing and covariate information. Hum Hered 2008, 65(3):154–165.
 58.
Zhou H, Pan W: Binomial mixture modelbased association tests under genetic heterogeneity. Ann Hum Genet 2009, 73(Pt 6):614–630.
 59.
Hoh J, Ott J: A train of thoughts on gene mapping. Theor Popul Biol 2001, 60(3):149–153.
 60.
Hoh J, Ott J: Mathematical multilocus approaches to localizing complex human trait genes. Nat Rev Genet 2003, 4(9):701–709.
 61.
Hoh J, Ott J: Genetic dissection of diseases: design and methods. Curr Opin Genet Dev 2004, 14(3):229–232.
 62.
Hoh J, Wille A, Ott J: Trimming, weighting, and grouping SNPs in human casecontrol association studies. Genome Res 2001, 11(12):2115–2119.
 63.
Schaid DJ, Sommer SS: Genotype relative risks: methods for design and analysis of candidategene association studies. Am J Hum Genet 1993, 53(5):1114–1126.
 64.
Weinberg CR: Allowing for missing parents in genetic studies of caseparent triads. Am J Hum Genet 1999, 64(4):1186–1193.
 65.
Weinberg CR, Wilcox AJ, Lie RT: A loglinear approach to caseparenttriad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 1998, 62(4):969–978.
 66.
Kruglyak L, Daly MJ, ReeveDaly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 1996, 58(6):1347–1363.
 67.
Sobel E, Lange K: Descent graphs in pedigree analysis: applications to haplotyping, location scores, and markersharing statistics. Am J Hum Genet 1996, 58(6):1323–1337.
 68.
O'Connell JR, Weeks DE: The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype setrecoding and fuzzy inheritance. Nat Genet 1995, 11(4):402–408.
 69.
Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlinrapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 2002, 30(1):97–101.
 70.
Dempster A, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 1977, 39: 1–38.
 71.
Sharma S, Gao X, Londono D, Devroy SE, Mauldin KN, Frankel JT, Brandon JM, Zhang D, Li QZ, Dobbs MB, Gurnett CA, Grant SF, Hakonarson H, Dormans JP, Herring JA, Gordon D, Wise CA: Genomewide association studies of adolescent idiopathic scoliosis suggest candidate susceptibility genes. Hum Mol Genet 20(7):1456–1466.
 72.
Wise CA, Gao X, Shoemaker S, Gordon D, Herring JA: Understanding genetic factors in idiopathic scoliosis, a complex disease of childhood. Current genomics 2008, 9(1):51–59.
 73.
Nelson LM, Kenneth W: Genetic Markers of Chromosome 7 Associated With Scoliosis And Use Thereof. In., vol. WO/2008/033813. Switzerland: World Intellectual Property Organization; 2008.
 74.
Purcell S, Neale B, ToddBrown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC: PLINK: a tool set for wholegenome association and population based linkage analyses. Am J Hum Genet 2007, 81(3):559–575.
 75.
Tukey JW: Exploratory Data Analysis. Upper Saddle River, NJ: Pearson Education  Addison Wesley; 1977.
 76.
Ji F, Yang Y, Haynes C, Finch SJ, Gordon D: Computing asymptotic power and sample size for casecontrol genetic association studies in the presence of phenotype and/or genotype misclassification errors. Stat Appl Genet Mol Biol 2005, 4: 37. Article Article
 77.
Terwilliger JD, Ott J: Handbook of Human Genetic Linkage. Baltimore: Johns Hopkins University Press; 1994.
 78.
Vieland VJ, Logue M: HLODs, trait models, and ascertainment: implications of admixture for parameter estimation and linkage detection. Hum Hered 2002, 53(1):23–35.
 79.
Ott J: Computersimulation methods in human linkage analysis. Proceedings of the National Academy of Sciences of the United States of America 1989, 86(11):4175–4178.
 80.
Weeks DE, Ott J, Lathrop GM: SLINK: a general simulation program for linkage analysis. Am J Hum Genet 1990, 47: A204.
 81.
Abel L, MullerMyhsok B: Maximumlikelihood expression of the transmission/disequilibrium test and power considerations. Am J Hum Genet 1998, 63(2):664–667.
 82.
Gordon D, Heath SC, Liu X, Ott J: A transmission/disequilibrium test that allows for genotyping errors in the analysis of singlenucleotide polymorphism data. Am J Hum Genet 2001, 69(2):371–380.
 83.
Tu IP, Whittemore AS: Power of association and linkage tests when the disease alleles are unobserved. Am J Hum Genet 1999, 64(2):641–649.
Acknowledgements
The authors gratefully acknowledge the patients and families for their participation, and referring surgeons and associates for their assistance. This work was supported by NIH grant R01 HD052973, the Crystal Charity Ball, the Scoliosis Research Society, the Cain Foundation, and the TSRHC Research Fund (to C.A.W.).
Author information
Affiliations
Corresponding author
Additional information
Authors' contributions
DL codeveloped the TDThet statistic, performed a number of analyses and simulations and wrote a significant portion of the manuscript. SB and SJF both helped develop the TDTHET method and provided advice on application of the ExpectationMaximization Algorithm. SS performed the genotyping for the Idiopathic Scoliosis candidate loci samples and, together with CAW, wrote the section describing the Idiopathic Scoliosis candidate loci sample collection. CAW graciously provided samples for the Idiopathic Scoliosis candidate loci example. DG codeveloped the TDTHET statistic, wrote software code to compute the statistic, performed a number of simulations and wrote the majority of the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Appendix
Additional file 1: . Full details of the derivation of the TDTHET statistic, including notation. (PDF 204 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Londono, D., Buyske, S., Finch, S.J. et al. TDTHET: A new transmission disequilibrium test that incorporates locus heterogeneity into the analysis of familybased association data. BMC Bioinformatics 13, 13 (2012). https://doi.org/10.1186/147121051313
Received:
Accepted:
Published:
Keywords
 Mating Type
 Adolescent Idiopathic Scoliosis
 Transmission Disequilibrium Test
 Locus Heterogeneity
 Empirical Power