Accounting for uncertainty when assessing association between copy number and disease: a latent class model
DOI: 10.1186/1471-2105-10-172
© González et al; licensee BioMed Central Ltd. 2009
Received: 12 November 2008
Accepted: 06 June 2009
Published: 06 June 2009
Abstract
Background
Copy number variations (CNVs) may play an important role in disease risk by altering dosage of genes and other regulatory elements, which may have functional and, ultimately, phenotypic consequences. Therefore, determining whether a CNV is associated or not with a given disease might be relevant in understanding the genesis and progression of human diseases. Current stage technology give CNV probe signal from which copy number status is inferred. Incorporating uncertainty of CNV calling in the statistical analysis is therefore a highly important aspect. In this paper, we present a framework for assessing association between CNVs and disease in case-control studies where uncertainty is taken into account. We also indicate how to use the model to analyze continuous traits and adjust for confounding covariates.
Results
Through simulation studies, we show that our method outperforms other simple methods based on inferring the underlying CNV and assessing association using regular tests that do not propagate call uncertainty. We apply the method to a real data set in a controlled MLPA experiment showing good results. The methodology is also extended to illustrate how to analyze aCGH data.
Conclusion
We demonstrate that our method is robust and achieves maximal theoretical power since it accommodates uncertainty when copy number status are inferred. We have made R functions freely available.
Background
With the recent technological advances, various genome-wide studies have uncovered an unprecedented number of structural variants throughout the human genome [1–3], mainly in the form of copy number variations (CNVs). The considerable number of genes and other regulatory elements that fall within these variable regions make CNVs very likely to have functional and, ultimately, phenotypic consequences [4, 5]. In fact, recent studies have reported a correlation between copy number of specific genes and degree of disease predisposition [6–8], indicating that identification of DNA copy number is important in understanding genesis and progression of human diseases.
Several techniques and platforms have been developed for genome-wide analysis of DNA copy number, such as array-based comparative genomic hybridization (aCGH). The goal of this approach is to identify contiguous DNA segments where copy number changes are present. The ability of aCGH to distinguish between different numbers of copies is limited, so various quantitative techniques are required for more precise, targeted analysis of genomic regions. For known CNVs, real time PCR assays can be used to compare the copy number status of particular loci in cases and controls. Individuals are typically binned into copy number categories using pre-defined thresholds of probe signal intensity. Recently, Multiplex Ligation-dependent Probe Amplification (MLPA) [9] has also been used to quantify copy number classes. This method allows the analysis of several loci at the same time in a single assay. MLPA is usually used to identify gains or losses in test samples with respect to controls [10], but it can also be used in the context of association studies in a case-control or cohort settings [11, 12].
The statistical methods used in CNV-disease association studies are currently very simple. Quantitative methods give CNV probe signal intensity measurements for each individual as a continuous variable, from which copy number status is inferred, generally using pre-defined thresholds. Differences in copy number distribution between cases and controls are then assessed using χ2, Fisher or Mann-Whitney tests [6, 13, 14]. However, the distribution of CNV probe measurements is continuous and multimodal, meaning that signal intensity should be considered as a mixture of curves. In many instances, these curves overlap with various underlying distributions leading to uncertainty. Therefore, scoring copy number by binning and then assessing the association may lead to misclassification and unreliable results.
Ionita-Laza et al. (2009) pointed out that it is not inmediately clear how this uncertainty of CNV calling should be incorporated in the statistical analysis [15]. To overcome this difficulty in assessing association between CNVs and disease, we propose a latent class (LC) model that incorporates possible uncertainty that appear when CNV calling is performed. After inferring copy number using Gaussian finite mixture distributions, or any other calling algorithm, the model assesses the relationship between the trait and a CNV using a mixture of generalized linear models. Association is then tested using a likelihood ratio procedure. We validate and compare our method with existing methods through a simulation study. We then illustrate how to test association between CNVs and the trait by using two real examples. One of them corresponds to a case-control study using data from a MLPA experiment where the true copy number status is known. The second example belongs to a study where breast cancer cell lines are analyzed using aCGH.
Methods
Inference of copy number status
mutually exclusive latent classes c = 1, ...,
(e.g. copy number status). Instead of observing these classes, we observe a surrogate variable, X, corresponding to a continuous variable arising from any quantitative method. For instance, in targeted studies using MLPA or real-time PCR, X corresponds to peak intensities for each CNV probe. In the context of a whole genome scan, one may have quantitative data from aCGH or any other platform such as Illumina or Affymetrix, where, for each probe, the variable X corresponds to a ratio of intensities. Figure 1 shows a number of possible distributions that signal intensities may have. Some variants clearly show different underlying copy number status with multimodal signal intensities distributions (CNV2, CNV4 and CNV6). In other cases, where the existence of different copy numbers is not clear, inferring copy number by binning the data may be difficult or unfeasible.CNV quantitative measurements. Examples of CNV data showing different clustering quality and copy number status.
classes using the surrogate variable X. We propose to model the unobserved latent classes using a finite mixture model with
components of the form
) is the Gaussian distribution with Θ denoting all model parameters (e.g., Θ = (η
c
,
), c = 1, ...,
), and x is the surrogate variable that corresponds to the quantitative measure of copy number status. For the component weights π
c
it holds that
to be used is chosen by applying Bayesian Information Criteria (BIC) [16]. This mixture model approach for calling is similar to some used for the analysis of aCGH data [17, 18] where correlation among probes should be considered. When analyzing MLPA data, it should be pointed out that in some instances, especially when there are individuals with 0 copies, the intensity distributions (see CNV2 and CNV4 in Figure 1) for a null allele is meant to be equal to 0. However, due to experimental noise it is fact that in some cases this ratio shows values that slightly deviate from this theoretical value. After our experience with hundreds of home-made MLPA probes, the value for null alleles is typically below 0.1; nevertheless, we recommend this parameter to be determined experimentally for each of the probes used in the MLPA experiments using the appropriate control samples. For these cases, the procedure used to estimate the parameters in (1) fails because the underlying distribution of individuals with 0 copies is not normal. In these situations we propose to fit the following mixture model to determine the latent classesLatent class model
Discrete traits
binomial variables, as follows:Quantitative traits
Covariate Adjustment
Parameter estimation
, are known and that they are given by the surrogate variable X from equation (3). Therefore, they can be used in the log-likelihood calculation, resulting in
= c, Z, θ) is given by equations (9) and (10) for discrete and quantitative traits, respectively. The maximum likelihood estimators (MLE) of the model parameters maximize this log-likelihood function. We propose to use a Newton-Raphson procedure to find parameter estimates. The k-th component of the score, S, is given byFormulae for the derivatives of h ic for covariates and for discrete and qualitative traits are given in the Appendix.
where zα/2 denotes the (1 - α/2)-th quantile of a standard normal distribution, α is the desired type-I error, and subindex [·, ·] denotes the position in the inverse of Fisher's information matrix.
Hypothesis testing
. In many instances, we are interested in studying the trend in effect with respect to copy number status (e.g., additive model). This can be done by generalizing equation (11) in the form
+ K). For example, a trend test on copy number status without covariates D would have the formand the trend hypothesis on copy number status is tested using a likelihood ratio test, comparing this model with the null model. Notice that this formulation allows us to accommodate different or common effects for each latent class. In this case, parameter estimates are obtained as shown above. Formulae for the derivatives obtained in the score and Hessian, where coefficients are not shared by each latent class, are shown in the Appendix. R language functions for the methods discussed in this paper are freely available at http://www.creal.cat/jrgonzalez/software.htm[25]
Results
Simulation study
We performed computer simulation studies to empirically examine the properties of the parameter estimators developed in the previous sections. The specific goals of these studies were: (i) to evaluate the performance of the proposed likelihood ratio trend test based on the latent class model for a number of CNV measurement distributions; (ii) to examine the effect of sample size (I) on the distributional properties of the estimators; (iii) to examine the bias and mean square error (MSE) of the estimators; (iv) to assess the accuracy whether of the variance and parameter estimates obtained using the observed information matrix. Simulations were performed as follows: To study (i), we simulated a binary trait using 300 cases and 300 controls. The unobserved copy number statuses (e.g. latent classes) were simulated depending on 3 different copy number status (
= 3), with the proportion of individuals in each category set as π = (0.5, 0.4, 0.1). The trend OR was set equal to 1.5. The observed signal intensity ratio (X variable) were simulated as a finite mixture of
normal distributions using different means, η, and variances, σ2, to assess whether the separation of clusters and their variance affects power.
To study (ii)–(iv) we simulated binary and quantitative traits. For the binary trait, simulation was performed as above but simulating various scenarios of sample size (I), OR and proportion of individuals with each copy number status, π. Again, we simulated different CNV distributions by varying η and σ2. For quantitative traits, we used the same simulation procedure but copy number status was simulated depending on a fixed mean trait level for the reference copy number status and a desired mean difference with respect to other copy number statuses. Next, we describe the settings for the different simulation parameters. Sample size: We chose the values of I: I ∈ {50, 300}. Although current studies are analyzing thousands of individuals, these values were chosen to evaluate the performance of our proposed method in moderately large samples. Copy number status: Since we were interested in evaluating the performance of the parameter estimates, we only simulated two different copy number statuses
= {1, 2}. Odds ratio: To assess the impact of the strength of association between the disease and CNV, we chose two values for OR: OR ∈ {1.3, 2} in order to consider a moderate association and a strong one. Proportion of cases with normal copy number status: To evaluate the impact of classes with different number of individuals we set π ∈ {(0.8, 0.2), (0.5, 0.5)}. Finite mixture: To asses the impact of distribution of intensity ratio, X, we simulated two normal distributions with the following parameters: η ∈ {1, 1.5}, which correspond to having 2 (considered as normal copy number status) and 3 copies, respectively, and σ ∈ {(0.15, 0.15), (0.15, 0.2), (0.2, 0.2)}. In this case, these scenarios also helped us to model different situations regarding misclassification or how latent classes were separated.
We compared three different approaches. The first (NAIVE) was based on assessing association between disease and copy number status obtained using MAP from the finite mixture model (2). That is, association was assessed using a χ2 test from Table 1. The second is the approach that has been used predominantly to date when analyzing this kind of data and is based on assigning CNV status using pre-defined thresholds (THRES). Association is then assessed using a χ2 test. As mentioned previously, we simulated data from two mixtures of normal distributions with means of 1 and 1.5. This is equivalent to simulating individuals with 2 and 3 copies, respectively. In this situation, it is considered that individuals with intensity (or intensity-ratio) greater than 1.33 correspond to individuals with 3 copies [10]. The third method is the one proposed in this paper, based on latent class (LC) using a χ2 test. In order to make the results comparable, the performance of LC based on likelihood ratio trend test was compared with that of the two other methods using a χ2 trend test (e.g. 1 degree of freedom). To evaluate bias and MSE of parameter estimates, χ2 of association was used for all three methods.
Empirical power for simulation studies. Empirical power for the three different approaches analyzed, varying the quality of clustering for underlying copy number status. Left panel is for fixed variance and varying means, while the right panel is for fixed mean and varying variances.
The left panel shows the power for each method, varying the CNV measurement distribution with regard to the mean of each latent class, η, while the right panel gives the same information but with fixed means and varying variances, σ2. Figure 2 also depicts the distribution of CNV signal intensities for various scenarios. We observe that our proposed latent class model performs better in all cases, even when distribution of copy number status are not very well separated (e.g. more uncertainty).
Simulation study
Mean Square Error (×103) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
I | π | e β | σ | SIM | NAIVE | THRES | LC | NAIVE | THRES | LC |
50 | 0.8 | 1.3 | (0.15,0.15) | 1.23 | 1.17 | 1.15 | 1.20 | 57 | 87 | 42 |
50 | 0.8 | 1.3 | (0.2,0.2) | 1.24 | 1.14 | 1.09 | 1.21 | 107 | 131 | 114 |
50 | 0.8 | 1.3 | (0.15,0.2) | 1.28 | 1.18 | 1.15 | 1.24 | 134 | 148 | 112 |
50 | 0.8 | 2 | (0.15,0.15) | 1.60 | 1.40 | 1.28 | 1.48 | 54 | 85 | 44 |
50 | 0.8 | 2 | (0.2,0.2) | 1.82 | 1.36 | 1.29 | 1.52 | 152 | 158 | 126 |
50 | 0.8 | 2 | (0.15,0.2) | 1.89 | 1.42 | 1.33 | 1.57 | 180 | 253 | 162 |
50 | 0.5 | 1.3 | (0.15,0.15) | 1.26 | 1.24 | 1.21 | 1.26 | 39 | 51 | 32 |
50 | 0.5 | 1.3 | (0.2,0.2) | 1.32 | 1.28 | 1.25 | 1.35 | 82 | 79 | 97 |
50 | 0.5 | 1.3 | (0.15,0.2) | 1.26 | 1.23 | 1.20 | 1.26 | 66 | 72 | 60 |
50 | 0.5 | 2 | (0.15,0.15) | 2.04 | 1.94 | 1.83 | 2.05 | 40 | 67 | 34 |
50 | 0.5 | 2 | (0.2,0.2) | 2.04 | 1.76 | 1.68 | 2.05 | 107 | 128 | 92 |
50 | 0.5 | 2 | (0.15,0.2) | 2.06 | 1.78 | 1.72 | 1.99 | 87 | 107 | 71 |
300 | 0.8 | 1.3 | (0.15,0.15) | 1.30 | 1.25 | 1.18 | 1.30 | 13 | 32 | 10 |
300 | 0.8 | 1.3 | (0.2,0.2) | 1.32 | 1.25 | 1.15 | 1.34 | 27 | 50 | 29 |
300 | 0.8 | 1.3 | (0.15,0.2) | 1.30 | 1.22 | 1.16 | 1.29 | 24 | 42 | 21 |
300 | 0.8 | 2 | (0.15,0.15) | 2.01 | 1.87 | 1.49 | 2.01 | 21 | 120 | 13 |
300 | 0.8 | 2 | (0.2,0.2) | 2.03 | 1.70 | 1.36 | 1.99 | 69 | 203 | 43 |
300 | 0.8 | 2 | (0.15,0.2) | 2.03 | 1.62 | 1.38 | 1.86 | 78 | 189 | 38 |
300 | 0.5 | 1.3 | (0.15,0.15) | 1.31 | 1.27 | 1.26 | 1.30 | 7 | 9 | 5 |
300 | 0.5 | 1.3 | (0.2,0.2) | 1.30 | 1.23 | 1.22 | 1.30 | 15 | 17 | 12 |
300 | 0.5 | 1.3 | (0.15,0.2) | 1.30 | 1.24 | 1.23 | 1.29 | 12 | 14 | 9 |
300 | 0.5 | 2 | (0.15,0.15) | 2.00 | 1.87 | 1.77 | 2.00 | 11 | 23 | 5 |
300 | 0.5 | 2 | (0.2,0.2) | 2.00 | 1.72 | 1.66 | 2.02 | 36 | 51 | 15 |
300 | 0.5 | 2 | (0.15,0.2) | 2.00 | 1.76 | 1.71 | 1.97 | 26 | 37 | 10 |
Regarding variance estimates, the estimation based on ASYM showed good performance in all scenarios (see Additional file 1, Table S1). Despite slightly overestimating of EMP, the bias was less pronounced for I = 300, as expected. Confidence intervals based on the LC method outperform those obtained by other methods with regard to power.
Application to real data
MLPA example
Association between Gene 1 and disease. Graphical representation of peak intensities (CNV quantitative measurement) of individuals for Gene 1 analyzed in the example. The various colors indicate copy number status inferred using our proposed finite mixture model.
Association between Gene 2 and disease. Graphical representation of peak intensities (CNV quantitative measurement) of individuals for Gene 2 analyzed in the example. The various colors indicate copy number status inferred using our proposed finite mixture model.
Contingency table of estimated and true copy number status for the two genes examined in the real data example.
True copy number status | |||
|---|---|---|---|
0 | 1 | 2 | |
Gene 1 | |||
0 | 426 | 0 | 0 |
1 | 0 | 201 | 0 |
2 | 0 | 0 | 24 |
Gene 2 | |||
0 | 85 | 0 | 0 |
1 | 5 | 287 | 0 |
2 | 0 | 73 | 204 |
Association analysis of disease status and copy number category using the true copy number status and the estimated status obtained using the finite mixture proposed.
True CN | Estimated CN | ||||||
|---|---|---|---|---|---|---|---|
Co | Ca | OR (CI95%) | Co | Ca | ORnaïve (CI95%) | ORLC (CI95%) | |
Gene 1 | |||||||
0 | 210 | 216 | 1 | 210 | 216 | 1 | 1 |
1 | 75 | 126 | 1.63 (1.16,2.30) | 75 | 126 | 1.63 (1.16,2.30) | 1.63 (1.16,2.30) |
2 | 6 | 18 | 2.92 (1.14,7.49) | 6 | 18 | 2.92 (1.14,7.49) | 2.92 (1.14,7.50) |
P association | 0.0027 | 0.0027 | 0.0023 | ||||
P trend | 5.0 × 10-4 | 5.0 × 10-4 | 5.0 × 10-4 | ||||
Gene 2 | |||||||
0 | 24 | 66 | 1 | 22 | 63 | 1 | 1 |
1 | 159 | 201 | 0.46 (0.27,0.77) | 129 | 178 | 0.44 (0.26,0.75) | 0.47 (0.27,0.82) |
2 | 108 | 93 | 0.31 (0.18,0.54) | 140 | 119 | 0.33 (0.19,0.57) | 0.31 (0.18,0.54) |
P association | 7.2 × 10-5 | 2.3 × 10-4 | 8.4 × 10-5 | ||||
P trend | 2.1 × 10-5 | 1.0 × 10-4 | 2.1 × 10-5 | ||||
aCGH example
Steps used to assess association between CNVs and traits when aCGH is used.
Step 1. Use any aCGH calling procedure that provides MAP (uncertainty) |
|---|
Step 2. Build blocks/regions of consecutive probes with similar signatures |
Step 3. Use the signature that occurs most in a block to perform association unsing LC model |
Step 4. Correct for multiple testing considering dependency among signatures |
We applied the methodology to the breasts cancer data studied by Neve et al. [29], which is freely available from the bioconductor website http://www.bioconductor.org/[30]. The data consists on CGH arrays of 1 MB resolution [31]. The authors chose the 50 samples that could be matched to the name tokens of caArrayDB data (June 9th 2007).
Number of CNV blocks (out of 459) associated with estrogen receptor positivity from 50 aCGH breast cancer cell lines.
Significance level | |||||
|---|---|---|---|---|---|
10-6 | 10-5 | 10-4 | 10-3 | 10-2 | |
Latent class model | 1 | 4 | 27 | 64 | 117 |
Chi-square test | 0 | 2 | 10 | 41 | 93 |
Discussion
In this paper we have shown that the assessment of association between CNVs and disease using analysis methods that do no take into account uncertainty when inferring copy number status lead to larger p-values and underestimate the model parameters. This confounds the need to increase statistical power, which is reduced by the multiple comparison correction for the simultaneous testing of several loci. False positives are typically controlled by a dramatic reduction in the nominal p-value, such that very low values are required to reach statistical significance. Thus, a precise computation of these values is essential in genetic association studies.
Here we have proposed a latent class model (LC) that accounts for the uncertainty of assessing CNV status and also accommodates potential confounding factors. In the case of analyzing quantitative traits, we also provide formulae to further propagate call uncertainty, as other authors have proposed in another context [32]. By analyzing quantitative traits, we have assumed that the response variable follows a normal distribution, although this assumption does not hold in some instances. In this situation, one possibility is to analyze the log-transformed variable, although log transformation may not be not sufficient. The model could easily be extended to fit a response variable that has any exponential family distribution (e.g. normal, gamma, Poisson). However, we have not yet implemented this option in the functions reported here. The extension of our proposed latent-class model to assess survival time, possibly with right-censored data, is not trivial but could be a very interesting avenue for future investigation. The parameter estimation procedure proposed here, allows the estimation of confidence intervals. The LC model was remarkably consistent with simulated data. In particular, we found that the p-values obtained with the LC model were more similar to the expected values than those obtained by the threshold and naïve methods.
We maximize the likelihood function, assuming fixed weights for each copy number status, which accounts for possible misclassification. The main advantage of considering weights as known constants is that the Newton-Raphson procedure is much simpler, faster and feasible for obtaining the Hessian matrix analytically. We confirmed that the proposed model captures very well the nature of the synthetic data and variance estimates. Interestingly, we observed that the variance estimates using MLE were also reproduced when a bootstrap procedure was used (see Additional file 1, Table S2). In the interest of generalization, one can consider maximizing the likelihood function for both model parameters and weights. In that case, an EM algorithm should be used instead. However, one should bear in mind that EM does not allow for estimation of the variance of the model parameters and is computationally expensive, which may be particularly costly if this method is used in whole genome scan settings.
Conclusion
We have shown that the LC model can incorporate uncertainty of CNV calling in the analysis. We have also illustrated how to analyze quantitative traits as well as how to accomodate confounding variables. This is of particular importance in complex diseases studies where other clinical or biochemical factors need to be taken into account. The formulation can also be generalized to assess survival times or counts in longitudinal studies. The model has showed good performance when analyzing both targeted (MLPA data) and whole genome (aCGH data) studies.
Appendix
= c, Z, θ) is given by equations (9) and (10) for discrete and quantitative traits, respectively. As previously mentioned, the k-th component of the score, S, is given byHerein we provide formulae for the derivatives of h ic for all cases discussed in this paper. Although the following expressions may appear complicated, they are straightforward to program and are included in the >R functions available at http://www.creal.cat/jrgonzalez/software.htm.
Binary Traits
Binary Traits without covariates
Binary Traits with covariates
Quantitative traits
Trend test
Declarations
Acknowledgements
The first author would like to thank Xavier Bassagaña for his comments and helpful conversations about the model proposed. Gavin Lucas is also acknowledged for his comments on a last version of the manuscript. The authors also want to thank helpful comments on how to analyze aCGH data given by one of the reviewers. This work was supported by the Spanish Ministry for Science and Innovation [MTM2008-02457 to JRG and SAF2008-00357 to XE]; and the European Commission [AnEUploidy project; FP6-2005-LifeSciHealth contract #037627].
Authors’ Affiliations
References
- Locke DP, Sharp AJ, McCarroll SA, McGrath SD, Newman TL, Cheng Z, Schwartz S, Albertson DG, Pinkel D, Altshuler DM, Eichler EE: Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am J Hum Genet 2006, 79(2):275–90. 10.1086/505653PubMed CentralView ArticlePubMedGoogle Scholar
- Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Grata-cos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Armengol L, Conrad DF, Es-tivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME: Global variation in copy number in the human genome. Nature 2006, 444(7118):444–54. 10.1038/nature05329PubMed CentralView ArticlePubMedGoogle Scholar
- Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, Ng RT, Brown CJ, Eichler EE, Lam WL: A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 2007, 80: 91–104. 10.1086/510560PubMed CentralView ArticlePubMedGoogle Scholar
- Feuk L, Carson AR, Scherer SW: Structural variation in the human genome. Nat Rev Genet 2006, 7(2):85–97. 10.1038/nrg1767View ArticlePubMedGoogle Scholar
- Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007, 315(5813):848–53. 10.1126/science.1136678PubMed CentralView ArticlePubMedGoogle Scholar
- Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs RJ, Freedman BI, Quinones MP, Bamshad MJ, Murthy KK, Rovin BH, Bradley W, Clark RA, Anderson SA, O'Connell RJ, Agan BK, Ahuja SS, Bologna R, Sen L, Dolan MJ, Ahuja SK: The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 2005, 307(5714):1434–40. 10.1126/science.1101160View ArticlePubMedGoogle Scholar
- Rovelet-Lecrux A, Hannequin D, Raux G, Le Meur N, Laquerriere A, Vital A, Dumanchin C, Feuillette S, Brice A, Vercelletto M, Dubas F, Frebourg T, Campion D: APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy. Nat Genet 2006, 38: 24–6. 10.1038/ng1718View ArticlePubMedGoogle Scholar
- Le Marechal C, Masson E, Chen JM, Morel F, Ruszniewski P, Levy P, Ferec C: Hereditary pancreatitis caused by triplication of the trypsinogen locus. Nat Genet 2006, 38(12):1372–4. 10.1038/ng1904View ArticlePubMedGoogle Scholar
- Schouten JP, McElgunn CJ, Waaijer R, Zwijnenburg D, Diepvens F, G P: Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification. Nucleic Acids Res 2002, 30(12):e57. 10.1093/nar/gnf056PubMed CentralView ArticlePubMedGoogle Scholar
- González J, Carrasco J, Armengol L, Villatoro S, Jover L, Yasui Y, Estivill X: Probe-specific mixed-model approach to detect copy number differences using multiplex ligation-dependent probe amplification (MLPA). BMC Bioinformatics 2008, 9: 261. 10.1186/1471-2105-9-261PubMed CentralView ArticlePubMedGoogle Scholar
- Engert S, Wappenschmidt B, Betz B, Kast K, Kutsche M, Hellebrand H, Goecke T, Kiechle M, Niederacher D, Schmutzler R, Meindl A: MLPA screening in the BRCA1 gene from 1,506 German hereditary breast cancer cases: novel deletions, frequent involvement of exon 17, and occurrence in single early-onset cases. Hum Genet 2008, 29(7):948–58.Google Scholar
- Hansen T, Jonson L, Albrechtsen A, Andersen M, Ejlertsen B, Nielsen F: Large BRCA1 and BRCA2 genomic rearrangements in Danish high risk breast-ovarian cancer families. Breast Cancer Res Treat 2008, in press.Google Scholar
- Aitman T, Dong R, Vyse T, Norsworthy P, Johnson M, Smith J, Mangion J, Roberton-Lowe C, Marshall A, Petretto M, Hodges E, Bhangal G, Patel S, Sheehan-Rooney K, Duda M, Cook P, Evans D, Domin J, Flint J, Boyle J, Pusey C, Cook H: Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 2006, 439(7078):851–5. 10.1038/nature04489View ArticlePubMedGoogle Scholar
- Fellermann K, Stange D, Schaeffeler E, Schmalzl H, Wehkamp J, Bevins C, Reinisch W, Teml A, Schwab M, Lichter P, Radlwimmer B, Stange E: A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am J Hum Genet 2006, 79(3):439–48. 10.1086/505915PubMed CentralView ArticlePubMedGoogle Scholar
- Ionita-Laza I, Rogers AJ, Lange C, Raby BA, Lee C: Genetic association analysis of copy-number variation (CNV) in human disease pathogenesis. Genomics 2009, 93: 22–26. 10.1016/j.ygeno.2008.08.012PubMed CentralView ArticlePubMedGoogle Scholar
- Fraley C, Raftery AE: How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal 1998, 41: 578–588. 10.1093/comjnl/41.8.578View ArticleGoogle Scholar
- Picard F, Robin S, Lebarbier E, Daudin JJ: A segmentation/clustering model for the analysis of array CGH data. Biometrics 2007, 63(3):758–766. 10.1111/j.1541-0420.2006.00729.xView ArticlePubMedGoogle Scholar
- Wiel MA, Kim KI, Vosse SJ, van Wieringen WN, Wilting SM, Ylstra B: CGHcall: calling aberrations for array CGH tumor profiles. Bioinformatics 2007, 23(7):892–894. 10.1093/bioinformatics/btm030View ArticlePubMedGoogle Scholar
- Leisch F: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software 2004, 11(8):1–18.View ArticleGoogle Scholar
- Du J: Combined Algorithms for Fitting Finite Mixture Distributions. PhD thesis. McMaster University, Ontario, Canada; 2002.Google Scholar
- Bashir S, Duffy S: The correction of risk estimates for measuremente error. Ann Epidem 1993, 7: 156–164.Google Scholar
- Davidov O, Faraggi D, Reiser B: Misclassification in logistic regression with discrete covariates. Biometrical Journal 2003, 5: 541–553. 10.1002/bimj.200390031View ArticleGoogle Scholar
- Greenland S: Basic methods for sensitivity analysis of biases. Int J Epi 1996, 25: 1107–1115. 10.1093/ije/25.6.1107-aView ArticleGoogle Scholar
- Spiegelman D, Rosner B, Logan R: Estimation and inference for logistic regression with covariate missclassification and measurement error, in main study/validation study designs. J Am Stat Assoc 2000, 95: 51–61. 10.2307/2669522View ArticleGoogle Scholar
- CREAL's web-page[http://www.creal.cat/jrgonzalez/software.htm]
- Wiel M, van Wieringen W: CGHregions: dimension reduction for array CGH data with minimal information loss. Cancer Informatics 2007, 2: 55–63.Google Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 1995, 57: 289–300.Google Scholar
- Sarkar S: False discovery and false nondiscovery rates in single-step multiple testing procedures. The Annals of Statistics 2006, 34: 394–415. 10.1214/009053605000000778View ArticleGoogle Scholar
- Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe JP, Tong F, Speed T, Spellman PT, DeVries S, Lapuk A, Wang NJ, Kuo WL, Stilwell JL, Pinkel D, Albertson DG, Waldman FM, McCormick F, Dickson RB, Johnson MD, Lippman M, Ethier S, Gazdar A, Gray JW: A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 2006, 10(6):515–527. 10.1016/j.ccr.2006.10.008PubMed CentralView ArticlePubMedGoogle Scholar
- Bioconductor's web-page[http://www.bioconductor.org/]
- M Neve et al in Gray Lab at LBL: Neve2006: expression and CGH data on breast cancer cell lines. [R package version 0.1.6].Google Scholar
- van Wieringen WN, Wiel MA: Nonparametric testing for DNA copy number induced differential mRNA gene expression. Biometrics 2009, 65: 19–29. 10.1111/j.1541-0420.2008.01052.xView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.






























































