 Methodology article
 Open Access
A Hidden Markov Model to estimate population mixture and allelic copynumbers in cancers using Affymetrix SNP arrays
 Philippe Lamy^{1},
 Claus L Andersen^{2},
 Lars Dyrskjot^{2},
 Niels Torring^{2} and
 Carsten Wiuf^{1, 2}Email author
https://doi.org/10.1186/147121058434
© Lamy et al; licensee BioMed Central Ltd. 2007
 Received: 10 May 2007
 Accepted: 09 November 2007
 Published: 09 November 2007
Abstract
Background
Affymetrix SNP arrays can interrogate thousands of SNPs at the same time. This allows us to look at the genomic content of cancer cells and to investigate the underlying events leading to cancer. Genomic copynumbers are today routinely derived from SNP array data, but the proposed algorithms for this task most often disregard the genotype information available from germline cells in paired germlinetumour samples. Including this information may deepen our understanding of the "true" biological situation e.g. by enabling analysis of allele specific copynumbers. Here we rely on matched germlinetumour samples and have developed a Hidden Markov Model (HMM) to estimate allelic copynumber changes in tumour cells. Further with this approach we are able to estimate the proportion of normal cells in the tumour (mixture proportion).
Results
We show that our method is able to recover the underlying copynumber changes in simulated data sets with high accuracy (above 97.71%). Moreover, although the known copynumbers could be well recovered in simulated cancer samples with more than 70% cancer cells (and less than 30% normal cells), we demonstrate that including the mixture proportion in the HMM increases the accuracy of the method. Finally, the method is tested on HapMap samples and on bladder and prostate cancer samples.
Conclusion
The HMM method developed here uses the genotype calls of germline DNA and the allelic SNP intensities from the tumour DNA to estimate allelic copynumbers (including changes) in the tumour. It differentiates between different events like uniparental disomy and allelic imbalances. Moreover, the HMM can estimate the mixture proportion, and thus inform about the purity of the tumour sample.
Keywords
 Hide Markov Model
 Transition Parameter
 Hide State
 Allelic Imbalance
 Mixture Proportion
Background
Chromosomal abnormalities such as lossofheterozygosity (LOH) or genomic copynumber changes are frequent in tumour cells. LOH occurs when a heterozygous marker in germline DNA of an individual becomes homozygous in cancer DNA of the same individual. This event is the result of losing one allele of a chromosomal region while the other allele is retained, duplicated (uniparental disonomy), or multiplicated (uniparental polysomy). In the same way, chromosomal amplifications can be unbalanced (if only one allele of a chromosomal region is multiplicated) or balanced (if both alleles are multiplicated). Detecting chromosomal abnormalities is important in cancer research as it allows the discovery of chromosomal regions possibly harbouring cancerrelated genes such as tumour suppressor genes or oncogenes. It may also be used to identify genomic markers (i.e. chromosomal abnormalities) that may distinguish between clinically important stages in the disease course, e.g. markers of metastasis or markers of treatment response.
Single nucleotide polymorphisms (SNPs) account for most of the genetic variation in the human genome. They occur every 100 to 300 bases along the 3billionbase human genome [1]. Different techniques (e.g. Illumina [2], Affymetrix [3], Perlegen [4]) have been developed in order to genotype thousands of SNPs distributed all over the genome at the same time. In this paper, we focus on Affymetrix SNParrays, but note that the method we have developed can be applied to data obtained from other experimental platforms as well.
The Affymetrix technique is based on genomic hybridization to synthetic highdensity oligonucleotide microarrays. Each of the two alleles of a SNP is represented by 10 oligonucleotides (together called a probeset) and hybridization (probe) intensities are measured for all probes in the probeset [3]. Different algorithms [5–8], have been developed to genotype correctly SNPs from the Affymetrix intensities. A very high accuracy and concordance of genotype calls is observed for normal samples as the ploidy is always two. However, it is much more diffcult to genotype cancer samples due to genomic alterations that might change the ploidy number.
Hidden Markov Models (HMMs) have been used extensively to recover unobserved underlying states that give rise to an observed sequence of data. In relation to LOH analyses HMMs have been used to infer whether an allele is lost or retained (i.e. two hidden states) from genotype data [9–11]. Lin et al. [10] and Koed et al. [9] developed HMM methods that score the presence of allelic imbalance mainly based on converted SNPs (when AB call becomes AA or BB in the cancer sample). In [11], Beroukhim et al. describes a HMMbased method to identify LOH from unpaired tumour samples. They use the genotype calls to identify whether a SNP marker is in a retention state or in a LOH state. By integrating copynumber analysis into the analysis, they can distinguish LOH from allelic imbalance. However, the LOH analysis and the copynumber analysis are performed separately. Besides, the LOH analysis is highly dependent on the genotype calls even if the possibility of genotyping errors is taken into account.
HMMs have also been used for copynumber analysis. In [12], Fridlyand et al. developed a HMM to analyse microarraybased comparative genomic hybridization (array CGH) data. In [13], Zhao et al. developed a method to infer DNA copynumbers using Affymetrix SNParrays. They combined probeset intensities for each SNP into a single value and used the values as an observed sequence of data in their HMM. These methods are not allele specific and thus cannot distinguish e.g. retention (keeping both alleles) from uniparental disomy (losing one allele and duplicating the other one), which appears to be very important and widespread in certain cancers [14].
More recently, methods to infer allele specific copynumbers have been published [15, 16]. Laframboise et al. [15] used a circular binary segmentation (CBS) algorithm which originally was used for array CGH [17]. Huang et al. [16] used a kernel smoothing method to estimate allelic copynumber changes. In [18], Nannya et al. describes a HMM to infer allelic copynumbers that is based on the observed sequence of SNPs intensities ratios for which the corresponding normal SNP markers are heterozygous.
In this study, we developed a HMM method to infer allele specific copynumbers using Affymetrix SNP arrays. In a sense the method works on paired normaltumour samples. It takes as input the genotype calls of the normal sample, the allelic specific intensities of the tumour sample and outputs the estimated copynumber states of each allele for each SNP. To limit the state space of the underlying Markov Chain, we restricted the possible copynumbers of each allele to 0, 1, 2 and > 2. Many tumour samples contain a large fraction of normal cells and this potentially affects the performance of the method. We therefore included the possibility to estimate the population mixture (proportion of cancer cells; henceforth called mixture proportion) from the data and used this in the analysis. We did this in a way similar to Fridlyand et al.'s method for array CGH [12]. We tested our HMM model on simulated data sets, normal samples from the HapMap project and on bladder and prostate tumour samples.
Results and Discussion
We first normalized the 90 HapMap arrays and the 134 cancer arrays and transformed the allele intensities as described in Methods. We then selected SNPs for each of the three groups of arrays: the HapMap, bladder and prostate groups. The selection was done using only the normal samples from each group as described in Methods. After selection, we had 17,198 SNPs selected for the HapMap group, 15,237 SNPs for the bladder group and 17,541 SNPs for the prostate group.
Estimating the parameters
Estimation the transition parameters and the states
True r  

True p  Estimation  0.001  0.01  0.05  0.1 
p  0.00115  0.00104  0.00101  0.00107  
0.001  r  0.00108  0.01028  0.04889  0.09462 
Accuracy  0.99965  0.99893  0.99487  0.99227  
p  0.00959  0.00994  0.00934  0.01006  
0.01  r  0.00112  0.01057  0.04931  0.09588 
Accuracy  0.99725  0.99622  0.99303  0.98850  
p  0.04861  0.04831  0.04794  0.04849  
0.05  r  0.00109  0.01034  0.04812  0.09402 
Accuracy  0.99180  0.99000  0.98517  0.98018  
p  0.09551  0.09356  0.09603  0.09300  
0.1  r  0.00140  0.01006  0.04857  0.09480 
Accuracy  0.99233  0.98980  0.98269  0.97708 
Accuracy of the method when the sample is a population mixture
In the simulated data, amplification of an allele always implies that the copynumber increases by one; e.g. if the SNP is heterozygous then amplification of the A allele results in two A alleles. In real data, this is not always true: amplification may increase the allelic copynumber by more than one. Thus it is easier for the method to recover the true hidden states in the simulated data than in real data.
Estimating the population mixture
Estimation of the mixture proportion
Estimated mixture  Accuracy (%)  

True mixture  Average  Stdev  Without mixture  With mixture 
0.60  0.603  0.012  90.31  95.45 
0.70  0.719  0.069  94.43  96.16 
0.80  0.803  0.009  96.87  97.52 
0.90  0.902  0.014  98.16  98.25 
Reliable information about mixture is only available if the sample contains SNPs with different copynumber alterations. For example, if the observed copynumber of a SNP is 4.7, then it is not possible to distinguish between a mixture of 1) 90% tumour cells with 5 copies and 10% normal cells (2 copies) and 2) 54% tumour cells with 7 copies and 46% normal cells. However, if SNPs exist in several different states, then it becomes possible to distinguish between different mixtures. In case 1), a SNP in state 1 will have an observed copynumber of 1.1 and in case 2), the observed copynumber will be 1.46.
Estimation of the mixture proportion (m) in the bladder and the prostate groups
M < 1  

Cancer type  Number of samples  Number of samples with m = 1  Average  Stdev 
Bladder  18  12  0.793  0.193 
Prostate  25  21  0.93  0.055 
Varying the transition parameters across the chromosomes
Compararison of two estimation methods on 40 simulated samples
Accuracy in %  Average posterior probability  

Method  Number of samples  Average  stdev  True state  False state 
Onearray  
 only normal heterozygous calls  20  99.495  0.058  0.997  0.812 
 all calls  20  98.548  0.147  0.992  0.845 
Allarray  
 only normal heterozygous calls  20  99.552  0.042  0.997  0.806 
 all calls  20  98.653  0.150  0.992  0.830 
In order to account for the similarities between different samples from the same cancer type, we also estimated the transition parameters for each chromosome across all samples of a group. This modified version allows the chromosomes to be ranked according to the frequency of changes occurring as reflected in the estimated transition parameters. We ran the modified version on the same 40 simulated samples (Table 4). As expected, we achieved a slightly better accuracy in recovering the hidden states. The two methods agreed on 99.70% of the recovered states when we put no restrictions on the genotypes and on 99.86% of the recovered states when all SNPs were assumed to be heterozygous.
We applied our allarray method to the set of bladder and prostate tumours and compared the results obtained analysing one sample at a time. The two methods agreed on 95.71% of the states for the bladder group and on 96.24% of the states for the prostate group. From the results of the allarray method, we were also able to classify chromosomes according to how often a change in copynumber occurs. For the bladder group, copynumber changes occurred most often in chromosomes 8 and 9. These two chromosomes are known to be frequently abnormal in bladder tumours [22, 23]. For the prostate group, copynumber changes occurred most often in chromosomes 3, 7, 8 and 16. A combined analysis of published CGH studies [24] and a study based on SNP arrays [25] showed that these chromosomes are frequently abnormal in prostate tumours.
Uniparental disomy
Using the homozygous SNPs to estimate allelic copynumber changes
In our HMM approach, there are two ways to estimate allelic copynumber changes. One can choose to use only the SNPs which are heterozygous in the germline sample or to use all SNPs including the homozygous SNPs. Both ways have been tested here. Using only the SNPs which are heterozygous in the germline sample is the best way to obtain good estimates of the underlying states because all states of the HMM are differentiable. In this paper, we obtained a high rate of recovery (above 99.40%, Table 4) when the simulated samples had only heterozygous calls. However, the average heterozygosity in the Affymetrix Genechip 100 k SNP arrays is only around 0.3 [27]. This implies that less than one third of the SNPs are heterozygous in a normal sample. Therefore, including SNPs with homozygous calls in the germline sample may improve the resolution of the map of allelic copynumber changes. When we included all genotype calls in the analysis, we still obtained a high rate of recovery (above 98.50%, Table 4). SNPs with homozygous genotypes in the germline sample can also differentiate different states based on their different copynumbers. However, some states are very similar: states 0 and 3 or states 3, 4 and 5. The presence of heterozygous SNPs helps in differentiating these states. Oppositely the presence of homozygous SNPs might help in differentiating between heterozygous states (e.g. states 0 and 4; see Figure 1); for example, if noise is corrupting the signal from a heterozygous SNP with homozygous neighbours, then the copy numbers of the neighbours can point to whether the heterozygous SNP is in state 0 or 4.
Comparison with PLASQ
PLASQ [15] was run on 10 of the real samples (6 prostate and 4 bladder samples). The states estimated by PLASQ were converted into the corresponding states in our model and the results were compared. Agreement between the two methods was on average 90.47% (ranging from 77.05% to 98.11%). Generally, our method detects more abnormalities than PLASQ. This is concordant with previous observations concerning PLASQ, where it has been found that PLASQ is conservative [28]; i.e. PLASQ has a tendency to prefer the normal state. In order to be more conservative, we ran our HMM with a higher standard deviation in the emission density for the normal state. As expected, agreement between the two methods increased to an average of 95.02% (ranging from 88.86% to 99.38%). Additionally we calculated the average posterior probabilities of the states when both methods agree or disagree. As expected, when both methods agree, the average is higher than when they disagree (0.983 against 0.909).
Conclusion
In this study, we described a HMMbased method to estimate allelic SNP copynumber changes, LOH and allelic imbalance using Affymetrix GeneChip SNP arrays. The method takes as input the genotype call of the germline sample and the allelic SNP intensities of the tumour sample and outputs the estimated copynumber states for each SNP. The different hidden states estimated by the HMM correspond to different events occurring in the cancer cell. A chromosomal region may remain unchanged in a cancer cell, may lose one allele (LOH event) or both alleles (homozygous deletion), may lose one allele and multiplicate the other one (LOH+uniparental disomy), may multiplicate one allele (allelic imbalance) or both alleles (see Figure 1). Our method is able to reliably differentiate between these events.
When samples are taken from tumour tissue, they often contain a mixture of normal and cancer cells. Different techniques such as microdissection can help keep the percentage of normal cells low but this is a procedure that cannot be done automatically and is not always done. In this study, we showed that it is possible to estimate the true mixture proportion of a sample. We also showed that knowledge of the mixture proportion improves estimation of the allelic copynumbers. In fact, the SNP intensity reflects the average copynumber of that particular SNP in the different cells in the sample. However, population mixture of normal and cancer cells might be confused with tumour heterogeneity. Multiclonality has been shown to occur in bladder cancer as well as in prostate cancer and this could also lead to noninteger copynumbers, i.e. the average over all cells is not an integer. It would be interesting to tackle this issue in future work.
Finally, we discussed the utility of using SNPs which are homozygous in the germline samples in the estimation of allelic copynumber changes. We showed that despite the fact that they cannot really differentiate between events by themselves, e.g. normal state and uniparental disomy with a single duplication, they are useful in getting a finer map of copynumber changes in the cancer.
Methods
Materials
We used tumour and blood samples from 38 patients diagnosed with prostate cancer and 29 patients diagnosed with bladder cancer. All the bladder tumour samples were macrodissected. This implies that any connective tissue and muscle tissue were scraped away with a scalpel while looking at the tumour section in a microscope [29]. All the prostate samples were laser microdissected [30]. The GeneChipMapping 100 K array was applied to all samples. Only the array probes for Xba I cleaved DNA were used. We also downloaded the 100 K Affymetrix SNP arrays from the 30 CEPH trios (90 samples) used in the international Hapmap project [31]. Only the Xba arrays were used.
Normalization and allele copynumbers of SNPs
where PM_{ ij }(α) is the intensity of the jth probe of allele α for SNP i. Here j runs over j = 1,...,p, where p = 10, i = 1,..., 57290. p is the number of probe in a probeset interrogating one allele and i is the total number of SNPs.
where α is allele A or B, ${M}_{i}^{2}(\alpha )$ is the mean intensity for allele α in samples with two copies of α, ${M}_{i}^{1}(\alpha )$ is the mean intensity for allele α in samples with one copy of α and c_{1} and c_{2} are SNPindependent parameters. (see additional file 1 as an illustration). Note that the means are SNPdependent. Assuming that the logarithm of the copynumber of a SNP allele is proportional to the logarithm of its intensity, see e.g. [32], we have for C_{ i }(α) > 0:
where C_{ i }(α) and ${M}_{i}^{c}(\alpha )$ are the copynumber and intensity of allele α in SNP i, respectively. The parameters α_{ i }and b_{ i }are SNPspecific. Here we allow C_{ i }(α) to be an arbitrary number to allow for mixed samples.
C_{ i }(α) > 0. This equation remains true if C_{ i }(α) is not an integer.
As we have only I_{ i }(α), an estimate of ${M}_{i}^{c}(\alpha )$, we can only obtain ${X}_{i}^{c}(\alpha )$, an estimate of log_{2}(C_{ i }(α)). We assume that ${X}_{i}^{c}(\alpha )$ is normally distributed around log_{2}(C_{ i }(α)) with standard deviation σ_{ c }.
The parameters ${M}_{i}^{1}(\alpha )$, c_{1}, c_{2}, σ_{1}, σ_{2} were estimated using the HapMap data set based on the knowledge of the allelic copynumbers of each SNP; i.e. 0, 1, or 2 depending on whether the SNP is heterozygous or homozygous. Here c_{1} = 0.38, c_{2} = 1.08, σ_{1} = 0.3 and σ_{2} = 0.35. We assume that σ_{ c }= σ_{2} for c > 2. When the copynumber is 0, we can still use equation 4 with ${X}_{i}^{0}(\alpha )$ being distributed as a normal distribution with mean 2 and σ_{0} = 0.55 (empirical observation; see additional file 2). A level of 2 corresponds to 0.25 copies, and not to 0 copies. This slightly elevated level can be explained by crosshybridization and background noise. Those values were also obtained using the HapMap data set.
Selection of SNPs
We selected only the SNPs that conformed well to the model. Those that do not conform to the model are less likely to be useful for copynumber analysis [8]. The selection is based on the normal samples from each group: the 90 HapMap samples, the 38 normal samples from the prostate group and the 29 normal samples from the bladder group. A SNP is selected if it has a high call rate (above 90%) (Affymetrix genotype call) and if there is a high correspondence between the inferred allelic copynumber given by equation 4 and the true allelic copynumber given by the genotype (see [8] for more details).
A Hidden Markov Model (HMM) to estimate the allelic copynumber
The model
We used a HMM to estimate the allelic copynumber of the selected SNPs. Our HMM has six states (Figures 1A, 1B) corresponding to the germline state (state 0) and five chromosomal abnormalities: heterozygous deletion (state 1), homozygous deletion (state 2), uniparental di/polysomy (state 3), unbalanced amplification (state 4) and balanced amplification (state 5).
We defined the transition matrix using 3 parameters (Figure 1c). The transition probabilities are the probabilities of moving from one state to another state, when moving from one SNP to its neighbour SNP. The pparameter is the probability of moving from the germline state (state 0) to an abnormal state (states 1 to 5). The rparameter is the probability of moving from one abnormal state to another different abnormal state and the εparameter is the probability of what is considered an improbable transition (Figure 1d). We considered a transition improbable if the transition implies two breakpoints between two consecutive SNPs. For example, the transition between the germline state (state 0) and the uniparental di/polysomy state (state 3) is improbable as it implies one breakpoint to lose one allele and a second breakpoint to multiplicate the other allele.
For each state, the emission density is defined as a bivariate normal distribution where the mean is the logarithm of the allelic copynumbers and the covariance matrix (including σ_{ c }) is estimated from the normal samples. For the states with copynumbers 2+, 3+ or 4+, we take the mean to be the logarithm of 2, 3 or 4 copies, respectively. Looking at the HapMap data set, we could estimate the mean of ${X}_{i}^{0}$. When a SNP marker was homozygous in the germline DNA, we defined the emission densities as a normal density as only one allele could be present in the cancer cells.
The Viterbi algorithm is used to recover the hidden states and a modified version of the BaumWelch algorithm is used to estimate the p and rparameters. The εparameter is set to an arbitrary but small value. Here ε = 0.00001.
Simulation of data sets
In order to test if the method can recover known transition parameters and known states, we simulated data sets with different transition parameters. For the simulation, we used 18 arrays from the international HapMap project in order to estimate the noise corresponding to 0, 1 or 2 copies of an allele. The noise was defined as the difference between the observed log copynumber ${X}_{i}^{c}(\alpha )$ and the true log copynumber, log_{2}(C_{ i }(α)). To estimate the noise corresponding to 0 copies, we used a log copynumber equal to 2; as described previously. Then, we used one HapMap sample and replaced the normalized intensity for each SNP and allele by a simulated value corresponding to a known state with noise estimated from the HapMap sample. The states were determined randomly using the HMM model. Here, all the SNPs were given a heterozygous call.
Further, we simulated a mixture of cancer and normal cells. Here we determined the observed values by adding noise obtained from the Hapmap data set to the allelic copynumber defined as follows:
C_{ O }= (1  m)C_{ N }+ mC_{ T }
where C_{ O }is the allelic copynumber (i.e. the average copynumber in the mixture population), C_{ N }and C_{ T }are the allelic copynumbers in the normal and the cancer cells and m is the percentage of cancer cells present in the mixture.
Including the mixture proportion
The mixture model
We modified the HMM defined above in order to account for a mixture of normal and cancer cells. This was done in the emission probabilities where the allelic copynumber was considered a weighted sum of the copynumbers from the normal and the abnormal cells (see equation 5). Then we used an iterative procedure to estimate the mixture proportion, m, in a sample (m is the proportion of cancer cells).

Initialisation: we ran the method on the sample considering there is no mixture (m = 100%) and obtained a sequence of hidden states corresponding to the sample.

Update 1: Assuming the sequence of hidden states, we used a leastsquare method to fit the optimal mixture value m.

Update 2: Assuming m, we applied the method with the mean intensity given as in equation 5. A new sequence of hidden states was obtained.

Iteration step: We repeated Update 1 and Update 2 until the mixture m did not change.
As only SNPs (or alleles) in abnormal states can help in obtaining an estimate of the mixture level, we need to have a minimum of changes in copynumbers occurring in order to obtain a reasonable estimate. The iterative procedure was applied only to samples containing more than 5% of abnormal states after the initialisation.
Simulation of data sets
In order to test the iterative procedure on more realistic simulated samples, we designed the simulations in a different way. We first ran the HMM on the real bladder and prostate tumour samples, then we used the sequence of hidden states recovered to produce new simulated samples with known population mixture. The observed allelic copynumbers were determined as in equation 5.
The allarray method
Until now, the transition parameters were estimated for each sample but were the same for all chromosomes. Here, we modified the method to allow different transition parameters for each chromosome.
Simulation of data sets
We simulated two data sets of 20 samples where the transition parameters were randomly chosen between 0.001 and 0.05 on the logarithmic scale. For each of the 40 samples, a sequence of hidden states was determined according to the HMM. Each chromosome had its own transition parameters across all samples. In the first data set of 20 samples, only heterozygous SNP intensities were simulated. Each sample had the same number of SNPs and the same positions in the genome as the HapMap samples. In the second data set of 20 samples, each sample had the same number of SNPs, the same positions in the genome and the same genotype calls as one randomly chosen HapMap sample. SNP intensities were simulated according to the genotype call and to the hidden state determined previously for this SNP.
Declarations
Acknowledgements
PL and CW are supported by the Danish Cancer Society. CLA and LD are supported by the Danish Research Council and the John and Birthe Meyer Foundation. NT is supported by the Danish Cancer Society and the John and Birthe Meyer Foundation. Karsten Zieger is thanked for helpful discussions.
Authors’ Affiliations
References
 The NCBI dbSNP database. [http://www.ncbi.nlm.nih.gov/projects/SNP/index.html]
 Shen R, Fan JB, Campbell D, Chang W, Chen J, Doucet D, Yeakley J, Bibikova M, Wickham Garcia E, McBride C, Steemers F, Garcia F, Kermani BG, Gunderson K, Oliphant A: Highthroughput SNP genotyping on universal bead arrays. Mutat Res. 2005, 573: 7082.View ArticlePubMedGoogle Scholar
 Matsuzaki H, Dong S, Loi H, Di X, Liu H, Hubbell E, Law J, Berntsen T, Chadha M, Hui H, Yang G, C KG, Webster TA, Cawley S, Walsh PS, Jones KW, Fodor SPA, Mei R: Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods. 2004, 1: 109111. 10.1038/nmeth718.View ArticlePubMedGoogle Scholar
 Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR: Wholegenome patterns of common DNA variation in three human populations. Science. 2005, 307: 10721079. 10.1126/science.1105436.View ArticlePubMedGoogle Scholar
 Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, Su X, Cao M, Chen W, Zhang J, Liu W, Yang G, Di X, Ryder T, He Z, Surti U, Phillips MS, BoyceJacino MT, Fodor SP, Jones KW: Largescale genotyping of complex DNA. Nat Biotechnol. 2003, 21: 12331237. 10.1038/nbt869.View ArticlePubMedGoogle Scholar
 Di X, Matsuzaki H, Webster TA, Hubbell E, Liu G, Dong S, Bartell D, Huang J, Chiles R, Yang G, Shen MM, Kulp D, Kennedy GC, Mei R, Jones KW, Cawley S: Dynamic model based algorithms for screening and genotyping over 100 K SNPs on oligonucleotide microarrays. Bioinformatics. 2005, 21: 19581963. 10.1093/bioinformatics/bti275.View ArticlePubMedGoogle Scholar
 Rabbee N, Speed TP: A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics. 2006, 22: 712. 10.1093/bioinformatics/bti741.View ArticlePubMedGoogle Scholar
 Lamy P, Andersen CL, Wikman FP, Wiuf C: Genotyping and annotation of Affymetrix SNP arrays. Nucleic Acids Res. 2006, 34: e10010.1093/nar/gkl475.PubMed CentralView ArticlePubMedGoogle Scholar
 Koed K, Wiuf C, Christensen LL, Wikman FP, Zieger K, Moller K, von der Maase H, Orntoft TF: Highdensity single nucleotide polymorphism array defines novel stage and locationdependent allelic imbalances in human bladder tumors. Cancer Res. 2005, 65: 3445.PubMedGoogle Scholar
 Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C: dChipSNP: significance curve and clustering of SNParraybased lossofheterozygosity data. Bioinformatics. 2004, 20: 12331240. 10.1093/bioinformatics/bth069.View ArticlePubMedGoogle Scholar
 Beroukhim R, Lin M, Park Y, Hao K, Zhao X, Garraway LA, Fox EA, Hochberg EP, Mellinghoff IK, Hofer MD, Descazeaud A, Rubin MA, Meyerson M, Wong WH, Sellers WR, Li C: Inferring lossofheterozygosity from unpaired tumors using highdensity oligonucleotide SNP arrays. PLoS Comput Biol. 2006, 2: e4110.1371/journal.pcbi.0020041.PubMed CentralView ArticlePubMedGoogle Scholar
 Fridlyand J, Snijders AM, Pinkel D, Albertson DG, Jain AN: Hidden Markov models to approach to the analysis of array CGH data. Journal of Multivariate Analysis. 2004, 90: 132153. 10.1016/j.jmva.2004.02.008.View ArticleGoogle Scholar
 Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, Minna J, Christiani D, Leo C, Gray JW, Sellers WR, Meyerson M: An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 2004, 64: 30603071. 10.1158/00085472.CAN033308.View ArticlePubMedGoogle Scholar
 Andersen CL, Wiuf C, Kruhoffer M, Korsgaard M, Laurberg S, Orntoft TF: Frequent occurrence of uniparental disomy in colorectal cancer. Carcinogenesis. 2007, 28: 3848. 10.1093/carcin/bgl086.View ArticlePubMedGoogle Scholar
 LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, Meyerson M: Allelespecific amplification in cancer revealed by SNP array analysis. PLoS Comput Biol. 2005, 1: e6510.1371/journal.pcbi.0010065.PubMed CentralView ArticlePubMedGoogle Scholar
 Huang J, Wei W, Chen J, Zhang J, Liu G, Di X, Mei R, Ishikawa S, Aburatani H, Jones KW, Shapero MH: CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics. 2006, 7: 8310.1186/14712105783.PubMed CentralView ArticlePubMedGoogle Scholar
 Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of arraybased DNA copy number data. Biostatistics. 2004, 5: 557572. 10.1093/biostatistics/kxh008.View ArticlePubMedGoogle Scholar
 Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M, Chiba S, Bailey DK, Kennedy GC, Ogawa S: A robust algorithm for copy number detection using highdensity oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 2005, 65: 60716079. 10.1158/00085472.CAN050465.View ArticlePubMedGoogle Scholar
 Hartmann A, Rosner U, Schlake G, Dietmaier W, Zaak D, Hofstaedter F, Knuechel R: Clonality and genetic divergence in multifocal lowgrade superficial urothelial carcinoma as determined by chromosome 9 and p53 deletion analysis. Lab Invest. 2000, 80: 709718.View ArticlePubMedGoogle Scholar
 Haggarth L, Auer G, Busch C, Norberg M, Haggman M, Egevad L: The significance of tumor heterogeneity for prediction of DNA ploidy of prostate cancer. Scan J Urol Nephrol. 2005, 39: 387392. 10.1080/00365590500239883.View ArticleGoogle Scholar
 van der Poel HG, Oosterhof GO, Schaafsma HE, Debruyne FM, Schalken JA: Intratumoral nuclear morphologic heterogeneity in prostate cancer. Urology. 1997, 49: 652657. 10.1016/S00904295(96)005572.View ArticlePubMedGoogle Scholar
 Blaveri E, Brewer JL, Roydasgupta R, Fridlyand J, DeVries S, Koppie T, Pejavar S, Mehta K, Carroll P, Simko JP, Waldman FM: Bladder cancer stage and outcome by arraybased comparative genomic hybridization. Clin Cancer Res. 2005, 11: 70127022. 10.1158/10780432.CCR050177.View ArticlePubMedGoogle Scholar
 Koo SH, Kwon KC, Ihm CH, Jeon YM, Park JW, Sul CK: Detection of genetic alterations in bladder tumors by comparative genomic hybridization and cytogenetic analysis. Cancer Genet Cytogenet. 1999, 110: 8793. 10.1016/S01654608(98)001939.View ArticlePubMedGoogle Scholar
 Sun J, Liu W, Adams TS, Sun J, Li X, Turner AR, Chang B, Kim JW, Zheng SL, Isaacs WB, Xu J: DNA copy number alterations in prostate cancers: a combined analysis of published CGH studies. Prostate. 2007, 67 (7): 692700. 10.1002/pros.20543.View ArticlePubMedGoogle Scholar
 Lieberfarb ME, Lin M, Lechpammer M, Li C, Tanenbaum DM, Febbo PG, Wright RL, Shim J, Kantoff PW, Loda M, Meyerson M, Sellers WR: Genomewide loss of heterozygosity analysis from laser capture microdissected prostate cancer using single nucleotide polymorphic allele (SNP) arrays and a novel bioinformatics platform dChipSNP. Cancer Res. 2003, 63: 47814785.PubMedGoogle Scholar
 Raghavan M, Lillington DM, Skoulakis S, Debernardi S, Chaplin T, Foot NJ, Lister TA, Young BD: Genomewide single nucleotide polymorphism analysis reveals frequent partial uniparental disomy due to somatic recombination in acute myeloid leukemias. Cancer Res. 2005, 65: 375378.PubMedGoogle Scholar
 The Affymetrix GeneChip Human Mapping 100 k Set. [http://www.affymetrix.com/products/arrays/specific/100k.affx]
 PLASQ 10 k instructions. [http://genome.dfci.harvard.edu/~tlaframb/PLASQ/PLASQ10K.pdf]
 Zieger K, Dyrskjot L, Wiuf C, Jensen JL, Andersen CL, Jensen KM, Orntoft TF: Role of activating fibroblast growth factor receptor 3 mutations in the development of bladder tumors. Clin Cancer Res. 2005, 11: 77097719. 10.1158/10780432.CCR051130.View ArticlePubMedGoogle Scholar
 Torring N, Borre M, Sorensen KD, Andersen CL, Wiuf C, Orntoft TF: Genomewide analysis of allelic imbalance in prostate cancer using the Affymetrix 50 K SNP mapping array. Br J Cancer. 2007, 96: 499506. 10.1038/sj.bjc.6603476.PubMed CentralView ArticlePubMedGoogle Scholar
 The Affymetrix Hapmap trio data. [http://www.affymetrix.com/support/technical/sample_data/hapmap_trio_data.affx]
 Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, Wei W, Stratton MR, Futreal PA, Weber B, Shapero MH, Wooster R: Highresolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004, 14: 287295. 10.1101/gr.2012304.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.