Waveletbased identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays
 Youngmi Hur^{1} and
 Hyunju Lee^{2}Email author
https://doi.org/10.1186/1471210512146
© Hur and Lee; licensee BioMed Central Ltd. 2011
Received: 13 October 2010
Accepted: 11 May 2011
Published: 11 May 2011
Abstract
Background
Copy number aberrations (CNAs) are an important molecular signature in cancer initiation, development, and progression. However, these aberrations span a wide range of chromosomes, making it hard to distinguish cancer related genes from other genes that are not closely related to cancer but are located in broadly aberrant regions. With the current availability of highresolution data sets such as single nucleotide polymorphism (SNP) microarrays, it has become an important issue to develop a computational method to detect driving genes related to cancer development located in the focal regions of CNAs.
Results
In this study, we introduce a novel method referred to as the waveletbased identification of focal genomic aberrations (WIFA). The use of the wavelet analysis, because it is a multiresolution approach, makes it possible to effectively identify focal genomic aberrations in broadly aberrant regions. The proposed method integrates multiple cancer samples so that it enables the detection of the consistent aberrations across multiple samples. We then apply this method to glioblastoma multiforme and lung cancer data sets from the SNP microarray platform. Through this process, we confirm the ability to detect previously known cancer related genes from both cancer types with high accuracy. Also, the application of this approach to a lung cancer data set identifies focal amplification regions that contain known oncogenes, though these regions are not reported using a recent CNAs detecting algorithm GISTIC: SMAD7 (chr18q21.1) and FGF10 (chr5p12).
Conclusions
Our results suggest that WIFA can be used to reveal cancer related genes in various cancer data sets.
Keywords
Background
With the recent advances of cancer studies at a molecular level, DNA copy number aberrations (CNAs) have been studied as important causes and consequences in the initiation, development, and progression of cancer. To date, many researchers have focused on the detection of chromosomal regions having amplifications and deletions using arrays of comparative genomic hybridization (CGH) data sets. These studies have generated valuable observations about cancer metastasis [1–7]. For example, it is now known that many oncogenes and tumor suppressor genes are located in regions of amplifications and deletions, and that chromosome regions with aberrations can be used to distinguish between cancer types. Also, new cancer related genes have been discovered. These advances have been accelerated by the development of computational methods and software [8–14]; segmentation and denoising methods such as circular binary segmentation (CBS) [8], wavelets [9], and the Gaussianbased likelihood approach (GLAD) [10] have been developed in order to identify true aberrations from background noise in a single sample. And with the accumulation of copy number aberration data sets, it has become increasingly important to find concordant aberrations in multiple samples. Thus, algorithms such as the minimum common region (MCRs) [15] and significance testing for aberrant copy number (STAC) [16] have been developed to address this issue.
However, even though each method can identify aberrant regions, these regions are not concordant between the different methods. As one possible explanation for this lack of concordance, Beroukhim et al. (2007) [17] assumed that many aberrations randomly occur, though most methods do not explicitly consider the background rate of random aberrations. For instance, most locations of chr7 and chr10 are amplified and deleted, respectively, in shortterm survival patients of glioblastoma multiforme (GBM) [18], though only a few of their genes are known oncogenes and tumor suppressors in GBM. As such, if random aberrations are not considered, most chr7 and chr10 genes will be regarded as relevant. Hence, an important issue is to distinguish cancer driving genes, i.e., genes involved in cancer development, from broad chromosomal aberrations. Fortunately, the amount of aberrations of driving genes has been observed to be larger than in their neighboring genes, and these aberrations are likely to occur consistently across multiple cancer patients. A few algorithms, such as the genomic identification of significant targets in cancer (GISTIC) [17], have been developed in attempts to incorporate these issues and are used to detect focal aberrations. Note that the term "focal aberrations" is used here to refer to relatively short, but consistently aberrant, regions in multiple samples. The use of GISTIC revealed that these focal aberrations contain many cancer related genes. In a comparison of GISTIC to MCR [15], via three independent data sets, GISTIC consistently identified more cancer related genes than MCR. In GISTIC, it first selects copy number aberration regions by applying a segmentation method to each sample, and then sums the amount of aberrations from the multiple samples. Then, differences between the aberrations and their neighbors are computed using a peeloff method. However, GISTIC has an inherent weakness: differences between neighbors in individual samples may cancel out since it summates log2 ratios in all aberrant samples first. The important difference between GISTIC and our proposed approach is that we first consider the differences between neighbors in an individual sample, before identifying focal regions in multiple samples. In this study, we propose a novel algorithm, referred to as the waveletbased identification of focal genomic aberrations (WIFA), to address the following issues: (i) distinguish signals from noise among probes having high aberrations, (ii) detect focal aberrations by considering the differences between aberrations and their neighbors, as well as the amount of aberrations, and (iii) consider the consistency of aberrations in multiple samples.
Wavelet analysis is a mathematical technique for representing data. Wavelets can be used to remove noise from observed data (contaminated by noise) while preserving important features of true data; this process is called wavelet denoising. In this study, we use a variant of the translationinvariant level dependent wavelet denoising method in [19] to obtain translationinvariant approximations of the smooth (lowfrequency) part of true data y_{ LOW } , and of the local (highfrequency) behavior of true data y_{ HIGH } , from the observed data y. In brief, y_{ LOW } is based on the averages of the neighboring values of y, and y_{ HIGH } is based on the differences of neighboring values of y, followed by thresholding. Thresholding is only performed in y_{ HIGH } since it is likely that noise would be more pronounced in the highfrequency content. After obtaining y_{ HIGH } via the wavelet analysis, we obtain for each sample by adjusting some obvious artifacts in y_{ HIGH } , and then cluster continuous focal aberrations across multiple samples. By applying this approach to GBM and lung cancer data sets, we are able to find previously known cancer related genes in the focal aberrations. In addition, a similar procedure based on y_{ LOW } enables us to detect broad regions of chromosomal aberrations.
The difficulty of assessing the performance in detecting focal aberrations is that the true answer is often not known, since regions containing cancer related genes still need to be revealed. Hence, we compare genes identified by our approach to known cancer genes obtained from GISTIC [17]. Based on this comparison, in addition to confirming regions identified by GISTIC, we are able to find new regions not previously identified by GISTIC; literature shows that these new regions contain known oncogenes. In addition, WIFA is compared to STAC and MCR, outperforming these two methods both in the simulation and GBM data. The source code for WIFA is available at http://www.gcancer.org/wifa/WIFA.html.
Materials and methods
Materials
We collected and reanalyzed three single nucleotide polymorphism (SNP) data sets: 154 GBM tumor samples [17], 178 GBM tumor samples [20], and 371 lung tumor samples [21]. We downloaded the signal intensities of the data sets from either the websites of the original publications or the GEO database. We used all chromosomes except X and Y. Both GBM data sets were generated from an Affymetrix 100K SNP microarray, and the lung cancer data set was from 250K Sty SNP arrays. Since the 100K SNP array consisted of independent 50K Xba and 50K Hind arrays, we then merged these two arrays along the chromosome positions. Next, to calculate copy number changes from signal intensities, we applied the following procedure (similar to original publications): (i) signal intensities were transformed using the log2 transform to make the noise constant; (ii) for each sample, the median value across all probes was subtracted from the probes; (iii) to obtain the log2 ratio for tumor samples compared to the normal samples, log2 transformed normal samples were subtracted from the log2 transformed tumor samples; and (iv) to remove copy number variants (CNVs) that occur in normal population, positions with CNVs obtained from [22] were omitted from the data sets.
WIFA methodology
Wavelet transform and its use in WIFA
Wavelet transform
Let J and L be integers such that 1 ≤ L ≤ J  1. The (discrete) wavelet transform (WT) maps a given data set y of length 2^{ J }into the scaling coefficients s := {s_{L,t}: t = 0, 1, ..., 2 ^{ L }  1} and the wavelet coefficients w := {w_{j,t}: j = L, L + 1, ..., J  1; t = 0, 1, ..., 2^{ j } 1}. Note that WT is linear and can be represented by a 2^{ J }× 2^{ J }orthogonal matrix W. WT depends on the specific wavelet selected. In this paper, we use a WT based on the Haar wavelet. The Haar wavelet transform is used to simply pair up input values, storing the difference and passing the average, and it repeats this process recursively, pairing up the averages to provide the next levelfinally resulting in 2 ^{ L } averages (stored in the scaling coefficients s) and 2^{ J } 2 ^{ L } differences (that are stored in the wavelet coefficients w). For more details about wavelets, refer to [23], [24], and [25].
Wavelet procedure for WIFA
A drawback of the traditional WT is that it is not translationinvariant. In attempts to remedy this problem, a number of translationinvariant wavelet transforms have been employed [9, 19, 26]. Among the available translationinvariant wavelet transforms, we use the stationary wavelet transform by [27] for our WIFA methodology.
where f is an underlying function representing the true copy number change, and ε_{ i } is a stationary Gaussian noise with a zero mean value [19].
The basic principle of wavelet denoising then becomes to identify and zero out the wavelet coefficients y := {y_{ i } : i = 1, ..., n} that are likely to contain noise, and to estimate {f (x_{ i } ): i = 1, ..., n} in (1). Instead of the usual wavelet denoising procedure [29], we use the following modified steps as the main wavelet denoising procedure for our methodology:
(Step 1) Given the data y of length n = 2 ^{ J } , and an integer L such that 1 ≤ L ≤ J  1, compute W y = {s, w}. For an integer M ≥ L, let w_{ MID } be the wavelet coefficients w of y with levels j = L, L + 1, ..., M  1, and w_{ HIGH } be the wavelet coefficients w of y with levels j = M, M + 1, ..., J  1.
(Step 2) Define T_{ LOW } :{s, w_{ MID } , w_{ HIGH } } ↦ {s, 0, 0} and , where is obtained from w_{ HIGH } by thresholding using a hard threshold function [30] with the threshold value for each level j = M, M + 1, ..., J  1. Here, is the estimate of the noise variance for the wavelet coefficients at level j, n_{ j } is the length of the subsignal at level j, and C is a constant to be determined later.
The threshold value used in (Step 2) is a variant of the threshold value used in [19]. After (Step 1)(Step 3), we obtain y_{ LOW } and y_{ HIGH } . Note that y_{ LOW } gives a translationinvariant approximation of the smooth (lowfrequency) part of the true data, which provides rough estimate for detecting a broad region of chromosomal aberrations. This value is based on the Haar scaling coefficients, which can be considered as averages of neighboring values of y. On the other hand, y_{ HIGH } gives a translationinvariant approximation of the local (highfrequency) behavior of the true data, which provides a rough idea for detecting the focal aberration of chromosomes; y_{ HIGH } is based on Haar wavelet coefficients  which are differences between the neighboring values of yand the threshold.
Dividing the wavelet coefficients depending on the level has been used in many studies (see [19] and [31]), although the exact form may vary. The main difference between our method and other leveldependent wavelet denoising methods is that we concentrate only on the lowfrequency scaling and highfrequency wavelet coefficients, and do not consider the midfrequency wavelet coefficients. To do this, we add the parameter M to the usual wavelet thresholding process; from the discussion in the Results section, this parameter allows us to identify focal genomic aberrations more effectively.
The values of y_{ LOW } for all chromosomes of a given sample y are obtained simply by processing each chromosome separately, and then concatenating the values of y_{ LOW } for each chromosome; similarly, the values of y_{ HIGH } for all chromosomes can be found.
Next, let us explain how we treat the problem of the boundary of each chromosome. In brief, the problem of the boundary is caused by our previous assumption that the chromosome has n = 2^{ J }locations for a positive integer J, which may not hold true in general; for a more detailed discussion about boundary conditions, refer to [32]. We handle this boundary problem by extending each chromosome first symmetrically and then periodically. Our experiments show the effectiveness of this method. Other parameters used in our methodology include:

Constant C in the threshold value : in (Step 2), we use the threshold value to threshold the highfrequency wavelet coefficients at level j. A smaller C would allow more nonzero values in y_{ HIGH }.

Level L: parameter L can be as small as 1 and as large as J  1. A smaller L would increase the coarseness of the y_{ LOW }approximation, whereas a larger L would make it finer.

Level M : parameter M can be as small as L and as large as needed. A smaller M would produce a y_{ HIGH }with more nonzero values, whereas a larger M would produce a y_{ HIGH }with fewer nonzero values. Since this is not a standard parameter in wavelet literature, we pay special attention to it and discuss its effect on our methodology by varying M. See the Results section.
In the Results section, we further discuss which values of the above parameters C, L, and M are used for each of the data sets in our experiments. To implement the wavelet transforms, we used WaveLab http://wwwstat.stanford.edu/%7Ewavelab/Wavelab_850/index_wavelab850.html.
Identification of broad and focal aberrations in WIFA
Identification of focal aberrations
In order to identify the focal aberration regions, we consider the neighboring positions together instead of as a single position. For this task, we consider groups of positions having positive (or negative) values located within 1 MB along the chromosome. Then, in order to find regions of focal aberrations in a group, we construct clusters such that the two closest positions in a cluster having positive (or negative) values are located within a distance d. From the clusters in a group, we select a cluster c having the maximum score . In the clustering process, clusters containing nonzero values from only a single patient are removed. Then, statistically significant clusters are ranked based on their S(c) scores. In Figure 3(c), the cluster on the right is a focal aberration that contains a known cancer related gene (EGFR).
 1.
In each sample, segments of aberrations (a set of consecutive probes with ) are randomly positioned on a chromosome. This random positioning is then applied to all samples, generating randomly permuted data from the multiple samples. This permutation approach is described in detail in [11].
 2.
The process for detecting focal aberrations in multiple samples is subsequently applied to the randomly permuted data, generating a set of clusters.
 3.
Steps 1 and 2 are repeated N times. Let the max score of clusters from the i th permutation be the max_score(i).
 4.
 5.
Clusters with P_{cluster}(c) less than α are considered statistically significant.
In this paper, we use N = 1,000 and α = 0.1, and the permutation and calculation of P_{cluster} is performed for each chromosome.
Identification of broad aberrations
After y_{ LOW } values are generated from the wavelet transform for each sample, we use y_{ LOW } as ; y_{ LOW } values do not require a processing step, contrary to y_{ HIGH } . To integrate multiple samples, we sum values from multiple samples, referred to as . Note that if all probes in a chromosome arm are statistically significant in , we consider it a broad aberration. We then calculate the statistical significance of in the following way. The null hypothesis for is that is independent among samples, so the summation of , , is the same across all probes in the chromosomes. To generate the null distribution, we first construct a histogram h_{ i } of in a single sample i by splitting values into bins at intervals of 0.01. Next, the distribution of is calculated by the convolution of h_{ i } of all samples, and the pvalue of the observed is calculated by summing the probabilities from the tail of the null distribution to the observed value. The pvalue is separately calculated for amplifications and deletions. For the correction of multiple tests, pvalues are converted into qvalues [33]; the pvalues of are similarly calculated. This approach is similar to the calculation of statistical significance of aberrations used in [17]. Note that the pvalues of are calculated for each probe, and the P_{cluster} discussed above is calculated for each cluster.
Results
Broad and focal aberrations in glioblastoma
Clusters with focal aberrations in GBM.
Score  P _{cluster}  Cytoband  Start (KB)  End (KB)  # of PA  Gene Symbol§ 

363  0  9p21.3,p22.1  19,639  24,327  42  CDKN2A 
266  0  7p11.2  54,145  55,790  25  EGFR 
101  0  4q12  52,600  55,926  9  PDGFRA 
66  0  1q32.1  200,858  202,110  5  MDM4† 
60  0  12q15  67,074  68,482  6  MDM2 
36  0.015  12q13.3,q14.1  55,820  57,257  4  CDK4 
35  0  13q14.2  46,386  47,510  3  RB1† 
23  0.067  7q31.2  115,813  116,895  2  MET 
18  0  10q23.2,q23.31  88,974  89,943  3  PTEN 
16  0  19q13.2,q13.31  46,084  48,423  3  
15  0  9p21.1  32,101  32,432  3  
8  0.001  17q22  47,672  49,744  2  
7  0.034  1p33  50,351  50,689  2  
6  0  2p24.3  15,746  16,670  2  MYCN 
4  0  9p13.1  38,288  39,006  3  IGFBPL1 
2  0.005  14q31.3  84,913  86,323  2  
2  0  9p12  40,722  41,675  3  
0.5  0  3p14.2  60,046  60,153  2  
0.2  0.033  14q21.2,q21.3  42,887  43,214  2 
We also applied the proposed method to a GBM data set obtained by Kotliarov et al.[20]. We used the same parameter values for C, L, M, and d as for the previous GBM data, since both are generated using the same SNP array platform. The HIGH analysis for this GBM data generates eight clusters; among these clusters, one focal deletion contains CDKN2A, and five focal amplifications contain MDM4, PDGFRA, EGFR, MDM2, and CDK4 (Additional File 3 and Additional File 4). Note that MET is not included in the focal aberrations because amplification occurs only in a single sample. The six GBM genes identified using this data are also found in the previous GBM data set. This result confirms that our method is able to detect focal aberrations that are consistent across different experiments.
Broad and focal aberrations in lung cancer
With the value of C = 1.94, the number of nonzero values in y_{ HIGH } of all chromosomes in the GBM data set is about 10% of the number of nonzero values in the sample. The value of C that provides approximately the same percentage of nonzero values in y_{ HIGH } for the lung data set is C = 5.2.
Using the LOW analysis, we then attempt to detect broad aberrations in lung cancer. For this task, the qvalue is first calculated for each probe. As shown in Additional File 5 (a) and 5 (b), with a threshold qvalue of 0.01: chr1p, 3p, 4q, 5q, 6q, 8p, 9, 10q, 13p, 15, 16q, 18, 21p, and 22 are deleted; and chr1q, 2p, 5p, 6p, 7, 8q, 14p, 17q, and 20q are amplified in the size of the chromosome arm.
Clusters with focal aberrations in lung cancer.
Score  P _{cluster}  Cytoband  Start (KB)  End (KB)  # of PA  Gene Symbol§ 

118  0  14q13.2 q13.3  35,467  36,690  10  MBIP,NKX21 
107  0  12p11.23 p12.1  24,055  26,685  4  KRAS 
84  0  18q11.2 q12.1  20,300  23,537  4  SS18 
62  0  7p11.2 p12.1  53,796  55,553  3  EGFR 
54  0.006  18q21.1  43,971  45,322  2  SMAD7 ‡ 
51  0.064  12q15  67,983  69,669  4  MDM2 † 
47  0.006  19q12  35,835  36,303  2  CCNE1 † 
47  0  11q13.2 q13.3  68,164  69,500  3  CCND1 
42  0.012  19q13.11 q13.12  37,716  41,040  5  
35  0  22q11.21  19,057  19,785  2  
33  0  17q12 q21.1  33,845  35,540  4  ERBB2 
26  0.018  8p11.23 p12  38,417  39,171  2  FGFR1 
24  0  10p11.21  37,594  38,717  2  
22  0  6p21.33  30,143  30,911  2  
15  0.023  6p22.1 p22.2  26,089  26,937  2  
14  0.007  5p12  42,982  44,452  2  FGF10 ‡ 
10  0.064  6p21.1  43,240  44,227  2  VEGFA 
10  0.019  10q11.21  42,184  43,133  2  
9  0  9p21.3  24,546  25,375  2  
3  0  9p21.3  21,181  22,194  2  CDKN2A 
To validate our choice of M, we conducted experiments with various M values. For example, with M = 9, 37 clusters spanning 161 MB included 12 cancer related genes, as shown in Additional File 8(a). However, even though the M = 9 case identified two more genes compared to M = 11, it required the search of five times more genomic regions; refer to Additional File 8 for the results of the other M values. Our analysis of lung cancer confirms that WIFA is useful for identifying cancer related genes in focal aberrations across different cancer types.
Comparison with other methods
We then compared our method with MCR and STAC. For implementation, we used the MCR from waviCGH [13](http://wavi.bioinfo.cnio.es/) and STAC from the authors' website (http://www.cbil.upenn.edu/STAC/). Note that the input files from both methods should have binary aberration calls of amplification, deletion, or no change; hence, GLAD [10], a segmentation method, was applied to single samples. The thresholds for amplification and deletion were then used to determine the aberration regions. In MCR, the fraction of samples in aberrant regions was used to determine the significant regions. In STAC, the pvalue of the footprint was used as a measure of the significance of aberrant regions. For WIFA, the cluster score is used for this purpose.
We used a series of simulation data as the basis of our comparison, and generated the simulation data in two steps. First, ten different underlying true data were generated using Multiple Sample Analysis [11](http://www.cbil.upenn.edu/MSA/) software. For each true data, the length of a genome, in terms of number of markers, was 4,500; in addition, the number of samples was 50; the number of markers in the underlying concordant aberrations was 30; and the numbers of samples in concordant aberrations varied from 50% to 70%. In ten true data, the numbers of concordant aberrant regions varied from five to seven, and there were one or two nonconcordant regions. Second, the background aberrations were generated using a normal distribution. Because the maximum (in absolute) values of the markers for each sample were different, we set the standard deviation of the normal distribution to be the multiplication of a fixed number, which we refer to as the noise level, and the maximum value of the true data.
Discussion and Conclusions
Our work is based on a wavelet analysis. The wavelet analysis has been used in other papers to analyze array CGH data (cf. [9], [43]); for example, in [9], it is shown to perform well compared to approaches such as CBS, a changepoint method [8], and HMM [44]. Compared to other waveletbased approaches used to analyze array CGH data, the main differences in WIFA include: (i) a new parameter M is introduced, which is used to identify focal genomic aberrations more effectively; and (ii) a new method that integrates multiple samples, as a postprocessing step in the wavelet analysis, is suggested in order to identify cancerrelated genes from a data set having multiple samples. As a result, we were able to detect cancer related genes with high rate of accuracy in both GBM and lung data sets.
CNPs are another type of DNA variation that are abundant in the normal population, and are usually observed in kilobase or megabase DNA deletions or duplications. When a HIGH analysis was applied to SNP microarrays, deletions of a single SNP probe were frequently observed. When these were compared to the positions of known CNPs [22], many regions were found to overlap (data not shown); these single SNP probes were removed from our analysis since the relevance of CNPs to cancer requires further study. However, if CNPs are the main subject of analysis, it is possible that a new method based on our HIGH analysis could be developed to achieve this task. As a promising example, a single deletion of the SNP probe from 13 patients was observed at the 55,205,890 base position of chr11 when the GBM data set was used [17]. Olfactory receptor (OR) genes such as OR4C11, OR4P4, OR4S2, and OR4C6 are located at this position, and it was previously shown that the OR genomic location is frequently affected by CNPs [45]. This observation suggests that our wavelet analysis has the potential to be broadly applied to detect various kinds of focal aberrations.
Declarations
Acknowledgements
We would like to thank the three anonymous reviewers for their helpful comments. This work was supported by the National Research Foundation (NRF) of Korea funded by the Ministry of Education, Science and Technology (MEST) (20100003597).
Authors’ Affiliations
References
 Liu F, Park PJ, Lai W, Maher E, Chakravarti A, Durso L, Jiang X, Yu Y, Brosius A, Thomas M, Chin L, Brennan C, DePinho RA, Kohane I, Carroll RS, Black PM, Johnson MD: A genomewide screen reveals functional gene clusters in the cancer genome and identifies EphA2 as a mitogen in glioblastoma. Cancer Res 2006, 66(22):10815–10823. 10.1158/00085472.CAN061408View ArticlePubMedGoogle Scholar
 Chaudhary J, Schmidt M: The impact of genomic alterations on the transcriptome: a prostate cancer cell line case study. Chromosome Res 2006, 14(5):567–586. 10.1007/s1057700610554View ArticlePubMedGoogle Scholar
 Tonon G, Wong KK, Maulik G, Brennan C, Feng B, Zhang Y, Khatry DB, Protopopov A, You MJ, Aguirre AJ, Martin ES, Yang Z, Ji H, Chin L, Depinho RA: Highresolution genomic profiles of human lung cancer. Proc Natl Acad Sci USA 2005, 102(27):9625–9630. 10.1073/pnas.0504126102PubMed CentralView ArticlePubMedGoogle Scholar
 Pole JCM, CourtayCahen C, Garcia MJ, Blood KA, Cooke SL, Alsop AE, Tse DML, Caldas C, Edwards PAW: Highresolution analysis of chromosome rearrangements on 8p in breast, colon and pancreatic cancer reveals a complex pattern of loss, gain and translocation. Oncogene 2006, 25(41):5693–5706. 10.1038/sj.onc.1209570View ArticlePubMedGoogle Scholar
 Phillips HS, Kharbanda S, Chen R, Forrest WF, Soriano RH, Wu TD, Misra A, Nigro JM, Colman H, Soroceanu L, Williams PM, Modrusan Z, Feuerstein BG, Aldape K: Molecular subclasses of highgrade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell 2006, 9(3):157–173. [Comparative Study] [Comparative Study] 10.1016/j.ccr.2006.02.019View ArticlePubMedGoogle Scholar
 Myllykangas S, Bohling T, Knuutila S: Specificity, selection and significance of gene amplifications in cancer. Semin Cancer Biol 2007, 17: 42–55. 10.1016/j.semcancer.2006.10.005View ArticlePubMedGoogle Scholar
 Jong K, Marchiori E, van der Vaart A, Chin SF, Carvalho B, Tijssen M, Eijk PP, van den Ijssel P, Grabsch H, Quirke P, Oudejans JJ, Meijer GA, Caldas C, Ylstra B: Crossplatform array comparative genomic hybridization metaanalysis separates hematopoietic and mesenchymal from epithelial tumors. Oncogene 2007, 26(10):1499–1506. [Evaluation Studies] [Evaluation Studies] 10.1038/sj.onc.1209919View ArticlePubMedGoogle Scholar
 Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of arraybased DNA copy number data. Biostatistics 2004, 5(4):557–572. 10.1093/biostatistics/kxh008View ArticlePubMedGoogle Scholar
 Hsu L, Self S, Grove D, Randolph T, Wang K, Delrow J, Loo L, Porter P: Denoising arraybased comparative genomic hybridization data using wavelets. Biostatistics 2005, 6(2):211–26. 10.1093/biostatistics/kxi004View ArticlePubMedGoogle Scholar
 Hupe P, Stransky N, Thiery J, Radvanyi F, Barillot E: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 2004, 20(18):3413–22. 10.1093/bioinformatics/bth418View ArticlePubMedGoogle Scholar
 Guttman M, Mies C, DudyczSulicz K, Diskin S, Baldwin D, Stoeckert CJJ, Grant G: Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays. PLoS Genet 2007, 3(8):e143. 10.1371/journal.pgen.0030143PubMed CentralView ArticlePubMedGoogle Scholar
 Lee H, Kong S, Park P: Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes. Bioinformatics 2008, 24(7):889–96. 10.1093/bioinformatics/btn034PubMed CentralView ArticlePubMedGoogle Scholar
 Carro A, Rico D, Rueda O, DíazUriarte R, Pisano D: waviCGH: a web application for the analysis and visualization of genomic copy number alterations. Nucleic Acids Res 2010, 38(Suppl):W182–7.PubMed CentralView ArticlePubMedGoogle Scholar
 Oh M, Song B, Lee H: CAM: a web tool for combining array CGH and microarray gene expression data from multiple samples. Comput Biol Med 2010, 40(9):781–5. 10.1016/j.compbiomed.2010.07.006View ArticlePubMedGoogle Scholar
 Maher E, Brennan C, Wen P, Durso L, Ligon K, Richardson A, Khatry D, Feng B, Sinha R, Louis D, Quackenbush J, Black P, Chin L, DePinho R: Marked genomic differences characterize primary and secondary glioblastoma subtypes and identify two distinct molecular and clinical secondary glioblastoma entities. Cancer Res 2006, 66(23):11502–13. 10.1158/00085472.CAN062072View ArticlePubMedGoogle Scholar
 Diskin S, Eck T, Greshock J, Mosse Y, Naylor T, Stoeckert CJJ, Weber B, Maris J, Grant G: STAC: A method for testing the significance of DNA copy number aberrations across multiple arrayCGH experiments. Genome Res 2006, 16(9):1149–58. 10.1101/gr.5076506PubMed CentralView ArticlePubMedGoogle Scholar
 Beroukhim R, Getz G, Nghiemphu L, Barretina J, Linhart HsuehTD, et al.: Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma. Proc Natl Acad Sci USA 2007, 104(50):20007–20012. 10.1073/pnas.0710052104PubMed CentralView ArticlePubMedGoogle Scholar
 Nigro JM, Misra A, Zhang L, Smirnov I, Colman H, Griffin C, Ozburn N, Chen M, Pan E, Koul D, Yung WKA, Feuerstein BG, Aldape KD: Integrated arraycomparative genomic hybridization and expression array profiles identify clinically relevant molecular subtypes of glioblastoma. Cancer Res 2005, 65(5):1678–1686. [Comparative Study] [Comparative Study] 10.1158/00085472.CAN042921View ArticlePubMedGoogle Scholar
 Johnstone IM, Silverman BW: Wavelet Threshold Estimators for Data with Correlated Noise. J Royal Statist Soc 1997, B 59(2):319–351.View ArticleGoogle Scholar
 Kotliarov Y, Steed M, Christopher N, Walling J, Su Q, Center A, Heiss J, Rosenblum M, Mikkelsen T, Zenklusen J, Fine H: Highresolution global genomic survey of 178 gliomas reveals novel regions of copy number alteration and allelic imbalances. Cancer Res 2006, 66(19):9428–36. 10.1158/00085472.CAN061691PubMed CentralView ArticlePubMedGoogle Scholar
 Weir B, Woo M, Getz G, Perner S, Ding L, Beroukhim R, Lin W, Province M, Kraja A, Johnson L, Shah K, Sato M, Thomas R, Barletta J, Borecki I, Broderick S, Chang A, Chiang D, Chirieac L, Cho J, Fujii Y, Gazdar A, Giordano T, Greulich H, Hanna M, Johnson B, Kris M, Lash A, Lin L, Lindeman N, Mardis E, McPherson J, Minna J, Morgan M, Nadel M, Orringer M, Osborne J, Ozenberger B, Ramos A, Robinson J, Roth J, Rusch V, Sasaki H, Shepherd F, Sougnez C, Spitz M, Tsao M, Twomey D, Verhaak R, Weinstock G, Wheeler D, Winckler W, Yoshizawa A, Yu S, Zakowski M, Zhang Q, Beer D, Wistuba I, Watson M, Garraway L, Ladanyi M, Travis W, Pao W, Rubin M, Gabriel S, Gibbs R, Varmus H, Wilson R, Lander E, Meyerson M: Characterizing the cancer genome in lung adenocarcinoma. Nature 2007, 450(7171):893–8. 10.1038/nature06358PubMed CentralView ArticlePubMedGoogle Scholar
 Conrad D, Andrews T, Carter N, Hurles M, Pritchard J: A highresolution survey of deletion polymorphism in the human genome. Nat Genet 2006, 38: 75–81. 10.1038/ng1697View ArticlePubMedGoogle Scholar
 Mallat SG: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Machine Intell 1989, 11(7):674–693. 10.1109/34.192463View ArticleGoogle Scholar
 Daubechies I: Ten Lectures on Wavelets. Philadelphia: Soc Ind Appl Math 1992.Google Scholar
 Meyer Y: Wavelets and operators. Cambridge: Cambridge University Press; 1992.Google Scholar
 Wang XH, Istepanian RSH, Song YH: Microarray image enhancement by denoising using stationary wavelet transform. IEEE Trans Nanobiosci 2003, 2(4):184–189. 10.1109/TNB.2003.816225View ArticleGoogle Scholar
 Coifman RR, Donoho DL: TranslationInvariant DeNoising. In Wavelets and Statistics. Volume 103. Berlin: SpringerVerlag; 1995:125–150.View ArticleGoogle Scholar
 Sardy S, Percival DB, Bruce AG, Gao HY, Sthestzle W: Wavelet shrinkage for unequally spaced data. Statistics and Computing 1999, 9: 65–75. 10.1023/A:1008818328241View ArticleGoogle Scholar
 Donoho DL, Johnstone IM: Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81: 425–455. 10.1093/biomet/81.3.425View ArticleGoogle Scholar
 RosasOrea MCE, HernandezDiaz M, AlarconAquino V, GuerreroOjeda LG: A Comparative Simulation Study of Wavelet Based Denoising Algorithms. Proceedings of the 15th International Conference on Electronics, Communications and Computers 2005, 125–130.Google Scholar
 Barford P, Kline J, Plonka D, Ron A: A Signal Analysis of Network Traffic Anomalies. Proceedings of ACM SIGCOMM Internet Measurement Workshop: November 2002; France, ACM 2002, 71–82.View ArticleGoogle Scholar
 Strang G, Nguyen T: Wavelets and filter banks. Wellesley: WellesleyCambridge Press; 1996.Google Scholar
 Storey J, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003, 100(16):9440–5. 10.1073/pnas.1530509100PubMed CentralView ArticlePubMedGoogle Scholar
 Reifenberger G, Collins VP: Pathology and molecular genetics of astrocytic gliomas. J Mol Med 2004, 82(10):656–670. 10.1007/s001090040564xView ArticlePubMedGoogle Scholar
 Futreal P, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton M: A census of human cancer genes. Nat Rev Cancer 2004, 4(3):177–83. 10.1038/nrc1299PubMed CentralView ArticlePubMedGoogle Scholar
 Smith P, Nicholson L, Syed N, Payne A, Hiller L, Garrone O, Occelli M, Gasco M, Crook T: Epigenetic inactivation implies independent functions for insulinlike growth factor binding protein (IGFBP)related protein 1 and the related IGFBPL1 in inhibiting breast cancer phenotypes. Clin Cancer Res 2007, 13(14):4061–8. 10.1158/10780432.CCR063052View ArticlePubMedGoogle Scholar
 Kleeff J, Ishiwata T, Maruyama H, Friess H, Truong P, Büchler M, Falb D, Korc M: The TGFbeta signaling inhibitor Smad7 enhances tumorigenicity in pancreatic cancer. Oncogene 1999, 18(39):5363–72. 10.1038/sj.onc.1202909View ArticlePubMedGoogle Scholar
 Luo X, Ding Q, Wang M, Li Z, Mao K, Sun B, Pan Y, Wang Z, Zang Y, Chen Y: In vivo disruption of TGFbeta signaling by Smad7 in airway epithelium alleviates allergic asthma but aggravates lung carcinogenesis in mouse. PLoS One 2010, 5(4):e10149. 10.1371/journal.pone.0010149PubMed CentralView ArticlePubMedGoogle Scholar
 Memarzadeh S, Xin L, Mulholland D, Mansukhani A, Wu H, Teitell M, Witte O: Enhanced paracrine FGF10 expression promotes formation of multifocal prostate adenocarcinoma and an increase in epithelial androgen receptor. Cancer Cell 2007, 12(6):572–85. 10.1016/j.ccr.2007.11.002PubMed CentralView ArticlePubMedGoogle Scholar
 Nomura S, Yoshitomi H, Takano S, Shida T, Kobayashi S, Ohtsuka M, Kimura F, Shimizu H, Yoshidome H, Kato A, Miyazaki M: FGF10/FGFR2 signal induces cell migration and invasion in pancreatic cancer. Br J Cancer 2008, 99(2):305–13. 10.1038/sj.bjc.6604473PubMed CentralView ArticlePubMedGoogle Scholar
 Clark J, Tichelaar J, Wert S, Itoh N, Perl A, Stahlman M, Whitsett J: FGF10 disrupts lung morphogenesis and causes pulmonary adenomas in vivo. Am J Physiol Lung Cell Mol Physiol 2001, 280(4):L705–15.PubMedGoogle Scholar
 Calvo R, West J, Franklin W, Erickson P, Bemis L, Li E, Helfrich B, Bunn P, Roche J, Brambilla E, Rosell R, Gemmill R, Drabkin H: Altered HOX and WNT7A expression in human lung cancer. Proc Natl Acad Sci USA 2000, 97(23):12776–81. 10.1073/pnas.97.23.12776PubMed CentralView ArticlePubMedGoogle Scholar
 BenYaacov E, Eldar Y: A fast and flexible method for the segmentation of aCGH data. Bioinformatics 2008, 24(16):i139–45. 10.1093/bioinformatics/btn272View ArticlePubMedGoogle Scholar
 Fridlyand J, Snijders AM, Pinkel D, Albertson DG, Jain AN: Hidden Markov models approach to the analysis of array CGH data. J Multivar Anal 2004, 90: 132–153. 10.1016/j.jmva.2004.02.008View ArticleGoogle Scholar
 Hasin Y, Olender T, Khen M, GonzagaJauregui C, Kim P, Urban A, Snyder M, Gerstein M, Lancet D, Korbel J: Highresolution copynumber variation map reflects human olfactory receptor diversity and evolution. PLoS Genet 2008, 4(11):e1000249. 10.1371/journal.pgen.1000249PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.