HAT: Hypergeometric Analysis of Tiling-arrays with application to promoter-GeneChip data
© Taskesen et al; licensee BioMed Central Ltd. 2010
Received: 24 November 2009
Accepted: 21 May 2010
Published: 21 May 2010
Tiling-arrays are applicable to multiple types of biological research questions. Due to its advantages (high sensitivity, resolution, unbiased), the technology is often employed in genome-wide investigations. A major challenge in the analysis of tiling-array data is to define regions-of-interest, i.e., contiguous probes with increased signal intensity (as a result of hybridization of labeled DNA) in a region. Currently, no standard criteria are available to define these regions-of-interest as there is no single probe intensity cut-off level, different regions-of-interest can contain various numbers of probes, and can vary in genomic width. Furthermore, the chromosomal distance between neighboring probes can vary across the genome among different arrays.
We have developed Hypergeometric Analysis of Tiling-arrays (HAT), and first evaluated its performance for tiling-array datasets from a Chromatin Immunoprecipitation study on chip (ChIP-on-chip) for the identification of genome-wide DNA binding profiles of transcription factor Cebpa (used for method comparison). Using this assay, we can refine the detection of regions-of-interest by illustrating that regions detected by HAT are more highly enriched for expected motifs in comparison with an alternative detection method (MAT). Subsequently, data from a retroviral insertional mutagenesis screen were used to examine the performance of HAT among different applications of tiling-array datasets. In both studies, detected regions-of-interest have been validated with (q)PCR.
We demonstrate that HAT has increased specificity for analysis of tiling-array data in comparison with the alternative method, and that it accurately detects regions-of-interest in two different applications of tiling-arrays. HAT has several advantages over previous methods: i) as there is no single cut-off level for probe-intensity, HAT can detect regions-of-interest at various thresholds, ii) it can detect regions-of-interest of any size, iii) it is independent of probe-resolution across the genome, and across tiling-array platforms and iv) it employs a single user defined parameter: the significance level. Regions-of-interest are detected by computing the hypergeometric-probability, while controlling the Family Wise Error. Furthermore, the method does not require experimental replicates, common regions-of-interest are indicated, a sequence-of-interest can be examined for every detected region-of-interest, and flanking genes can be reported.
Tiling-arrays are used for the identification of specific genomic DNA regions that can be enriched using various procedures to study certain molecular biological features. For example, DNA fragments that are bound by a protein of interest, e.g., a transcription factor, can be enriched by using Chromatin Immunoprecipitation (ChIP). When these enriched fragments are hybridized to an array, a genome wide protein binding profile can be obtained that is associated with this particular protein of interest in the cell type that was studied (ChIP-on-chip ). Other applications of tiling-arrays  are: Methylated-DNA immunoprecipitation (MeDIP-on-chip ), transcriptome mapping , recognition of hypersensitive sites such as segments of open chromatin that are cleaved more readily by DNaseI (DNase-chip ), or identification of copy number variations or breakpoints (Array CGH ). The use of tiling-arrays to detect enriched DNA regions has several advantages such as i) high sensitivity, which allows the detection of small DNA fragments associating with rare molecules and, ii) high probe-resolution, which results in accurate acquisition of unbiased data.
A tiling-array is an array of short DNA fragments, which represent 'probes' that cover the entire genome, or contigs of the genome. The hybridization of labeled DNA to an array (for example DNA enriched using ChIP), will produce a quantitative signal intensity for each probe. Multiple contiguous probes with increased signal intensity across a particular genomic region, is a putative region-of-interest, and suggests the presence of a protein binding site.
As there are no standard criteria to accurately define a region-of-interest, a major challenge in the analysis of tiling-array data is to define such a region, and discriminate a positive signal from non-specific signals . Defining regions-of-interest requires intensity thresholds on continuous probe intensity levels. Following this, the decision of the number of consecutive probes above the threshold needs to be made before a region-of-interest is called. This threshold, and the number of probes above the threshold, directly influence the size of the region-of-interest that can be detected. As biologically relevant regions may vary in intensity, employing a single threshold is insufficient. Additionally, as the probe-resolution varies across the genome, and across different tiling-array platforms, choosing a fixed number of consecutive probes as a region-of-interest is also inadequate. Various methods have been developed to detect regions-of-interest in ChIP-on-chip data such as Welch t-test, HMM, TileMap, MAT, Mixture model approach, CMARRT, Starr and Ringo [8–15]. MAT (Model-based analysis of tiling-arrays for ChIP-chip)  is one of the most cited methods for analyzing ChIP-on-chip data and it has been shown to outperform Welch t-test, HMM and TileMap [8–10]. MAT uses various user-defined parameters to model a region-of-interest, such as maximum bandwidth, maximum gap size between probes, the minimum number of probes in a region and the use of a fixed threshold. A major limitation of this method is that it assumes a uniform probe-resolution across the genome, and depends on many user-defined parameters.
Here, we propose a statistical framework (HAT: Hypergeometric Analysis of Tiling-arrays) to identify regions-of-interest in tiling-array data. HAT has several advantages over previous methods including MAT: i) as there is no single cut-off level for probe-intensity, HAT can detect regions-of-interest for a large number of thresholds, ii) it can detect regions-of-interest of any size, iii) it is independent of probe-resolution across the genome and across tiling-array platforms and iv) it employs only a single user defined parameter: the significance level. HAT can be seen as a generalization of the transcript discovery approach used in Bertone et al .
We have used two datasets using promoter tiling-arrays to evaluate HAT. In the first assay, tiling-array data was employed to identify genome-wide DNA binding profiles of the transcription factor Cebpa, in a cell line model. Using these data, we have shown that although HAT detected fewer regions-of-interest than MAT, the detected regions are more highly enriched for CEBP binding motifs, and include known Cebpa target genes. In the second experiment, a retroviral insertional mutagenesis assay, HAT identified novel putative transforming loci that may play a role in tumor development. Two of these loci were subsequently validated using PCR.
HAT can also detect and compare regions-of-interest across multiple samples. Each sample is analyzed independently, but when multiple samples within one experiment are used, detected regions-of-interest at the same genomic location among different samples are combined into 'common regions-of-interest', thereby increasing the confidence. In addition, HAT can incorporate sequence information for the detection of pre-defined sequences (e.g., binding location within or near the region). These are highlighted in the graphical output for every detected region-of-interest and indicated in the output file.
Results and Discussion
Two distinct experimental datasets were used in this study: ChIP-on-chip data derived from an inducible Cebpa expressing myeloid cell line model and data obtained from genomic DNA from retrovirus induced murine leukemias. Data were generated using the Affymetrix GeneChip Mouse Promoter 1.0 Array. This chip generates 4.6 million perfect match probes over 28000 mouse promoter regions. Promoter regions cover 6 Kb upstream to 2.5 Kb downstream of 5' transcription start sites. Each probe has a size of 25 nt.
Detection of regions-of-interest for cebpa chromatin immunoprecipitation by applying HAT
To compare different methods and to analyze the promoter array data, we made use of a dataset that was obtained from a ChIP of beta-estradiol induced Cebpa in a myeloid cell line, 32D, followed by promoter array hybridizations. The data were used to examine the validity of detected regions-of-interest in two ways: i) at the 'CCAAT' binding level; Cebpa interacts with the nucleotide sequence 'CCAAT' within the promoter regions represented on the chip, therefore CEBP binding motifs are expected to be enriched, and ii) at the gene level; examination of the presence of known Cebpa target genes, by taking the genes flanking the detected region-of-interest into account. Furthermore, one selected region-of-interest was validated by Real Time Quantitative PCR (qPCR).
The experimental setup was as follows: clones were derived from a myeloid cell line model (32D), that expresses either beta-estradiol inducible Cebpa-ER (3 clones) or control ER (2 clones). Chromatin immunoprecipitations were carried out using an antibody directed against ER in the beta-estradiol treated cells and the DNA obtained from these cells, after immunoprecipitation, was hybridized to Affymetrix promoter chips.
Motif enrichment analysis.
HAT: Motif enrichment analysis using α = 0.05.
In addition, the HAT and MAT results were also compared with the detected regions of Starr . Starr implements the CMARRT algorithm  and thereby incorporates the correlation structure for the identification of regions-of-interest in tiling-array data. For the detection of regions-of-interest, we have utilized similar parameter settings (fragment size = 600 bp, minimum number of probes in a region = 8 and α = 1 × 105) as used in HAT and MAT. Using these parameter settings, Starr detected 1664 regions-of-interest and showed high enrichment for CEBP binding motifs (Additional file 1: Supplemental Table S1). Following this, we have examined the overlap of regions-of-interest detected by all methods as depicted in Figure 2. All regions-of-interest detected by HAT (except one) were also detected by MAT alone or together with Starr (64 and 791 respectively). Note that the number of overlapping regions can contain multiple regions-of-interest detected by a single method. To asses the validity of the detected regions-of-interest by HAT, Starr and MAT, we have examined the enrichment for CEBP binding motifs for the different parts in the Venn diagram, depicted as different colors in Figure 2 (blue, red, green, orange and pink). High enrichment for CEBP motifs are found for; i) the overlap of HAT with the other two methods (pink: 719), ii) the overlap of HAT with MAT (blue: 64) and, iii) the overlap between Starr and MAT (orange: 652). No significant enriched motifs are found in the regions detected only by Starr (red: 70) and limited motifs are enriched for CEBP in the regions detected only by MAT (green: 3092). Therefore we can conclude that HAT had the highest specificity as it was able to detect regions-of-interest highly enriched for CEBP binding motifs.
Detection of retroviral insertion sites by HAT
Retroviral Integration Mutagenesis (RIM) in mice is a powerful tool to identify new genes playing an important role in oncogenesis. Mice are injected with retroviruses that potentially integrate into the murine genome upon infection. Viral integration can lead to gene deregulation, and depending on the genes affected, tumors may develop. Genes located proximal to viral integration sites are potentially oncogenic, leading to tumor development. Genomic regions that have been targeted by proviral DNA in multiple tumors are called common viral integration sites (VIS), and are likely driving tumor development. Using retroviral insertional mutagenesis, many oncogenes have been identified using large sequencing screens in multiple tumors [21–24]. We hypothesise that within tumors, genes may be silenced as a result of proviral integration caused by hypermethylation of the CpGs in the viral long terminal repeat, and subsequently in the promoters of their target genes. The identification of methylated genes by means of retroviral insertional mutagenesis may be studied by Methyl-DNA immunopreciptitation (MeDIP-on-chip), followed by inverse PCR, using long terminal repeat (LTR) specific primers. After combining these two technologies, we hybridized samples to Affymetrix promoter chips to identify genomic locations involved in viral integration that potentially harbour new tumor suppressor genes (TSG).
Extended applications of HAT
The scope of this method is not limited to the presented studies (i.e., detecting transcription factor binding sites and DNA methylated regions). Moreover, we have successfully applied HAT for the detection of regions enriched for histone modifications such as, trimethylation of histone 3 at lysine 4 or lysine 27 (H3K4 me3 and H3K27 me3) (data not shown). Some of the detected regions-of-interest were selected for further validation and confirmed by qPCR. Regarding tiling-array data spanning the entire genome  (e.g., RNA transcript mapping data ), we do not expect changes in algorithm performance (detection of regions-of-interest) due to an increased variability in hybridization consistency because the applied normalization method [11, 26] corrects for two major causes of differences in hybridization consistency, i.e., probe sequence and presence of repeats within the genome. Furthermore, in addition to one-color arrays (e.g., Affymetrix tiling-arrays) we envision that HAT can also be applied on data stemming from two-color arrays (e.g., Nimblegen tiling-arrays), because data structure remains similar. We stress however that the normalization procedure is an important step and strongly depends on the type of tiling-array dataset.
Here we propose a statistical framework; HAT (Hypergeometric Analysis of Tiling-arrays) to analyze tiling-array data. We showed that the method is robust and has increased specificity in the detection of regions-of-interest in comparison with two alternative methods. This is achieved by computing the hypergeometric-probability for every detected region-of-interest, among different threshold levels of probe-intensities and window sizes, while keeping control of the Family Wise Error (FWE) by employing Bonferroni correction. Besides the detection of regions-of-interest, HAT also determines sequences-of-interest, flanking genes and the distances to 5' transcriptional start sites on both DNA strands. We describe the performance of HAT, when applied to different experimental tiling-array datasets. For each experimental dataset, the selected downstream genes flanking the detected regions-of-interest were successfully confirmed by (q)PCR. We compared the detected regions-of-interest of HAT with two other methods (MAT  and Starr ), and showed that HAT resulted in a reduced number of detected regions-of-interest using the same significance for both MAT and Starr. However, using motif enrichment analysis we showed that the regions-of-interest detected by HAT were more enriched for the expected binding motifs of CEBP compared to MAT and showed similar enrichment for Starr, illustrating increased specificity using HAT.
Besides analyzing ChIP-on-chip data, HAT is also suitable for the analysis of other types of tiling-array data. Applying HAT to the data from the MeDIP inverse-PCR and promoter-GeneChip hybridization experiment, we discovered mVISs and cmVIS that are subject to DNA methylation and identified the genes (unpublished data) that flank these methylated viral integration sites (Figure 4 and 5).
HAT is applicable to detect regions-of-interest among the different applications of tiling-arrays, and has the advantage of being independent for thresholds, number of probes in a region and probe-resolution. It does not depend on setting various user defined parameters, except for the significance level and an optional maximum fragment size.
Extracting candidate gene-regions based on high throughput data using tiling-arrays is a multi-step process (Figure 1). The first step is to normalize the probe-intensity data from the chip (Figure 1A). For this purpose, we utilize the normalization from Model-based analysis of tiling-arrays for ChIP-chip (MAT) [11, 26], but other normalization procedures can also be applied. The normalization procedure prevents systematic variation between experimental conditions, which are unrelated to biological difierences. As a result of this normalization, the probe-intensity values follow a normal distribution with a negative mean; hence the majority of probes have values below zero, and are ignored in all subsequent analyses. Probe-intensities that may be the result of hybridization of labeled DNA on the chip (e.g., were present in the immunopreciptitated chromatin sample), have values greater then zero and are used to determine candidate regions-of-interest.
To define a region-of-interest, we determine the significance of all possible window positions g, for which the window contains at least one positive probe. To account for the fact that the exact number of probes in a region-of-interest is undefined, and may differ greatly between different regions-of-interest due to differences in local probe-resolution; the window width N is varied. To prevent evaluating many highly similar windows, thereby incurring a high multiple testing penalty, only those window widths for which the number of probes in the window varies are evaluated. Therefore, N is defined in terms of the number of probes contained in the window. The number of positive probes in a window of width N, at genomic position g, for threshold value t, is denoted by x(g, t, N). In the example presented in Figure 6, we varied N from 1 through to 3. For the case k(t) = 2 (Panel B and C), x(g, t, N) ranges from 1 through to 2, and in case k(t) = 4 (Panel E and F), x(g, t, N) ranges from 1 through to 4.
Due to the use of various values for t and N, similar or partly overlapping regions are found. In order to find a single region-of-interest at the same genomic location, these overlapping regions are merged by joining regions with one or more overlapping probes. In our example, we assume for simplicity, that windows with x(g, t, N) ≥ 2 are statistically significant. These statistically significant regions are colored blue and green in Figure 6C and Figure 6F respectively. The merging procedure is illustrated in Figure 6D, where four blue regions are merged into a single region, and in Figure 6G where 18 green regions are merged.
In our example so far, regions are detected within a single sample. When multiple samples are available (for the same experiment), array-wise detection of regions-of-interest is examined in order to detect common regions-of-interest (Figure 1D). A radius, defined in base pairs, can be defined to set the maximum distance between regions over multiple samples (default is zero).
Additional properties of HAT
The HAT method includes two additional properties beside the detection of regions-of-interest; i) The determination of sequences-of-interest surrounding and within the detected regions-of-interest, e.g., the enhancer binding protein Cebpa is known to interact with 'CCAAT' sequences, and it is therefore expected that detected regions-of-interest contain this sequence in a chromatin IP experiment. The presence, and positions of the sequences-of-interest can be indicated in the (graphical) output of HAT. In this graphical output, sequences are indicated with an upward facing green bar, indicating that the sequence is detected on the positive strand, or a downward facing green bar representing a sequence on the negative strand. ii) The determination of genes flanking the detected regions-of-interest. For every detected region-of-interest (for both up- and down-stream and forward and reverse DNA strands), the genes with the closest distance to the transcriptional start site are determined, and indicated in the (graphical) output.
To include these regions-of-interest and genes into the HAT method, the public genome-sequence (available for different model systems) can be utilized from the UCSC genome browser.
Availability and requirements
HAT is implemented in Matlab R2009b and is tested on Unix and MS-Windows. It is available on http://www.erasmusmc.nl/hematologie/. The run time depends on the number of used threshold cut-off's as the computation complexity increases linear with the used number of probes for the detection of regions-of-interest. In addition, run time also depends on the different steps in the method (Figure 1B-F). On average, for the cebpa-study, 28 minutes were needed per sample for the detection of regions-of-interest, while MAT required on average a run time of 23 minutes per sample. Note, however, that in our algorithm the data were analyzed using a multitude of window sizes and thresholds. A more detailed overview of the run time for each step in the method can be found in Additional file 2: Supplemental Figure S1.
The authors thank, Erik van den Akker, Martin van Vliet and Mathijs Sanders for the discussions. This research is supported by the Center for Translational Molecular Medicine (CTMM), the Netherlands Genomics Initiative (NGI) and the Dutch Cancer Society (KWF Kankerbestrijding).
- Aparicio O, Geisberg JV, Struhl K: Chromatin immunoprecipitation for determining the association of proteins with specific genomic sequences in vivo. Curr Protoc Cell Biol 2004, Chapter 17: Unit 17.7.PubMedGoogle Scholar
- Liu XS: Getting started in tiling microarray analysis. PLoS Comput Biol 2007, 3(10):1842–1844. 10.1371/journal.pcbi.0030183View ArticlePubMedGoogle Scholar
- Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schübeler D: Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet 2005, 37(8):853–862. 10.1038/ng1598View ArticlePubMedGoogle Scholar
- Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M: Global identification of human transcribed sequences with genome tiling arrays. Science 2004, 306(5705):2242–2246. 10.1126/science.1103388View ArticlePubMedGoogle Scholar
- Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R, Meltzer PS, Wolfsberg TG, Collins FS: DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods 2006, 3(7):503–509. 10.1038/nmeth888View ArticlePubMedPubMed CentralGoogle Scholar
- Heidenblad M, Lindgren D, Jonson T, Liedberg F, Veerla S, Chebil G, Gudjonsson S, Borg A, Månsson W, Höglund M: Tiling resolution array CGH and high density expression profiling of urothelial carcinomas delineate genomic amplicons and candidate target genes specific for advanced tumors. BMC Med Genomics 2008, 1: 3. 10.1186/1755-8794-1-3View ArticlePubMedPubMed CentralGoogle Scholar
- Royce TE, Rozowsky JS, Bertone P, Samanta M, Stolc V, Weissman S, Snyder M, Gerstein M: Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends Genet 2005, 21(8):466–475. 10.1016/j.tig.2005.06.007View ArticlePubMedPubMed CentralGoogle Scholar
- Keleş S, Laan MJ, Dudoit S, Cawley SE: Multiple testing methods for ChIP-Chip high density oligonucleotide array data. J Comput Biol 2006, 13(3):579–613. 10.1089/cmb.2006.13.579View ArticlePubMedGoogle Scholar
- Li W, Meyer CA, Liu XS: A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics 2005, 21(Suppl 1):i274-i282. 10.1093/bioinformatics/bti1046View ArticlePubMedGoogle Scholar
- Ji H, Wong WH: TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics 2005, 21(18):3629–3636. 10.1093/bioinformatics/bti593View ArticlePubMedGoogle Scholar
- Johnson WE, Li W, Meyer CA, Gottardo R, Carroll JS, Brown M, Liu XS: Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci USA 2006, 103(33):12457–12462. 10.1073/pnas.0601180103View ArticlePubMedPubMed CentralGoogle Scholar
- Sun W, Buck MJ, Patel M, Davis IJ: Improved ChIP-chip analysis by a mixture model approach. BMC Bioinformatics 2009, 10: 173. 10.1186/1471-2105-10-173View ArticlePubMedPubMed CentralGoogle Scholar
- Kuan PF, Chun H, Keleş S: CMARRT: a tool for the analysis of ChIP-chip data from tiling arrays by incorporating the correlation structure. Pac Symp Biocomput 2008, 515–526.Google Scholar
- Zacher B, Kuan PF, Tresch A: Starr: Simple Tiling ARRay analysis of Affymetrix ChIP-chip data. BMC Bioinformatics 2010, 11: 194. 10.1186/1471-2105-11-194View ArticlePubMedPubMed CentralGoogle Scholar
- Toedling J, Skylar O, Sklyar O, Krueger T, Fischer JJ, Sperling S, Huber W: Ringo-an R/Bioconductor package for analyzing ChIP-chip readouts. BMC Bioinformatics 2007, 8: 221. 10.1186/1471-2105-8-221View ArticlePubMedPubMed CentralGoogle Scholar
- Ji X, Li W, Song J, Wei L, Liu XS: CEAS: cis-regulatory element annotation system. Nucleic Acids Res 2006, (34 Web Server):W551-W554. 10.1093/nar/gkl322Google Scholar
- Tinel M, Berson A, Elkahwaji J, Cresteil T, Beaune P, Pessayre D: Downregulation of cytochromes P450 in growth-stimulated rat hepatocytes: role of c-Myc induction and impaired C/EBP binding to DNA. J Hepatol 2003, 39(2):171–178. 10.1016/S0168-8278(03)00238-1View ArticlePubMedGoogle Scholar
- Ramji DP, Foka P: CCAAT/enhancer-binding proteins: structure, function and regulation. Biochem J 2002, 365(Pt 3):561–575.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang W, Wang X, Ward AC, Touw IP, Friedman AD: C/EBPalpha and G-CSF receptor signals cooperate to induce the myeloperoxidase and neutrophil elastase genes. Leukemia 2001, 15(5):779–786. 10.1038/sj.leu.2402094View ArticlePubMedGoogle Scholar
- Zhang P, Iwama A, Datta MW, Darlington GJ, Link DC, Tenen DG: Upregulation of interleukin 6 and granulocyte colony-stimulating factor receptors by transcription factor CCAAT enhancer binding protein alpha (C/EBP alpha) is critical for granulopoiesis. J Exp Med 1998, 188(6):1173–1184. 10.1084/jem.188.6.1173View ArticlePubMedPubMed CentralGoogle Scholar
- Erkeland SJ, Valkhof M, Heijmans-Antonissen C, van Hoven-Beijen A, Delwel R, Hermans MHA, Touw IP: Large-scale identification of disease genes involved in acute myeloid leukemia. J Virol 2004, 78(4):1971–1980. 10.1128/JVI.78.4.1971-1980.2004View ArticlePubMedPubMed CentralGoogle Scholar
- Touw IP, Erkeland SJ: Retroviral insertion mutagenesis in mice as a comparative oncogenomics tool to identify disease genes in human leukemia. Mol Ther 2007, 15: 13–19. 10.1038/sj.mt.6300040View ArticlePubMedGoogle Scholar
- Theodorou V, Kimm MA, Boer M, Wessels L, Theelen W, Jonkers J, Hilkens J: MMTV insertional mutagenesis identifies genes, gene families and pathways involved in mammary cancer. Nat Genet 2007, 39(6):759–769. 10.1038/ng2034View ArticlePubMedGoogle Scholar
- Suzuki T, Shen H, Akagi K, Morse HC, Malley JD, Naiman DQ, Jenkins NA, Copeland NG: New genes involved in cancer identified by retroviral tagging. Nat Genet 2002, 32: 166–174. 10.1038/ng949View ArticlePubMedGoogle Scholar
- Mockler TC, Chan S, Sundaresan A, Chen H, Jacobsen SE, Ecker JR: Applications of DNA tiling arrays for whole-genome analysis. Genomics 2005, 85: 1–15. 10.1016/j.ygeno.2004.10.005View ArticlePubMedGoogle Scholar
- Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19(2):185–193. 10.1093/bioinformatics/19.2.185View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.