Analysis of DNA strand-specific differential expression with high density tiling microarrays
© Quintales et al; licensee BioMed Central Ltd. 2010
Received: 19 November 2009
Accepted: 17 March 2010
Published: 17 March 2010
DNA microarray technology allows the analysis of genome structure and dynamics at genome-wide scale. Expression microarrays (EMA) contain probes for annotated open reading frames (ORF) and are widely used for the analysis of differential gene expression. By contrast, tiling microarrays (TMA) have a much higher probe density and provide unbiased genome-wide coverage. The purpose of this study was to develop a protocol to exploit the high resolution of TMAs for quantitative measurement of DNA strand-specific differential expression of annotated and non-annotated transcripts.
We extensively filtered probes present in Affymetrix Genechip Yeast Genome 2.0 expression and GeneChip S. pombe 1.0FR tiling microarrays to generate custom Chip Description Files (CDF) in order to compare their efficiency. We experimentally tested the potential of our approach by measuring the differential expression of 4904 genes in the yeast Schizosaccharomyces pombe growing under conditions of oxidative stress. The results showed a Pearson correlation coefficient of 0.943 between both platforms, indicating that TMAs are as reliable as EMAs for quantitative expression analysis. A significant advantage of TMAs over EMAs is the possibility of detecting non-annotated transcripts generated only under specific physiological conditions. To take full advantage of this property, we have used a target-labelling protocol that preserves the original polarity of the transcripts and, therefore, allows the strand-specific differential expression of non-annotated transcripts to be determined. By using a segmentation algorithm prior to generating the corresponding custom CDFs, we identified and quantitatively measured the expression of 510 transcripts longer than 180 nucleotides and not overlapping previously annotated ORFs that were differentially expressed at least 2-fold under oxidative stress.
We show that the information derived from TMA hybridization can be processed simultaneously for high-resolution qualitative and quantitative analysis of the differential expression of well-characterized genes and of previously non-annotated and antisense transcripts. The consistency of the performance of TMA, their genome-wide coverage and adaptability to updated genome annotations, and the possibility of measuring strand-specific differential expression makes them a tool of choice for the analysis of gene expression in any organism for which TMA platforms are available.
The introduction of gene expression DNA microarrays (EMAs) about 15 years ago opened a whole new range of possibilities for studying genome dynamics by making possible the simultaneous analysis of the transcription of all the genes in a genome . Genes are represented in EMAs either by a reduced number of oligonucleotides (around 11) or by PCR-synthesized fragments spanning a fraction of their length. The advent of genomic tiling microarrays (TMAs) expanded the possibilities of EMAs by increasing the number of probes so that complete genome coverage could be reached. TMAs are widely used for structural and functional genome analyses, which include the localization of protein-DNA interactions by chromatin immunoprecipitation followed by microarray hybridization (ChIP on chip), the mapping of DNA methylation and histone modifications, nucleosome positioning, DNase hypersensitive regions and the assessment of copy number variation, among other applications (reviewed in ).
The generation of high-resolution transcription maps by hybridizing total RNA to TMAs has uncovered the existence of a large variety of RNAs, many of which are non-coding, in a range of organisms that include Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila, human  and Arabidopsis. This unprecedented view of the transcriptional landscape of the genome derives mainly from a qualitative interpretation of TMA analysis, and raises the challenge of establishing the putative biological role of non-annotated transcriptionally active regions. A step towards assigning functions to these transcripts is their quantitative analysis to facilitate comparisons between different physiological conditions. In principle, the much higher density of the probes of TMAs and the possibility of providing unbiased information about transcription directionality and antisense transcription should offer several advantages over EMAs for measuring differential gene expression. One disadvantage in the use of TMAs for expression analyses, however, is the requirement of more sophisticated bioinformatic tools to process the hybridization signal from several million probes that have not been classified as genic or intergenic. In contrast, the number of probes in EMAs is at least one order of magnitude lower; they are unambiguously ascribed to specific genes, and the processing and summarization of their hybridization signal is relatively straightforward.
Here we report a probe-filtering protocol to generate custom Chip Description Files (CDF) to process the hybridization signals of TMAs from each DNA strand in a quantitative manner to measure differential transcriptional expression. CDFs can be generated from any genome annotation or any set of probes in a microarray and they allow direct use with the same tools as those used for the analysis of differential expression with EMAs. We experimentally compared the performance of the Affymetrix TMA and EMA platforms hybridized with identical RNA samples from the yeast Schizosaccharomyces pombe to measure differential gene expression under conditions of oxidative stress. We also compared our results with those from a previous study using custom-made microarrays based on PCR amplified probes representing over 4500 S. pombe genes [10, 11]. Our results show that TMAs are as reliable as EMAs for measuring the differential expression of protein coding genes. In addition, by combining the high resolution of TMAs with a labelling protocol that preserves the polarity of RNA, we show that they allow the quantitative analysis of previously unidentified strand-specific non-annotated and sense/antisense transcripts.
Schizosaccharomyces pombe culture growth, oxidative stress conditions, and RNA isolation
Cultures of S. pombe wild-type strain 972 h- were grown under identical conditions to those described by Chen et al.  in 100 ml yeast extract (YE) medium at 30°C and 170 rpm up to OD595 = 0.2 (4 * 106 cells/ml). Two separate cultures developed from independent single colonies were processed in parallel throughout the entire experiment (biological duplicates). Cells from a 30 ml volume were collected by centrifugation at 2000 rpm for 2 minutes and the pellet was immediately frozen in liquid nitrogen. Hydrogen peroxide (SIGMA, H-1009) was added to the rest of the culture at a final concentration of 0.5 mM and incubation was allowed to proceed for 30 minutes, after which 30 ml of culture were processed as above.
Total RNA was prepared by resuspending the cell pellets in 20 μ l extraction buffer (100 mM EDTA, pH 8.0, 100 mM NaCl, 50 mM Tris-HCl, pH 8.0), 20 μ l phenol/chloroform, 2 μ l 10% SDS, 200 μ l glass beads (425-600 μ m, SIGMA G-8772). Cells were mechanically disrupted in a Fast-Prep device (Savant BIO 101) and the cell lysate was extracted with phenol, phenol/chloroform and chloroform/isoamyl alcohol before precipitation with 0.3 M sodium acetate and ethanol. RNA was resuspended in 50 μ l of sterile water with diethyl pyrocarbonate (SIGMA D-5758) and was further purified with the RNeasy mini kit (Quiagen) following the supplier's specifications.
Target labelling and microarray hybridization
To hybridize the Affymetrix Genechip Yeast Genome 2.0 expression microarrays (EMA), 7 μ g of total RNA was used for cDNA synthesis. Target labelling was performed following the instructions of the Affymetrix GeneChip whole transcript double-stranded target-labelling assay manual. To hybridize the Affymetrix GeneChip S. pombe 1.0FR tiling microarray (TMA), 300 ng of total RNA without rRNA reduction was used for cDNA synthesis. Target labelling preserving the original polarity of RNAs was performed following the instructions of the GeneChip whole transcript sense target labelling assay manual from Affymetrix. Biological duplicates from cells treated and not treated with 0.5 mM hydrogen peroxide were used to hybridize TMAs and EMAs. The Pearson correlation coefficients of the probe hybridization signals between TMA duplicates hybridized with RNA from untreated and hydrogen peroxide-treated samples were 0.997 and 0.996, respectively. In the case of EMAs, the Pearson correlation coefficients were 0.998 and 0.998, indicating minimum variability between duplicates. The complete set of microarray hybridization results is available at the GEO database under accession number GSE19020.
Differential expression analyses
For differential expression analyses, microarray probe intensities were processed using the Robust Multiarray Average (RMA) procedure, which includes RMA background adjustment, quantile normalization, and median polish summarization .
Segmentation algorithm for non-annotated differentially transcribed regions (dTRs)
The segmentation algorithm used to define the boundaries of non-annotated differentially transcribed regions (dTRs) included only probes displaying a difference in the hybridization signal above 0.8 (log2 scale). Probes less than 60 nucleotides apart (approximately 3 tiling probes) were clustered in a single region. Only regions larger than 180 nucleotides with an average hybridization signal difference of all the probes included above 0.8 (log2 scale) were selected. Regions meeting these criteria were fused if the distance between them was shorter than 120 nucleotides.
Results and Discussion
Generation of custom Chip Description Files (CDF) for expression analyses
Number of probes and probesets during generation of CDFs
Total probes in MA
Unique probes in MA (Step 1)
Probes in Genome (Step 2)
Unique probes in Genome (Step 3)
Intragenic probes (Step 4)
Probe sets (genes) in CDF (Step 5)
CDF "Sp_TMA". This included probes from the Affymetrix GeneChip S. pombe 1.0FR tiling array filtered as described above to generate 4972 probesets.
CDF "Sp_EMA". This included probes from Affymetrix GeneChip Yeast Genome 2.0 filtered to generate 4904 probesets. We used the same genome annotation as in the Sp_TMA CDF to make the results comparable between both platforms.
CDF "Sp_PCR_TMA". To compare results from both Affymetrix platforms and custom designed microarrays developed in the Sanger Centre , as a reference in step 4 (Figure 1) we used the sequence of the amplicons used as probes in the Sanger microarray. As a result, 4574 probesets were generated from the Affymetrix TMA 1.0FR matching sequences in the Sanger amplicons. We have called the original Sanger custom microarray "Sp_PCR_EMA".
The Perl software used and the custom CDFs generated for expression analysis can be downloaded from our web site http://genomics.usal.es/TMADE.
Probe density and number of genes analyzed using different platforms
Comparative analysis of differential gene expression
Taken together, these results show that the Sp_TMA and Sp_EMA Affymetrix platforms yielded virtually identical results, thus validating the use of TMAs for the analysis of differential expression of annotated genes. These results are consistent with those reported in a previous study carried out in Arabidopsis, in which a strong correlation between the performance of EMAs and TMAs for quantitative gene expression was also found . The fact that the correlation between the results from both platforms was higher in our study could be due to a more precise annotation of the S. pombe genome relative to Arabidopsis or to the fact that repetitive probes were not filtered out in that study.
Quantitative analysis of DNA strand-specific transcription and of non-annotated transcripts
The development of DNA microarrays and more recently of deep sequencing technologies has revealed that in addition to protein coding genes, a large fraction of eukaryotic genomes are transcribed. Detailed transcriptome maps in Saccharomyces cerevisiae have uncovered an unexpectedly large amount of stable and unstable non-coding RNAs, a large fraction of which are transcribed bidirectionally from nucleosome-free regions [17, 18]. In order to assess the biological role of these trancripts, the approach described here should be useful to measure their differential expression under different physiological conditions. It could also be adapted to the analysis of the allele-specific expression that has been recently reported in S. cerevisiae. The possibility of assigning polarity to non-annotated dTRs is essential for predicting possible RNA secondary structures that could be relevant to their function. This is particularly well illustrated by the human HARF1 non-coding transcript, which derives from one of the most divergent regions between humans and chimpanzees  and is one out of several candidate genes that could contribute to establishing differences between both species.
We have shown that information derived from TMA hybridization can be simultaneously processed for high-resolution qualitative and quantitative analysis of differentially transcribed regions. The consistency of the performance of TMAs, their genome-wide coverage, and their adaptability to updated genome annotations, together with the possibility of quantitative measurement of the differential expression of non-annotated and antisense transcripts, makes them a tool of choice for the analysis of genome dynamics in any organism for which TMA platforms are available.
We thank Dr. Encarnación Fermiñán for advice and excellent technical support with microarray hybridization. This work was funded by grants BFU2008-01919BMC and Consolider-Ingenio CSD2007-00015 from the Spanish Ministerio de Ciencia e Innovación.
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270(5235):467–470. 10.1126/science.270.5235.467View ArticlePubMedGoogle Scholar
- Liu XS: Getting started in tiling microarray analysis. PLoS Comput Biol 2007, 3(10):1842–1844. 10.1371/journal.pcbi.0030183View ArticlePubMedGoogle Scholar
- Rasmussen S, Nielsen HB, Jarmer H: The transcriptionally active regions in the genome of Bacillus subtilis. Mol Microbiol 2009, 73(6):1043–1057. 10.1111/j.1365-2958.2009.06830.xView ArticlePubMedPubMed CentralGoogle Scholar
- David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, Jones T, Davis RW, Steinmetz LM: A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci USA 2006, 103(14):5320–5325. 10.1073/pnas.0601091103View ArticlePubMedPubMed CentralGoogle Scholar
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bähler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 2008, 453(7199):1239–1243. 10.1038/nature07002View ArticlePubMedGoogle Scholar
- He H, Wang J, Liu T, Liu XS, Li T, Wang Y, Qian Z, Zheng H, Zhu X, Wu T, Shi B, Deng W, Zhou W, Skogerbø G, Chen R: Mapping the C. elegans noncoding transcriptome with a whole-genome tiling microarray. Genome Res 2007, 17(10):1471–1477. 10.1101/gr.6611807View ArticlePubMedPubMed CentralGoogle Scholar
- Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE, Bussemaker HJ, White KP: A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 2004, 306(5696):655–660. 10.1126/science.1101312View ArticlePubMedGoogle Scholar
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 2005, 308(5725):1149–1154. 10.1126/science.1108625View ArticlePubMedGoogle Scholar
- Laubinger S, Zeller G, Henz SR, Sachsenberg T, Widmer CK, Naouar N, Vuylsteke M, Schölkopf B, Rätsch G, Weigel D: At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana. Genome Biol 2008, 9(7):R112. 10.1186/gb-2008-9-7-r112View ArticlePubMedPubMed CentralGoogle Scholar
- Lyne R, Burns G, Mata J, Penkett CJ, Rustici G, Chen D, Langford C, Vetrie D, Bähler J: Whole-genome microarrays of fission yeast: characteristics, accuracy, reproducibility, and processing of array data. BMC Genomics 2003, 4: 27. 10.1186/1471-2164-4-27View ArticlePubMedPubMed CentralGoogle Scholar
- Chen D, Wilkinson CRM, Watt S, Penkett CJ, Toone WM, Jones N, Bähler J: Multiple pathways differentially regulate global oxidative stress responses in fission yeast. Mol Biol Cell 2008, 19: 308–317. 10.1091/mbc.E07-08-0735View ArticlePubMedPubMed CentralGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4(2):249–264. 10.1093/biostatistics/4.2.249View ArticlePubMedGoogle Scholar
- Karp RH, Rabin MO: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 1987., 31(2):Google Scholar
- Lu J, Lee JC, Salit ML, Cam MC: Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: high-resolution annotation for microarrays. BMC Bioinformatics 2007, 8: 108. 10.1186/1471-2105-8-108View ArticlePubMedPubMed CentralGoogle Scholar
- Ioannidis JPA, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC, Falchi M, Furlanello C, Game L, Jurman G, Mangion J, Mehta T, Nitzberg M, Page GP, Petretto E, van Noort V: Repeatability of published microarray gene expression analyses. Nat Genet 2009, 41(2):149–155. 10.1038/ng.295View ArticlePubMedGoogle Scholar
- Royce TE, Rozowsky JS, Bertone P, Samanta M, Stolc V, Weissman S, Snyder M, Gerstein M: Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends Genet 2005, 21(8):466–475. 10.1016/j.tig.2005.06.007View ArticlePubMedPubMed CentralGoogle Scholar
- Xu Z, Wei W, Gagneur J, Perocchi F, Clauder-Münster S, Camblong J, Guffanti E, Stutz F, Huber W, Steinmetz LM: Bidirectional promoters generate pervasive transcription in yeast. Nature 2009, 457(7232):1033–1037. 10.1038/nature07728View ArticlePubMedPubMed CentralGoogle Scholar
- Neil H, Malabat C, d'Aubenton Carafa Y, Xu Z, Steinmetz LM, Jacquier A: Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature 2009, 457(7232):1038–1042. 10.1038/nature07747View ArticlePubMedGoogle Scholar
- Gagneur J, Sinha H, Perocchi F, Bourgon R, Huber W, Steinmetz LM: Genome-wide allele- and strand-specific expression profiling. Mol Syst Biol 2009, 5: 274. 10.1038/msb.2009.31View ArticlePubMedPubMed CentralGoogle Scholar
- Pollard KS, Salama SR, King B, Kern AD, Dreszer T, Katzman S, Siepel A, Pedersen JS, Bejerano G, Baertsch R, Rosenbloom KR, Kent J, Haussler D: Forces shaping the fastest evolving regions in the human genome. PLoS Genet 2006, 2(10):e168. 10.1371/journal.pgen.0020168View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.