The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays
© Eggle et al; licensee BioMed Central Ltd. 2009
Received: 08 December 2008
Accepted: 18 June 2009
Published: 18 June 2009
The MAQC project demonstrated that microarrays with comparable content show inter- and intra-platform reproducibility. However, since the content of gene databases still increases, the development of new generations of microarrays covering new content is mandatory. To better understand the potential challenges updated microarray content might pose on clinical and biological projects we developed a methodology consisting of in silico analyses combined with performance analysis using real biological samples.
Here we clearly demonstrate that not only oligonucleotide design but also database content and annotation strongly influence comparability and performance of subsequent generations of microarrays. Additionally, using human blood samples and purified T lymphocyte subsets as two independent examples, we show that a performance analysis using biological samples is crucial for the assessment of consistency and differences.
This study provides an important resource assisting investigators in comparing microarrays of updated content especially when working in a clinical or regulatory setting.
The ability to assess genome-wide transcriptional profiles of cells, tissues or even whole organs is a cornerstone of the advances genomics has brought to the life and medical sciences [1, 2]. DNA microarrays are the major technology used for this purpose . Both in biology and medicine, important new findings have been revealed by this technology [4–6]. More recently, the MicroArray Quality Control (MAQC) project, a community-wide effort initiated and led by FDA (US Food and Drug Administration) scientists, has made a significant contribution assuring reliability and consistency of DNA microarray technology [7–12] at a time when concerns about repeatability, reproducibility and comparability of microarray results were raised [13–15]. The major message from the MAQC project is that microarrays with comparable content show inter- and intra-platform reproducibility of gene expression measurements. Major regulatory agencies such as the FDA or the European Medicine Agencies (E MEA) have recognized genomic technologies, particularly gene expression profiling by DNA microarrays, as opportunities in advancing personalized medicine [16, 17]. Therefore, the results established by MAQC are very promising for the use of DNA microarrays in drug development, medical diagnostics and risk assessment, and the use of these technologies has been encouraged by the regulatory agencies.
However, as already outlined by the MAQC project, an important aspect of DNA microarray technology needs further attention . Advances in array technology as well as improvements of genomic database content will lead to the development of new generations of microarrays in upcoming years [18, 19]. The currently available annotation of transcripts represented on DNA microarrays (microarray content) is still incomplete. In fact, our knowledge about gene expression is far from being complete, which is reflected by a continuous increase of content of gene databases such as RefSeq . Therefore there have been advances in updating the annotation of microarray probes to the most up-to-date annotation available by providing either new annotation files or software tools for re-annotating existing microarray formats [21–25]. So far, using the most recent DNA microarray technology has always been seen as an advantage – especially when searching for novel transcripts . However, this might be different in the context of drug development, medical diagnostics or risk assessment, where signatures rather than single genes are of highest relevance. Here, unaltered gene annotation and probe sequence content are needed for long-term applications. The potential impact of advances in technology and database content on successfully established diagnostic gene signatures (e.g. the 70-gene signature established by van't Veer et al. for predicting therapy outcome in breast cancer patients [27, 28]) has not been fully appreciated. It is therefore mandatory to develop approaches and methods that allow fast and decisive assessment of the global impact of database improvements, content changes of microarrays and technical advances.
Significant dynamics of gene sequence content of current genome databases
Content and annotation of microarrays depends on the repository database
Consistency of consecutive array versions strictly depends on database content and annotation
Altogether, comparability of consecutive array versions even on a single platform is a function of oligonucleotide design, database content and annotation available at the time of array design. Unexpectedly, optimal comparability is not achieved with the newest annotation of the RefSeq database but rather with the annotation available at the time of design of the newest array version. As long as the database content is not yet finalized, updates in array design are mandatory to correctly reflect genomic content.
Selection of representative data sets for best investigation of performance issues
I-huBC-V2 outperforms I-huBC-V1 concerning sensitivity, signal-to-noise-ratio and dynamic range
For further analyses concerning performance issues of two different array versions we cross-annotated the re-blasted probes from the I-huBC-V1 and the I-hu-V2 arrays BC (see Additional file 2). To quickly assess improvement of performance by newer generation technology, we assessed 4 parameters describing important quality aspects, (1) the percentage of detected transcripts reflecting sensitivity, (2) the dynamic range of signal intensities, (3) the values of background/noise signals reflecting signal-to-noise ratio and (4) technical replication reflecting reproducibility. In the Treg data set, on average 23.9% of all probes were called present on I-huBC-V1 and 31.0% on I-huBC-V2. Similarly, in the whole blood data set, we obtained mean percentages of present calls of 23.2% for I-huBC-V1 and 30.7% for I-huBC-V2 samples (see Additional file 5). Additionally, probes with low signal intensities on both arrays were generally more often called present on I-huBC-V2 in comparison to I-huBC-V1 suggesting that I-huBC-V2 has a higher detection sensitivity (see Additional file 2, see Additional file 6). Boxplots were used to compare the dynamic range of signals between I-huBC-V1 and I-huBC-V2. When plotting the signals of the 8,299 probes that were identical on both versions, we observed an enlargement of the dynamic range as well as a decrease in median signal intensities on I-huBC-V2 for both data sets (Figure 4B, C) which was due to reduced overall background values on I-huBC-V2 (for cross-annotated probes see Additional file 7A for the Treg dataset, see Additional File 7B for the whole blood dataset). Analysis of identical oligonucleotides represented on both versions in conjunction with the use of the same cRNA samples, can be used to assess the performance of both arrays concerning technical replication. When comparing raw signal intensities of such technical replicates we observed increased signal intensities for moderate to highly expressed transcripts on I-huBC-V2 (see Additional file 7C). For visualization we used pairwise scatterplots, principal components analysis (PCA) and hierarchical clustering on normalized data. Samples of the Treg data set showed a mean correlation of 0.97 ± 0.005 (see Additional file 8 for a table of all correlations and Additional file 9 for scatterplots) and samples of the whole blood data set a mean correlation of 0.91 ± 0.17 (see Additional file 10 for a table of all correlations and Additional file 11 for scatterplots). These results were confirmed when performing PCA using the 100 most variable probes out of the 8,299 identical oligonucleotides (Figure 4D, E). Additionally, when performing hierarchical clustering on these samples, almost all technical replicates clearly clustered next to each other (see Additional file 12).
Rank correlation metric reveals significant differences between subsequent microarray versions
Most recently, validity and comparability of transcriptional profiling using different microarray platforms has been very elegantly demonstrated by the MAQC consortium . Proving consistency of these technologies when introducing technological advances was suggested by MAQC as a major issue for future development. Here we have addressed the overall impact of improvements of genomic database content and annotation over time and the impact of technology optimization on major performance issues of a typical microarray analysis. Unexpectedly, database content and annotation as exemplified for the Refseq database still remains highly dynamic, which by itself has a significant impact on microarray probe annotation. Using an in silico approach based on BLAST analysis combined with categorization of probes and respective cross-annotation approaches, we demonstrate that content changes on a given microarray platform are also influenced by database dynamics. Moreover, we conducted a performance analysis combining common quality control measures with a rank correlation metric and show that the inclusion of real biological experiments is mandatory to estimate the overall impact of technology improvements on data consistency. Using the Illumina BeadChip platform as an example, we demonstrate that a large change of probe content between subsequent array versions results in incompatible data in addition to unexpected challenges, such as significant introduction of non-functional probes. This has high impact on biological screening experiments, when signals for known marker genes are lost (as exemplified for FOXP3). Even higher impact can be expected for experiments within a diagnostic setting, where content and technology changes will lead to incompatible diagnostic signatures. Up to now, using the most recent DNA microarray format has always been seen as an advantage, since the most recent version is usually an improvement of the old version. However, this might only be true for the technical performance of an array.
It should be noted that we chose the Illumina BeadChip over the Agilent arrays as an example, since the number of changes between subsequent array generations was significantly higher for this platform. Also, we have only used ~20,000 cross-annotated probes for performance analysis, which is less than 50% of the content. The reason for this strictness was, in part, based on a recent publication by Lee et al. demonstrating high signal disagreement for probes targeting genes susceptible to alternative splicing . We therefore limited our analysis to probes with identical targets.
As already outlined by the MAQC project, high throughput technologies including microarrays for transcriptional profiling require significantly more attention to quality control and comparability than any test measuring only a single data point . The MAQC project clearly demonstrated that comparability of microarray technology is already high 1) when restricting the analysis to a comparable set of data points (genes) and 2) when comparing high throughput technologies developed approximately at the same time. Here we clearly show that a next important step in genomic sciences will be to quickly introduce standardized general impact analyses to assess newer generation technologies. It would be desirable to introduce the presented approach as a starting point for further projects within the MAQC consortium. Next steps could be to test the overall impact of the presented approach in the larger consortium and perform such impact analyses on a grand scale respectively when new technologies become available again.
In summary, standardized methods and approaches are critically needed to quickly address the impact of introducing upgrades of high throughput technologies on project content.
Retrieving database releases and statistics
Human sequences for RefSeq releases 1 through 24 (September 2007) were obtained in two steps. First, the human RefSeq entries for each release were extracted from the release catalog which can be obtained from ftp://ftp.ncbi.nih.gov/refseq/release/release-catalog. Second, by using GI numbers and the E-utilities provided by NCBI, fasta sequences for each entry were downloaded. All fasta sequences for a Release were stored in a separate file. Human sequences for Ensembl releases 21–52 (April 2004 – December 2008) were obtained as fasta sequences from ftp://ftp.ensembl.org/pub/.
BLAST analysis of probes
For performing the BLAST analyses we used the Standalone BLAST tool (v2.2.16) distributed by NCBI ftp://ftp.ncbi.nih.gov/blast/. Probe sequences for the different array versions were extracted from the annotation files provided by the manufacturers and fasta files were generated from them. For blasting probe sequences we used the blastn program. The output file (tab-delimited) was imported into R for further analysis. Three different classes of hits to the databases can a be retrieved for each probe: (1) a hit was called 'perfect' if the alignment length was equal to the probe length and returned a 100% identity, (2) a hit was called 'imperfect' if the alignment length was equal to the probe length and returned an identity which was 90% < identity < 100% and (3) a hit was called 'unspecific' if the alignment length was shorter than the probe length.
Cross-annotation of probes
By BLAST analysis a set of probes was identified with perfect hits to Refseq. For cross-annotation purposes three types of probes with perfect hits have to be considered: (1) probes showing a single perfect hit to one Refseq ID, (2) probes with hits to more than one Refseq ID, all of which are splice variants of the same gene and (3) probes showing hits to more than one Refseq comprising different genes. To ensure cross-annotation of probes only within one probe type we chose the following cross-annotation approach:
Let list(X A ) (list(Y B )) be the list of Refseq IDs with a perfect hit of probe X (Y) on arrays A (B). Then X and Y will be cross-annotated if list (X A ) = list (Y B ). This approach ensures cross-annotation of probes within one probe type.
Determination of absent or present status of individual genes
For comparing the absent or present status of transcripts on the I-huBC-V1 and the I-huBC-V2 array, respectively, the following criteria were used: A probe was called present on a single array, if the detection p-value < 0.05. A probe was called present within a sub-group, if it was called present in at least 2/3 of the samples within this sub-group. Otherwise it was called absent.
Raw data collection for Illumina BeadChip arrays was performed using Illumina BeadStudio software. All data analysis was performed using the R Statistical language  and packages from the Bioconductor  project. Data sets were normalized using the quantile normalization method implemented in the 'affy' package. Hierarchical clustering was performed using the 'hcluster' package with average linkage and Pearson's correlation as the linkage resp. distance methods. Principal components analysis was performed using the pcurve package. Pairwise scatterplots for investigating technical replication were performed on normalized data. When performing an analysis based on the 8,299 identical probes data from I-huBC-V1 and I-huBC-V2 was limited to these 8,299 proes and then normalized together using quantile normalization. For all other analyses based on cross-annotated probes, data was normalized individually within each array version, since a combined normalization across cross-annotated probes (in contrast to identical probes) could potentially alter the results.
Differentially expressed genes were calculated using Student's t-test using the following criteria: fold change > 1.75, p-value < 0.05 and difference of mean group-signal > 100. Variation of probes across a data set was determined using the variation coefficient for each probe (mean/stdev) across all samples. The 100 most variable probes were then used for further analysis.
Rank correlation metric
To examine the comparability of results from two different array versions we performed a rank correlation comparison. Cross-annotated probes that were moderately to highly expressed (signal intensity > 500) or present in one of the sub-groups on either one array type were used for analysis. Probes were ranked according to the following criteria: (1) log fold change, (2) p-value and (3) difference of means. Rank correlations were calculated using Pearson's correlation coefficient implemented in R.
Sample collection and preparation
Blood samples from patients with systemic sclerosis or bacteremia, respectively, were collected in PAXgene blood RNA tubes (BD Biosciences, Heidelberg, Germany) after written informed consent had been obtained and following approval by the institutional review board. CD4+ CD127low CD25+ (Treg) and CD4+ CD127+ CD25-(Tconv) T cells were stained with CD4, CD25 and CD127 mAb (all from BD Pharmingen) and sorted on a FACSDiva cell sorter. Cell purity after isolation was assessed by intracellular staining for FOXP3 (e-bioscience) and routinely showed purities >95%.
RNA preparation and microarray hybridization
RNA from Treg and Tconv cells lysed in TRIzol (Invitrogen, Karlsruhe, Germany) was isolated according to the manufacturer's protocol with subsequent column purification using the RNeasy MinElute Cleanup Kit (Qiagen, Hilden, Germany). Total RNA from PAXgene samples was prepared according to the manufacturer's recommendations including an optional DNAse digestion step. cDNA and biotin-labeled cRNA synthesis was generated from 100 ng total RNA using the Illumina® TotalPrep™ RNA Amplification Kit (Applied Biosystems, Darmstadt, Germany). cRNA (1.5 μg) was hybridized to Human-6 Expression BeadChips V1 and V2 (Illumina, San Diego, CA) and scanned on Illumina BeadStation 500×. All microarray data has been submitted to Gene Expression Omnibus (GSE16031).
J. L. Schultze was supported by the Alexander von Humboldt Foundation via a Sofia-Kovalevskaja Award. The work was supported by a grant from the Bundesministerium für Bildung and Forschung NGFN N1K3-S24T27, a Köln Fortune grant, a grant from the Deutsche Krebshilfe (S. Debey-Pascher) and a grant from the Wilhelm-Sander Foundation (M. Beyer). The authors wish to thank Nico Hunzelmann for access to systemic sclerosis patient material and Harald Seifert for access to bacteremia patient material. We also wish to thank Kay Nieselt and Jürgen Bayorath for their invaluable comments on our manuscript.
- Pennacchio LA, Rubin EM: Genomic strategies to identify mammalian regulatory sequences. Nat Rev Genet 2001, 2(2):100–109. 10.1038/35052548View ArticlePubMedGoogle Scholar
- Reinke V, White KP: Developmental genomic approaches in model organisms. Annu Rev Genomics Hum Genet 2002, 3: 153–178. 10.1146/annurev.genom.3.031302.100922View ArticlePubMedGoogle Scholar
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270(5235):467–470. 10.1126/science.270.5235.467View ArticlePubMedGoogle Scholar
- Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, Olson JA Jr, Marks JR, Dressman HK, West M, Nevins JR: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 2006, 439(7074):353–357. 10.1038/nature04296View ArticlePubMedGoogle Scholar
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531–537. 10.1126/science.286.5439.531View ArticlePubMedGoogle Scholar
- Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, Bartel DP, Linsley PS, Johnson JM: Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 2005, 433(7027):769–773. 10.1038/nature03315View ArticlePubMedGoogle Scholar
- Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Lee KY, Ma Y, Maqsodi B, Papallo A, Peters EH, Poulter K, Ruppel PL, Samaha RR, Shi L, Yang W, Zhang L, Goodsaid FM: Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol 2006, 24(9):1115–1122. 10.1038/nbt1236View ArticlePubMedGoogle Scholar
- Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, Goodsaid FM, Hurban P, Phillips KL, Xu J, Deng X, Sun YA, Tong W, Dragan YP, Shi L: Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotechnol 2006, 24(9):1162–1169. 10.1038/nbt1238View ArticlePubMedGoogle Scholar
- Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR, Walker SJ, Zhang L, Hurban P, de Longueville F, Fuscoe JC, Tong W, Shi L, Wolfinger RD: Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol 2006, 24(9):1140–1150. 10.1038/nbt1242View ArticlePubMedGoogle Scholar
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JM, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Slikker W Jr, Shi L, Reid LH: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006, 24(9):1151–1161. 10.1038/nbt1239View ArticlePubMedGoogle Scholar
- Shippy R, Fulmer-Smentek S, Jensen RV, Jones WD, Wolber PK, Johnson CD, Pine PS, Boysen C, Guo X, Chudin E, Sun YA, Willey JC, Thierry-Mieg J, Thierry-Mieg D, Setterquist RA, Wilson M, Lucas AB, Novoradovskaya N, Papallo A, Turpaz Y, Baker SC, Warrington JA, Shi L, Herman D: Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat Biotechnol 2006, 24(9):1123–1131. 10.1038/nbt1241PubMed CentralView ArticlePubMedGoogle Scholar
- Tong W, Lucas AB, Shippy R, Fan X, Fang H, Hong H, Orr MS, Chu TM, Guo X, Collins PJ, Sun YA, Wang SJ, Bao W, Wolfinger RD, Shchegrova S, Guo L, Warrington JA, Shi L: Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotechnol 2006, 24(9):1132–1139. 10.1038/nbt1237View ArticlePubMedGoogle Scholar
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W: Multiple-laboratory comparison of microarray platforms. Nat Methods 2005, 2(5):345–350. 10.1038/nmeth756View ArticlePubMedGoogle Scholar
- Kuo WP, Liu F, Trimarchi J, Punzo C, Lombardi M, Sarang J, Whipple ME, Maysuria M, Serikawa K, Lee SY, McCrann D, Kang J, Shearstone JR, Burke J, Park DJ, Wang X, Rector TL, Ricciardi-Castagnoli P, Perrin S, Choi S, Bumgarner R, Kim JH, Short GF 3rd, Freeman MW, Seed B, Jensen R, Church GM, Hovig E, Cepko CL, Park P, Ohno-Machado L, Jenssen TK: A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies. Nat Biotechnol 2006, 24(7):832–840. 10.1038/nbt1217View ArticlePubMedGoogle Scholar
- Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J: Independence and reproducibility across microarray platforms. Nat Methods 2005, 2(5):337–344. 10.1038/nmeth757View ArticlePubMedGoogle Scholar
- Frueh FW: Impact of microarray data quality on genomic data submissions to the FDA. Nat Biotechnol 2006, 24(9):1105–1107. 10.1038/nbt0906-1105View ArticlePubMedGoogle Scholar
- Lesko LJ, Woodcock J: Translation of pharmacogenomics and pharmacogenetics: a regulatory perspective. Nat Rev Drug Discov 2004, 3(9):763–769. 10.1038/nrd1499View ArticlePubMedGoogle Scholar
- Hardiman G: Microarrays Technologies 2006: an overview. Pharmacogenomics 2006, 7(8):1153–1158. 10.2217/146224184.108.40.2063View ArticlePubMedGoogle Scholar
- Hoheisel JD: Microarray technology: beyond transcript profiling and genotype analysis. Nat Rev Genet 2006, 7(3):200–210. 10.1038/nrg1809View ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007, (35 Database):D61–65. 10.1093/nar/gkl842Google Scholar
- Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ, Meng F: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005, 33(20):e175. 10.1093/nar/gni179PubMed CentralView ArticlePubMedGoogle Scholar
- de Leeuw WC, Rauwerda H, Jonker MJ, Breit TM: Salvaging Affymetrix probes after probe-level re-annotation. BMC Res Notes 2008, 1: 66. 10.1186/1756-0500-1-66PubMed CentralView ArticlePubMedGoogle Scholar
- Ferrari F, Bortoluzzi S, Coppe A, Sirota A, Safran M, Shmoish M, Ferrari S, Lancet D, Danieli GA, Bicciato S: Novel definition files for human GeneChips based on GeneAnnot. BMC Bioinformatics 2007, 8: 446. 10.1186/1471-2105-8-446PubMed CentralView ArticlePubMedGoogle Scholar
- Harbig J, Sprinkle R, Enkemann SA: A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array. Nucleic Acids Res 2005, 33(3):e31. 10.1093/nar/gni027PubMed CentralView ArticlePubMedGoogle Scholar
- Berg BH, Konieczka JH, McCarthy FM, Burgess SC: ArrayIDer: automated structural re-annotation pipeline for DNA microarrays. BMC Bioinformatics 2009, 10: 30. 10.1186/1471-2105-10-30PubMed CentralView ArticlePubMedGoogle Scholar
- Classen S, Zander T, Eggle D, Chemnitz JM, Brors B, Buchmann I, Popov A, Beyer M, Eils R, Debey S, S chultzeJL: Human resting CD4+ T cells are constitutively inhibited by TGF beta under steady-state conditions. J Immunol 2007, 178(11):6931–6940.View ArticlePubMedGoogle Scholar
- Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteve A, Glas en A, Delahaye L, Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R: A gene-expression signature as a predictor of survival in breast cancer. The New England journal of medicine 2002, 347(25):1999–2009. 10.1056/NEJMoa021967View ArticlePubMedGoogle Scholar
- van't Veer LJ, Dai H, Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415(6871):530–536. 10.1038/415530aView ArticleGoogle Scholar
- Pontius J, Wagner L, Schuler G: UniGene: a unified view of the transcriptome. In The NCBI Handbook. Bethesda, MD: National Center for Biotechnology Information; 2003.Google Scholar
- Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJ, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S: Ensembl 2008. Nucleic acids research 2008, (36 Database):D707–14.Google Scholar
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic acids research 2006, (34 Database):D16–20. 10.1093/nar/gkj157Google Scholar
- Avery OT, MacLeod CM, McCarty M: Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Inductions of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J Exp Med 1979, 149(2):297–326. 10.1084/jem.149.2.297View ArticlePubMedGoogle Scholar
- Kronick MN: Creation of the whole human genome microarray. Expert review of proteomics 2004, 1(1):19–28. 10.1586/147894220.127.116.11View ArticlePubMedGoogle Scholar
- Kuhn K, Baker SC, Chudin E, Lieu MH, Oeser S, Bennett H, Rigault P, Barker D, McDaniel TK, Chee MS: A novel, high-performance random array platform for quantitative gene expression profiling. Genome Res 2004, 14(11):2347–2356. 10.1101/gr.2739104PubMed CentralView ArticlePubMedGoogle Scholar
- Liu W, Putnam AL, Xu-Yu Z, Szot GL, Lee MR, Zhu S, Gottlieb PA, Kapranov P, Gingeras TR, Fazekas de St Groth B, Clayberger C, Soper DM, Ziegler SF, Bluestone JA: CD127 expression inversely correlates with FoxP3 and suppressive function of human CD4+ T reg cells. J Exp Med 2006, 203(7):1701–1711. 10.1084/jem.20060772PubMed CentralView ArticlePubMedGoogle Scholar
- Seddiki N, Santner-Nanan B, Martinson J, Zaunders J, Sasson S, Landay A, Solomon M, Selby W, Alexander SI, Nanan R, Kelleher A, Fazekas de St Groth B: Expression of interleukin (IL)-2 and IL-7 receptors discriminates between human regulatory and activated T cells. J Exp Med 2006, 203(7):1693–1700. 10.1084/jem.20060468PubMed CentralView ArticlePubMedGoogle Scholar
- Lee JC, Stiles D, Lu J, Cam MC: A detailed transcript-level probe annotation reveals alternative splicing based microarray platform differences. BMC Genomics 2007, 8: 284. 10.1186/1471-2164-8-284PubMed CentralView ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2007.Google Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JY, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004, 5(10):R80. 10.1186/gb-2004-5-10-r80PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.