A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat
© Alberts et al; licensee BioMed Central Ltd. 2007
Received: 12 September 2006
Accepted: 20 April 2007
Published: 20 April 2007
The Affymetrix GeneChip technology uses multiple probes per gene to measure its expression level. Individual probe signals can vary widely, which hampers proper interpretation. This variation can be caused by probes that do not properly match their target gene or that match multiple genes. To determine the accuracy of Affymetrix arrays, we developed an extensive verification protocol, for mouse arrays incorporating the NCBI RefSeq, NCBI UniGene Unique, NIA Mouse Gene Index, and UCSC mouse genome databases.
Applying this protocol to Affymetrix Mouse Genome arrays (the earlier U74Av2 and the newer 430 2.0 array), the number of sequence-verified probes with perfect matches was no less than 85% and 95%, respectively; and for 74% and 85% of the probe sets all probes were sequence verified. The latter percentages increased to 80% and 94% after discarding one or two unverifiable probes per probe set, and even further to 84% and 97% when, in addition, allowing for one or two mismatches between probe and target gene. Similar results were obtained for other mouse arrays, as well as for human and rat arrays. Based on these data, refined chip definition files for all arrays are provided online. Researchers can choose the version appropriate for their study to (re)analyze expression data.
The accuracy of Affymetrix probe sequences is higher than previously reported, particularly on newer arrays. Yet, refined probe set definitions have clear effects on the detection of differentially expressed genes. We demonstrate that the interpretation of the results of Affymetrix arrays is improved when the new chip definition files are used.
Microarrays are widely used to study genome-wide gene expression levels. A frequently used type of microarray is the Affymetrix GeneChip . This technology uses multiple probes per gene (probe set) to measure the amount of mRNA present (target). For reasons of specificity, probes are chosen to be complementary to a unique part of the target sequence. Although all probes from a single probe set should measure the same amount of mRNA, the hybridization signals of individual probes for a given mRNA molecule may vary widely. This is believed to be caused by variations in molecular characteristics of the probe sequence, such as GC content and secondary structure, and corrections have been proposed to calculate true expression levels averaged over probe signals [2, 3]. However, another reason for the variation in signal between probes could be misdesigned probes, that either do not match the target RNA or can hybridize with other, non-target, RNA molecules. For correct interpretation of the results of Affymetrix GeneChip hybridizations, it is important to know which probes may cause variation in hybridization and for what reason. For example, in our large scale genetical genomics applications [4–6], individual probe hybridizations are used to map regulatory regions in a genome. In such applications, it is important to be able to rule out potential false positive results due to misdesigned probes.
An earlier analysis of the probe sequences of the Affymetrix mouse genome U74Av2 array  against the RefSeq database showed that for only 51% of the probe sets on the array all probes could be 'entirely verified', that is, corresponded without any mismatch to a RefSeq mRNA sequence. A recent analysis at the individual probe level verified 73% of the individual probe sequences of the MG-U74Av2 array against mRNA sequences from Entrez . Affymetrix supplies regular updates of probe set verifications using new releases of the RefSeq, GenBank and Ensembl databases [9, 10]. In the July 2006 release, 70% of the probe sets of the MG-U74Av2 GeneChip are 'entirely verified'. These surprisingly low verification percentages suggest that a major part of the hybridization results of such an array should be regarded with caution. Little information is available on the possibility of hybridization of individual mouse probes with non-target RNA molecules . Here we present an extensive and generalized protocol for the verification of probe sequences on Affymetrix arrays.
The protocol uses four databases: NCBI RefSeq, NCBI UniGene Unique, NIA Mouse Gene Index, and UCSC mouse genome. By incorporating these databases in the verification protocol, the number of sequence-verified probes of the Affymetrix mouse arrays increases considerably. The same protocol applied to other mouse arrays, or a similar protocol (based on RefSeq, UniGene Unique and UCSC genome) for human and rat arrays, yielded similar results. Refined chip definition files (CDF files), which include only verified probes, are provided online.
We conclude that with the corrections as proposed previously [2, 3], the accuracy and reliability of the Affymetrix arrays is considerably higher than reported till now. Our new data on probe verification and cross-hybridization are important for assessing unexpected behaviour of any given individual probe in a given experiment and will contribute to the more accurate assessment of expression data using Affymetrix arrays.
Quality of sequence databases
Comparison of sequence databases
No. of mismatches
No. of gaps
No. of nucleotides
The verification protocol
In the protocol for mouse arrays, we use the BLAST program to verify all probe sequences against the three messenger databases (see Methods). Using the terminology of Mecham et al. , for each probe set we determine per database whether it is
'entirely verified', meaning that all probes were identical to a messenger sequence;
'partially verified', meaning that only a subset of probes was identical to a messenger sequence;
'entirely unverified', meaning that none of the probes was identical to a messenger sequence.
Only probe sets that could not be classified as 'entirely verified' against one of the three messenger databases, were verified against the genome (see Methods). Each probe set is assigned a verification score which is the best score over all databases, where 'entirely verified' is better than 'partially verified', and 'partially verified' is better than 'entirely unverified'. For the final verification score the order of the databases does not matter since each probe set is assigned the best possible score.
We included all mentioned databases in the protocol to obtain the greatest coverage. Since the genome sequence database is much larger than the messenger databases and therefore the verification against the genome takes much longer, we have put the verification against the genome in the last position. This improves computational efficiency. The verification is not hampered by the lower accuracy of the messenger databases compared to the genome, since only 0.60% (0.34%) of the probe sets of the MG-U74Av2 (430 2.0) array were 'entirely verified' against one of the messenger databases but 'entirely unverified' against the genome. We examined some of the probe sets that were 'entirely unverified' against the genome in more detail. These seem to represent contaminated non-mouse sequences, or the tiny fraction of genes that are still missing from the assembled genomes. Because there are no major quality differences between the messenger databases, their order is in principle arbitrary. However, we have put RefSeq in the first position in the protocol since it contains the most intensively curated transcript sequence information and probe sets that are 'entirely verified' against this database exit the protocol with RefSeq gene identifiers (supplementary material).
Verification of the U74 and 430 arrays
Megablasting all probe sequences of the U74 array against the mouse NCBI RefSeq database 'entirely verified' only 53% of all probe sets; this confirms the 51% reported earlier  with an older version of the RefSeq database. From the 430 array, only 46% of all probe sets could be 'entirely verified' (Figure 1). Next, by including the UniGene Unique database we 'entirely verified' 59% and 56% of all probe sets in U74 and 430, respectively. Then, by including the NIA Mouse Gene Index, the percentages grow to 69% and 74%, respectively. At last, we verified the remaining probe sets that were not yet 'entirely verified' against the UCSC mouse genome database. This way, we finally 'entirely verified' 74% and 85% of all probe sets in U74 and 430, respectively. More detailed numbers of the contribution of each of the databases to the final verification are given in additional file 1: 'Verification scores for the Affymetrix U74 array' and additional file 2: 'Verification scores for the Affymetrix 430 array'.
Most 'partially verified' probe sets contain at most two bad probes
Laboratory experience has shown that often the hybridization conditions do not allow distinction between a perfect match and a mismatch probe . In this context, it could be argued that the requirement for a perfect match in probe sequence verification is not necessary, especially when only PM signals are used for estimating the expression levels, as is the case for most modern probe summarization methods (RMA, GCRMA). Moreover, messenger databases contain sequencing errors. For these two reasons, we have repeated the verification protocol as established above while allowing either one or two mismatches per probe sequence; 26% and 47% of the unverified probes had one or two mismatches between probe and target for U74 and 430, respectively. Figure 1 shows that the percentage of 'entirely verified' probe sets increases considerably, up to 77% for U74 and 91% for 430 in case of one mismatch and up to 79% for U74 and 93% for 430 for two mismatches. If we restrict ourselves to probe sets labeled by Affymetrix with "_at" then 85% of the probe sets are 'entirely verified' for U74 and 92% for 430 in case of one mismatch, and 87% for U74 and 93% for 430 for two mismatches. If we allow for two mismatches and also drop one or two unverifiable probes then 84% and 97% of all probe sets of U74 and 430 were 'entirely verified'. The hybridization conditions of the individual laboratory will have to decide which validation scheme is most appropriate and which probes or probe sets have to be scrutinized with more care.
Verification of all available human, mouse and rat arrays confirms high probe accuracy
Percentage of verified probe sets for all Affymetrix human, mouse and rat arrays analyzed
Feb 19, 2002
Jul 02, 2002
Nov 07, 2003
Feb 19, 2002
Feb 19, 2002
HG-U133 Plus 2.0
Nov 07, 2003
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
Jul 19, 2004
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
Jun 18, 2003
Jun 18, 2003
Mouse 430 2.0
May 25, 2004
Mouse 430A 2.0
Jun 18, 2003
Feb 19, 2002
Feb 19, 2002
Jun 19, 2003
Jun 19, 2003
Rat 230 2.0
Jul 20, 2004
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
Feb 19, 2002
The impact of updated probe set definitions on expression data
Microarrays are often used to find genes that are differentially expressed. To assess the impact of the updated probe set definitions on the assessment of differential gene expression, we reanalyzed an example dataset, the Clinical Prostate Cancer Behavior dataset (see Methods), consisting of 52 prostate tumor RNA samples and 50 non-diseased RNA samples hybridized to the human HG-U95Av2 array. Using RankProducts (Methods), we calculated lists of differentially expressed genes, both using the original Affymetrix CDF file and the new CDF file. 943 upregulated probe sets were detected with both CDF files, 32 probe sets were detected only with the new CDF file and 41 probe sets were detected only with the original CDF file (at a significance level of p < 0.05, Bonferroni adjused; similar numbers were found for the downregulated genes).
Comparison of lists of differentially expressed genes created with original and new CDF files.
To verify that this observed improvement of results is consistent in other datasets and platforms, we repeated this evaluation procedure for a dataset of 34 smoker vs. 23 non-smoker samples from intra-pulmonary airway epithelial cells hybridized to HG-U133A arrays and a dataset of 4 male vs. 4 female BWF1 lupus-prone mice spleen samples hybridized to MG-U74Av2 arrays. We saw the same clear improvement, with high statistical significance (Table 3). As outlined above, we expect that random changes in the probe set definition would lead to equal numbers of genes being affected in either direction. We calculated the difference of the observed amount of genes having a higher rank with the new CDF and the expected amount (n/2), for different values of n. We used the maximum excess as an estimate of the number of probe sets that are significantly improved by refining the CDF files. Depending on the array, these numbers range between 321 and 658. Although these numbers are small compared to the total number of genes present on the array, they comprise a large fraction of the genes that are typically found to be differentially expressed in a microarray experiment.
In different studies [7, 8, 15, 16] Affymetrix probe sequences were verified against mRNA databases. In all of these studies, only one mRNA database was considered. Gautier et al.  and Zhang et al.  verified human Affymetrix arrays against mRNA sequences from Entrez and RefSeq. Elo et al.  investigated the reproducibility of the probe signals for different generations of Affymetrix arrays. They compared the correlations of probe signals for original Affymetrix probe sets and verified probe sets, which they defined as the subset of probes of the original probe sets that only match with the target transcript for which the probes were originally designed by Affymetrix. They found that probe verification improved the correlations between generations of Affymetrix arrays and also that probe verification improved the consistency of the measurements within an array. Mecham et al.  showed that probe verification results in increased precision in technical replicates; increased accuracy across complementary microarray platforms, increased accuracy translating data from oligonucleotide arrays to cDNA microarrays, and increased diagnostic power of microarray technology.
A problem with the RefSeq and the UniGene Unique databases is that 3' UTRs are often truncated by the way the sequences are assembled [17, 18], while Affymetrix selects the probes from the 600 bases most proximal to the 3' end of each transcript . We overcame this problem by incorporating the genome in the verification protocol, where all 3' UTRs are available.
The Fantom 3 project (Functional annotation of the mouse, ) provides an extensive characterization of the mouse transcriptome. We also tested the verification protocol with the Fantom 3 transcripts included. Since this did not increase our verification scores (data not shown), we did not include this database in our protocol.
The mRNA and genome databases currently available are mainly based on the C57BL/6 mouse strain. Also, the probes on the Affymetrix arrays are mainly based on the C57BL/6 mouse strain. When samples from C57BL/6 mice are hybridized to the arrays, their transcripts are expected to perfectly match the probes. However, mice from genetically different strains or from recombinant inbred pedigrees, as in our genetical genomics applications [4, 6], may carry allelic SNPs compared to the C57BL/6 genome. Probes carrying allelic SNPs may hamper data interpretation as putative differential mRNA expression can be confounded with differential hybridization . When sequences of other mouse strains become available, the verification protocol here developed should be repeated for these newly sequenced strains to identify and, if so desired, eliminate probes carrying allelic SNPs.
The use of refined probe set definitions, that exclude unverified probes, will improve the interpretation of expression data, as non-hybridizing and mis-hybridizing probes add only noise to the data. Our evaluation of expression data from the public domain shows that this is indeed the case.
By combining various verifications as described above, we show that 74% of the U74 probe sets and 85% of the 430 probe sets can be considered 'entirely verified' when based on perfect matches. When two mismatches are allowed, the percentages increase to 79% for U74 and 93% for 430. When considering individual probes, 85% and 95% of the probes were verified for U74 and 430 respectively, and even 89% and 97% when allowing two mismatches. Our extensive analyses of probe sequence data show that the inclusion of various databases, such as the genome sequence, indicate that the arrays are much more accurate than shown previously. Existing data can be reanalyzed with our verified probe sets (using the online CDF files). We show that such a refined probe set definition has clear effects on the detection of differentially expressed genes and demonstrate for various experiments that the results are systematically improved by discarding unverified probes.
The U74Av2 array is based on the mouse UniGene database, release 74. It contains 196.670 oligomers of length 25, divided into 12.422 probe sets, most of which contain 16 oligomers. Probe sets of the newer 430 2.0 array were selected from sequences derived from dbEST (NCBI, June 2002), GenBank (NCBI, Release 129, April 2002), and RefSeq (NCBI, June 2002) . It contains 495.374 oligomers of length 25, divided into 45.037 probe sets, generally consisting of 11 oligomers.
RefSeq is a curated non-redundant collection of naturally occurring DNA, RNA and protein sequences. It is based on the sequences and annotations supplied to GenBank by the original researchers . For mouse we used 55,810 messenger sequences from RefSeq.
UniGene is a processed and curated collection of millions of ESTs (Expressed Sequence Tags), which are relatively inaccurate (around 2% error). To assign ESTs to genes, the ESTs are clustered and the cluster consensus sequences stored in UniGene Unique . The mouse UniGene Unique release contains 43,104 sequences.
NIA Mouse Gene Index (developed by the National Institute on Aging) is currently the most comprehensive collection of alternative transcription/splicing sequences. Patterns of alternative transcription/splicing are obtained by aligning a complete and nonredundant transcriptome assembly from expressed sequences (obtained from RefSeq, GenBank, dbEST, Ensembl and NIA) to the mouse genome . The NIA Mouse Gene Index contains 186,405 sequences.
The UCSC mouse genome (maintained by University of California Santa Cruz) reports about 90% of the genome in finished form (error rate of less than 1 in 10,000 bases). We used build mm7 (corresponding to NCBI build 35.1; August, 2005).
For the mouse protocol we used two NCBI  databases: RefSeq mRNAs (NCBI, Feb. 3, 2006) and UniGene Unique (NCBI, build 151, Oct. 20, 2005). In addition, we used all mouse mRNA sequences from the National Institute on Aging (NIA Mouse Gene Index 5, June 2005, ) and the UCSC mouse genome (mm7, Aug. 2005, ). For the human protocol we used RefSeq mRNAs (NCBI, Feb. 16, 2006), UniGene Unique (NCBI, build 188, Dec. 30, 2005) and UCSC human genome (hg17, May 2004). For the rat protocol we used RefSeq mRNAs (NCBI, Feb. 16, 2006), UniGene Unique (NCBI, build 149, Jan. 25, 2006) and UCSC rat genome (rn3, June 2003).
Assessment of the quality of the sequence databases
To assess the quality of the sequence databases, we took the UCSC genome sequence as a reference, and compared the sequences of 1000 randomly selected genes, all occurring in each of the three messenger databases, to the genome sequence. Since the genome contains introns and the messenger databases do not, we extracted the exon sequences from the genome by using the exon coordinates of RefSeq genes and attached them to each other. Then for each of the 1000 genes we compared the three messenger sequences to the reconstructed genome sequence and counted the amounts of mismatches and gaps (Table 1).
Sequence alignment algorithms
Individual probes were analyzed against the messenger databases with Megablast (version 2.2.6 with a word size of 12, ) for 'short nearly exact matches'. Hits in databases were distinguished on the basis of none, one or two mismatches with the probe sequence.
Since analysis of all single probe sequences against the mouse genome gives too many non-exon hits (data not shown), we used the probe selection region (PSR) of each probe set as input for BLAT (, standalone BLAT version 32 × 1, standard settings). PSR is defined as the unique part of the messenger sequence from which Affymetrix selected the probes . We masked all nucleotides not represented in probe sequences. Within the obtained BLAT hits of the masked PSRs, we re-identified the position of each probe to count the number of mismatches per probe.
All analyses were performed on a Linux cluster consisting of 200 nodes with dual Opteron processors 2 GHz and 1 GB memory. The average computation time per array was 4 hours on one node.
Datasets and methods for determining the impact of updated probe set definitions on expression data
The Clinical Prostate Cancer Behavior dataset was downloaded from . The smoker vs. non-smoker dataset was downloaded from the Gene Expression Omnibus (GEO) and has accession number GSE994. The male vs. female BWF1 lupus-prone mice dataset was also downloaded from GEO (accession number GSE2336). In all cases we used RMA  to generate probe set-level data. Using RankProducts  we calculated ranked lists of differentially expressed genes using Affymetrix' original CDF file and our refined CDF file, while separating up- and downregulated genes.
RA was supported by Biomolecular Informatics grant 050-50-203 from the Netherlands Organisation for Scientific Research (NWO).
LVB was supported by the Dutch Cancer Society and by the US National Institutes of Health.
- Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996, 14: 1675–1680. 10.1038/nbt1296-1675View ArticlePubMedGoogle Scholar
- Zhang L, Miles MF, Aldape KD: A model of molecular interactions on short oligonucleotide microarrays. Nat Biotechnol 2003, 21: 818–821. 10.1038/nbt836View ArticlePubMedGoogle Scholar
- Wu Z, Irizarry RA: Preprocessing of oligonucleotide array data. Nat Biotechnol 2004, 22: 656–658. 10.1038/nbt0604-656bView ArticlePubMedGoogle Scholar
- Alberts R, Terpstra P, Bystrykh LV, de Haan G, Jansen RC: A statistical multiprobe model for analyzing cis and trans genes in genetical genomics experiments with short-oligonucleotide arrays. Genetics 2005, 171: 1437–1439. 10.1534/genetics.105.045930PubMed CentralView ArticlePubMedGoogle Scholar
- Jansen RC, Nap JP: Genetical genomics: the added value from segregation. Trends Genet 2001, 17: 388–391. 10.1016/S0168-9525(01)02310-1View ArticlePubMedGoogle Scholar
- Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, Su AI, Vellenga E, Wang J, Manly KF, Lu L, Chesler EJ, Alberts R, Jansen RC, Williams RW, Cooke MP, de Haan G: Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'. Nat Genet 2005, 37: 225–232. 10.1038/ng1497View ArticlePubMedGoogle Scholar
- Mecham BH, Wetmore DZ, Szallasi Z, Sadovsky Y, Kohane I, Mariani TJ: Increased measurement accuracy for sequence-verified microarray probes. Physiol Genomics 2004, 18: 308–315. 10.1152/physiolgenomics.00066.2004View ArticlePubMedGoogle Scholar
- Elo LL, Lahti L, Skottman H, Kylaniemi M, Lahesmaa R, Aittokallio T: Integrating probe-level expression changes across generations of Affymetrix arrays. Nucleic Acids Res 2005, 33: e193. 10.1093/nar/gni193PubMed CentralView ArticlePubMedGoogle Scholar
- NetAffx Analysis Center: Affymetrix.2006. [http://www.affymetrix.com/analysis/index.affx]Google Scholar
- Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun S, Kulp D, Siani-Rose MA: NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res 2003, 31: 82–86. 10.1093/nar/gkg121PubMed CentralView ArticlePubMedGoogle Scholar
- UCSC Genome Browser2006. [http://genome.ucsc.edu]
- Genome Glossary2006. [http://www.ncbi.nlm.nih.gov/genome/guide/glossary.htm]
- Naef F, Magnasco MO: Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. Phys Rev E Stat Nonlin Soft Matter Phys 2003, 68: 011906.View ArticlePubMedGoogle Scholar
- GBiC supplementary data2006. [http://gbic.biol.rug.nl/supplementary/2006/probeverification]
- Gautier L, Moller M, Friis-Hansen L, Knudsen S: Alternative mapping of probes to genes for Affymetrix chips. BMC Bioinformatics 2004, 5: 111. 10.1186/1471-2105-5-111PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang J, Finney RP, Clifford RJ, Derr LK, Buetow KH: Detecting false expression signals in high-density oligonucleotidearrays by an in silico approach. Genomics 2005, 85: 297–308. 10.1016/j.ygeno.2004.11.004View ArticlePubMedGoogle Scholar
- Pruitt K, Tatusova T, Ostell J: The Reference Sequence (RefSeq) Project. In The NCBI Handbook. Edited by: McEntyre J and Ostell J. Bethesda (MD), National Library of Medicine; 2002:18–1-18–20.Google Scholar
- Pontius JU, Wagner L, Schuler GD: UniGene: A Unified View of the Transcriptome. In The NCBI Handbook. Edited by: McEntyre J and Ostell J. Bethesda (MD), National Library of Medicine; 2002:21–1-21–12.Google Scholar
- Affymetrix Technical Note2006. [http://www.affymetrix.com/support/technical/technotes/mouse430_technote.pdf]
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, mbesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, Chalk AM, Chiu KP, Choudhary V, Christoffels A, Clutterbuck DR, Crowe ML, Dalla E, Dalrymple BP, de BB, Della GG, di BD, Down T, Engstrom P, Fagiolini M, Faulkner G, Fletcher CF, Fukushima T, Furuno M, Futaki S, Gariboldi M, Georgii-Hemming P, Gingeras TR, Gojobori T, Green RE, Gustincich S, Harbers M, Hayashi Y, Hensch TK, Hirokawa N, Hill D, Huminiecki L, Iacono M, Ikeo K, Iwama A, Ishikawa T, Jakt M, Kanapin A, Katoh M, Kawasawa Y, Kelso J, Kitamura H, Kitano H, Kollias G, Krishnan SP, Kruger A, Kummerfeld SK, Kurochkin IV, Lareau LF, Lazarevic D, Lipovich L, Liu J, Liuni S, McWilliam S, Madan BM, Madera M, Marchionni L, Matsuda H, Matsuzawa S, Miki H, Mignone F, Miyake S, Morris K, Mottagui-Tabar S, Mulder N, Nakano N, Nakauchi H, Ng P, Nilsson R, Nishiguchi S, Nishikawa S, Nori F, Ohara O, Okazaki Y, Orlando V, Pang KC, Pavan WJ, Pavesi G, Pesole G, Petrovsky N, Piazza S, Reed J, Reid JF, Ring BZ, Ringwald M, Rost B, Ruan Y, Salzberg SL, Sandelin A, Schneider C, Schonbach C, Sekiguchi K, Semple CA, Seno S, Sessa L, Sheng Y, Shibata Y, Shimada H, Shimada K, Silva D, Sinclair B, Sperling S, Stupka E, Sugiura K, Sultana R, Takenaka Y, Taki K, Tammoja K, Tan SL, Tang S, Taylor MS, Tegner J, Teichmann SA, Ueda HR, van NE, Verardo R, Wei CL, Yagi K, Yamanishi H, Zabarovsky E, Zhu S, Zimmer A, Hide W, Bult C, Grimmond SM, Teasdale RD, Liu ET, Brusic V, Quackenbush J, Wahlestedt C, Mattick JS, Hume DA, Kai C, Sasaki D, Tomaru Y, Fukuda S, Kanamori-Katayama M, Suzuki M, Aoki J, Arakawa T, Iida J, Imamura K, Itoh M, Kato T, Kawaji H, Kawagashira N, Kawashima T, Kojima M, Kondo S, Konno H, Nakano K, Ninomiya N, Nishio T, Okada M, Plessy C, Shibata K, Shiraki T, Suzuki S, Tagami M, Waki K, Watahiki A, Okamura-Oho Y, Suzuki H, Kawai J, Hayashizaki Y: The transcriptional landscape of the mammalian genome. Science 2005, 309: 1559–1563. 10.1126/science.1112014View ArticlePubMedGoogle Scholar
- Affymetrix Data Sheet[http://www.affymetrix.com/support/technical/datasheets/mogarrays_datasheet.pdf]
- Sharov AA, Dudekula DB, Ko MS: Genome-wide assembly and analysis of alternative transcripts in mouse. Genome Res 2005, 15: 748–754. 10.1101/gr.3269805PubMed CentralView ArticlePubMedGoogle Scholar
- NCBI HomePage2006. [http://www.ncbi.nlm.nih.gov]
- Zhang Z, Schwartz S, Wagner L, Miller W: A greedy algorithm for aligning DNA sequences. J Comput Biol 2000, 7: 203–214. 10.1089/10665270050081478View ArticlePubMedGoogle Scholar
- Cancer Program Data Sets2006. [http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi]
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003, 31: e15. 10.1093/nar/gng015PubMed CentralView ArticlePubMedGoogle Scholar
- Breitling R, Armengaud P, Amtmann A, Herzyk P: Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004, 573: 83–92. 10.1016/j.febslet.2004.07.055View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.