Combining gene expression data from different generations of oligonucleotide arrays
© Hwang et al; licensee BioMed Central Ltd. 2004
Received: 06 July 2004
Accepted: 25 October 2004
Published: 25 October 2004
One of the important challenges in microarray analysis is to take full advantage of previously accumulated data, both from one's own laboratory and from public repositories. Through a comparative analysis on a variety of datasets, a more comprehensive view of the underlying mechanism or structure can be obtained. However, as we discover in this work, continual changes in genomic sequence annotations and probe design criteria make it difficult to compare gene expression data even from different generations of the same microarray platform.
We first describe the extent of discordance between the results derived from two generations of Affymetrix oligonucleotide arrays, as revealed in cluster analysis and in identification of differentially expressed genes. We then propose a method for increasing comparability. The dataset we use consists of a set of 14 human muscle biopsy samples from patients with inflammatory myopathies that were hybridized on both HG-U95Av2 and HG-U133A human arrays. We find that the use of the probe set matching table for comparative analysis provided by Affymetrix produces better results than matching by UniGene or LocusLink identifiers but still remains inadequate. Rescaling of expression values for each gene across samples and data filtering by expression values enhance comparability but only for few specific analyses. As a generic method for improving comparability, we select a subset of probes with overlapping sequence segments in the two array types and recalculate expression values based only on the selected probes. We show that this filtering of probes significantly improves the comparability while retaining a sufficient number of probe sets for further analysis.
Compatibility between high-density oligonucleotide arrays is significantly affected by probe-level sequence information. With a careful filtering of the probes based on their sequence overlaps, data from different generations of microarrays can be combined more effectively.
By providing a genome-wide view of gene expression, microarrays have become a common exploratory tool in many areas of biological and clinical studies [1–3]. While there are several different microarray platforms, photolithographically synthesized oligonucleotide arrays from Affymetrix have become one of the principal technologies. These arrays feature multiple 25-mer probes (a "probe set") for each gene, with their measurements summarized into a single number for the estimated expression level of that gene. Because of the important role played by this technology, many methodological studies have focused on improving the extraction of information from these arrays, from image analysis and the proper role of perfect and mismatch probes to distributional properties of the measurements and optimal statistical tests for differential expression [4, 5].
Large-scale gene expression data often contain a large amount of noise from various experimental factors. Fortunately, in most cases, the technical variability is relatively small compared to the biological one and its effect can be minimized by using a sufficient number of replicates [6–8]. However, the high cost of microarray experiments often prevents gathering of enough samples for a reliable analysis in a single laboratory. In such cases, employing existing microarray datasets from other studies can be an efficient way of improving the reliability of a study. Moreover, as the number of publicly available datasets grows rapidly on public data depositories (e.g., Gene Expression Omnibus ; Stanford Microarray Database ; ArrayExpress at EBI ), it is clear that these datasets should be combined to generate a more comprehensive understanding of underlying biology.
Several issues have made this process difficult so far. First, different datasets have been processed using different procedures due to a lack of uniform standards, e.g., for background correction, normalization, and calculation of expression values. This makes it difficult to compare them directly. Raw data files are generally unavailable and, even if they are, reprocessing them requires substantial effort. Second, we have lacked datasets with enough controls and replicates, performed under a proper experimental design and with adequate annotations, in order to make proper comparisons. Third, possibly the most troublesome, the experiments have been performed on many different platforms, with significant differences among them. Even within a single platform, technological and algorithmic advances as well as the evolving annotations of the genomes have resulted in succeeding generations of arrays with substantial modification from one generation to the next. Until now, several studies have found varying degrees of disagreement between platforms, sometimes with large discrepancies that call into the question the reliability of certain conclusions reached in microarray studies [12–19]. A comparison of two Affymetrix arrays, HuGeneFL and HG-U95A, was made previously, but only with the conclusion that the reproducibility is high when the two probe sets share many exact probes and that it is low when they do not .
Comparison of the methods for probe set matching
Comparison of the methods for probe set matching. In the case of Best Match, the relation of probe sets between U95Av2 and U133A is many-to-one. The Pearson correlation coefficients of array pairs from the same biopsies were calculated and averaged for the assessment of comparability. The main reason for the high comparability of Best Match is the selection of the most appropriate probe set from the multiple matches using sequence information.
No. of matched probe sets (U95Av2)
No. of matched probe sets (U133A)
No. of unique IDs shared between U95Av2 and U133A
Mean correlation coefficient of array pairs
0.832 ± 0.017
0.831 ± 0.017
0.870 ± 0.016
As a simple way to assess comparability, the Pearson correlation coefficient between each array pair from the same sample was calculated and the 14 correlation coefficients were averaged. The results are summarized in Table 1. UniGene and LocusLink matching give practically identical results. Best Match, on the other hand, shows somewhat higher reproducibility than other matching methods (.870 vs .831–.832). The main reason for the higher reproducibility in Best Match is most likely that more comparable probes are chosen among multiple matches by considering the sequence information. The overall reproducibility, however, is surprisingly low. It has been observed in many replicate studies that expression values from Affymetrix arrays show high reproducibility, typically in the range of >0.98 [20, 22, 23]. The low correlation coefficient is already an indication that the cross-generation comparison may not be simple. We use the Best Match in the following sections; UniGene or LocusLink matching performs similarly or slightly worse than Best Match.
Exactly matched probes between array generations are highly reproducible
There was a possibility that the lack of high correlation between the two versions was caused by a true inconsistency present in the data, perhaps due to RNA degradation between the times when the hybridizations on the two platforms were performed. To make sure that this was not the case, we investigated the quality of our data by examining the subset of probes which have the exactly same sequences between the array generations.
When we examined about 5% of probes that have the same sequence between U95Av2 and U133A, the mean correlation coefficient of array pairs, calculated by PM intensity, was 0.967 ± 0.007. (A calculation using PM-MM values also gives a very similar result.) This is similar to the conclusion in  that the probe sets with exactly the same set of probes have a very high correlation. The high correlation in our dataset confirms that the samples and other experimental factors were nearly identical between the two hybridizations and that any discordant result in comparative analysis is therefore most likely due to the differences in the probe design of the two arrays. When we compare the expression values between Best Match and the exactly matched probes, we can easily see the lack of reproducibility for the Best Match case (See Figure 2 in Additional File 1). It is clear that the probe-level sequence information has a large impact on the relationship between the abundance of transcript and the reported intensity  and that the use of probe sequences would be necessary in order to choose a subset of relatively consistent probes between U95Av2 and U133A for enhanced reproducibility.
Standard probe set matching produces discordant results in analyses
To determine the extent to which the analysis results from the two versions of the arrays agree, we employ the two most frequent tools for exploratory analysis: cluster analysis and identification of differentially expressed genes. For evaluating the compatibility in terms of cluster analysis, we combined the datasets from U95Av2 and U133A by Best Match. Then, the 28 samples were clustered by agglomerative hierarchical clustering method with the Pearson correlation coefficient as the distance measure. Figure 2(a) shows the dendrogram of 28 samples. Unexpectedly, instead of each array pair from the same biopsy specimen clustering together, the two array types form the two main clusters. In other words, the most distinguishing feature of the data is the array version, rather than the actual characteristics of the samples. To examine the reason for this incongruent result, correlation coefficients of all the possible sample pairings of the combined dataset were calculated. Figure 2(b) shows the correlation coefficients as a color map. The two red parts of the map (upper left and lower right) represent the high correlation coefficients among samples from the same array version. Compared to these, the correlation coefficients across U95Av2 and U133A are relatively low (lower left and upper right parts of the map).
Gene scaling and data filtering can enhance comparability in specific situations
To understand the reason for the discordance observed in Figure 2(a), we have examined a large number of probes. The underlying problem, we have discovered, is due to a large number of probe sets that exhibit similar relative expression patterns but at different absolute levels. As an illustration, we plot the expression pattern of one such probe set pair, 35828_at of U95Av2 and 208978_at of U133A, in Figure 3(a). Clearly, although the expression patterns of these genes are similar in terms of a correlation coefficient, their scales are very different. This behavior is not simple to explain, but we believe it may be related to a large amount of cross-hybridization by a subset of badly designed probes in a probe set, especially for U95Av2. That would have the effect of amplifying the overall expression values.
A simple solution to this problem is to scale expression values for each gene across samples, for instance, making the mean to be 0 and the standard deviation to be 1. The effect of this gene scaling on the gene pair from Figure 3(a) is illustrated in Figure 3(b). The similarity in the expression pattern is more clearly visible and the measurements for this gene are now more comparable. While the Pearson correlations for the genes are not impacted by this linear scaling for genes, the correlations do change for the arrays. Figures 2(c) and 2(d) show the effect of gene scaling on the clustering result and the correlation coefficient of sample pairs, respectively. In Figure 2(c), the arrays from each platform corresponding to the same sample are now clustered together in every case. In Figure 2(d), the high correlation among the arrays of same type (shown by red colors in Figure 2(b)) is diminished and the correlation between specimen samples across array types is highlighted (shown by dark red diagonal lines in upper right and lower left areas). For comparing datasets in a cluster analysis, gene scaling appears to work very well.
Probe filtering by overlapping length highly improves reproducibility with enough probe sets for comparison
We now describe a more general method for improving comparability by filtering at the probe level, instead of at the probe set level. We have already observed that the probes with exactly the same sequences on the two generations give highly reproducible values (Additional File 1, Figure 2) but that the probe sets do not. This implies that specific probe sequences within the same target region can produce strikingly different results, and suggests that comparability would improve if we select only those probes that have sequence similarities on the two arrays. To carry this out, we mapped the location of all probes using BLAT, as described in Methods. When we select a subset of probes, we mask the rest in the raw data (cel files) and then recompute the expression values using the same algorithm used in MAS 5.0.
To emphasize the improvement, we again show in Figure 7(a) the increase in the mean correlation coefficient of array pairs, without any criterion on the fraction of used probes per probe set. As a baseline, the mean correlation coefficient of array pairs using Best Match is also represented (dashed line). Enhancement in the mean correlation coefficient of array pairs is roughly proportional to the minimum overlapping length. It appears that the mean correlation coefficient can be worse than in the case of Best Match when the minimum overlapping length is less than 10 bp. It is possibly because such a small overlap constitutes enough dissimilarity as to confer no functional relationship between the probes and instead other good probes that do not have overlaps are thrown away. Based on Figures 5 and 7(a), we suggest that the minimum overlapping length of more than 18 bp is necessary for obtaining significantly improved results in terms of correlation coefficient of array pairs (>0.9).
Deviation from the original expression profile after probe filtering can be controlled by criterion on the overlapping length
Comparative analysis of different microarray types has a potential to generate more comprehensive and reliable results by fully exploiting available data. Understanding and resolving both the inter-platform and inter-generation data remain an important and challenging practical issue. So far, attempts at such comparisons have been few, and many were limited to simple observations of low correlations in expression values. In this work, we provided a more quantitative and comprehensive description of the issues and inconsistencies through the analysis of a unique dataset consisting of HG-U95Av2 and HG-U133A hybridizations for each of the sample biopsies, and then we described a general method for resolving some of the problems.
We first observed in cluster analysis that with a standard matching of genes, the dominant feature of the dataset is not the sample characteristics but the array type. But we found that for clustering, this problem can be mitigated by rescaling each gene. We note, however, that this method is effective under certain assumptions, e.g., that there are enough samples for each array type and that each dataset does not contain unrelated experiments. If two groups of patients under study are measured on two different arrays, for example, a gene scaling will simply make the samples more homogeneous and reduce the differences between the groups. We also examined the inconsistencies in the list of differentially expressed genes obtained in the two cases. The overlap was very low, indicating that such a list may be platform-dependent and must be interpreted with caution. Some data filtering steps, either by selecting a subset of genes that are empirically shown to be well-correlated between platforms or by focusing only on highly-expressed genes, can be helpful at times, but they do not resolve the underlying problem.
Our approach based on the probe-level sequence information resulted in a significant improvement in the reproducibility in terms of correlation coefficients and selection of differentially expressed genes. As the probes aligned to multiple regions in the genome are eliminated and the probes that share larger segments are selected, the expression values become more consistent. This result is promising because it does not use data-dependent information such as the empirical correlation for each gene between different versions of arrays, which can only be obtained through special datasets such as ours. We examined the effect of the minimal sequence overlap length and the minimum number of probes per gene on the reproducibility, and found that, when the parameters are chosen properly, higher correlation can be attained while retaining a large number of probes for further analysis. We also examined the deviation from the original data when new expression values are calculated after probe filtering. In general, we recommend the minimum overlapping length of 18 ~ 20 bp and that at least 10 ~ 20% of probes in a probe set be present in the filtering step for a comparative analysis between U95Av2 and U133A.
Combining data across multiple platforms remains a formidable challenge. As a first step, we have studied the issues associated with combining data from multiple generations of a single platform and proposed one method. From our analysis, it is clear that technological issues can have significant effect and that one should be aware of the potential pitfalls in studies involving more than a single array type. In principle, the approach of selecting probes with sequence overlaps can be applied to other array types as well as to different versions of oligonucleotide arrays. For example, to study expression profiles of conserved regions across species using a different array for each species, more accurate results may be obtained by using only a subset of probes with sequence similarity. In each case, appropriate criteria for the length of overlap and the number of probes needed for a robust estimate of a probe set value need to be investigated for different contexts, but the results we provide in this work can serve as a guide.
Muscle tissue samples of 14 patients with inflammatory myopathies were collected. Among the 14 patients, 5 had dermatomyositis (DM) and 9 had other inflammatory myopathies including necrotizing myopathy, inclusion body myositis, granulomatous myositis, and polymyositis. Because the molecular profile of DM is sufficiently different from those of the rest, we can think of the DMs as one group and the rest as the other group in a two-group comparison . Total RNA was extracted from muscle biopsy tissues and labeled. A portion was hybridized to HG-U95Av2 arrays; the remaining supply was frozen and then later hybridized to HG-U133A arrays at the same facility.
Matching probe sets between U95Av2 and U133A
Although they belong to the same oligonucleotide array platform, the changes from the older version (U95Av2) to the newer one (U133A) were substantial: 1) Main source of probe selection region is different (UniGene Build 95 and 133; for the U133 set, other sequence databases such as dbEST were extensively used for choosing the probe selection region); 2) The number of probe pairs was reduced from 16 to 11 for a single gene; and 3) Probe selection method was improved . The annotation for each probe set in U95Av2 and U133A was obtained from NetAffx Analysis Center (NetAffx annotation files (annotation date: 12/10/2003)) . According to the annotation information, U95Av2 has 12,625 probe sets, which are annotated by 9,091 UniGene and 8,672 LocusLink identifiers. The newer version U133A consists of 22,283 probe sets annotated by 13,624 UniGene and 12,769 LocusLink identifiers. Here, the UniGene identifier was assigned by matching the representative sequence of each probe set to the UniGene database at the time of annotation. The LocusLink identifier was derived from the matched UniGene record (Annotation Methodology, Affymetrix web site).
For considering variations in the probe sets for the same transcript between different array versions, Affymetrix provides the probe set matching tables for comparative analysis. These matching tables were constructed based on the sequence information of probe sets as follows . First, all possible probe set pairs between two array generations were checked by their similarity in the representative sequence for selection. Among the selected probe set pairs, "Good Match" pairs were chosen by the following criteria: 1) Percent identity between the representative sequences >90%; 2) Length of the representative sequence >100 base pairs (bp); 3) At least one perfect match (PM) probe of one array generation should be perfectly aligned to the probe selection region of the other array generation. In addition, "Best Match" is a subset of Good Match selected by more stringent criteria on the similarity of probe set pairs . Best Match is used in the rest of the paper as it performs better than Good Match in all instances. When there is more than one probe set matching on either or both arrays, we take the average of the measurements.
BLAT for the alignment of probes
For improving compatibility between U95Av2 and U133A, those probes whose sequence overlapped with any of the probes for the same gene on the other platform were selected. The extent of overlap necessary is described in the Results section. First, all the perfect match (PM) probes were aligned to the coding regions of the genome. Of commonly used short sequence alignment tools such as SIM4 , SPIDEY , and BLAT , we used BLAT (build version 26, available at http://www.soe.ucsc.edu/~kent/exe/ as a stand-alone program) because it appears to be more accurate and faster than others for matching short sequences with high sequence identity (more than 90%). BLAT has been used previously for annotating the probe sets of HG-U95Av2 in GeneAnnot system from Weizmann Institute of Science . The alignment was done on the human chromosome sequence Build 34 (July 2003 freeze), available at UCSC Genome Bioinformatics (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg16/chromosomes/). We ran BLAT with its default options (-tileSize = 11 -minMatch = 2 -minScore = 30, -minIdentity = 90 -maxGap = 2), without the overused tile file to avoid missing any matches. From the BLAT search result, only the 25-mer perfect alignments were considered for further analysis. All probes aligned to more than two regions in genomic DNA were discarded because of the possibility of cross hybridization. In each matched probe set pair, the overlapping lengths between all the possible PM probe pairings (16 × 11) were calculated.
Filtering probes by overlapping length
The length of the overlap between probe sequences (1 bp ~ 25 bp) was used as a criterion for choosing probes for comparative analysis. The expression values were recomputed each time using only the selected probes by masking out the other probes from the raw (.cel) files. The values were calculated by the Statistical Expression Analysis Algorithm using Microarray Suite version 5.0 (MAS 5.0) (Affymetrix, Santa Clara, CA) without linear scaling to target intensity. MAS 5.0 is a robust estimator of expression index based on one-step biweight estimation algorithm, considering both perfect match (PM) and mismatch (MM) probes. This algorithm alleviates the problem of unstable expression values to some extent when a fraction of the probes is eliminated in our analysis.
KBH was supported by the Korea Science and Engineering Foundation (KOSEF) and by the Korea Ministry of Science and Technology under the NRL Project; SWK was supported by 5U01HL066582-04 from NIH; PJP was supported by K25-GM67825 from NIH.
- Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN: A gene expression database for the molecular pharmacology of cancer. Nat Genet 2000, 24(3):236–244. 10.1038/73439View ArticlePubMedGoogle Scholar
- Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Raffeld M, Yakhini Z, Ben-Dor A, Dougherty E, Kononen J, Bubendorf L, Fehrle W, Pittaluga S, Gruvberger S, Loman N, Johannsson O, Olsson H, Wilfond B, Sauter G, Kallioniemi OP, Borg A, Trent J: Gene-Expression Profiles in Hereditary Breast Cancer. N Engl J Med 2001, 344(8):539–548. 10.1056/NEJM200102223440801View ArticlePubMedGoogle Scholar
- van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415(6871):530–536. 10.1038/415530aView ArticleGoogle Scholar
- Parmigiani G, Garrett E, Irizarry R, Zeger S, (Eds): The Analysis of Gene Expression Data. New York, NY: Springer Verlag; 2003.Google Scholar
- Speed TP, (Ed): Statistical Analysis of Gene Expression Microarray Data. Boca Raton, FL: Chapman & Hall/CRC CRC Press; 2003.Google Scholar
- Hartemink AJ, Gifford DK, Jaakkola TS, Young RA: Maximum likelihood estimation of optimal scaling factors for expression array normalizations. In Proceedings of SPIE BiOS 2001 2001.Google Scholar
- Rocke DM, Durbin B: A Model for Measurement Error for Gene Expression Arrays. J Comput Biol 2001, 8(6):557–569. 10.1089/106652701753307485View ArticlePubMedGoogle Scholar
- Zien A, Fluck J, Zimmer R, Lengauer T: Microarrays: how Many Do You Need? J Comput Biol 2003, 10(3):653–667. 10.1089/10665270360688246View ArticlePubMedGoogle Scholar
- Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30: 207–210. 10.1093/nar/30.1.207PubMed CentralView ArticlePubMedGoogle Scholar
- Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G: The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 2003, 31: 94–96. 10.1093/nar/gkg078PubMed CentralView ArticlePubMedGoogle Scholar
- Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, Oezcimen A, Rocca-Serra P, Sansone SA: ArrayExpress – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 2003, 31: 68–71. 10.1093/nar/gkg091PubMed CentralView ArticlePubMedGoogle Scholar
- Kuo WP, Jenssen TK, Butte AJ, Ohno-Machado L, Kohane IS: Analysis of Matched mRNA Measurements from Two Different Microarray Technologies. Bioinformatics 2002, 18(3):405–412. 10.1093/bioinformatics/18.3.405View ArticlePubMedGoogle Scholar
- Li J, Pankratz M, Johnson JA: Differential gene expression patterns revealed by oligonucleotide versus long cDNA arrays. Toxicol Sci 2002, 69(2):383–390. 10.1093/toxsci/69.2.383View ArticlePubMedGoogle Scholar
- Kothapalli R, Yoder SJ, Mane S, TPL Jr: Microarray Results: how Accurate are They? BMC Bioinformatics 2002, 3: 22. 10.1186/1471-2105-3-22PubMed CentralView ArticlePubMedGoogle Scholar
- Huminiecki L, Lloyd AT, Wolfe KH: Congruence of Tissue Expression Profiles from Gene Expression Atlas, SAGEmap and Tissuelnfo databases. BMC Genomics 2003, 4: 31. 10.1186/1471-2164-4-31PubMed CentralView ArticlePubMedGoogle Scholar
- Barczak A, Rodriguez MW, Hasnpers K, Koth LL, Tai YC, Bolstad BM, Speed TP, Erle DJ: Spotted Long Oligonucleotide Arrays for Human Gene Expression Analysis. Genome Res 2003, 13(7):1775–1785. 10.1101/gr.1048803PubMed CentralView ArticlePubMedGoogle Scholar
- Lee JK, Bussey KJ, Gwadry FG, Reinhold W, Riddick G, Pelletier SL, Nishizuka S, Szakacs G, Annereau JP, Shankavaram U, Lababidi S, Smith LH, Gottesman MM, Weinstein JN: Comparing cDNA and Oligonucleotide Array Data: concordance of Gene Expression Across Platforms for the NCI-60 Cancer Cells. Genome Biology 2003, 4: R82. 10.1186/gb-2003-4-12-r82PubMed CentralView ArticlePubMedGoogle Scholar
- Tan PK, Downey TJ, ELS Jr, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM, Cam MC: Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 2003, 31(19):5676–5684. 10.1093/nar/gkg763PubMed CentralView ArticlePubMedGoogle Scholar
- Mah N, Thelin A, Lu T, Nikolaus S, Kühbacher T, Gurbuz Y, Eickhoff H, Klöppel G, Lehrach H, Mellgård B, Costello CM, Schreiber S: A comparison of oligonucleotide and cDNA-based microarray systems. Physiol Genomics 2004, 16(3):361–370. 10.1152/physiolgenomics.00080.2003View ArticlePubMedGoogle Scholar
- Nimgaonkar A, Sanoudou D, Butte AJ, Haslett JN, Kunkel LM, Beggs AH, Kohane IS: Reproducibility of Gene Expression across Generations of Affymetrix Microarrays. BMC Bioinformatics 2003, 4: 27. 10.1186/1471-2105-4-27PubMed CentralView ArticlePubMedGoogle Scholar
- Affymetrix: User's guide to product comparison spreadsheets.2003. [http://www.affymetrix.com/support/technical/manual/comparison_spreadsheets_manual.pdf]Google Scholar
- Baugh L, Hill A, Brown E, Hunter C: Quantitative analysis of mRNA amplification by in vitro transcription. Nucleic Acids Res 2001, 29(5):e29. 10.1093/nar/29.5.e29PubMed CentralView ArticlePubMedGoogle Scholar
- Costigan M, Befort K, Karchewski L, Griffin RS, D'Urso D, Allchorne A, Sitarski J, Mannion JW, Pratt RE, Woolf CJ: Replicate high-density rat genome oligonucleotide microarrays reveal hundreds of regulated genes in the dorsal root ganglion after peripheral nerve injury. BMC Neuroscience 2002, 3: 16. 10.1186/1471-2202-3-16PubMed CentralView ArticlePubMedGoogle Scholar
- Hennig L, Menges M, Murray JAH, Gruissem W: Arabidopsis transcript profiling on Affymetrix GeneChip arrays. Plant Mol Biol 2003, 53(4):457–465. 10.1023/B:PLAN.0000019069.23317.97View ArticlePubMedGoogle Scholar
- Mei R, Hubbell E, Bekiranov S, Mittmann M, Christians FC, Shen MM, Lu G, Fang J, Liu WM, Ryder T, Kaplan P, Kulp D, Webster TA: Probe selection for high-density oligonucleotide arrays. Proc Natl Acad Sci U S A 2003, 100(20):11237–11242. 10.1073/pnas.1534744100PubMed CentralView ArticlePubMedGoogle Scholar
- Greenberg SA, Sanoudou D, Haslett JN, Kohane IS, Kunkel LM, Beggs AH, Amato AA: Molecular profiles of inflammatory myopathies. Neurology 2002, 59: 1170–1182.View ArticlePubMedGoogle Scholar
- Liu G, Loraine AE, Shigeta R, Cline M, Cheng J, Valmeekam V, Sun S, Kulp D, Siani-Rose MA: NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res 2003, 31: 82–86. 10.1093/nar/gkg121PubMed CentralView ArticlePubMedGoogle Scholar
- Florea L, Hartzell G, Zhang Z, Rubin GM, Miller W: A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence. Genome Res 1998, 8(9):967–974.PubMed CentralPubMedGoogle Scholar
- Wheelan SJ, Church DM, Ostell JM: Spidey: a Tool for mRNA-to-Genomic Alignments. Genome Res 2001, 11(11):1952–1957.PubMed CentralPubMedGoogle Scholar
- Kent WJ: BLAT-The BLAST-Like Alignment Tool. Genome Res 2002, 12(4):656–664. 10.1101/gr.229202. Article published online before March 2002PubMed CentralView ArticlePubMedGoogle Scholar
- Chalifa-Caspi V, Shmueli O, Benjamin-Rodrig H, Rosen N, Shmoish M, Yanai I, Ophir R, Kats P, Safran M, Lancet D: GeneAnnot: interfacing GeneCards with high-throughput gene expression compendia. Briefings in Bioinformatics 2003, 4(4):349–360.View ArticlePubMedGoogle Scholar
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ: The UCSC Genome Browser Database. Nucleic Acids Res 2003, 31: 51–54. 10.1093/nar/gkg129PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.