Microarray results: how accurate are they?
© Kothapalli et al; licensee BioMed Central Ltd. 2002
Received: 6 May 2002
Accepted: 23 August 2002
Published: 23 August 2002
DNA microarray technology is a powerful technique that was recently developed in order to analyze thousands of genes in a short time. Presently, microarrays, or chips, of the cDNA type and oligonucleotide type are available from several sources. The number of publications in this area is increasing exponentially.
In this study, microarray data obtained from two different commercially available systems were critically evaluated. Our analysis revealed several inconsistencies in the data obtained from the two different microarrays. Problems encountered included inconsistent sequence fidelity of the spotted microarrays, variability of differential expression, low specificity of cDNA microarray probes, discrepancy in fold-change calculation and lack of probe specificity for different isoforms of a gene.
In view of these pitfalls, data from microarray analysis need to be interpreted cautiously.
Traditionally, techniques for the study of gene expression were significantly limited in both breadth and efficiency since these studies typically allowed investigators to study only one or a few genes at a time. However, the recently developed DNA microarray technique is a powerful method that provides researchers with the opportunity to analyze the expression patterns of tens of thousands of genes in a short time . Presently, several vendors offer these microarray systems, also known as chips, with a variety of technologies available. Currently, DNA microarrays are manufactured using either cDNA or oligonucleotides as gene probes. cDNA microarrays are created by spotting amplified cDNA fragments in a high density pattern onto a solid substrate such as a glass slide [1, 2]. Oligonucleotide arrays are either spotted or constructed by chemically synthesizing approximately 25-mer oligonucleotide probes directly onto a glass or silicon surface using photolithographic technology .
Due to the powerful nature of microarrays, the number of relevant publications in this burgeoning field is increasing exponentially. During the years 1995–1997, the number of reports featuring microarray data was less than ten. However, in 2001 alone approximately 800 publications featured data generated by microarray studies (according to a PubMed search).
Microarray technology certainly has the potential to greatly enhance our knowledge about gene expression, but there are drawbacks that need to be considered. As Knight  cautioned, it is possible that errors could be incorporated during the manufacture of the chips. Consequently, the fidelity of the DNA fragments immobilized to the microarray surface may be compromised. However, there are few studies where the majority of the gene sequences spotted on the microarrays were verified . Kuo et al (2002) compared the data from two high-throughput DNA microarray technologies, cDNA microarray (Stanford type) and oligonucleotide microarray (from Affymetrix) and found very little correlation between these two platforms . Unfortunately, many investigators are reporting microarray data without confirming their results by other traditional gene expression techniques such as PCR, Northern blot analysis and RNase protection assay. Raw microarray data obtained from questionable nucleotide sequences are then often manipulated using cluster and statistical analysis software and subsequently reported in scientific journals. In addition the quality of the probe sequences and the location of the probes selected for incorporation into the array are also very important. For example, if probes are selected only from the 3' end of a given gene, then there is a strong possibility that different splice variants of that gene will not be identified if the alternative splicing occurs at the 5' region of the gene.
The development of a single chip containing the complete gene set for a given tissue or for a complex organism (30,000 to 60,000 genes) is likely in the near future, so it is paramount that chip manufacturers avoid these problems . In this report, we demonstrate that microarray technology continues to be a dynamic and developing process and highlight potential pitfalls that must be addressed when interpreting data.
Inconsistent sequence fidelity of spotted cDNA microarrays
Verification of genes spotted on cDNA microarray
Size in kb
Balanced Differential Expression
Sequence correct / incorrect
Northern Blots Positive / negative
Variable reliability of differential expression data
The cDNA fragments corresponding to differentially expressed genes spotted on the microarrays were excised from the plasmid DNA and used as probes in Northern blots. Out of the seventeen only eight provided positive results as indicated by microarray (47%). Although all the sequences for the down-regulated genes were correct, Northern blot analysis with these probes did not show any differential expression of the genes. This is in contrast to the microarray data that suggested they were down regulated (Table 1).
Low specificity of cDNA microarray probes
Discrepancy in fold change calculation for a given gene
Lack of probe specificity for gene isoforms
Mismatch probe sets mask the perfect match signals in oligonucleotide array (Affymetrix)
In order to identify the differentially expressed genes in large granular lymphocytic (LGL) leukemia, we performed microarray analysis using the UniGEM-V microarray from IncyteGenomics and the HU6800 oligonucleotide array from Affymetrix. In the course of our analysis, we discovered several problems that we feel could occur in other studies that might lead to false conclusions.
Approximately 80 up-regulated genes and 12 down-regulated genes were identified by cDNA microarray analysis in leukemic LGL cells. Since microarray technology was a new tool at that time, we decided to verify the sequences of all the genes that were differentially expressed. To that end, we purchased approximately 20 clones representing the differentially expressed genes and verified the sequences. We found that only approximately 70% of the genes spotted on the microarray matched the correct sequence of the clones. Other groups reported similar observations. For example, IMAGE mouse cDNA clones (approximately 1200) were purchased from Research Genetics (Huntsville, Alabama) and sequences were verified by Halgren et al . This group found that only 62% were definitely identified as a pure sample of the correct clones. In another study, PCR amplification products (previously sequence-verified cDNA clones) were re-sequenced and only 79% of the clones matched the original database . In a different study, it was estimated that only 80% of the genes in a set of microarray experiments were correctly identified . Therefore, we advise that when preparing cDNA microarrays (commercial or homemade), it is necessary to sequence verify each clone at the final stage before printing the microarray. If mistakes are made at this stage, it is not possible to correct them later by using the most sophisticated analytical tools.
We used cDNA microarray analysis to compare the gene expression profile of leukemic LGL cells obtained from a patient versus the expression profile of PBMC obtained from a normal healthy individual as a control. We decided to verify the microarray results using samples from more patients by employing the use of other methods such as PCR, Northern blot and RNase protection assay. To our surprise, none of the three down-regulated genes studied exhibited differential expression in Northern blots when the cDNA fragments of these genes were used as probes. In the up-regulated genes, only 47 % proved to support the results from the microarray data. The rest either displayed no signal, were not detectable in any sample or failed to reveal any differential expression whatsoever. Although some genes such as PAC-1 and A20 showed differential expression in LGL leukemia patients, no product amplification was obtained using RT-PCR with gene-specific primers.
By microarray analysis, it is very difficult to distinguish between two similar genes. The best example in our case is when granzyme B and granzyme H are compared. These two genes share approximately 80% similarity at the DNA level but have different enzymatic activities [13, 14]. Using either one of the genes as a probe, both cDNA microarray and northern blot analysis indicated over-expression of both genes indiscriminately (Fig. 1). However, using gene-specific probes in an RNase protection assay, we were able to distinctly identify the over-expression of both granzyme B and H in leukemic LGL cells (Fig. 1d and 1e). In normal PBMC only trace amounts of both genes were identified, but after activation by PHA and IL2 only granzyme B was up-regulated. It is very difficult to get this information by microarray analysis alone. Therefore, caution in presenting microarray data without verification and confirmation is advised.
When the results from two different microarray technologies (cDNA and oligonucleotide arrays) were compared, the differential expression in some of the genes appeared to agree in both cases but a large variation in expression profiles between the two microarrays was clearly evident. Previously, such systematic differences in the two technologies were reported . For example, perforin showed a 103-fold change in the Affymetrix array, whereas the cDNA microarray showed only a balanced differential expression of 3.8-fold. Northern blot results indicate that the genes were over-expressed, but the actual value is in between the values from the two microarrays. This problem may be due to an inaccurate fold change calculation due to the inclusion of mismatch values in the formula. We observed that many over-expressed genes were not properly identified at times. This may be the result of the introduction of mismatch values in the Affymetrix system. For example, genes for human autoantigen and human carboxyl ester lipase-like protein would be considered up-regulated in the microarray (according to PM match hybridization) if the MM hybridization values were ignored in the fold change calculation.
DNA microarray anlysis can be a powerful technique to identify differentially expressed genes but differentiating between splice variants can be problematic. For example, although the differential expression of the several genes such as PAC-1 and A20 were confirmed by northern blot analysis, we were unable to see any expression of protein corresponding to these genes by Western blot analysis. We were also unable to amplify those genes using gene-specific primers by RT-PCR. After screening the LGL library, we obtained several full-length genes that were different from both the 5' and 3' ends of PAC1. Similarly, we screened an LGL leukemia library and obtained several 1.5 kb cDNA fragments using the A20 cDNA as a probe. The deduced amino acid sequences of these genes revealed different proteins.
We found an up-regulation of NKG2C with a balanced differential expression of 5.8 in cDNA microarray (Fig. 4a). When Northern Blot analysis was performed using NKG2 C cDNA as a probe, we identified multiple transcripts. Screening the LGL leukemia library resulted in the identification of several other members of the NKG2 family such as NKG2 A, D, E, and F. Therefore, it can be very difficult to distinguish different forms of genes if they are similar in certain sequence regions.
At the time of writing this report there were approximately 1150 articles published describing microarray results (PubMed). There is no doubt that these results will provide an overall idea of gene expression and contribute to understanding the molecular mechanisms involved in various processes. However, as demonstrated by our findings, the development of a standardized microarray system is needed to obtain more meaningful data from these experiments. The introduction of more uniform systems combined with the consideration of the above described pitfalls and alternatives will allow better utilization of this powerful technique in an expanding collection of scientific endeavors. It will be very helpful for the scientific community if the verified data is deposited in a public data base.
Isolation of PBMC and RNA
PBMC were isolated from whole blood using Ficoll-Hypaque density gradient centrifugation. These cells were suspended in Trizol reagent (GIBCO-BRL, Rockville, MD) and total RNA was isolated immediately according to the manufacturer's instructions. Poly A+ RNA was isolated from total RNA by using Oligo-Tex mini mRNA kit (Qiagen, Valencia, CA) according to the manufacturer's recommendations.
Activation of PBMC
Normal PBMC were cultured in vitro and activated by PHA, (Sigma Chemical Co. St. Louis, MO) (1 μg/ml, 2 days) and Interleukin-2 (IL-2) (100 U/ml, 10 days), then total RNA was isolated.
cDNA microarray analysis
Microarray probing and analysis was performed by IncyteGenomics. Briefly, one μg of Poly (A) + RNA isolated from PBMC of an LGL leukemia patient and healthy individual was reverse transcribed to generate Cy3 and Cy5 fluorescent labeled cDNA probes. cDNA probes were competitively hybridized to a human UniGEM-V cDNA microarray containing approximately 7075 immobilized cDNA fragments (4107 for known genes and 2968 for ESTs). Microarrays were scanned in both Cy3 and Cy5 channels with an Axon GenePix scanner (Foster City, CA) with a 10 μm resolution. P1 and P2 signals are the intensity reading obtained by the scanner for Cy3 and Cy5 channels. The balanced differential expression was calculated using the ratio between the P1 signal (intensity reading for probe 1) and the balanced P2 signal (intensity reading for probe 2 adjusted using the balanced coefficient)
Incyte GEMtools software (Incyte Pharmaceuticals, Inc., Palo Alto, CA) was used for image analysis. A gridding and region detection algorithm determined the elements. The area surrounding each element image was used to calculate a local background and was subtracted from the total element signal. Background subtracted element signals were used to calculate Cy3:Cy5 ratio. The average of the resulting total Cy3 and Cy5 signal gave a ratio that was used to balance or normalize the signals.
Oligonucleotide microarray analysis
The HU 6800 microarray was obtained from Affymetrix (Santa Clara, CA). Briefly, total RNA isolated from normal PBMC and leukemic LGL were DNase-treated and purified with a Qiagen kit (Valencia, CA). Approximately 10 μg of purified RNA was used to prepare double-stranded cDNA (Supercript GIBCO/BRL, Rockville, MD) using a T7 (dT)24 primer containing a T7 RNA polymerase promoter binding site. Biotinylated complementary RNA was prepared from 10 μg of cDNA and then fragmented to approximately 50 to 100 nucleotides. In vitro transcribed transcripts were hybridized to the HU 6800 microarray for 16 h at 45°C with constant rotation at 60 rpm. Chips were washed and stained by using the Affymetrix fluidics station. Fluorescence intensity was measured for each chip and normalized to the fluorescence intensity for the entire chip.
Verification of the clones
GEM cDNA clones (supplied as a bacterial stab) were purchased from IncyteGenomics and streaked on LB agar plates containing the appropriate antibiotic. Individual colonies were picked and grown in LB medium. Plasmid DNA was isolated and sequenced in order to verify the sequence identity.
Northern blot analysis
Northern Blotting was performed as described. Briefly 10 μg of total RNA from each sample was denatured at 65°C in RNA loading buffer, electrophoresed in a 1% agarose gel containing 2.2 M formaldehyde, then blotted onto a Nytran membrane (Schleicher & Schuell, Inc, Keene, N.H). The RNA was fixed to the membrane by UV cross-linking. cDNA was labeled with [32P] and purified using Nick columns (Amersham Pharmacia Biotech AB, Piscataway, NJ). Hybridization and washing of the blots were performed as described by Engler-Blum et al .
RNase protection assay (RPA)
RPAs were performed using the RNA isolated from leukemic LGL, normal PBMC and normal PBMC activated by IL-2 and PHA. Five μg of total RNA was hybridized to the in vitro transcribed hAPO-4 probe set (PharMingen, SanDiego, CA), and the RPA assay was performed according to the manufacturer's protocol. After the assay, the samples were resolved on a 5% polyacrylamide gel. The gel was dried and exposed to X-ray film. After developing the film, the bands were quantitated by using the ImageQuant program and normalized with the housekeeping gene, L32.
Western immunoblot analysis
Cells were lysed in a buffer containing 50 mM Tris-HCl (pH 7.6), 5 mM EDTA, 150 mM NaCl, 0.5 % NP-40, and 0.5% Triton X-100 containing 1 μg/ml leupeptin, aprotinin and antipain; 1 mM sodiumorthovanadate; and 0.5 mM PMSF (all reagents were obtained from Sigma Chemical Co.). Twenty-five μg of total protein from each sample was subjected to 10% SDS-PAGE. Then the proteins were transferred to a membrane and Western blotting was performed using the monoclonal antibody for PAC-1 and A20, followed by the ECL technique as recommended by the manufacturer (Amersham Biosciences, Piscataway, NJ).
This investigation was supported by grants from the Veterans Administration Merit Review, National Cancer Institute, Hisamitsu Pharmaceutical Co, Inc., (CA83947, CA90633, G60203). We thank Susan Nyland and Steven Enkemann for critical reading of the manuscript and helpful suggestions.
- Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270: 467–470.View ArticlePubMedGoogle Scholar
- Hedge P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes JE, Snesrud E, Lee N, Quackenbush J: A concise guide to cDNA microarray analysis. Biotechniques 2000, 29: 548–562.Google Scholar
- Lipshutz RJ, Morris D, Chee M, Hubbell E, Kozal MJ, Shah N, Shen N, Yang R, Fodor SPA: Using oligonucleotide probe arrays to access genetic diversity. Biotechniques 1995, 19: 442–447.PubMedGoogle Scholar
- Knight J: When the chips are down. Nature 2001, 410: 860–861. 10.1038/35073680View ArticlePubMedGoogle Scholar
- Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, et al.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 2000, 24: 227–234. 10.1038/73432View ArticlePubMedGoogle Scholar
- Kuo WP, Jenssen T, Butte AJ, Ohno-Machado L, Kohane IS: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics 2002, 18: 405–412. 10.1093/bioinformatics/18.3.405View ArticlePubMedGoogle Scholar
- Jain KK: Biochips for gene spotting. Science 2001, 294: 621–623. 10.1126/science.294.5542.621View ArticlePubMedGoogle Scholar
- Liu CC, Walsh CM, Young JD-E: Perforin: structure and function. Immunol Today 1995, 16: 194–201. 10.1016/0167-5699(95)80121-9View ArticlePubMedGoogle Scholar
- Rohan PJ, Davis P, Moskaluk CA, Kearns M, Krutzsch H, Siebenlist U, Kelly K: PAC-1 A mitogen-induced nuclear protein tyrosine phosphatase. Science 1993, 259: 1763–1766.View ArticlePubMedGoogle Scholar
- Glienke J, Sobanov Y, Brostjan C, Steffens C, Nguyen C, Lehrach HE, Hofer E, Francies F: The genomic organization of NKG2C, E, F, and D receptor genes in the human natural killer gene complex. Immunogenetics 1998, 48: 163–173. 10.1007/s002510050420View ArticlePubMedGoogle Scholar
- Halgren RG, Fielden MR, Fong CJ, Zacharewski TR: Assessment of clone identity and sequence fidelity for 1189 IMAGE cDNA clones. Nucleic Acids Res 2001, 29: 582–588. 10.1093/nar/29.2.582PubMed CentralView ArticlePubMedGoogle Scholar
- Taylor E, Cogdell D, Coombes K, Hu L, Ramdas L, Tabor A, Hamilton S, Zhang W: Sequence verification as quality-control step for production of cDNA microarrays. Biotechniques 2001, 31: 62–65.PubMedGoogle Scholar
- Poe M, Blake JT, Boulton DA, Gammon M, Sigal NH, Wu JK, Zweerink HJ: Human cytotoxic lymphocyte granzyme B. Its purification from granules and the characterization of substrate and inhibitor specificity. J Biol Chem 1991, 266: 98–103.PubMedGoogle Scholar
- Edwards EM, Kam CM, Powers JC, Trapani JA: The human cytotoxic T cell granule serine protease granzyme H has chymotrypsin-like (Chymase) activity and is taken up into cytoplasmic vesicles reminiscent of granzyme B-containing endosomes. J Biol Chem 1999, 274: 30468–30473. 10.1074/jbc.274.43.30468View ArticlePubMedGoogle Scholar
- Engler-Blum G, Meier M, Frank J, Muller GA: Reduction of background problems in non-radioactive Northern and Southern blot analyses enables higher sensitivity than 32 P-based hybridizations. Anal Biochem 1993, 210: 235–244. 10.1006/abio.1993.1189View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.