Sources of variability and effect of experimental approach on expression profiling data interpretation

Background We provide a systematic study of the sources of variability in expression profiling data using 56 RNAs isolated from human muscle biopsies (34 Affymetrix MuscleChip arrays), and 36 murine cell culture and tissue RNAs (42 Affymetrix U74Av2 arrays). Results We studied muscle biopsies from 28 human subjects as well as murine myogenic cell cultures, muscle, and spleens. Human MuscleChip arrays (4,601 probe sets) and murine U74Av2 Affymetrix microarrays were used for expression profiling. RNAs were profiled both singly, and as mixed groups. Variables studied included tissue heterogeneity, cRNA probe production, patient diagnosis, and GeneChip hybridizations. We found that the greatest source of variability was often different regions of the same patient muscle biopsy, reflecting variation in cell type content even in a relatively homogeneous tissue such as muscle. Inter-patient variation was also very high (SNP noise). Experimental variation (RNA, cDNA, cRNA, or GeneChip) was minor. Pre-profile mixing of patient cRNA samples effectively normalized both intra- and inter-patient sources of variation, while retaining a high degree of specificity of the individual profiles (86% of statistically significant differences detected by absolute analysis; and 85% by a 4-pairwise comparison survival method). Conclusions Using unsupervised cluster analysis and correlation coefficients of 92 RNA samples on 76 oligonucleotide microarrays, we found that experimental error was not a significant source of unwanted variability in expression profiling experiments. Major sources of variability were from use of small tissue biopsies, particularly in humans where there is substantial inter-patient variability (SNP noise).

probes (features) designed against each gene tested by the array, with paired perfect-match and mismatch probes, with standardized factory synthesis of arrays [5,6]. The uniform nature of the arrays permits databasing of individual profiles, which facilitates comparison of data generated by different laboratories.
Expression profiling has led to dramatic advances in understanding of yeast biology, where homogeneous cultures can be grown and exposed to timed environmental variables [7][8][9][10][11][12]. Such studies have led to the rapid assignment of function to a large number of anonymous gene sequences. Large-scale expression profiling studies of tissues from higher vertebrates are more challenging, due to the higher complexity of the genome, larger related gene families, and incomplete genomic resources. Nevertheless, DNA microarrays have been successfully applied in the analysis of aging and caloric restriction [13] and pulmonary fibrosis [14]. And many publications, particularly on cancer, have appeared [14][15][16][17][18][19]. Affymetrix has recently announced the availability of the U133 GeneChip series with 33,000 well-characterized human genes mined from genomic sequence. The nearly complete ascertainment of genes in the human genome should make expression-profiling studies of human tissues particularly powerful. However, identification of the sources of experimental variability, and knowledge of the relative contribution of variation from each source, is critical for appropriate experimental design in expression profiling experiments.
Mills and Gordon recently studied the relative contribution of experimental variability of probe production on the reproducibility of microarray results using mixed murine tissue RNA on Affymetrix Mu11K GeneChips [20]. In their study, the same RNA preparation was used as a template for distinct cDNA/cRNA amplifications and hybridizations. An additional variable studied was the effect of different laboratories processing the same RNAs. The authors found relatively poor concordance between duplicate arrays, with an average of 12% increase/decrease calls between the same RNA processed in parallel and hybridized to two Mu11K-A microarrays. The authors concluded that there was substantial experimental variability in the experimental procedure, necessitating extensive filtering and large numbers of arrays to detect accurate gene expression changes (LUT: look-up tables) [20]. In our laboratory, we have processed over 1,200 Affymetrix arrays, and have found significantly higher experimental reproducibility (R 2 = 0.979 for new generation U74A version 2 murine arrays or human U95 series, see Result and Discussion). In addition, a recent publication of a single human patient, where RNA was prepared from two distinct breast tumors, and placed on duplicate U95A GeneChips (four chips total) found a very low degree of experimental variability between microarrays (R 2 = 0.995), and be-tween the two tumors (R 2 = 0.987) [21]. The marked differences in experimental variability between laboratories could be due to different quality control protocols (see [http://microarray.cnmcresearch.org] ), newer more robust Affymetrix arrays now available (murine Mu11K versus U74A version 2 and new generation human U95 series), use of more recent algorithms for data interpretation, or due to more consistent processing of RNA, cDNA, and cRNA in the same laboratory.
The previous studies did not systematically address the reproducibility of GeneChip hybridization (e.g. the same biotinylated cRNA on two different microarrays). In addition to lingering questions concerning variability due to specific experimental procedures, there are other possible sources of variability that have not yet been investigated, specifically tissue heterogeneity and inter-individual variation. The latter two sources of variability are particularly important in human expression profiling studies. The study of human tissues often involves the use of tissue biopsies, where a relatively limited region of an organ is sampled. Tissue heterogeneity and sampling error might be expected to introduce significant variability in expression profiles. Second, tissues may derive from individuals from different ethnic backgrounds; humans are highly outbred, leading to the potential of significant polymorphic noise (herein called "SNP noise") between individuals unrelated to the disease or variable under study. SNP noise also exists between different inbred mouse strains, and some experiments have normalized this effect by breeding the same mutation on different strains, and profiling each individually [22]. Knowledge of the relative effect of each experimental, tissue, and patient variable on expression profiling results in humans is important, so that appropriate experimental designs can be employed.
We recently reported the design and production of a highly redundant oligonucleotide microarray for analysis of human muscle biopsies (Borup et al. submitted). This Mus-cleChip contains 4,601 probe sets corresponding to 3,369 distinct genes and ESTs expressed in human muscle. Each probe set contains between 16 to 40 oligonucleotides, such that the number of specific oligonucleotide probes on the array was 138,000.
Here, we utilize this MuscleChip to investigate the relative significance of variables affecting expression profiling data and interpretation. Specifically, we studied the correlation coefficients of profiles considering the following variables: 1. variation due to probe production (same RNA); 2. variation due to the microarray itself (same cRNA on different GeneChips); 3. tissue heterogeneity (different regions of the same muscle biopsy); 4. inter-patient variability (SNP noise); 5. diagnosis (underlying pathological variable); and 6. patient age. We have recently reported generation of expression profiling results using mixed patient samples [23]. Our hypothesis was that mixing of RNA samples from multiple regions of muscle biopsies, and from multiple patients matched for most variables (disease, age, sex), would effectively normalize both intra-patient variability (tissue heterogeneity), and inter-patient variability (SNP noise; e.g. normal human polymorphic variation unrelated to the primary defect). Here, we test this hypothesis directly, and show that sample mixing does indeed result in relatively high sensitivity and specificity for gene expression changes that would be detected by many individual expression profiles. Thus, sample mixing appears to be an appropriate first-pass method to obtain the most significant expression changes, while using small numbers of arrays.

Results and discussion
Fifty six (56) different RNA samples were prepared from different regions of muscle biopsies from 28 individuals (15 Duchenne muscular dystrophy (DMD) patients, 13 normal controls). The profiles of five of the DMD patients and the five controls have been previously reported using the Affymetrix HuFL microarray [23]; however, we re-tested these same samples on the custom MuscleChip (Borup et al. submitted) for comparison to the other patients here. All RNAs were converted to double-stranded cDNA, and then to biotinylated cRNA. The cRNAs were then hybridized to the MuscleChip either singly, in mixed groups, or both, as described below. In total, 34 hybridizations were performed, scanned, and the data statistically analyzed using Affymetrix Microarray Suite and Excel. Quality control criteria were as described on our web site ( [http://microarray.cnmcresearch.org] , link to "programs in genomic applications"), and included sufficient cRNA amplification, and adequate post-hybridization scaling factors. Scaling factors (normalization needed to reach a common target intensity) ranged from 0.46 to 3.28 (Table 1). All raw image files, processed image files, and difference analyses are posted on a web-queried SQL database interface to our Affymetrix LIMS Oracle warehouse (see [http:// microarray.cnmcresearch.org] : link to "programs in genomic applications", "data", "human").
Among the 4,601 probe sets on the Affymetrix custom muscle microarray, we found a consistent percentage of "present" calls for each of the 34 cRNA samples tested (Duchenne dystrophy, 28 arrays, 48.2% ± 6.1%; controls 6 arrays, 53.3% ± 1.4%). To test for inter-array variability, two different hybridization solutions were applied to duplicate arrays, and correlation coefficients determined. A high correlation coefficient was found in this analysis, suggesting that inter-array variability of the MuscleChip used was a relatively minor variable (Patient 3 a and 3aduplicate R 2 = 0.96 and percent shared [No Change (NC)] calls by Microarray Suite software was 99%; Patient 3b and 3b-duplicate R 2 = 0.98 and percent NC was 98%; Table 1). The high reproducibility of Affymetrix array results is consistent with other data in our laboratory, and from previously published data [6,21,23,24], and shows that experimental variability associated with hybridization and scanning of highly redundant oligonucleotide Gene-Chips is not a major source of experimental variability.
Given the previous report suggesting that the conversion of RNA to biotinylated cRNA probe was a major source of variability in murine array experiments [20], we tested a series of murine RNA from different sources, using the

Figure 1
Unsupervised hierarchical clustering of 42 murine U74Av2 Affymetrix arrays. Unsupervised hierarchical clustering of 42 murine U74Av2 Affymetrix arrays shows that probe synthesis and hybridization is not a major source of experimental variability. Expression profiles shown were from three different experimental groups; one using cultured murine myogenic cells (VSM samples), one using mouse spleens (KNagaraju samples), and one group from mouse skeletal muscle from normal and mdx mouse strains (FBooth samples). For the KNagaraju samples, the same spleen RNA was split prior to cDNA synthesis to create duplicate cDNA-cRNA-profile results; these duplicates show a very high correlation coefficient, and close relationship by Unsupervised clustering (low branches on dendrogram). The VSM cultured samples were each derived from different culture plates, and the FBooth samples from different murine muscles. The duplicate murine muscle samples are more closely related (high correlation coefficient) than the parallel cultures (VSM). Additional variables, such as male versus female (KNagaraju samples), and time after TSA treatment (VSM samples) are indicated, but are not relevant for this manuscript, and will be discussed in more detail elsewhere.
newer generation U74Av2 GeneChips. One series of samples was from murine spleens, where spleens from multiple animals for each variable under study were mixed, RNA isolated, RNA samples split, and duplicate cDNA, cR-NA, and hybridizations processed in parallel for each RNA (Fig. 1, "KNagaraju" samples). We also compared RNAs processed from parallel murine myogenic cell cultures (Fig. 1, "VSM" samples), where each profile was from a different cell culture. Finally, we used a series of murine muscle tissues from normal and dystrophin-deficient mice, where each profile was from a different series of complete gastrocnemius muscles (Fig. 1, "FBooth" samples). The data from these 42 murine U74Av2 profiles were then analyzed by unsupervised clustering [25] to determine which profiles were most closely related to each other (Fig. 1). This analysis shows that the different sources of RNA cluster together, as expected. Importantly, the same RNA used as a template for two distinct cDNA/cRNA preparations and hybridizations showed a high correlation coefficient (R 2 = 0.99 for five of the six samples, with average R 2 = 0.978) (Fig. 1). The large muscle group profiles (FBooth samples) showed excellent correlation, both with respect to diagnosis; however here there was no sampling error as the entire muscle group was used rather than isolated biopsies. Finally, the parallel tissue culture experiments (VSM samples) showed greater variability between duplicates, suggesting that tissue culture conditions may be more subject to variability than in vivo tissues (Fig. 1). This murine data shows that variability from different cDNA-cRNA reactions is very low (R 2 = 0.978).
To analyze the impact of intra-patient variability (tissue heterogeneity), inter-patient variability (polymorphic noise in outbred populations), and the effect of sample mixing on the sensitivity of detection of gene expression differences between patient groups, we conducted a series of individual and mixed profiling (Table 1). Muscle biopsies from five 4-6 yr old DMD patients, and five 10-12 yr old patients were selected, each biopsy split into two parts, and RNA isolated independently from each of the 20 biopsy fragments. For these ten DMD patients, the two different regions of the same biopsy were expression profiled both individually (20 profiles), and also mixed into four pools where each pool originated from distinct RNA samples (Table 1). The resulting profiles were also compared to previously reported mixed 6-9 yr old DMD patient cRNAs, and mixed 6-9 yr old control cRNAs [23], as mentioned above.
As an initial statistical analysis, we used Affymetrix software to define genes that showed expression changes (Increased, Decreased or Marginal) in expression levels between pairs of profiles (difference analyses). This method of data interpretation showed that some muscle biopsies showed very little variance between different regions of the same biopsy, while other patient biopsies showed considerable variability (see Fig. 2 for representative scatter graphs). Expressing this variance as a percentage of "Diff Calls" between the two regions of the same biopsy, as determined by Affymetrix default algorithms, we found considerable variability in the similarity of profiles, with values ranging from 1.5% to 18% of the 4,601 probe sets studied (4.99% ± 4.94%). This data suggests that tissue heterogeneity (intra-patient variability) can be a major source of variation in expression profiling experiments, even when using relatively large pieces (50 mg) of relatively homogeneous tissues (such as muscle).
The most common strategy for interpreting Affymetrix microarray data is to use two profile comparisons, with an arbitrary threshold for "significant fold-change" in expression levels. Typically, multiple arrays are compared, with those gene expression changes showing the most consistent fold changes prioritized, although other methods have been reported [13,22,26,27]. To study inter-patient variability, we defined the gene expression changes surviving four pairwise comparisons with mixed control samples, as we have previously described [23]. Briefly, four comparisons were done by Affymetrix software (eg. DMD 1a versus control 1a; DMD 1a versus control 1b; DMD1b versus control 1a; DMD1b versus control 1b). The four data sets were then compared, with only those gene expression changes that showed >2-fold change in all four comparisons (four comparison survival method). The number of surviving diff calls by this method ranged from 250 to 463 (355 ± 80) ( Table 1). Interestingly, those patients showing considerable variation between different regions of the same biopsy did not show a corresponding decrease in the number of gene expression changes surviving the iterative comparisons to controls (Table 1). This suggests (but does not prove) the most significant changes might be shared, independent of tissue variability (see below).
A different statistical method to determine the effect of the different variables under study is to perform hierarchical cluster analysis using nearest neighbor statistical methods [25]. Here, we subjected all profiles to unsupervised cluster analysis, as a means of determining which variables had the greatest effect (e.g. intra-patient variability [different regions of biopsy], versus diagnosis [DMD vs control], versus inter-patient variability [DMD patients in same age group], versus age of patient). For this analysis, we used the fluorescence intensity of each probe set (Average difference), after data scrubbing to remove genes that showed expression levels near background ("Absent" Calls) for all profiles (Fig. 3). This analysis shows that duplicate profiles of the same cRNA hybridization solution are the most highly related (Patient 3 a and duplicate (3ad); 3b and duplicate (3b-d)), consistent with the high correlation found by the comparisons using Affymetrix When comparing two different regions of the same biopsy [intra-patient variability], we found widely varying results, depending on the patient studied (Fig. 3). For example, some individual patients showed very closely related profiles that approached the similarity of duplicate arrays on the same cRNA ( Fig. 2; profiles 6a, 6b; 10a, 10b). On the other hand, some patients showed very distantly related profiles for two regions of the same biopsy ( Fig. 3; profiles 1a, 1b; 4a, 4b; 9a, 9b). Importantly, the variation caused by intra-patient tissue variation often overshadowed all other variables. For example, a profile from DMD patient 9 (9a) clustered with the normal controls, rather than with the other DMD patients (Fig. 3). The histopathology of this patient was noted as being unusually variable in severity prior to expression profiling. Also, unsupervised clustering was unable to group patients of similar ages, despite DMD showing a progressive clinical course. We conclude that intra-patient tissue heterogeneity is a major source of experimental variability in expression profiling, and must be considered in experimental design.
The above findings suggested that both intra-patient variability (tissue heterogeneity) and inter-patient variability (polymorphic noise) had major effects on the expression profiles. One method to control for these sources of noise is to analyze large numbers of profiles, both on multiple patients, and on multiple regions of tissue from each patient. This would allow determinations of p values and statistical significance for a single controlled variable under study (e.g. DMD vs controls). An alternative method is to experimentally normalize these variables through mixing of samples from patient groups; such mixing would be expected to average out both intra-and inter-patient variation. The expectation is that the most significant and dramatic gene expression changes would still be identified, while using many less profiles (and thus a substantial reduction in cost of the analyses).
To test for the relative sensitivity of interpretation of sample mixing versus individual profiles, we mixed together the 10 cRNAs for the two different age groups of DMD patients (samples 1a -5b; samples 6a -10b). For this analysis, we also generated expression profiles for two additional groups of control individuals. One was a second set of five normal male biopsies ages 5-12 yrs (controls 2a, 2b), and the third control set was three normal  Importantly, the mixed DMD profiles cluster more closely with mixed normal controls than with individual DMD patient profiles. This data suggests that intra-patient (tissue heterogeneity) and inter-patient (SNP noise) can be significant sources of experimental variability.
Control (normal) muscles 5-6 years old DMD muscles 10-12 years old DMD muscles 6-9 years old DMD muscles age-matched female biopsies ages 4-13 yrs (controls 3a, 3b) (Fig. 4). As with the original male control group (control 1a, 1b), two different regions of each biopsy were processed independently through the biotinylated cRNA step, and then equimolar amounts of cRNA mixed for hybridization to the MuscleChip.
All 34 profiles (both individual and mixed samples) were again analyzed by unsupervised hierarchical clustering (Fig. 4) [25]. As described above, we scrubbed the profiles to eliminate all genes showing expression levels consistently at or below background hybridization intensities by requiring each gene to show a "Present Call" in one or more of the 34 profiles.
As above, duplicate profiles using the same cRNA hybridization solution on different arrays, whether mixed or individual samples, showed very highly correlated results (very low branch on dendrogram) ( Fig. 4; mix 5-6 yrs, mix 10-12 yrs; patient 3a/3a-d; patient 3b/3b-d). As above, this indicates that experimental variability from laboratory procedures or different arrays is a relatively minor factor in interpretation of results. Mixed samples from different regions of the same biopsies showed the same, or only slightly more variation (mixed controls c1, c2, and c3, mixed DMD 6-9 yrs). This showed that sample mixing does indeed average out tissue heterogeneity (intra-patient variability), as well as inter-patient variability. We noted that all of the controls (both male and female) clustered in the same branch of the dendrogram, while the four of the six mixed DMD profiles clustered just one level away from the controls, separately from the other DMD profiles. This analysis suggests that there is considerable variability in the progressive tissue pathology induced by dystrophin deficiency, both within a patient, and between patients.
To test the sensitivity and specificity of sample mixing versus individual profiling, we defined differentially expressed genes using a two group t-test (GeneSpring [28,29]), comparing all 6 mixed control profiles and the 10 individual 5-6 yr old DMD profiles. Genes were retained that met specific p value thresholds between the two sets of profiles. In parallel, we compared the two corresponding mixed 5-6 yr old DMD profiles to the same 6 mixed control profiles.
Comparison of 10 individual 5-6 yr Duchenne dystrophy profiles to 6 mixed controls revealed 1,498 genes showing differential expression with p < 0.05 (Fig. 5). Comparison of the two mixed Duchenne dystrophy profiles to the 6 mixed controls showed 1,350 genes with p < 0.05 (Fig.  5A). Comparison of the two gene lists showed that 61% of differentially regulated genes detected by the 10 individual profiles were also detected by the two mixed profiles. This suggests that the sensitivity and specificity of using mixed samples is approximately half that of individual profiles. However, there was a rapid shift in specificity and sensitivity as stringency of the analysis was increased. Raising the statistical threshold to p < 0.0001 for individual profiles, while keeping the threshold for mixed profiles at p < 0.05 as required by the small number of data points (Fig. 5B), resulted in a sensitivity of 86% for mixed samples (351 of 408 genes p < 0.0001 detected). In conclusion, mixing detected about two thirds of statistically significant changes (p < 0.05). Mixing was a relatively sensitive method of detecting the most highly significant changes (p < 0.0001) (86% of changes detected), however it was not very specific; as many as one third of gene ex-

Figure 5
Comparison of individual profiles to mixed profiles by t-test statistics. Shown in green are differentially expressed genes for 2 mixed DMD 5-6 y profiles versus 6 mixed controls. Shown in red are differentially expressed genes for 10 individual DMD 5-6 y profiles and 6 mixed controls. P-value thresholds used to generate gene lists are indicated. The p-value for mixed profiles is held at p < 0.05, as the low sample number (2 versus 2) precludes obtaining more significant values. This analysis suggests that the use of t-test statistics for small number of mixed samples is relatively sensitive, but not highly specific. Use of t-test measurements is expected to contain significant amounts of noise, due to the very large number of comparisons involved in array studies; a value of p = 0.05 means that as many as 5% of gene expression changes are expected to be identified by "chance", and thereby not reflect true differences between samples. We have previously reported a very simple, yet potentially more stringent method for data analysis of small numbers of expression profiles, using duplicate profiles for control and experimental samples, and then identifying those genes that show consistent changes >2-fold in the four possible pair-wise data comparisons (four comparison survival method) [23]. A similar pair-wise comparison method, using a less stringent average fold-change analysis, was recently reported for muscle from aging and calorie-restricted mouse muscle [13].
To investigate the validity of this approach we compared the sensitivity and specificity of t-test detection of expression changes versus the four-pairwise survival method. Two sample t-test of the 10 individual Duchenne dystrophy profiles compared to the 6 mixed control profiles revealed 1,498 genes showing p < 0.05 as above. In parallel, the mixed DMD duplicate profiles were compared to a single pair of mixed control sample profiles (c1a, c1b), using the pairwise comparison survival method [23]. Briefly, four comparisons were done (DMD 1a versus control 1a; DMD1b versus control 1a; DMD 1a versus control 1b; DMD1b versus control 1b), and only those genes retained which showed >2-fold change in all four comparisons. This method was indeed considerably more specific in identifying significant (p < 0.05) gene expression changes ( Fig. 6A) with 85% of gene expression changes in the mixed profiles verified by individual profiles (p < 0.05). The sensitivity of this method depended on the p-value threshold for the individual profiles, but only reached a maximum of 49% sensitivity at p < 0.0001 (Fig. 6B).
The results above suggested that analysis of mixed samples using t-test methods was relatively sensitive but nonspecific, while analysis of the same mixed profiles by 2fold survival method was relatively specific but insensitive. To confirm this conclusion, we directly compared the sensitivity and specificity of the four pairwise comparison method to more standard t-test methods (Fig. 7A). We found that the pairwise survival method was indeed highly specific, with 97% of changes identified by this method also detected by t-test. However, as predicted, it was not very sensitive, with only 30% of the expression changes with p < 0.05 identified by t-test being detected by the pairwise survival method. Comparison of all three analysis methods showed that many (349) genes expression changes were detected by all three methods (Fig. 7B).

Conclusions
Microarray data analyses have been criticized as being "quite elusive about measurement reproducibility" [30]. This is largely the consequence of the large number of uncontrolled or unknown variables, and the prohibitive cost of isolating and investigating each variable. Here, we report the systematic isolation and study of most variables in microarray experiments using Affymetrix oligonucleotide arrays and human tissue biopsies. We found that all sources of experimental variability were quite minor (microarray R 2 = 0.98-0.99; probe synthesis + microarray R 2 = 0.98-0.99). On the other hand, tissue heterogeneity

Figure 6
Use of >2-fold survival method provides relatively specific, but insensitive detection of significant gene changes. Shown in green is a representation of number of genes surviving four pair-wise comparisons to two mixed control profiles, with retention of only those genes showing fold changes > 2-fold in the four pair-wise comparisons. Shown in red are differentially expressed genes for 10 individual DMD 5-6 y profiles versus 6 mixed controls at the indicated p-value thresholds. This fold-change survival method shows good specificity at p < 0.05 for individual profiles (85%), however it is relatively insensitive.  ), and inter-individual variation often is very large. We have shown that mixing of patient samples effectively normalizes much of the intraand inter-patient noise, while still identifying the majority of the most significant gene expression changes that would have been detected by larger numbers of individual patient profiles. Our results suggest that stringent yet robust data can be generated by mixing a small number of individuals with a defined condition (n = 5), preferably using different regions of tissue for duplicate arrays. Controls should be similarly processed. The resulting four arrays (2 controls, 2 experimental datasets) should then be subjected to the >2-fold survival method, as previously described [23]. This will yield a stringent set of expression changes that are likely to be verified by larger studies with individual arrays, but at low cost as only four arrays are employed. The preliminary data from just four mixed profiles (two experimental and two control) can then be used to generate functional clusters and pathophysiological models. These preliminary models can then direct more hypothesis-driven experiments, or more extensive expression profiling studies.

Expression profiling
Human muscle biopsy samples were diagnostic specimens flash-frozen immediately after surgery in isopentane cooled in liquid nitrogen, with storage in small, airtight, humidified tubes at -80°C until RNA isolation. Duchenne muscular dystrophy patient samples were all shown to have complete lack of dystrophin by immunostaining and/or immunoblot analysis, and were shown to have excellent morphology and preservation of tissue. Controls included groups of males and female (age described in text) that showed no histopathological abnormality, normal dystrophy proteins, and normal serum creatine kinase levels. Biopsy sizes ranged from 50 mg to 2 grams, with approximately 20-30 mg used for RNA isolation (~10-15 micrograms of total RNA). As described in the text, all biopsies had two different regions of the same biopsy expression profiled separately.
Details concerning the murine profiles will be published elsewhere. In this report, we used the murine profiles simply to test the sources of variation during sample preparation prior to hybridization to oligonucleotides.
RNA isolation (Trizol, Gibco BRL), RNA purification (RNAeasy, Qiagen), cDNA synthesis and biotinylated cRNA were all done as per standard protocols provided by Affymetrix Inc. Quality control methods are described on our web site ( [http://microarray.cnmcresearch.org/ pga.htm] ), with cRNA amplifications of between 5-and 13-fold for each of the samples. Ten micrograms of gelverified fragmented biotinylated cRNA were hybridized to each MuscleChip or U74A v2 array, and scanning done after biotin/avidin/phycoerythrin amplification. Details on the specific patients studied, and details for each Gene-Chip (scaling factors, number of present calls, percentage difference calls between each duplicate sample, number

Bio-informatic methods
Absolute analysis (average difference determinations for each probe set) was done using Affymetrix default parameters. As described in the text, data was analyzed using a variety of methods, including unsupervised nearest-neighbor hierarchical clustering analyses (GeneSpring [28,29] [Silicon Genetics], and Cluster [25] [Stanford University]), t-test (GeneSpring) and four-comparison survival method [23]. The Cluster and Tree View software were download from [http://rana.lbl.gov] and installed on an NT workstation.