The PowerAtlas: a power and sample size atlas for microarray experimental design and research
 Grier P Page^{1}Email author,
 Jode W Edwards^{1, 2},
 Gary L Gadbury^{3},
 Prashanth Yelisetti^{1},
 Jelai Wang^{1},
 Prinal Trivedi^{1} and
 David B Allison^{1}
DOI: 10.1186/14712105784
© Page et al; licensee BioMed Central Ltd. 2006
Received: 26 July 2005
Accepted: 22 February 2006
Published: 22 February 2006
Abstract
Background
Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies.
Results
To address this challenge, we have developed a Microrarray PowerAtlas [1]. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO). The PowerAtlas also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC).
Conclusion
This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes.
Background
Planning microarray studies provides unique challenges to investigators with respect to estimating power and the sample size required for a study. The questions proposed may be quite general and exploratory, such as "which genes are differentially expressed in response to a given treatment?" A microarray study should have a high probability to answer, at least in part, the questions and hypotheses being proposed [loosely speaking, power or in our case Expected Discovery Rate (EDR)]. It should also have a high probability that those genes declared significant are truly differentially expressed (i.e. the 'True Positive' probability should be high). Sample size is a critical determinant of statistical power and expected error rates.
In traditional biomedical studies, investigators test one or at most a few hypotheses. This is not the case in microarray studies. Each treatment or group comparison involves the testing of every gene on the chip, which may number in the 10,000's. Some microarray experiments may involve multiple groups; thus the total number of hypotheses tested in a microarray experiment can run in the 100,000 s or more. In addition, the effects size and variance for each hypothesis may be different; resulting in different power estimates for each and every gene by treatment comparison.
Some investigators have proposed approaches to estimating required sample size for microarray research [2–4], but most of these methods calculate power based upon an arbitrary level of change being biologically relevant and constant across all genes. These methods do not take into account the amount of variability in each gene nor specify a hypothesized distribution of effect sizes, and do not incorporate some of the recently developed approaches to account for multiple testing in highdimensional biology(HDB) [5].
Quantities of interest in microarray experiments
Genes for which there is not a real effect  Genes for which there is not a real effect  

Genes not declared significant at designated thereshold  A  B 
Genes declared significant at designated thereshold  C  D 
The PowerAtlas works in two ways. Firstly, investigators may upload their own pilot data and extrapolate out the EDR, PTN, and PTP for a variety of sample sizes and α (type 1 error rate) level combinations. Secondly, many investigators do not have the opportunity to conduct their own pilot microarray study, but this need not stop an investigator. Given that many journals now require authors to place microarray data in public databases [8] before publication, investigators may draw upon these public data as pilot information. We have developed the PowerAtlas to assist investigators in the use of these public data to estimate the sample sizes required for wellpowered studies. We have downloaded all data from Gene Expression Omnibus [9], reanalyzed it, and processed it with the methods developed in Gadbury et al [5]. Thereafter, we have put the power and sample size calculations for many of the datasets into a readily accessible and searchable database [1]. It should be stressed that no one study is a perfect replicate of the study an investigator wishes to conduct, but similar studies can give a sense of the plausible ranges of sample sizes. We recommend that investigators examine several related experiments to get a sense of the sample size required for robust EDR and high PTP.
Designing a microarray study with the appropriate number of replicates is cost efficient. The use of the PowerAtlas will not only prevent investigators from using too many samples in a group, resulting in wasted money; but will also limit wasting money on experiments that have too few replicates to have sufficient power to yield good results.
Usage of the PowerAtlas
No registration is required to use the PowerAtlas, nor are any programs or applets pushed to an investigator's computer. An investigator simply accesses the PowerAtlas [1] and selects the appropriate link to use public data or the investigator's studyspecific data.
The data in the PowerAtlas are taken directly from GEO. As long as it meets the requirements outlined in 'Using Existing Public Data' the data is included in the PowerAtlas. However, the data in GEO can be quite variable, due to any number of reasons, including, but not limited to, the image processing algorithm, normalization, and inferential statistical procedure used in the analysis. Thus when using public data as a basis for planning future studies, an investigator should consider the results from several datasets, consult the primary sources (GEO GDS files and journal publications of the data), and have a reasonable understanding of the idiosyncrasies and applicability of each dataset to the proposed experiment before using the data. In addition, since each lab processes and handles samples and runs microarrays slightly differently, when possible, estimates of power should be based upon an investigator's own pilot data, which will be more accurate for an investigator's future experimental power than will extrapolations from other investigators' data.
Using the investigators' own data
To use the PowerAtlas with an investigator's own data a list of pvalues generated using a valid statistical method must be available. Currently the PowerAtlas generates sample sizes for two group comparisons only for any valid statistical test [10]. Then use the following instructions:

The investigator must possess/generate a tab delimited file with one pvalue per gene/feature for the main hypothesis of interest with each pvalue located on its own line. There should be no identifiers for genes. All pvalues from all genes on a chip/array should be included.

The file with pvalues is uploaded to the web site.

The investigator then enters the sample sizes (N1 and N2) for each of the groups used to calculate the pvalues.

The investigator then may either use default or custom settings for the sample sizes, significance (α) thresholds, and number of iterations for the bootstrap to be used for estimating power.

The investigator selects submit. For a sense of runtime, from an initial set of 12,500 pvalues with EDR, PTP, and PTN being calculated for 14 sample sizes and six thresholds, the analysis takes 3–10 minutes.

The investigator then will obtain a series of figures that illustrate the EDR, PTP, and PTN for a variety of sample sizes and significance (α) thresholds(examples are shown in figures 1, 2, 3 for an Affymetrix dataset [11] and figures 4, 5, 6 for a cDNA experiment [12]). The investigator may then choose the sample size and α level combination that achieves the desired levels for EDR, PTP, and PTN.
Using existing public data
To use the public data for estimating sample size:

From the PowerAtlas web page an investigator selects the option of using existing public data.

The investigator makes a selection of desired chip type (one or two channel) and the species of interest. At most, one of each chip type or species of interest may be selected. Alternatively, the investigator may also select only a chip type or a species.

A list of all experiments will appear that meet the selection criteria. The number of datasets can range from 0 (most bacteria on single channel chips) to more 200 for Human and Mouse on single channel chips (see table 2).
Itemization of the number type and species of chips available. NA means power estimation is not available. See section "Using existing public data" for possible explanations why the datasets may have be listed as NA.
Dual Channel  Dual Channel  Single Channel  Single Channel  

Available  NA  Available  NA  
Arabidopsis thaliana  1  10  22  0 
Aspergillus parasiticus  0  1  0  0 
Bacillus anthracis  0  2  0  0 
Bos taurus  0  3  2  0 
Caenorhabditis elegans  1  2  1  0 
Campylobacter jejuni  1  1  0  0 
Canis familiaris  0  0  1  1 
Capra hircus  0  0  1  0 
Chlamydomonas reinhardtii  1  0  0  0 
Cricetulus griseus  0  1  0  0 
Drosophila melanogaster  3  9  15  0 
Drosophila simulans  2  0  0  0 
Drosophila yakuba  0  2  0  0 
Escherichia coli  5  1  2  0 
Escherichia coli K12  0  0  1  0 
Fundulus heteroclitus  0  0  1  0 
Homo sapiens  44  34  178  35 
Marmota monax  0  0  1  1 
Mastomys natalensis  0  0  0  1 
Mus musculus  63  14  175  58 
Mycobacterium tuberculosis  1  0  0  0 
Oncorhynchus mykiss  0  4  0  0 
Oryza sativa  0  2  0  0 
Pinus contorta  0  0  1  0 
Rattus norvegicus  6  8  68  13 
Rhodobacter sphaeroides  0  0  2  2 
Saccharomyces cerevisiae  12  41  18  1 
Saccharomyces pastorianus  0  0  0  1 
Saccharum sp.  0  0  0  1 
Salmo salar  0  2  0  0 
Salmonella enterica  1  0  0  0 
Sus scrofa  0  1  1  0 
Viruses  0  0  0  1 
Zea mays  1  0  0  0 
TOTAL  142  138  490  115 

The investigator can read a brief description, taken directly from GEO, of all the experiments and find those that are most similar to their proposed experiment(s).

The investigator then selects the checkbox to the left of the desired datasets and press Submit to get additional information.

The investigator receives a report with a link to the GEO description should additional information be needed.

There are also links to a printable HTML report that includes a description of the dataset, the EDR, PTP, and PTN (figures 1 and 4) at an α level of 0.05 as well as a description of how to interpret the results.

In datasets with more than 2 groups the two groups with the largest sample sizes are given in the HTML report. There is a link to jpeg images for the other IntraGDS comparisons in the data set. In addition there is a link to a downloadable zip file that contains graphs illustrating the EDR (figures 2 and 5), PTP (figures 3 and 6), and PTN for a variety of α and sample sizes in a directory structure for each two group comparison. There is also an Excel file provided that contains the numbers underlying the figures.
Illustrative example of the accuracy and utility of the PowerAtlas
Estimated EDR and PTP for sample size of 7 per group at alpha levels of 0.05 and 0.001 extrapolated from a sample size of 3 (row 2) from the PKD data and the estimated EDR and PTP group at alpha levels of 0.05 and 0.001 calculated in the follow up study of 7 mice per group.
Estimated EDR for SS 7 at α = 0.05  Estimated PTP for SS 7 at α = 0.05  Estimated EDR for SS 7 at α = 0.001  Estimated PTP for SS 7 at α = 0.001  

Pilot of 3 per group  0.415009  0.809616  0.119791  0.985287 
Experiment of 7 per group  0.538771  0.772419  0.133711  0.976347 
Conclusion
The PowerAtlas provides investigators the option of using their own pilot data or drawing from a public domain microarray data sets to calculate sample sizes and statistical power for a proposed study. The overall goal is to estimate the sample size required to be able to answer the hypothesis of interest with a high EDR and a high PTP without using too many chips. Once the graphs and tables most appropriate are identified (which may involve examining several datasets), the investigator must decide upon the sample size to pursue. Unlike single hypothesisdriven research, a huge number of genes often are typically differentially expressed in a single microarray experiment and a study may yield many (often thousands) of significant genes. It is generally difficult for a single laboratory to followup or to investigate more than a few genes. Thus, while an EDR of 80% or more may be in line with traditional power studies, investigators may not want or have the laboratory resources to deal with largescale highpowered gene expression experiments where 1000 s of genes are identified as differentially expressed. Thus, it may be more appropriate to have a small list of genes in which an investigator has high confidence that the genes identified as differentially expressed are truly differentially expressed. Thus, modest EDRs (10–40%) may be appropriate when conservative alphas are chosen to generate high PTPs (80%+). On the other hand, when the investigator wants to get a complete picture of the experimental manipulations it may be more appropriate to use a liberal alpha level (0.1) to have a high EDR, but this will yield a lower PTP. Investigators should carefully consider what error rate (the proportion of the genes that are studied further that are false positives) is acceptable, how many genes they can truly invest in studying, and how important it is to have a complete list of differentially expressed genes.
A few other issues should be considered when choosing the sample size for an experiment. The first, one should not rely upon any one study to drive the sample size. An investigator should view several datasets to get an idea of the range of possible sample sizes. Secondly, we have analyzed all the data in the PowerAtlas as if it were two groups with fully randomized designs. This may not be the case; experiments may be 2 or 3 way experiment with multiple levels. If the actual experiment were these designs, the calculated sample size may be an over estimate for the methods in the PowerAtlas does not yet allow for using information from other groups to estimate the variances as methods such as ANOVA and linear models do. In addition the hypothesis shown in the main graph may not be the primary hypothesis of interest in a study, they are simply the groups with the largest sample sizes, and the other hypotheses should be reviewed as well. Investigators should review the primary literature to verify what the true experimental design was. We also assume the experiments were conducted in a rigorous fashion and have not been confounded by nonbiological sources of error, which may adversely effect power nor does the use of good sample size obviate the need for good experimental design and conduct of the experiment [16].
Future directions
There are several areas where the functions of the PowerAtlas will be expanded. First we will augment the data in the database by revising the data from GEO every six months and we will add data from additional sources such as the Nottingham Arabidopsis Stock Center (NASC). In the PTP graph at low n and small α, the lines of the PTP sometimes cross, which is due to the fact under these conditions sometimes very few or even zero genes are declared significant. As this is the denominator of PTP the PTP is 0. Until the sample size gets large enough to declare enough genes differentially expressed at a chosen (small) threshold, PTP lines may cross over each other. We are working to eliminate this issue from our method. Currently, only point estimates of the EDR, PTP and PTN are generated. Future work will generate confidence intervals on these estimates. We are also extending the power estimation procedures to handle ANOVA and linear models, which will allow for power estimation for loops designs, the correct analysis of datasets with multiple groups, and time series data. When these methods have been developed they will be incorporated into the PowerAtlas.
Availability and requirements

Project name: The PowerAtlas

Project home page: http://www.powerAtlas.org also http://www.poweratlas.net

Operating system(s): Webbased application

Programming language: Java

Other requirements: Web browser. Unzip utility

License: None

Any restrictions to use by nonacademics: None
Abbreviations
 ANOVA:

Analysis of Variance
 EDR:

Expected discovery rate
 GDS:

Gene Expression Omnibus Dataset
 GEO:

Gene Expression Omnibus
 HDB:

High Dimensional Biology
 PTP:

Probability of a True Positive
 PTN:

Probability of a True Negative
Declarations
Acknowledgements
We acknowledge the contributions of the many investigators who have selflessly deposited their microarray data into public databases such as GEO, without which the PowerAtlas would not be possible. This work was supported by NSF grant 0217651, 0306596 and NIH grant P50AT00477.
Authors’ Affiliations
References
 PowerAtlas T: [] http://www.poweratlas.org. http://www.poweratlas.org2006.http://www.poweratlas.orgGoogle Scholar
 Pan W, Lin J, Le CT: How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biology 2002, 3: RESEARCH0022.PubMed CentralPubMedGoogle Scholar
 Lee ML, Whitmore GA: Power and sample size for DNA microarray studies. Stat Med 2002, 21: 3543–3570. 10.1002/sim.1335View ArticlePubMedGoogle Scholar
 Wang SJ, Chen JJ: Sample size for identifying differentially expressed genes in microarray experiments. J Comput Biol 2004, 11: 714–726. 10.1089/cmb.2004.11.714View ArticlePubMedGoogle Scholar
 Gadbury GL, Page GP, Edwards J, Kayo T, Weindruch R, Permana PA, Mountz J, Allison DB: Power Analysis and Sample Size Estimation in the Age of High Dimensional Biology. Stat Meth Med Res 2004, 13: 325–338.View ArticleGoogle Scholar
 Mehta T, Tanik M, Allison DB: Towards sound epistemological foundations of statistical methods for highdimensional biology. Nat Genet 2004, 36: 943–947. 10.1038/ng1422View ArticlePubMedGoogle Scholar
 Donoho D: Mathematical Challenges of the 21st Century  HighDimensional Data Analysis: The Blessings and Curses of Dimensionality.2000. [http://wwwstat.stanford.edu/~donoho/Lectures/AMS2000/MathChallengeSlides2*2.pdf]Google Scholar
 Ball CA, Sherlock G, Parkinson H, RoccaSera P, Brooksbank C, Causton HC, Cavalieri D, Gaasterland T, Hingamp P, Holstege F, Ringwald M, Spellman P, Stoeckert CJJ, Stewart JE, Taylor R, Brazma A, Quackenbush J: Standards for microarray data. Science 2002, 298: 539. 10.1126/science.298.5593.539bView ArticlePubMedGoogle Scholar
 Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30: 207–210. 10.1093/nar/30.1.207PubMed CentralView ArticlePubMedGoogle Scholar
 Long AD, Mangalam HJ, Chan BY, Tolleri L, Hatfield GW, Baldi P: Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. Analysis of global gene expression in Escherichia coli K12. J Biol Chem 2001, 276: 19937–19944. 10.1074/jbc.M010192200View ArticlePubMedGoogle Scholar
 Zhang HG, Hyde K, Page GP, Brand JP, Zhou J, Yu S, Allison DB, Hsu HC, Mountz JD: Novel tumor necrosis factor alpharegulated genes in rheumatoid arthritis. Arthritis Rheum 2004, 50: 420–431. 10.1002/art.20037View ArticlePubMedGoogle Scholar
 Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Staudt LM: Distinct types of diffuse large Bcell lymphoma identified by gene expression profiling [see comments]. Nature 2000, 403: 503–511. 10.1038/35000501View ArticlePubMedGoogle Scholar
 Kerr MK, Martin M, Churchill GA: Analysis of variance for gene expression microarray data. J Comput Biol 2000, 7: 819–837. 10.1089/10665270050514954View ArticlePubMedGoogle Scholar
 Kerr MK, Churchill GA: Statistical design and the analysis of gene expression microarray data. Genetic Research 2001, 77: 123–128. 10.1017/S0016672301005055Google Scholar
 Smyth GK, Yang YH, Speed TP: Statistical isssues in cDNA microarray data analysis. In Function Genomics: Methods and protocols. 1st edition. Edited by: Borwnstein MJ and Khodursky A. ToTowa, NJ, Humana Press; 2002:100–106.Google Scholar
 Page GP, Edwards JW, Barnes S, Weindruch R, Allison DB: A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hardmindedness. Nutrition 2003, 19: 997–1000. 10.1016/j.nut.2003.08.001View ArticlePubMedGoogle Scholar
 Allison DB, Gadbury GL, Heo M, Fernandez JR, Lee CK, Prolla TA, Weindruch R: A mixture model approach for the analysis of microarray gene expression data. Computational Statistics and Data Analysis 2002, 39: 1–20. 10.1016/S01679473(01)000469View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.