Sample size calculation for microarray experiments with blocked oneway design
 SinHo Jung^{1}Email author,
 Insuk Sohn^{1},
 Stephen L George^{1},
 Liping Feng^{2} and
 Phyllis C Leppert^{2}
DOI: 10.1186/1471210510164
© Jung et al; licensee BioMed Central Ltd. 2009
Received: 28 August 2008
Accepted: 28 May 2009
Published: 28 May 2009
Abstract
Background
One of the main objectives of microarray analysis is to identify differentially expressed genes for different types of cells or treatments. Many statistical methods have been proposed to assess the treatment effects in microarray experiments.
Results
In this paper, we consider discovery of the genes that are differentially expressed among K (> 2) treatments when each set of K arrays consists of a block. In this case, the array data among K treatments tend to be correlated because of block effect. We propose to use the blocked oneway ANOVA Fstatistic to test if each gene is differentially expressed among K treatments. The marginal pvalues are calculated using a permutation method accounting for the block effect, adjusting for the multiplicity of the testing procedure by controlling the false discovery rate (FDR). We propose a sample size calculation method for microarray experiments with a blocked oneway design. With FDR level and effect sizes of genes specified, our formula provides a sample size for a given number of true discoveries.
Conclusion
The calculated sample size is shown via simulations to provide an accurate number of true discoveries while controlling the FDR at the desired level.
Background
Clinical and translational medicine have benefited from genomewide expression profiling across two or more independent samples, such as various diseased tissues compared to normal tissue. DNA microarray is a high throughput biotechnology designed to measure simultaneously the expression level of tens of thousands of genes in cells. Microarray studies provide the means to understand the mechanisms of disease. However, various sources of error can influence microarray results [1]. Microarrays also present unique statistical problems because the data are high dimensional and are insufficiently replicated in many instances. Methods of adjustment for multiple testing therefore become extremely important. Multiple testing methods controlling the false discovery rate (FDR) [2] have been popularly used because they are easy to calculate and less strict in controlling the false positivity compared to the familywise error rate (FWER) control method [3].
Numerous sample size calculation methods have been proposed for comparing independent groups while controlling the FDR in designing microarray studies. Lee and Whitmore [4] considered comparing multiple groups using ANOVA models and derived the relationship between the effect sizes and the FDR using a Bayesian approach. Their power analysis does not address the multiple testing issue. Muller et al. [5] chose a pair of testing errors, including FDR, and minimized one while controlling the other at a specified level using a Bayesian decision rule. Jung [6] proposed a closed form sample size formula for a specified number of true rejections while controlling the FDR at a desired level. Pounds and Cheng [7] and Liu and Hwang [8] proposed similar sample size formulas which can be used for comparison of K independent samples. These methods are for the FDRcontrol methods based on independence or a weak dependency assumption among test statistics. Recently, Shao and Tseng [9] introduced an approach for calculating sample sizes for multiple comparisons accounting for dependency among test statistics.
In some studies, specimens for K treatments are collected from the same subject and means are compared across treatment groups. In this case, the gene expression data for the K treatments may be dependent since they share the same physiological conditions. For example, Feng et al. [10] conducted a study to discover the genes differentially expressed between center (C) and edge (E) of the uterine fibroid and the matched adjacent myometrium (M). In this study, specimens are taken from the three sites for each patient. The patients are blocks and the three sites (K = 3), C, E and M, are treatments (or groups) to be compared.
Since a set of K specimens are collected from each patient, we require a much smaller number of patients than a regular unblocked design. Furthermore, the observations within each block tend to be positively correlated, so that a blocked design requires a smaller number of arrays than the corresponding unblocked design just as a paired twosample design with a positive pairwise correlation requires a smaller number of observations than a two independent sample design. The more heterogeneous the blocks are, the greater the savings in number of arrays for the blocked design.
In this paper, we consider a nonparametric blocked Ftest statistic to compare the gene expression level among K dependent groups. We adjust for multiple testing and control the FDR by employing a permutation method. We propose a sample size calculation method for a specified number of true rejections while controlling the FDR at a specified level. Through simulations, we show that the blocked Ftest accurately controls the FDR using the permutation resampling method and the calculated sample size provides an accurate number of true rejections while controlling the FDR at the desired level. For illustration, the proposed methods are applied to the fibroid study [10] mentioned above.
Methods
Nonparametric block Ftest statistic
where, for gene i, μ_{ i }is the population mean, δ_{ ik }is a fixed treatment effect and the primary interest, γ_{ ij }is a random block effect, and ε_{ ijk }is a random error term. We assume that , γ_{i 1},..., γ_{ in }are independent and identically distributed (IID) with mean 0 and variance v_{ i }, (ε_{ ijk }, 1 ≤ j ≤ n, 1 ≤ k ≤ K) are IID with mean 0 and variance , and error terms and block effects are independent. The standard ANOVA theory using parametric F distributions to test the treatment effect assumes a normal distribution for ε_{ ijk }. However, in this paper, we avoid the normality assumption by using a permutation resampling method in testing and a largesample approximation in sample size calculation.
where , and . If the error terms are normally distributed, F_{ i }marginally has the F_{K1, (K1)(n1)}distribution under H_{ i }. The normality assumption can be relaxed if n is large.
Without the normality assumption, the joint null distribution of the statistics can be approximated using a block permutation method, where the array data sets for K treatments are randomly shuffled within each block: the permuted data may be represented as , where is a random permutation of (1,..., K). Note that there are (K!)^{ n }different permutations, among which (K!)^{n1}give different Fstatistic values. The R language package multtest [11] can be used to implement the permutationbased multiple testing procedure for blocked microarray data. We consider adjusting for the multiplicity of the testing procedure by controlling the FDR [12, 13].
Permutationbased multiple testing for FDRcontrol
 (i)
Compute the Ftest statistics (F_{1},..., F_{ m }) from the original data, (f_{1},..., f_{ m }).
 (ii)
From the bth permutation data (b = 1,..., B), compute the Ftest statistics .
 (iii)For gene i, estimate the marginal pvalue by
 (iv)For a chosen constant λ ∈ (0, 1), estimate the qvalue by
 (v)
For a specified FDR level q*, discover gene i (or reject H_{ i }) if q_{ i } < q*.
Sample size calculation
Let ℳ_{0} and ℳ_{1} denote the sets of indices of genes that are equally and differentially expressed, respectively, in K treatments, and { = δ_{ ik }/σ_{ i }, i ∈ ℳ_{1}, 1 ≤ k ≤ K} denote the standardized effect sizes for the differentially expressed genes. Let m_{0} and m_{1} = m  m_{0} denote the cardinalities of ℳ_{0} and ℳ_{1}, respectively.
where β_{ i }(α) = P(p_{ i }≤ α) is the marginal power of a single αtest applied to gene i ∈ ℳ_{1} and denotes the expected number of true rejections when we reject H_{ i }for p_{ i }<α, see Jung [6].
where is the noncentral Fdistribution with ν_{1} and ν_{2} degrees of freedom, and noncentrality parameter η. Note that, for i ∈ ℳ_{0}, and F_{ i }~F_{(K1),(K1)(n1)}(0) = F_{(K1),(K1)(n1)}, the central Fdistribution.
In summary, the sample size (i.e., number of blocks) n for r_{1} (≤ m_{1}) true rejections is calculated as follows, assuming that the error terms in model (1) are normally distributed.
Sample size calculation based on the noncentral Fdistribution
 (i)
Specify the input variables:

K = number of treatments;

m = total number of genes for testing;

m_{1} = number of genes differentially expressed in K treatments (m_{0} = m  m_{1});

{ , i ∈ ℳ_{1}, 1 ≤ k ≤ K} = standardized effect sizes for prognostic genes;

q* = FDR level;

r_{1} = number of true rejections
 (ii)Using the bisection method, solve
 (iii)
The required sample size is n blocks, or nK array chips.
with respect to n, where α* = r_{1}q*/{m_{0}(1  q*)}. In this equation, n appears only in the noncentrality parameter of the χ^{2} distributions.
with respect to n, where . The only difference between (8) and (9) is the standardized effect sizes, = δ_{ ik }/σ _{ i }and . The latter is always smaller than the former because of the variance among blocks, v_{ i }. If v_{ i }is large compared to the variance of experimental errors, , then a blocked oneway design requires much smaller number of arrays than an unblocked oneway design. Let n_{ u }and n_{ b }denote the sample sizes n calculated under an unblocked and a blocked design, respectively. If are constant f among the prognostic genes, then from (8) and (9), we have n_{ u }= (1 + f)n_{ b }. As an example, consider the design of the fibrosis study as discussed in Background Section and suppose that the variance of the block effects is half of that of measurement errors for the prognostic genes, i.e. f = 0.5. In this case, if a blocked design requires n_{ b }= 100 patients and 3n_{ b }= 300 array chips, then the corresponding unblocked design with a balanced allocation requires n_{ u }= 150 patients per group or a total 450 patients. For an unblocked design, the number of array chips is identical to that of patients, and compared to the blocked design, the unblocked design requires 1.5 times more chips and 4.5 times more patients.
Results and discussion
Simulations
Empirical FDR from N = 1000 simulations with B = 1000 permutations for each simulation data set
n  

m _{1} 
 q*  10  30  50 
40  (1, 0, 1)  0.05  0.1766  0.0921  0.0925 
0.1  0.1819  0.1647  0.1705  
0.2  0.2736  0.2462  0.2506  
0.3  0.3636  0.3478  0.3512  
0.4  0.4546  0.4449  0.4431  
0.5  0.5435  0.5389  0.5399  
(1, 2, 1)  0.05  0.0936  0.0899  0.0915  
0.1  0.1619  0.1663  0.1665  
0.2  0.2402  0.2498  0.2421  
0.3  0.3373  0.3469  0.3461  
0.4  0.4347  0.4481  0.4421  
0.5  0.5318  0.5446  0.5340  
200  (1, 0, 1)  0.05  0.0653  0.0573  0.0603 
0.1  0.1120  0.1093  0.1130  
0.2  0.2076  0.2105  0.2146  
0.3  0.3079  0.3086  0.3176  
0.4  0.4070  0.4056  0.4171  
0.5  0.5051  0.5013  0.5162  
(1, 2, 1)  0.05  0.0567  0.0554  0.0591  
0.1  0.1108  0.1079  0.1111  
0.2  0.2142  0.2061  0.2116  
0.3  0.3120  0.3052  0.3113  
0.4  0.4124  0.4049  0.4148  
0.5  0.5141  0.5010  0.5162 
For the simulations on sample size calculation, we set m = 4000; m_{1} = 40 or 200; number of treatment K = 3; treatment effects = (1/4, 0, 1/4) or (1/4, 1/2, 1/4) for i ∈ ℳ_{1}; γ_{ ij }~N (0, 0.5^{2}) and ϵ_{ ijk }~N (0. 1). We want the number of true rejections r_{1} to be 30%, 60% or 90% of m_{1} while controlling the FDR level at q* = 1%, 5% or 10%. For each design setting, we first calculate the sample size n based on the Fdistribution or the chisquare approximation, and then generate N = 1000 samples of size n under the same setting. From each simulation sample, the number of true rejections are counted while controlling the FDR at the specified level using λ = 0.95. The first, second and third quartiles, Q_{1}, Q_{2} and Q_{3}, of the observed true rejections, , are estimated from the 1000 simulation samples.
Q_{2} (Q_{1}, Q_{3})/n, where n is the sample size and Q_{ k }(k = 1, 2, 3) are the kth quartile of the empirical true rejections from N = 1000 simulations
Based on the chisquare approximation  

m _{1}  r _{1}  q* = 1%  5%  10% 
= (1/4, 0, 1/4)  
40  12  11 (9, 13)/123  10 (8, 13)/100  11 (8, 14)/90 
24  23 (20, 26)/166  23 (21, 26)/138  23 (21, 25)/125  
36  36 (34, 37)/242  36 (34, 37)/207  36 (35, 37)/191  
200  60  56 (49, 61)/100  55 (47, 61)/77  55 (49, 61)/67 
120  115 (109, 120)/138  118 (112, 124)/110  117 (110, 122)/96  
180  179 (176, 182)/207  178 (176, 182)/171  179 (175, 182)/154  
= (1/4, 1/2, 1/4)  
40  12  8 (6, 10)/41  8 (5, 10)/34  7 (5, 10)/30 
24  21 (19, 23)/56  21 (18, 24)/46  21 (19, 24)/42  
36  35 (33, 37)/81  35 (34, 36)/70  36 (34, 37)/64  
200  60  42 (36, 48)/34  41 (35, 47)/26  44 (36, 52)/23 
120  103 (98, 109)/46  108 (101, 114)/37  104 (98, 111)/32  
180  176 (173, 180)/70  177 (173, 180)/57  178 (174, 180)/52  
Based on the Fdistribution  
m _{1}  r _{1}  q* = 1%  5%  10% 
= (1/4, 0, 1/4)  
40  12  12 (10, 15)/129  12 (10, 14)/104  12 (10, 15)/94 
24  24 (21, 27)/171  25 (23, 27)/142  24 (22, 26)/129  
36  36 (35, 37)/246  36 (35, 38)/211  36 (35, 38)/194  
200  60  60 (55, 66)/104  61 (54, 66)/80  62 (54, 70)/70 
120  123 (117, 128)/142  122 (118, 128)/113  120 (114, 126)/99  
180  179 (177, 184)/211  180 (177, 183)/174  181 (178, 184)/157  
= (1/4, 1/2, 1/4)  
40  12  13 (10, 15)/47  13 (10, 15)/38  13 (10, 16)/34 
24  23 (21, 26)/60  25 (23, 27)/50  25 (23, 27)/46  
36  36 (35, 37)/86  35 (35, 38)/73  36 (34, 37)/67  
200  60  61 (55, 67)/38  66 (60, 72)/30  66 (59, 71)/26 
120  121 (116, 127)/50  123 (116, 128)/40  121 (116, 126)/35  
180  180 (177, 183)/73  181 (177, 184)/60  182 (178, 185)/55 
Example
We applied the permutationbased blocked oneway ANOVA and the sample size calculation method to the fibroid study discussed in the Background Section. From each patient, specimens are taken from two sites of fibroid tissue, center (C) and edge (E), and one normal myometrium (M). Five patients are accrued to the study. We regard the three sites as treatments (K = 3) and the patients as blocks (n = 5). mRNA was amplified and hybridized onto HGU133 GeneChips according to the protocols recommended by Affymetrix (Santa Clara, CA), and m = 54675 probe sets on the array were analyzed. Expression values were calculated using the Robust Multichip Average (RMA) method [14]. RMA estimates are based upon a robust average of background corrected PM intensities. Normalization was done using quantile normalization [15]. We filtered out all "AFFX" genes and genes for which there were 4 or fewer present calls (based on Affymetrix's present/marginal/absent (PMA) calls using mismatch probe intensity, the ratio of PM to MM). That is, a gene is included only if there are at least 3 present calls among the 15 PMA calls. Filtering yielded 30711 genes to be used in the subsequent analyses.
The result of unterine fibroid tissue and adjacent myometrium microarray experiment
parametric  nonparametric  

probe_set_id  Gene_Descriptor  pvalue  qvalue  pvalue  qvalue 
220273_at  interleukin 17B  0.0000  0.0000  0.0008  0.0131 
213479_at  neuronal pentraxin II  0.0000  0.0000  0.0015  0.0144 
210255_at  RAD51like 1 (S. cerevisiae)  0.0000  0.0000  0.0008  0.0131 
205833_s _at  prostate androgenregulated transcript 1  0.0000  0.0000  0.0077  0.0219 
229160_at  melanoma associated antigen (mutated) 1like 1  0.0000  0.0000  0.0008  0.0131 
1561122_a _at  RAD51like 1 (S. cerevisiae)  0.0000  0.0000  0.0046  0.0189 
210817_s _at  calcium binding and coiledcoil domain 2  0.0000  0.0000  0.0015  0.0144 
1553194_at  neuronal growth regulator 1  0.0000  0.0000  0.0008  0.0131 
202965_s _at  calpain 6  0.0000  0.0000  0.0108  0.0239 
204620_s _at  chondroitin sulfate proteoglycan 2 (versican)  0.0000  0.0000  0.0054  0.0196 
217287_s _at  transient receptor potential cation channel, subfamily C, member 6  0.0000  0.0000  0.0008  0.0131 
227875_at  kelchlike 13 (Drosophila)  0.0000  0.0000  0.0023  0.0156 
205286_at  transcription factor AP2 gamma (activating enhancer binding protein 2 gamma)  0.0000  0.0000  0.0046  0.0189 
242737_at  RAD51like 1 (S. cerevisiae)  0.0000  0.0000  0.0062  0.0206 
209965_s _at  RAD51like 3 (S. cerevisiae)  0.0000  0.0000  0.0008  0.0131 
202007_at  nidogen 1  0.0000  0.0000  0.0015  0.0144 
221731_x _at  chondroitin sulfate proteoglycan 2 (versican)  0.0000  0.0000  0.0077  0.0219 
244813_at  RAD51like 1 (S. cerevisiae)  0.0000  0.0000  0.0015  0.0144 
201310_s _at  chromosome 5 open reading frame 13  0.0000  0.0000  0.0008  0.0131 
210258_at  regulator of Gprotein signalling 13  0.0000  0.0000  0.0008  0.0131 
202589_at  thymidylate synthetase  0.0000  0.0000  0.0054  0.0196 
228766_at  gb:AW299226  0.0000  0.0000  0.0054  0.0196 
218380_at  NLR family, pyrin domain containing 1  0.0000  0.0000  0.0008  0.0131 
201417_at  SRY (sex determining region Y)box 4  0.0000  0.0000  0.0015  0.0144 
215972_at  Prostate androgenregulated transcript 1  0.0000  0.0000  0.0093  0.0231 
212942_s _at  KIAA1199  0.0000  0.0000  0.0046  0.0189 
202966_at  calpain 6  0.0000  0.0000  0.0108  0.0239 
205943_at  tryptophan 2,3dioxygenase  0.0000  0.0000  0.0015  0.0144 
213668_s _at  SRY (sex determining region Y)box 4  0.0000  0.0000  0.0015  0.0144 
219454_at  EGFlikedomain, multiple 6  0.0000  0.0000  0.0008  0.0131 
235503_at  ankyrin repeat and SOCS boxcontaining 5  0.0000  0.0000  0.0069  0.0212 
222834_s _at  guanine nucleotide binding protein (G protein), gamma 12  0.0000  0.0000  0.0008  0.0131 
210198_s _at  proteolipid protein 1 (PelizaeusMerzbacher disease, spastic paraplegia 2, uncomplicated)  0.0000  0.0000  0.0015  0.0144 
220565_at  chemokine (CC motif) receptor 10  0.0000  0.0000  0.0008  0.0131 
237671_at  RAD51like 1 (S. cerevisiae)  0.0000  0.0000  0.0093  0.0231 
201220_x _at  Cterminal binding protein 2  0.0000  0.0000  0.0039  0.0180 
217771_at  golgi phosphoprotein 2  0.0000  0.0000  0.0015  0.0144 
224002_s _at  FK506 binding protein 7  0.0000  0.0000  0.0008  0.0131 
213170_at  glutathione peroxidase 7  0.0000  0.0000  0.0008  0.0131 
211980_at  collagen, type IV, alpha 1  0.0000  0.0000  0.0031  0.0167 
211981_at  collagen, type IV, alpha 1  0.0000  0.0000  0.0031  0.0167 
212282_at  transmembrane protein 97  0.0000  0.0000  0.0008  0.0131 
2013090_x _at  chromosome 5 open reading frame 13  0.0000  0.0000  0.0015  0.0144 
211917_s _at  prolactin receptor///prolactin receptor  0.0000  0.0000  0.0008  0.0131 
212281_s _at  transmembrane protein 97  0.0000  0.0001  0.0008  0.0131 
231930_at  ELMO/CED12 domain containing 1  0.0000  0.0001  0.0123  0.0248 
205347_s _at  thymosinlike 8  0.0000  0.0001  0.0015  0.0144 
223571_at  C1q and tumor necrosis factor related protein 6  0.0000  0.0001  0.0015  0.0144 
204619_s _at  chondroitin sulfate proteoglycan 2 (versican)  0.0000  0.0001  0.0046  0.0189 
231741_at  endothelial differentiation, sphingolipid Gproteincoupled receptor, 3  0.0000  0.0001  0.0054  0.0196 
The results of our analysis of the two sites of fibroid tissue, center and edge, compared to the normal myometrium using a blocked oneway design suggest that reduced FDR provides an enhanced approach to clinical microarray studies. Our findings are consistent with previously reported genomewide profiling studies [17, 18]. We believe that these results support the hypothesis that uterine fibroids develop through altered wound healing signaling pathways leading to tissue fibrosis [19, 20]. Using the method described in this paper, genes differentially overexpressed in the fibroid tissue compared to myometrium are related to extracellular matrix (ECM) and ECM regulation such as collagen IV, alpha 1, versican (chondroitin sulfated 2) and IL17β [21]. IL17β, a cellcell signaling transducer has been reported to enhance MMP secretion and to rapidly induce phosphorylation of the extracellular signalrelated kinases (ERK) 1/2 and p38MAPK in colonic myofibroblasts and has been shown to stimulate MMP1 expression in cardiac fibroblasts through ERK 1/2 and p38 MAPK [22, 23]. Thus IL17β is important in remodelling of the extracellular matrix. According to our analysis, RAD51like 1, a recombinational repair gene, is also overexpressed in fibroids, which is consistent with a report that RAD51B is the preferential translocation partner of high mobility group protein gene (HMGIC) in uterine leiomyomas [24]. HMGIC codes for a protein that is a nonhistone DNA binding factor that is expressed during development in embryonic tissue and is an important regulator of cell growth, differentiation and transformation as well as apoptosis [25]. Arrest of apoptosis appears to be a hallmark of uterine fibroids, a finding that is characteristic of altered wound healing as well [19]. HMGIC appears to play a role in the development of uterine fibroids [19, 26, 27].
Suppose that we want to design a new fibroid study using the data analyzed above as pilot data. In the sample size calculation, we set m = 30, 000. We assume that the m_{1} = 50 genes which were selected as the top 50 genes in terms of parametric pvalue are differentially expressed in the three sites (K = 3). From the pilot data, we estimate the standardized treatment effect δ_{ ik }. For illustration, the effect sizes of these m_{1} = 50 genes are taken to be δ_{ ik }= 0.1 . We need n = 15 patients (blocks) to discover 90% of the prognostic genes, i.e. r_{1} = [0.9 × 50] = 45, while controlling the FDR at q* = 5% level. In a simulation study, we generated N = 1000 microarray data sets of size n = 15 under this design setting. With q* = 0.05, we observed the quartiles Q_{2}(Q_{1}, Q_{3}) = 46(45, 47) from the empirical distribution of the observed true rejections.
Conclusion
We have considered studies where microarray data for K treatment groups are collected from the same subjects (blocks). We discover the genes differentially expressed among K groups using nonparametric Fstatistics for blocked oneway ANOVA while controlling the FDR. We employ a permutation method to generate the null distribution of the Fstatistics without a normal distribution assumption for the gene expression data. The permutationbased multiple testing procedure can be easily modified for controlling the familywise error rate, see e.g. Westfall and Young [28] and Jung et al. [29].
We propose a simple sample size calculation method to estimate the required number of subjects (blocks) given the total number of genes m, number of differentially expressed genes m_{1} and their standardized effect sizes ( , 1 ≤ i ≤ m_{1}, 1 ≤ k ≤ K) and the number of true rejections r_{1} at a specified FDR level q*. Through simulations and analysis of a real data set, we found that the permutationbased analysis method controls the FDR accurately and the sample size formula performs accurately. While we specify the individual effect sizes for the prognostic genes, some investigators [30, 31] use a mixture model for the marginal pvalues by specifying a distribution for the effect sizes among m genes.
Glueck et al. [32] propose an exact calculation of average power for the BenjaminiHochberg [2] procedure for controlling the FDR. Their formula may is useful for deriving sample sizes when the test statistics are independent and the number of hypotheses m is small. However, it is not appropriate for designing a microarray study with a large number of dependent test statistics.
A sample size calculation program in R is available from http://www.duke.edu/~is29/BlockANOVA/.
Appendix
We want to prove that F_{ i }converges to in distribution regardless of the normal distribution assumption on ϵ_{ ijk }and γ_{ ij }. We only assume that . The following is one of key lemmas used to derive the distribution of the Fstatistics in the standard ANOVA theory, see e.g. Section 3b.4 of Rao [33].
Lemma: Suppose that, for k = 1,..., K, z_{ k }are independent N (μ_{ k }, 1) random variables and A is an idempotent K × K matrix with rank ν. Let z= (z_{1},..., z_{ K })^{ T }and μ= (μ_{1},..., μ_{ K })^{ T }. Then, .
where and . By the strong law of large numbers, we have , and almost surely (a.s.).
Let and . Then, z_{1},..., z_{ K }are independent and, by the central limit theorem, z_{ k }is approximately . Let I be the K × K identity matrix, 1 = (1,..., 1)^{ T }the K × 1 vector with components 1, z= (z_{1},..., z_{ K })^{ T }A = I  K^{1} 11^{ T }. Note that A is an idempotent matrix with rank K  1 and , where . Then, is approximately distributed as by the lemma. Since , is approximately distributed as . By combining this result with (A.1) using the Slutsky's theorem, we complete the proof.
Declarations
Acknowledgements
We are grateful to Holly Dressler and the staff of the Duke Microarray Facility for their assistance in the conduct of the genomewide expression profiling. Funding in part was provided by NIH 1UL1RR024128 and Department of Obstetrics and Gynecology Duke University.
Authors’ Affiliations
References
 Catherino WH, Leppert PC, Segars JH: The promise and perils of microarray analysis. Am J Obstet Gynecol 2006, 195: 389–393. 10.1016/j.ajog.2006.02.035View ArticlePubMedGoogle Scholar
 Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. JR Statist Soc B 1995, 57: 289–300.Google Scholar
 Westfall PH, Wolfinger RD: Multiple tests with discrete distributions. American Statistician 1997, 51: 3–8. 10.2307/2684683Google Scholar
 Lee MLT, Whitmore GA: Power and sample size for DNA microarray studies. Stat Med 2002, 22: 3543–3570. 10.1002/sim.1335View ArticleGoogle Scholar
 Muller P, Parmigiani G, Robert C, Rousseau J: Optimal sample size for multiple testing: the case of gene expression microarrays. J Am Stat Assoc 2004, 99: 990–1001. 10.1198/016214504000001646View ArticleGoogle Scholar
 Jung SH: Sample size for FDRcontrol in microarray data analysis. Bioinformatics 2005, 21: 3097–3103. 10.1093/bioinformatics/bti456View ArticlePubMedGoogle Scholar
 Pounds S, Cheng C: Sample size determination for the false discovery rate. Bioinformatics 2005, 21: 4263–4271. 10.1093/bioinformatics/bti699View ArticlePubMedGoogle Scholar
 Liu P, Hwang JTG: Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 2007, 23: 739–746. 10.1093/bioinformatics/btl664View ArticlePubMedGoogle Scholar
 Shao Y, Tseng CH: Sample size calculation with dependence adjustment for FDRcontrol in microarray studies. Stat Med 2007, 26: 4219–4237. 10.1002/sim.2862View ArticlePubMedGoogle Scholar
 Feng L, Walls M, Sohn I, Behera M, Jung SH, Leppert PC: Novel approach to the analysis of genomewide expression profiling. Reproductive Sciences 2008, 15: 298A.Google Scholar
 Ge Y, Dudoit S, Speed TP: Resamplingbased multiple testing for microarray data hypothesis. Test 2003, 12: 1–44. 10.1007/BF02595811View ArticleGoogle Scholar
 Storey JD: A direct approach to false discovery rates. J of Roy Stat Soc Ser B 2002, 64: 479–498. 10.1111/14679868.00346View ArticleGoogle Scholar
 Jung SH, Jang W: How accurately can we control the FDR in analyzing microarray data? Bioinformatics 2006, 22: 1730–1736. 10.1093/bioinformatics/btl161View ArticlePubMedGoogle Scholar
 Irizarry RA, Hobbs B, Collin F, BeazerBarclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics 2003, 4: 249–264. 10.1093/biostatistics/4.2.249View ArticlePubMedGoogle Scholar
 Bolstad BM, Irizarry RA, Astrand M, Speed TP: A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Bias and Variance. Bioinformatics 2003, 19: 185–193. 10.1093/bioinformatics/19.2.185View ArticlePubMedGoogle Scholar
 Hartigan JA: Clustering Algorithms. Wiley: New York; 1975.Google Scholar
 Catherino WH, Prupas C, Tsibris JC, Leppert PC, Payson M, Nieman LK, Segars JH: Strategy for elucidating differentially expressed genes in leiomyomata identified by microarray technology. Fert Steril 2003, 80: 282–290. 10.1016/S00150282(03)009531View ArticleGoogle Scholar
 Skubitz K, Skubitz AP: Differential gene expression in uterine leiomyoma. Journal of Laboratory and Clinical Medicine 2003, 141: 279–308. 10.1016/S00222143(03)000076View ArticleGoogle Scholar
 Leppert PC, Catherino WH, Segars JH: A new hypothesis about the origin of uterine fibroids based on gene expression profiling with microarrays. Am J Obstet Gynecol 2006, 195: 415–420. 10.1016/j.ajog.2005.12.059PubMed CentralView ArticlePubMedGoogle Scholar
 Rogers R, Norian J, Malki M, Christman G, AbuAsab M, Chen F, Korecki C, Iatridis J, Catherino WH, Dhillon N, Leppert P, Segars JH: Mechanical homeostatis is altered in uterine leiomyoma. Am J Obstet Gynecol 2008, 198: 474.e1–11. 10.1016/j.ajog.2007.11.057View ArticleGoogle Scholar
 Malik M, Webb J, Catherino WH: Retinoic acid treatment of human leiomyoma cells transformed the cell phenotype to one strongly resembling myometrial cells. Clin Endocrinol (Oxf) 2008, in press.Google Scholar
 Yagi Y, Andoh A, Inatomi O, Tsujikawa T, Fujiyama Y: Inflammatory responses induced by interleukin17 family members in human colonic subepithelial myofibroblasts. J Gastroenterol 2007, 42: 746–753. 10.1007/s0053500720913View ArticlePubMedGoogle Scholar
 Cortez DM, Feldman MD, Mummidi S, Valente AJ, Steffenses B, Vincenti M, Barnes J, Chandrasekar B: IL17 stimulates MMP1 expression in primary human cardiac fibroblasts via p38 MAPK and ERK 1/2dependent C/EBP, NFkB, and AP1 activation. Am J Physiol Heart Circ Physiol 2007, 293: 3356–3365. 10.1152/ajpheart.00928.2007View ArticleGoogle Scholar
 Schoenmakers EFPM, Huysmanns C, Ven WJM: Allelic knockout of novel spice variants of human recombination repair gene RAD51B in t(12;14) uterine leiomyomas. Cancer Res 1999, 59: 19–23.PubMedGoogle Scholar
 Reeves R: Molecular biology of HMGA proteins: hubs of nuclear function. Gene 2001, 277: 63–81. 10.1016/S03781119(01)006898View ArticlePubMedGoogle Scholar
 Gatas GJ, Quade BJ, Nowak RA, Morton CC: HMGIC expression in human adult and fetal tissues and in uterine leimomyomata. Genes Chromosomes and cancer 1999, 25: 316–322. Publisher Full Text 10.1002/(SICI)10982264(199908)25:4<316::AIDGCC2>3.0.CO;20View ArticleGoogle Scholar
 Peng Y, Laser J, Shi G, Mittal K, Melamed J, Lee P, Wei JJ: Antiproliferative effects by Let7 repression of highmobility group A2 in uterine leiomyoma. Mol Cancer Res 2008, 6: 663–673. 10.1158/15417786.MCR070370View ArticlePubMedGoogle Scholar
 Westfall PH, Young SS: Resamplingbased Multiple Testing: Examples and Methods for Pvalue Adjustment. Wiley: New York; 1993.Google Scholar
 Jung SH, Bang H, Young S: Sample size calculation for multiple testing in microarray data analysis. Biostatistics 2005, 6: 157–169. 10.1093/biostatistics/kxh026View ArticlePubMedGoogle Scholar
 Hu J, Zou F, Wright FA: Practical FDRbased sample size calculations in microarray experiments. Bioinformatics 2005, 21: 3264–3272. 10.1093/bioinformatics/bti519View ArticlePubMedGoogle Scholar
 Jørstad TS, Midelfart H, Bones AM: A mixture model approach to sample size estimation in twosample comparative microarray experiments. BMC Bioinformatics 2008, 9: 117. 10.1186/147121059117PubMed CentralView ArticlePubMedGoogle Scholar
 Glueck DH, Mandel J, KarimpourFard A, Hunter L, Muller K: Exact calculations of average power for the BenjaminiHochberg procedure. International Journal of Biostatistics 2008, 4: Article 11.View ArticlePubMedGoogle Scholar
 Rao CR: Linear Statistical Inference and Its Applications. Wiley: New York; 1965.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.