# MetSizeR: selecting the optimal sample size for metabolomic studies using an analysis based approach

- Gift Nyamundanda
^{1}, - Isobel Claire Gormley
^{1}Email author, - Yue Fan
^{2}, - William M Gallagher
^{2}and - Lorraine Brennan
^{3}

**14**:338

**DOI: **10.1186/1471-2105-14-338

© Nyamundanda et al.; licensee BioMed Central Ltd. 2013

**Received: **7 June 2013

**Accepted: **13 November 2013

**Published: **21 November 2013

## Abstract

### Background

Determining sample sizes for metabolomic experiments is important but due to the complexity of these experiments, there are currently no standard methods for sample size estimation in metabolomics. Since pilot studies are rarely done in metabolomics, currently existing sample size estimation approaches which rely on pilot data can not be applied.

### Results

In this article, an analysis based approach called MetSizeR is developed to estimate sample size for metabolomic experiments even when experimental pilot data are not available. The key motivation for MetSizeR is that it considers the type of analysis the researcher intends to use for data analysis when estimating sample size. MetSizeR uses information about the data analysis technique and prior expert knowledge of the metabolomic experiment to simulate pilot data from a statistical model. Permutation based techniques are then applied to the simulated pilot data to estimate the required sample size.

### Conclusions

The MetSizeR methodology, and a publicly available software package which implements the approach, are illustrated through real metabolomic applications. Sample size estimates, informed by the intended statistical analysis technique, and the associated uncertainty are provided.

## Background

In many metabolomic experiments, one of the most important objectives is to discover the set of metabolites that plays a significant role in distinguishing samples from two different groups or populations and thus, in the identification of novel biomarkers [1]. As in any experiment, designing the experiment is critical if reliable statistically significant metabolites are to be obtained. Since metabolomic experiments are expensive, it is crucial to determine the optimal sample size $\widehat{n}$ to attain the desired power to identify discriminating metabolites without wasting resources or unnecessarily sampling many subjects. However, metabolomic data are typically high dimensional and correlated meaning sample size estimation using classical statistical methods is not straight forward.

Currently, in the metabolomics literature, there is no standard method for the determination of sample size when designing a metabolomic experiment. Several methods currently exist in the literature for sample size selection in the high-dimensional data setting [2-5]. However, none of these methods are suitable for metabolomic experiments since many either assume variables have equal variance or are independent. More importantly, these methods rely on the existence of some experimental pilot data on which the actual sample size selection is then based, and are not based on the method to be used to analyze the data. In metabolomic studies, experimental pilot data are rarely available, making such sample size selection approaches redundant.

Intuitively the MetSizeR method can be naturally extended to include other analysis approaches, assuming they are based on a statistical model rather than being non-parametric in nature.

MetSizeR draws on two currently existing methods (see [2] and [3]) for sample size calculation in high-dimensional data settings. While the approach in [3] is based on the existence of an experimental pilot data set, the approach detailed in [2] simulates pilot data from a statistical model. Further, while independence in the data is assumed in [2], the approach in [3] uses permutation methods to account for the correlation in the experimental pilot data. MetSizeR combines these ideas of prior simulation and permutation based techniques to estimate the sample size for metabolomic experiments. The main advantage of the developed approach is its ability to determine sample size without experimental pilot data and without assuming variable independence.

A graphic user interface (GUI) software called MetSizeR was developed to implement this approach to estimating sample sizes in R [9]. Effort was focused on designing the interface of MetSizeR to encourage its wide use in the metabolomics community regardless of previous knowledge of R. The software is available through the **R** statistical software environment http://www.r-project.org.

## Methods

Metabolomic data sets are typically acquired using analytical technologies such as nuclear magnetic resonance spectroscopy (NMR) [10] and mass spectrometry (MS) [11]. The spectrum resulting from NMR spectroscopy is usually divided into spectral bins (representing variables) and the signal intensities within the bins are related to the relative abundances of metabolites. MS is typically used for targeted metabolomics in which a specified list of metabolites is measured [12]. The following section describes how the number of samples required for either an NMR or an MS metabolomic experiment can be determined under the MetSizeR approach.

### Sample size estimation

Let ${\stackrel{\u0304}{x}}_{\mathit{\text{jg}}}$ be the estimate of the average signal intensity *μ*_{
jg
} for metabolite *j* in samples from the treatment group *g* which has corresponding sample size *n*_{
g
}, where *g* = 1, 2. Often in metabolomics, the goal of discovering a set of metabolites that discriminates between samples from two treatment groups is achieved by testing the hypothesis H_{
oj
} : *μ*_{j1} - *μ*_{j2} = 0, on each metabolite *j*, where *j* = 1,…, *p*. The aim of discovering discriminating metabolites can be framed as a multiple testing problem as there are *p* hypotheses to be tested and the probability of falsely declaring a metabolite as significant increases with *p*. It is therefore important to estimate sample size while controlling an error rate to improve the power of the test for identifying significant metabolites. MetSizeR focuses on controlling the false discovery rate (FDR, [13]). Here, the FDR is the expected number of metabolites incorrectly deemed to be significantly different between the two treatment groups, as a proportion of the total number of metabolites declared to be significant.

#### The test statistic and its distribution

*t*-statistic. The

*t*-statistic

*TS*is evaluated for all metabolites,

*j*= 1,…,

*p*, under the assumption that the null hypothesis of no difference

*μ*

_{j1}=

*μ*

_{j2}is true:

where *S*_{
j
} is the estimate of the pooled standard error for metabolite *j*. The corresponding within treatment variability estimate is ${s}_{\mathit{\text{jg}}}^{2}={({n}_{g}-1)}^{-1}{\sum}_{i=1}^{{n}_{g}}{({x}_{(\mathit{\text{jg}})i}-{\stackrel{\u0304}{x}}_{\mathit{\text{jg}}})}^{2}$ for *g* = 1, 2 where *x*_{(j g)i} denotes the signal intensity for metabolite *j* in sample *i* from the treatment group *g*. A correction factor *cf* is a small positive value added to the standard error of each metabolite to prevent some metabolites with signal intensity near zero from having large test statistic *TS*_{
j
}; such a metabolite may have *TS*_{
j
} ≈ 0/0.

The typical assumption about the null distribution (i.e. the distribution under the null hypotheses) of the test statistic *TS*_{
j
} is the *t*-distribution with *n*_{1} + *n*_{2} - 2 degrees of freedom. However, when the data violate such an assumption, misleading sample size estimates would result. Hence, as in [3], MetSizeR estimates the null distribution of *TS*_{
j
} using a permutation technique. This is a non-parametric method based on the assumption that under the null hypothesis of no difference, the distribution of the test statistic does not change no matter how the group labels of the pilot data are permuted. The data generated using this approach maintains the between subject variability and the amount of noise in the data. The null distribution of the test statistic *TS* is estimated by randomly permuting the group labels of pilot data and calculating the test statistic for each metabolite, *TS*_{
jt
}, where *t* = 1,…, *T* permutations.

#### Analysis based data simulation

where **x** is the *n* × *p* data matrix, **u** denotes the latent variables, and *θ* is a vector of unknown model parameters. Simulating from the marginal model is achieved by first generating values of the parameters and the latent variables from the prior distribution *p*(**u**, *θ*), and then simulating the data from the assumed model *p*(**x**|**u**, *θ*) given the simulated values of **u** and *θ*.

*p*(

**x**|

**u**,

*θ*) to analyse the data from their metabolomic experiment - either the PPCA, PPCCA or DPPCA model. Simulation of the parameters of these models is based on the model assumptions and on prior expert knowledge of metabolomic data properties. As PPCA is equivalent to the widely used Principal Components Analysis (PCA) method, simulating from the PPCA model is discussed here; details of the simulation of pilot data from the closely related PPCCA and DPPCA models are provided in the Additional file 1. Specifically, PPCA is a probabilistic formulation of PCA based on a Gaussian latent variable model [6, 7]. PPCA models the high dimensional spectrum ${\underline{x}}_{i}^{T}=({x}_{i1},\dots ,{x}_{\mathit{\text{ip}}})$ of subject

*i*(

*i*= 1,…,

*n*where

*n*=

*n*

_{1}+

*n*

_{2}) as a linear function of the corresponding low dimensional latent variable ${\underline{u}}_{i}^{T}=({u}_{i1},\dots ,{u}_{\mathit{\text{iq}}})$, where (

*q*≪

*p*). The PPCA model can be expressed as follows

**W**is a

*p*×

*q*loadings matrix, $\underline{\mu}$ is a mean vector and ${\underline{\epsilon}}_{i}$ is multivariate Gaussian noise for observation

*i*, i.e. $p({\underline{\epsilon}}_{i})={\text{MVN}}_{p}(\underline{0},{\sigma}^{2}\mathbf{\text{I}})$ where

**I**denotes the identity matrix. The latent variable ${\underline{u}}_{i}$ is also multivariate Gaussian distributed, $p({\underline{u}}_{i})={\text{MVN}}_{q}(\underline{0},\mathbf{\text{I}})$. The maximum likelihood estimates of the loadings matrix

**W**and the latent variable

**u**in the PPCA model are equivalent to the traditional PCA loadings matrix and principal component scores. For a given sample size

*n*, pilot data

**x**can be simulated from the PPCA model as follows:

- 1.Generate parameter values from their prior distributions:$\begin{array}{l}\phantom{\rule{.3em}{0ex}}p({\underline{u}}_{i})={\text{MVN}}_{q}(\underline{0},\mathbf{\text{I}})\phantom{\rule{2.22144pt}{0ex}}\phantom{\rule{2.22144pt}{0ex}}\text{for}\phantom{\rule{2.22144pt}{0ex}}\phantom{\rule{2.22144pt}{0ex}}i=1,\dots n.\\ p({\underline{w}}_{j})={\text{MVN}}_{q}({\underline{\mu}}_{W},{\mathbf{\Sigma}}_{W})\phantom{\rule{2.22144pt}{0ex}}\phantom{\rule{2.22144pt}{0ex}}\text{for}\phantom{\rule{2.22144pt}{0ex}}\phantom{\rule{2.22144pt}{0ex}}j=1,\dots p.\\ p({\sigma}^{2})=\text{IG}({\alpha}_{1},{\alpha}_{2})\end{array}$
- 2.Given the generated model parameters and latent variables the pilot data
**x**are then simulated from the PPCA model:$p({\underline{x}}_{i}|{\underline{u}}_{i},\mathbf{W},{\sigma}^{2})={\text{MVN}}_{p}(\mathbf{W}{\underline{u}}_{i},{\sigma}^{2}\mathbf{I})\phantom{\rule{2.22144pt}{0ex}}\phantom{\rule{2.22144pt}{0ex}}\text{for}\phantom{\rule{2.22144pt}{0ex}}\phantom{\rule{2.22144pt}{0ex}}i=1,\dots ,n.$

Estimating sample size based on pilot data simulated in this way ensures the estimated sample size is firmly dependent on the type of model being used to analyse the real experimental metabolomic data. Hence, MetSizeR represents an analysis based approach to sample size estimation for metabolomic studies. The specific steps involved in the MetsizeR algorithm are detailed in the next section.

#### The MetSizeR algorithm

The MetSizeR procedure for sample size estimation starts with a number *ntry* of different sample sizes and a user-specified FDR (denoted by *target.fdr*). It then searches for the optimal sample size $\widehat{n}$ by estimating the FDR for each of the *ntry* sample sizes. In order to estimate FDR for each sample size, the null distribution of the test statistics of all metabolites is estimated and then a shift constant is added to the test statistics of some *p*_{
o
} metabolites to allow them to be truly significant. The null distribution is estimated by calculating the test statistics of the permuted pilot data. After obtaining the critical values of the null distribution, the FDR is estimated. The optimal sample size $\widehat{n}$ is then set to be the sample size with FDR equal to *target.fdr*.

- 1.
Specify the input parameters which include the desired level of FDR (

*target.fdr*), the expected proportion*m*of significant metabolites and the model to be used when analyzing the observed metabolomic data. - 2.
Simulate pilot data of sample size

*n*_{ k }from the assumed analysis model, where*k*= 1,…,*ntry*. Pilot data simulation from the PPCA model is detailed in the previous Section; the Additional file 1 details pilot data simulation from the PPCCA and DPPCA models. - 3.
Estimate the null distribution for all metabolites by randomly permuting the group labels of the simulated pilot data and computing the test statistic

*TS*_{ jt }for each metabolite*j*and each permuted data set*t*for*T*permutations. - 4.Estimate the FDR for each permuted data set
*t*= 1,…,*T*:- (a)
Consider the corresponding

*p*-vector of the test statistics ${\underline{\mathit{\text{TS}}}}_{t}=({\mathit{\text{TS}}}_{1t},{\mathit{\text{TS}}}_{2t},\dots ,{\mathit{\text{TS}}}_{\mathit{\text{pt}}})$ for all metabolites on permutation*t*. - (a)
Randomly sample

*p*_{ o }=*m*×*p*of the test statistics ${\underline{\mathit{\text{TS}}}}_{t}$ and add $\frac{\delta}{{\varrho}_{\mathit{\text{jt}}}(\sqrt{\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}})}$ to their intensities. This allows*p*_{ o }metabolites to be truly significant. Here,*δ*is the effect size, and*ϱ*_{ jt }is the true within group standard deviation estimated by $\frac{{S}_{\mathit{\text{jt}}}}{\sqrt{\frac{1}{{n}_{1}}+\frac{1}{{n}_{2}}}}$. - (a)
A cut off point

*crit*is set to be the ${p}_{o}^{\mathit{\text{th}}}$ largest absolute value of the test statistics ${\underline{\mathit{\text{TS}}}}_{t}$. All metabolites with |*TS*_{ jt }| >*crit*are declared as significant. The FDR for permutation*t*can then be calculated.

- (a)
- 5.
Estimate the FDR for data simulation

*s*by taking the 50^{ th }percentile of the FDR values of 1,…,*T*permutations. - 6.
Repeat steps 2 to 5 for

*s*= 1,…,*SIM*simulations and report the 10^{ th }, 50^{ th }and 90^{ th }percentiles of the FDR values for sample size*n*_{ k }. - 7.
Repeat steps 2 to 6 for

*k*= 1,…,*ntry*different sample sizes and select the optimal sample size $\widehat{n}$ as the*n*_{ k }with FDR =*target.fdr*.

The total number of permutations *T* used to estimate the sampling distribution of the test statistics *TS* was chosen to be twenty. In the samr R package [3] 20 permutations were used to estimate the null distribution and they give accurate estimates of the FDR. Here, the value of the effect size *δ* is chosen based on the variance of the underlying model. The optimal sample size $\widehat{n}$ is estimated by predicting the sample size at *target.fdr* using a simple linear regression model on values of FDR above and below the *target.fdr* with their corresponding sample sizes *n*_{
k
}. The estimated sample size by MetSizeR ensures that the power or the confidence level in statistical tests reaches (1-*target.fdr*).

#### Parameter specification: details and guidelines

The MetSizeR algorithm requires the specification of several parameters; some are parameters relevant to the intended analysis model, and some are parameters relevant to the sample size estimation procedure itself.

In terms of the MetSizeR GUI which has been developed, the user is requested to specify parameters specific to the sample size estimation procedure i.e. the number of bins in the NMR or MS spectrum, the expected proportion of significant bins, the target FDR and the minimum sample size they wish to be considered. The default settings of these parameters are 200 spectral bins, 20% significantly different bins, a target FDR of 5% and a minimum sample size of 4. The choice of the number and proportion of significantly different spectral bins will naturally be informed by the metabolomic practitioner’s knowledge, as will the minimum sample size choice. For the target FDR, again this depends on the conservatism of the metabolomic practitioner and/or the research question of interest, but a FDR of 5% is indicative of typical statistical practice. The user can easily re-run the MetSizeR algorithm for different settings of these parameters to ascertain the influence of their particular specifications. However, within the MetSizeR GUI the user has the option of requesting plots of the expected proportion of significant bins versus the FDR, over different sample sizes, giving insight to the influence of this particularly influential parameter on sample size estimation. Regarding the specification of parameters relevant to the intended analysis model, in the MetSizeR GUI, the user is only required to specify the intended analysis model (PPCA, PPCCA or DPPCA), and in the case of PPCCA, the number of covariates to be included. Both of these decisions are again practitioner informed, depending on the particular experiment under consideration. The MetSizeR manual, available through the developed MetSizeR GUI, guides the user through these parameter specification steps using a number of illustrative examples.

The remaining parameters in the MetSizeR algorithm have been fixed within the **R** code underlying the MetSizeR GUI, but given the open source nature of **R**, these can be changed by the user if desired. In the context of the PPCA model discussed above the hyperparameters of the prior distributions of the loadings matrix **W** and the variance *σ*^{2} are based on previous estimates of **W** and *σ*^{2} from applications of PPCA to metabolomic data (eg. [7, 8]). Each row of the loadings matrix **W** is simulated from a standard multivariate Gaussian distribution MVN${}_{q}(\underline{0},\mathbf{\text{I}})$ and the noise variance *σ*^{2} is simulated from an inverse gamma distribution with shape parameter *α*_{1} = 3 and scale parameter *α*_{2} = 4. Hyperparameter settings for the PPCCA and DPPCA models are detailed in the Additional file 1. Within the MetSizeR algorithm four final parameters are specified: the effect size *δ* (fixed at 2.3, the 99th quantile of the assumed prior distribution of the loadings), the correction factor *cf* (fixed as the fifth percentile of the estimated standard errors of all metabolites), the number of permutations *T* (set to 20) and the number of simulations *SIM* (set at 20). These specifications are based on the choices in [3, 5, 14] in similar sample size estimation settings.

## Results

This section illustrates the application of MetSizeR to different metabolomic experimental settings. In the first section, MetSizeR is employed to estimate sample size in the setting where experimental pilot data are not available; the second section considers the case where experimental pilot data are available.

### Sample size estimation using simulated pilot data

*n*

_{1}= 5 and

*n*

_{2}= 5). All other MetSizeR parameters are set at their default values, as detailed in the previous section. The MetSizeR method was then applied, and the 10

^{ th }, 50

^{ th }and 90

^{ th }percentiles of the FDR were calculated across a range of sample sizes and are shown in Figure 1. The sample size at which the target FDR of 5% was achieved was estimated to be 30 with 15 in each treatment group as shown in Figure 1(A).

The expected proportion of significant spectral bins specified by the user impacts on the estimated number of samples required for the metabolomic experiment. Figures 1(B), 1(C) and 1(D) demonstrate the effect on FDR of varying the expected proportion of significant spectral bins for three different sample sizes. The figures show that, increasing the expected proportion of significant spectral bins reduces the FDR.

Figure 2(B) illustrates a third example of the setting where no experimental pilot data are available and the practitioner aims to conduct a longitudinal metabolomic experiment. The pilot data for this example are simulated from the DPPCA model; the data are simulated by only focusing on the first time point of the experiment as it is expected that the same number of subjects will be followed over time and that, while there may be dropouts, the largest number of subjects will be present at the first time point. Figure 2(B) shows that the expected number of samples required for a longitudinal study of 300 spectral bins with 20% significant bins and a target FDR of 5%, is 24 with 12 samples from each treatment group.

### Sample size estimation with experimental pilot data

In a situation where experimental pilot data are available, parameter estimates used for simulations are based on fitting the underlying model to the experimental pilot data. Here, the application of MetSizeR is illustrated using real metabolomic data sets as experimental pilot data.

The approach developed here for sample size estimation is not limited to NMR data. The method has been developed to accept data from targeted metabolomic analysis using MS, thus ensuring its applicability across the metabolomics community. Setting MetSizeR specifications as in the previous examples, the PPCA model was fitted to a targeted metabolomic MS pilot data set and under the MetSizR algorithm, the estimated sample size is shown in Figure 3(B).

## Discussion and conclusions

Determining sample sizes in metabolomics is important but due to the complexity of these experiments, there are currently no standard methods for sample size estimation in metabolomics. Moreover, since pilot studies are rarely done in metabolomics, sample size estimation approaches for high dimensional data studies requiring experimental pilot data, cannot be applied.

The method presented in this article is a straight forward approach for determining sample sizes for metabolomic experiments whilst controlling the FDR. The main advantage of the developed approach is its ability to determine sample size even when experimental pilot data are not available. Another key advantage is that it takes the type of analysis the researcher intends to use into consideration when estimating sample size and this can improve the power of the study. Also, since MetSizeR employs permutation techniques to estimate sample size, it accounts for correlation between metabolites and effect size variability. The method has been developed to accept both NMR and targeted MS data which will ensure wide applicability in the metabolomics community. Further, a software package facilitates easy implementation of the MetSizeR approach.

Areas of future work are multiple and varied. MetSizeR is currently designed to estimate the number of samples required for metabolomic experiments which involve two groups; modifications to the MetSizeR approach are possible to accommodate different metabolomic experimental designs. Alternatives to the permutation approach employed in MetSizeR could be examined - bootstrap sampling would provide an interesting alternative. Proof of concept metabolomic experiments are currently underway to validate the MetSizeR approach.

## Availability and requirements

The package MetSizeR has been developed for the **R** statistical environment (http://www.r-project.org) and is freely available at http://cran.r-project.org. The package is accompanied by documentation files to facilitate its use.

**Project name:**
MetSizeR

**Project home page:**
http://cran.r-project.org/web/packages/MetSizeR/

**Operating system(s):** Platform independent.

**Programming language:** **R** platform.

**Other requirements:** No.

**License:** GPL (≥ 2)

**Any restrictions to use:** It is available for free download.

## Authors’ information

Nyamundanda Gift is a PhD candidate in the PhD in Bioinformatics and Systems Biology programme in University College Dublin. Dr. Isobel Claire Gormley is a lecturer in Statistics in the School of Mathematical Sciences, University College Dublin. Dr. Yue Fan is a Postdoctoral Researcher in the School of Biomolecular and Biomedical Science in University College Dublin. Prof. William Gallagher is an Associate Professor in the School Of Biomolecular and Biomedical Science in University College Dublin. Dr. Lorraine Brennan is a lecturer in Nutritional Biochemistry in the School of Agriculture and Food Science, Conway Institute, University College Dublin.

## Declarations

### Acknowledgements

This research was supported by the Irish Research Council for Science Engineering and Technology (IRCSET) funded PhD Programme in Bioinformatics and Systems Biology in University College Dublin, Ireland ( http://bioinformatics.ucd.ie/PhD/) and by HRB Ireland (RP/2006/117).

## Authors’ Affiliations

## References

- Berk M, Ebbels T, Montana G: A statistical framework for biomarker discovery in metabolomic time course data. Bioinformatics. 2011, 27 (14): 1979-1985. 10.1093/bioinformatics/btr289.PubMed CentralView ArticlePubMedGoogle Scholar
- Muller P, Parmigiani G, Robert C, Rousseau J: Optimal sample size for multiple testing. J Am Stat Assoc. 2004, 99 (468): 990-100. 10.1198/016214504000001646.View ArticleGoogle Scholar
- Tibshirani R: A simple method for assessing sample sizes in microarray experiments. BMC Bioinformatics. 2006, 7: 106-10.1186/1471-2105-7-106.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu P, Hwanga JTG: Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics. 2007, 23 (6): 739-746. 10.1093/bioinformatics/btl664.View ArticlePubMedGoogle Scholar
- Lin WJ, Hsueh HM, Chen JJ: Power and sample size estimation in microarray studies. BMC Bioinformatics. 2010, 11: 48-10.1186/1471-2105-11-48.PubMed CentralView ArticlePubMedGoogle Scholar
- Tipping ME, Bishop CM: Probabilistic principal component analysis. J R Stat Soc: Series B (Stat Method). 1999, 61 (3): 611-622. 10.1111/1467-9868.00196.View ArticleGoogle Scholar
- Nyamundanda G, Gormley IC, Brennan L: Probabilistic principal component analysis for metabolomic data. BMC Bioinformatics. 2010, 11: 571-10.1186/1471-2105-11-571.PubMed CentralView ArticlePubMedGoogle Scholar
- Nyamundanda G, Gormley IC, Brennan L: A dynamic probabilistic principal components model for the analysis of longitudinal metabolomics data. J R Stat Soc Series C. (Appl Stat) (To Appear)Google Scholar
- R Development Core Team: R: a language and environment for statistical computing. 2009, Vienna, Austria: R Foundation for Statistical Computing, [http://www.R-project.org],Google Scholar
- Reo NV: Metabonomics based on NMR spectroscopy. Drug and Chem Toxicol. 2002, 25 (4): 375-382. 10.1081/DCT-120014789.View ArticleGoogle Scholar
- Dettmer K, Aronov PA, Hammock BD: Mass spectrometry-based metabolomics. Mass Spectrometry Rev. 2007, 26: 51-78. 10.1002/mas.20108.View ArticleGoogle Scholar
- Patti GJ, Yanes O, Siuzdak G: Innovation: Metabolomics: the apogee of the omics trilogy. Nat Rev Mol Cell Biol. 2012, 13 (4): 263-269. 10.1038/nrm3314.PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995, 57: 289-300.Google Scholar
- Hwang D, Schmitt WA, Stephanopoulos G, Stephanopoulos G: Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics. 2002, 18: 1184-1193. 10.1093/bioinformatics/18.9.1184.View ArticlePubMedGoogle Scholar
- Carmody S, Brennan L: Effects of pentylenetetrazole-induced seizures on metabolomic profiles of rat brain. Neurochem Int. 2010, 56 (2): 340-344. 10.1016/j.neuint.2009.11.004.View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.