A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNAseq experiments
 Mikel Esnaola^{1, 6},
 Pedro Puig^{2},
 David Gonzalez^{3},
 Robert Castelo^{4, 5}Email author and
 Juan R Gonzalez^{1, 2, 5, 6}Email author
DOI: 10.1186/1471210514254
© Esnaola et al.; licensee BioMed Central Ltd. 2013
Received: 1 June 2013
Accepted: 14 August 2013
Published: 21 August 2013
Abstract
Background
Highthroughput RNA sequencing (RNAseq) offers unprecedented power to capture the real dynamics of gene expression. Experimental designs with extensive biological replication present a unique opportunity to exploit this feature and distinguish expression profiles with higher resolution. RNAseq data analysis methods so far have been mostly applied to data sets with few replicates and their default settings try to provide the best performance under this constraint. These methods are based on two wellknown count data distributions: the Poisson and the negative binomial. The way to properly calibrate them with large RNAseq data sets is not trivial for the nonexpert bioinformatics user.
Results
Here we show that expression profiles produced by extensivelyreplicated RNAseq experiments lead to a rich diversity of count data distributions beyond the Poisson and the negative binomial, such as PoissonInverse Gaussian or PólyaAeppli, which can be captured by a more general family of count data distributions called the PoissonTweedie. The flexibility of the PoissonTweedie family enables a direct fitting of emerging features of large expression profiles, such as heavytails or zeroinflation, without the need to alter a single configuration parameter. We provide a software package for R called tweeDEseq implementing a new test for differential expression based on the PoissonTweedie family. Using simulations on synthetic and real RNAseq data we show that tweeDEseq yields Pvalues that are equally or more accurate than competing methods under different configuration parameters. By surveying the tiny fraction of sexspecific gene expression changes in human lymphoblastoid cell lines, we also show that tweeDEseq accurately detects differentially expressed genes in a real large RNAseq data set with improved performance and reproducibility over the previously compared methodologies. Finally, we compared the results with those obtained from microarrays in order to check for reproducibility.
Conclusions
RNAseq data with many replicates leads to a handful of count data distributions which can be accurately estimated with the statistical model illustrated in this paper. This method provides a better fit to the underlying biological variability; this may be critical when comparing groups of RNAseq samples with markedly different count data distributions. The tweeDEseq package forms part of the Bioconductor project and it is available for download at http://www.bioconductor.org.
Background
Highthroughput gene expression profiling across samples constitutes one of the primary tools for characterizing phenotypes at molecular level. One of the main advantages of the rapidly evolving massive scale cDNA sequencing assay for this purpose (RNAseq [1]), over the hybridizationbased microarray technology, is a much larger dynamic range of detection. However, the extent to which this feature is fully exploited depends entirely on the way the resulting data is analyzed when addressing a particular biological question. For instance, in the identification of genes that significantly change their expression levels between groups of samples, also known as differential expression (DE).
For DE analysis, after some preprocessing steps that include the alignment of the sequenced reads to a reference genome and their summarization into features of interest (e.g., genes), raw RNAseq data is transformed into an initial table of counts. This table should then be normalized [24] in order to adjust for both technical variability and the expression properties of the samples, such that the estimated normalization factors and offsets applied to the RNAseq count data describe as accurately as possible the relative number of copies of each feature throughout every sample. As opposed to the continuous nature of logscale fluorescence units in microarray data, RNAseq expression levels are defined by discrete count data, and therefore, require specific DE analysis techniques.
In this paper we propose to approach this problem by using other count data distributions that fit expression profiles better than the NB without the need to alter configuration parameters. The rest of the paper is organized as follows. Using a large RNAseq data set of HapMap lymphoblastoid cell lines (LCLs) derived from n=69 unrelated Nigerian (YRI) individuals [12], we start by assessing the goodness of fit of extensively replicated expression profiles to the NB distribution, showing a lack of fit for an important fraction of genes. We illustrate how a more flexible family of countdata probability distributions, called the PoissonTweedie, provides a better fit to these expression profiles. We provide data supporting the hypothesis that the lack of fit to NB distributions may be related to the dynamics of gene expression unveiled by RNAseq technology. We then introduce a new test for differential expression analysis in RNAseq data based on the PoissonTweedie family of distributions. We demonstrate with simulations on synthetic and real RNAseq data how a single run of our approach provides Pvalues that are equally or more accurate than NBbased competing methods calibrated with a variety of configuration parameters. Finally, by surveying the tiny fraction of sexspecific gene expression changes in LCL samples, we approach the problem of assessing accuracy in DE analysis with real RNAseq data and show that, in the context of extensively replicated RNAseq experiments, tweeDEseq yields better performance than competing NBbased methods without the need to make an informed decision on the configuration of parameters. This improvement is reported in terms of precision and recall of DE genes and reproducibility of the significance levels with respect to matching microarray experiments.
Results and discussion
The statistical methods proposed in this paper are implemented in a package for the statistical software R, called tweeDEseq which forms part of the Bioconductor project [13] at http://www.bioconductor.org. We have also created an experimental data package, called tweeDEseqCountData, which contains the previously described data set and is also available at the same URL. All results presented in the paper were obtained using these and other packages from R version 2.15.1 and Bioconductor version 2.11, and can be reproduced through the scripts available as Additional file 1 to this article.
Review of competing methods
There is currently a large body of literature on DE analysis methods for RNAseq data [511, 14], nearly all of them based on the NB distribution and developed to deliver their best performance with few replicates. Anders et al. (2010) [7] argued that for large number of individuals “... questions of data distribution could be avoided by using nonparametric methods, such as rankbased permutation tests”. However, rankbased methods require similar count data distributions between sample groups. Due to the large variability across groups [15] captured by RNAseq data, this assumption will most likely be broken in this context. For example, panels ef in Figure 1 illustrate the case of gene NLGN4Y (ENSG00000165246), a gene located in the malespecific region of chromosome Y and reported to have sexspecific expression, which shows remarkably different count data distributions between male and female samples. Permutation tests are also underpowered since distribution tails are not well estimated (due to the large dynamic range), which is important when correcting for multiple testing.
Methods and parameter configurations compared in this paper
Key  Software  Configuration parameters 

DESeqPgO  DESEq  method=~pooled~, sharingMode=~percondition~ 
DESeqPmax  DESEq  method=~pooled~, sharingMode=~maximum~ 
DESeqCgO  DESEq  method=~percondition~, sharingMode=~percondition~ 
DESeqCmax  DESEq  method=~percondition~, sharingMode=~maximum~ 
edgeRdf20  edgeR  common/trended/tagwise moderation regime with prior.df=20 (default) 
edgeRdf1  edgeR  common/trended/tagwise moderation regime with prior.df=1 
edgeRqlfDf20  edgeR  common/trended/tagwise moderation regime with prior.df=20 (default) and quasilikelihood Ftests 
edgeRqlfDf1  edgeR  common/trended/tagwise moderation regime with prior.df=1 and quasilikelihood Ftests 
Different gene expression dynamics require different distributional assumptions on count data
This result suggests that NB distributions may be too restrictive for an important fraction of expression profiles in large RNAseq data sets. Among the possible causes underlying the lack of fit of those genes to an NB distribution, a clear candidate is that the presence of many samples can potentially introduce features such as zeroinflation or heavytails (see Figure 1). So far, extensive biological replication in RNAseq experiments has been the exception rather than the rule. However, it is becoming increasingly clear [15] that in the coming years larger RNAseq data sets will be required to justify scientific conclusions and provide reproducible results. Therefore, we can expect to see more often gene expression profiles with emerging features, such as zeroinflation and heavy tails, that challenge RNAseq methods based on the NB distribution.
We propose to address this problem by adopting the PoissonTweedie (PT) family of distributions [16] to model RNAseq count data directly. PT distributions are described by a mean (μ), a dispersion (ϕ) and a shape (a) parameter (see Methods) and include Poisson and NB distributions, among others, as particular cases [16]. An important feature of this family is that, while the NB distribution only allows a quadratic meanvariance relationship, the PT distributions generalizes this relationship to any order [17]. We have implemented a maximum likelihood procedure for the estimation and simulation of these parameters from count data. These procedures are available in the tweeDEseq package through the functions mlePoissonTweedie(), dPT() and rPT().
Figure 1 illustrates the flexibility of the PT distribution to accurately fit different gene expression profiles obtained from the unnormalized LCL RNAseq data set. Left and right panels correspond to female and male samples, respectively and each row corresponds to a different gene: EEF1A2 (ENSG00000101210), SCT (ENSG00000070031) and NLGN4Y (ENSG00000165246), respectively. Among these three genes, only NLGN4Y has been reported in the literature to have sex specific expresssion, while the other two are likely to lack such property since EEF1A2 is a housekeeping gene and SCT is an endocrine hormone peptide in chromosome 11 that controls secretions in the duodenum. Each plot shows the empirical cumulative distribution of observed counts as well as the parametric cumulative distributions obtained through the estimation of parameters of the methods compared in this paper under different configurations. Note that the estimated dispersion parameter ϕ is identical between the two sample groups for edgeR and DESeq (pooled) as these approaches estimate ϕ irrespective from the sample groups. The Pvalue for testing whether the data follow an NB distribution (H_{0}:a=0), indicated above the legend, reveals that in several sample groups (panels ac, e) this hypothesis is rejected (P<0.05). In those cases, methods based on the NB distribution produce dispersion parameters that do not fit the data as accurately as the PT distribution. More concretely, heavytails present in panels a,c severely hamper the estimation of the pure NB and the common dispersion. These can be improved using a parameter configuration more suited to large sample sizes. However, this results in a poor estimate of zeroinflation in panels ce.
In fact, Fisher’s exact tests for enrichment of nonNB genes among human housekeeping genes are significant (P<1.24^{−6}) for every normalization method (see Additional file 2: Table S1). These observations suggest that genes with different expression dynamics can produce different count data distributions, and underscore the flexibility of the PT statistical model to capture these dynamics revealed by extensivelyreplicated RNAseq experiments.
Accurately testing differential expression
For the purpose of a DE analysis between two groups of samples, we have developed a twosample PTtest for differences in means (see Methods) implemented through the function tweeDE() in the tweeDEseq package. We will assess the accuracy of this PTbased test using the LCL data as well as synthetic count data from two different simulation studies. The first simulation study with synthetic data provides an assessment of the typeI error rate under four different scenarios involving distinct count data distributions between sample groups (see Additional file 2: Table S2 for a description of them). Here we compare tweeDEseq with the configurations of edgeR and DESeq which are closer to a straightforward NB model. Additional file 2: Figures S2 to S5 show that tweeDEseq properly controls the nominal probability of a typeI error while edgeR, DESeq and nonparametric tests (U MannWithney and permutation) fail to do so when data are not simulated from NB distributions. As expected, these methods perform correctly when data are generated under an NB model (see Additional file 2: Figure S5) as expected. Additional file 2: Figure S6 also shows that in the calculation of very low Pvalues, tweeDEseq clearly outperforms permutations tests. In order to provide a practical recommendation on the minimum sample size required by tweeDEseq to yield accurate results we have estimated the probability of a typeI error across different sample sizes. Additional file 2: Figure S7 indicates that 15 samples per group should be sufficient for tweeDEseq to correctly control for a nominal significance level α=0.05.
In the second simulation study we have first assessed the accuracy of the Pvalue distribution under the null hypothesis of no differential expression with real RNAseq data by making repeatedly twosample group comparisons within males and within females samples such that we recreate the null hypothesis of no DE with real RNAseq data and no DE gene should be expected to be found. The raw Pvalue distributions from such analysis should ideally be uniform.
The method introduced in this paper, tweeDEseq, is consistently closer to the diagonal than every other method throughout the two male and female comparisons and the two normalization methods. More informally, the visual inspection of the histograms of raw Pvalues given in Additional file 2: Figure S8 also reveals that tweeDEseq provides Pvalue distributions closer to the uniform under the null hypothesis of no DE simulated from extensively replicated real RNAseq data.
As other authors have shown, in the context of analysis of RNAseq data with very limited sample size [8], small deviations from uniformity of Pvalues under the null hypothesis can substantially affect FDR estimates of DE genes. We have also assessed the calibration of Pvalues and false discovery rates (FDR) with synthetic count data of similar dimensions to the RNAseq LCL data set, concretely with p=20,000 genes and n=70 samples. Working with this type of data allows to assess FDR estimates for a known subset of DE genes under a variety of simulated scenarios, which we defined by considering the combination of three different amounts of DE genes (100, 1000 and 2000) and three different magnitudes of foldchange (1.5, 2 and 4fold). Similarly to [8], data were simulated from a hierarchical gammaPoisson model with and without simulated library factors (see Methods).
Mean squared error of false discovery rates under constant library factors
#DE  Rank  1.5fold change  2fold change  4fold change  

MSE  Method  MSE  Method  MSE  Method  
1  0.050  DESeq  Pmax  0.050  DESeq  Pmax  0.031  tweeDEseq  
2  0.122  tweeDEseq  0.053  tweeDEseq  0.037  DESeq  Cmax  
3  0.155  DESeq  Cmax  0.065  DESeq  Cmax  0.070  DESeq  Pmax  
4  0.257  DESeq  PgOn  0.104  DESeq  PgOn  0.072  DESeq  PgOn  
100  5  0.306  edgeR  QLF Df1  0.138  edgeR  QLF Df1  0.430  edgeR  QLF Df1 
6  0.313  edgeR  QLF (def)  0.177  edgeR  QLF (def)  0.452  edgeR  QLF (def)  
7  0.558  edgeR  (def)  0.336  edgeR  (def)  0.790  edgeR  (def)  
8  0.755  edgeR  Df1  0.431  edgeR  Df1  0.957  edgeR  Df1  
9  9.688  DESeq  CgOn  6.232  DESeq  CgOn  5.133  DESeq  CgOn  
1  0.008  tweeDEseq  0.004  tweeDEseq  0.004  tweeDEseq  
2  0.008  DESeq  Cmax  0.008  DESeq  PgOn  0.007  DESeq  PgOn  
3  0.016  DESeq  PgOn  0.015  DESeq  Cmax  0.014  DESeq  Cmax  
4  0.043  edgeR  QLF Df1  0.087  DESeq  Pmax  0.081  DESeq  Pmax  
1000  5  0.045  edgeR  QLF (def)  0.413  edgeR  QLF (def)  0.429  DESeq  CgOn 
6  0.082  DESeq  Pmax  0.459  edgeR  QLF Df1  21.358  edgeR  QLF (def)  
7  0.105  edgeR  (def)  0.532  DESeq  CgOn  22.208  edgeR  (def)  
8  0.155  edgeR  Df1  0.639  edgeR  (def)  23.401  edgeR  QLF Df1  
9  0.735  DESeq  CgOn  0.835  edgeR  Df1  25.004  edgeR  Df1  
1  0.002  DESeq  PgOn  0.001  DESeq  PgOn  0.000  DESeq  PgOn  
2  0.002  tweeDEseq  0.001  tweeDEseq  0.001  tweeDEseq  
3  0.025  DESeq  Cmax  0.031  DESeq  Cmax  0.036  DESeq  Cmax  
4  0.053  edgeR  QLF (def)  0.090  DESeq  Pmax  0.093  DESeq  Pmax  
2000  5  0.056  edgeR  QLF Df1  0.183  DESeq  CgOn  0.140  DESeq  CgOn 
6  0.093  DESeq  Pmax  1.444  edgeR  QLF (def)  34.551  edgeR  QLF (def)  
7  0.113  edgeR  (def)  1.702  edgeR  QLF Df1  35.365  edgeR  (def)  
8  0.169  edgeR  Df1  1.724  edgeR  (def)  35.468  edgeR  QLF Df1  
9  0.271  DESeq  CgOn  2.219  edgeR  Df1  36.929  edgeR  Df1 
Mean squared error of false discovery rates under variable library factors
#DE  Rank  1.5fold change  2fold change  4fold change  

MSE  Method  MSE  Method  MSE  Method  
1  0.030  DESeq  Pmax  0.059  DESeq  Cmax  0.046  tweeDEseq  
2  0.099  tweeDEseq  0.064  tweeDEseq  0.055  DESeq  Pmax  
3  0.189  DESeq  Cmax  0.072  DESeq  Pmax  0.057  DESeq  Cmax  
4  0.194  DESeq  PgOn  0.116  DESeq  PgOn  0.106  DESeq  PgOn  
100  5  0.258  edgeR  QLF Df1  0.129  edgeR  QLF Df1  0.124  edgeR  QLF Df1 
6  0.348  edgeR  QLF (def)  0.129  edgeR  QLF (def)  0.153  edgeR  QLF (def)  
7  0.581  edgeR  (def)  0.273  edgeR  (def)  0.290  edgeR  (def)  
8  0.667  edgeR  Df1  0.420  edgeR  Df1  0.380  edgeR  Df1  
9  8.882  DESeq  CgOn  6.429  DESeq  CgOn  5.217  DESeq  CgOn  
1  0.005  tweeDEseq  0.006  tweeDEseq  0.008  tweeDEseq  
2  0.009  DESeq  Cmax  0.012  DESeq  Cmax  0.010  DESeq  Cmax  
3  0.013  DESeq  PgOn  0.012  DESeq  PgOn  0.011  edgeR  QLF Df1  
4  0.016  edgeR  QLF Df1  0.016  edgeR  QLF Df1  0.013  edgeR  QLF (def)  
1000  5  0.019  edgeR  QLF (def)  0.017  edgeR  QLF (def)  0.024  DESeq  PgOn 
6  0.054  edgeR  (def)  0.051  edgeR  (def)  0.045  edgeR (def)  
7  0.083  DESeq  Pmax  0.082  DESeq  Pmax  0.067  DESeq  Pmax  
8  0.087  edgeR  Df1  0.083  edgeR  Df1  0.077  edgeR  Df1  
9  0.700  DESeq  CgOn  0.545  DESeq  CgOn  0.529  DESeq  CgOn  
1  0.003  tweeDEseq  0.004  tweeDEseq  0.005  DESeq  Cmax  
2  0.003  DESeq  PgOn  0.006  edgeR  QLF Df1  0.017  edgeR  QLF (def)  
3  0.006  edgeR  QLF Df1  0.006  edgeR  QLF (def)  0.018  edgeR  QLF Df1  
4  0.007  edgeR  QLF (def)  0.007  DESeq  PgOn  0.023  tweeDEseq  
2000  5  0.025  DESeq  Cmax  0.024  DESeq  Cmax  0.029  DESeq  Pmax 
6  0.028  edgeR  (def)  0.026  edgeR  (def)  0.053  edgeR  (def)  
7  0.049  edgeR  Df1  0.047  edgeR  Df1  0.088  edgeR  Df1  
8  0.091  DESeq  Pmax  0.077  DESeq  Pmax  0.092  DESeq  PgOn  
9  0.267  DESeq  CgOn  0.238  DESeq  CgOn  0.465  DESeq  CgOn 
Rankings of methods by the mean squared error of false discovery rates under constant library factors
Method  #DE = 100  #DE = 1000  #DE = 2000  

1.5 FC  2 FC  4 FC  1.5 FC  2 FC  4 FC  1.5 FC  2 FC  4 FC  
tweeDEseq  2  2  1  1  1  1  2  2  2 
DESeq  PgOn  4  4  4  3  2  2  1  1  1 
DESeq  Pmax  1  1  3  6  4  4  6  4  4 
DESeq  CgOn  9  9  9  9  7  5  9  5  5 
DESeq  Cmax  3  3  2  2  3  3  3  3  3 
edgeR  (def)  7  7  7  7  8  7  7  8  7 
edgeR  QLF (def)  6  6  6  5  5  6  4  6  6 
edgeR  Df1  8  8  8  8  9  9  8  9  9 
edgeR  QLF Df1  5  5  5  4  6  8  5  7  8 
Rankings of methods by the mean squared error of false discovery rates under variable library factors
Method  #DE = 100  #DE = 1000  #DE = 2000  

1.5 FC  2 FC  4 FC  1.5 FC  2 FC  4 FC  1.5 FC  2 FC  4 FC  
tweeDEseq  2  2  1  1  1  1  1  1  4 
DESeq  PgOn  4  4  4  3  3  5  2  4  8 
DESeq  Pmax  1  3  2  7  7  7  8  8  5 
DESeq  CgOn  9  9  9  9  9  9  9  9  9 
DESeq  Cmax  3  1  3  2  2  2  5  5  1 
edgeR  (def)  7  7  7  6  6  6  6  6  6 
edgeR  QLF (def)  6  6  6  5  5  4  4  3  2 
edgeR  Df1  8  8  8  8  8  8  7  7  7 
edgeR  QLF Df1  5  5  5  4  4  3  3  2  3 
Identification of sexspecific gene expression in lymphoblastoid cell lines
Assessing performance of DE analysis methods without using simulated data is a challenging problem due to the difficulty of knowing or ensuring the exact differential concentration of RNA molecules in the analysed samples. In this respect, sexspecific expression constitutes a useful system to assess the accuracy of DE detection methods due to the vast literature on genes contributing to genderspecific traits. For this reason, in order to illustrate the accuracy of tweeDEseq with real RNAseq data, we have searched for genes changing significantly their expression between female and male individuals of the RNAseq experiments on LCLs analyzed in this paper. Again, we have compared different normalization procedures and parameter configurations of edgeR and DESeq. Next to considering the raw unnormalized data and the data normalized with cqn, TMM normalization was used for edgeR and tweeDEseq, while DESeq was used with its own normalization method. We have used a single significance cutoff of FDR <0.1 at which genes were called DE. Since LCLs come from a nonsexually dimorphic tissue and are outside their original biological context, the fraction of sexspecific expression changes we could expect should be rather small.
In an attempt to verify the accuracy of these lists of DE genes between female and male individuals, we searched for genes reported in the literature to be potential contributors to sexually dimorphic traits. This list of genes with documented sexspecific expression was obtained from genes in chromosome X that escape Xinactivation [22] and from genes in the malespecific region of the Y chromosome [23] (see Methods). This resulted in a goldstandard set of 95 genes mapping to Ensembl Gene Identifiers (release 63), which we shall denote by XiE and MSY genes, depending on their origin. For every predicted set of DE genes by each combination of DE detection method and normalized data set, we calculated precision and recall with respect to the goldstandard, and the Fmeasure which summarizes the tradeoff between these two diagnostics.
In Additional file 2: Table S3 we report the complete list of 55 DE genes detected by tweeDEseq from the data normalized with cqn, which is when it yields the best precisionrecall tradeoff. More than a half of genes in this list (32) are located in either the X or Y chromosomes and where the first 10 with largest foldchange contain 7 from the goldstandard set of MSY and XiE genes. Among the other 3, we find TTTY15, a testisspecific noncoding transcript from the Y chromosome and the other two lack functional annotation in Ensembl release 63.
Reproducibility with respect to microarray data
The YRI LCL samples we have analyzed have been previously assayed using microarray chips [24] and this enables a comparison between the gene expression read out of both technologies. In particular, we wanted to assess the degree of reproducibility of the significance levels of DE. While there may be many aspects from both technologies that can potentially bound the extent to which we can reproduce rankings of DE, we postulate that more accurate Pvalues in DE genes should lead to higher reproducibility of significance levels of DE genes.
With this purpose, we applied limma[25] on the microarray data and called genes DE at 10% FDR, just as we did with RNAseq data, and then compared the − log10 units of the raw Pvalues from DE genes called in RNAseq by each DE detection method to the − log10Pvalue units from genes called DE by limma. In Additional file 2: Figure S12 we show this comparison for every gene that is called DE either by limma in microarray data or by the other compared method in RNAseq data. Although we can observe a significant linear relationship between Pvalues in every compared method, the low fraction of variability explained by the fitted linear model (R^{2}<0.25) in every of those comparisons indicates a rather poor level of reproducibility for every method.
A closer look to genes in that figure indicates that the lack of reproducibility mostly comes from genes called DE by one method and technology but not by the other (dots close to zero in either the x or the yaxis). There may be many reasons, unrelated to the DE detection method itself, why a gene is not called simultaneously DE in two completely independent RNAseq and microarray experiments on the same biological material, such as different isoforms being probed in the microarray and summarized in RNAseq or differences in sample preparation. Therefore, for our current goal of assessing reproducibility of DE detection methods, we believe it makes sense to restrict this comparison to those genes that are called DE by both, limma in microrray data and the corresponding method in RNAseq data.
Conclusions
The increased amount of biological variability revealed by extensive replication in RNAseq experiments brings new challenges to the task of identifying genes whose change in expression is both, biologically and statistically significant. In microarray data, large foldchanges derived from large data sets were nearly synonymous of statistical significance. The volcano plots in Figure 12 and the examples of specific genes in Figure 1 illustrate why this is not true anymore with RNAseq count data. Those figures unveil that one of these new challenges is to distinguish statistically significant changes among those that are already large in magnitude. In this paper we provide an approach to this problem by using the PT family of distributions, showing that it captures a much richer diversity of expression dynamics in RNAseq count data than the statistical models based in the NB distributions alone (see Figures 4 and 5). We have implemented a twosample PTtest in a software package for R, called tweeDEseq, for detecting DE genes and demonstrated with simulations that produces more accurate Pvalue distributions that lead to better calibrated qvalues and FDR estimates.
We have made an attempt to assess DE detection accuracy with real RNAseq data by comparing male and female LCL samples normalized with three different methods and comparing the results to a goldstandard set of genes with documented sexspecific expression. This assessment also shows that tweeDEseq provides a better precisionrecall tradeoff than the compared NBbased methods (see Figure 10 and Additional file 2: Figure S11). We have also made a comparison with matching samples hybridised on microarray chips which allowed us to verify that tweeDEseq yields a higher degree of reproducibility of significance levels with respect to microrray data.
All these different comparative assessments have been performed against two of the most widely currently used methods for DE analysis of RNAseq data, edgeR and DESeq, under four different parameter configurations each, since their default parametrisation is tailored towards very limited sample size. Making an informed decision on what is the most appropriate setup is not trivial for the nonexpert user and, for this reason, it is important to underscore that tweeDEseq is competitive with all of the methodologies that follow from the different configurations of edgeR and DESeq without the need to set a single parameter.
The fact that the volcano plots from tweeDEseq and limma, derived from RNAseq and microarray data, reveal that limma is able to find a larger number of DE genes from the goldstandard, suggests a long way still ahead of us to fully exploit the RNAseq technology for DE. Not only regarding experimental aspects, but also statistical ones such as properly detecting and adjusting for unwanted sources of nonbiological variability, for which there is currently no wellestablished available techniques, as in the case of microarray data.
Other applications of highthroughput sequencing technology that output counts of molecules, like in Copy Number Variation (CNV) analysis, could potentially benefit of models based on the PTdistribution. It is our perception that richer count data models of this kind will become increasingly necessary to draw accurate conclusions from data as technology brings us closer the actual biology of the cell.
Methods
Preprocessing of RNAseq data
We have analyzed data from Pickrell et al. (2010) [12] that sequenced RNA from LCLs in 69 Nigerian (YRI) [12] individuals. Raw reads were downloaded from http://eqtl.uchicago.edu/RNA_Seq_data/unmapped_reads and preprocessed using the GRAPE pipeline [27]. This pipeline consists of first mapping the reads to the human genome version hg19 using the GEM mapper software [28]. Second, mapped reads were summarized into genelevel counts according to the GENCODE annotation version 3c [29] with Ensembl release 63 gene identifiers, by selecting those reads that mapped either completely within an exon or spanning a junction. This resulted in an initial table of counts of 38,415 Ensembl genes. This table of counts form part of the experimental data package tweeDEseqCountData available at http://www.bioconductor.org under the name pickrell1.
The table of counts was filtered to discard lowly expressed genes by keeping only those with an average of more than 0.1 counts per million (CPM) throughout the samples. The results shown in Additional file 2: Figure S11 were obtained by applying a more stringent minimum cutoff of 0.5 CPM. When we applied a normalization method that adjusted for gene length and G+C content (see below), genes without this information were also discarded. When the minimum CPM was 0.1, then 31,226 genes were kept when no normalization method or edgeRTMM was applied and when cqn was applied then 27,438 were kept (see pg. 5 and 6 from Additional file 1). When the minimum CPM was 0.5 then these numbers decreased to 19,166 and 18,009 genes, respectively.
Three approaches to normalizing the table of counts from the LCL data have been considered. The first one is to work with the initial table of raw counts without any kind of normalization, the second one is to apply TMM [2] normalization as implemented in the edgeR[30] package, the third one is to use the methodology implemented in the cqn[4] Bioconductor package which adjusts for samplespecific effects of gene length and G+C content of every gene. When using the DESeq method for DE analysis in the LCL samples, the TMM normalization procedure was replaced by its own normalization procedure.
Raw counts were transformed into filtered and normalized counts for the purpose of producing MAplots (Figure 2), assessing goodness of fit to the NB distribution (Figure 3), examining the relationship between mean expression level and the shape parameter of the PT distribution (Figure 4) and doing DE analysis with tweeDEseq. In the case of DESeq raw counts were transformed into normalized counts only when used with the cqn normalization method.
In the case of edgeRTMM normalization, counts were transformed following the steps that the function exactTest() in edgeR takes: calculate normalization factors with the TMM method (calcNormFactors()), estimate effective library sizes and adjust counts to effective library sizes obtaining noninteger normalized pseudocounts (equalizeLibSizes()) which were subtracted by 0.5 and then raised to the smallest integers not less than these pseudocounts (ceiling()). These steps are written together in the function normalizeCounts() from the tweeDEseq package.
In the case of cqn, normalization offsets are calculated by the function cqn() as log2 RPMs, which are added to original raw log2 RPMs. These are rolled back to absolute numbers and “unlogged” obtaining noninteger normalized pseudocounts which, analogously to the edgeRTMM case, were subtracted by 0.5 and then raised to the smallest integers not less than these pseudocounts (ceiling()). The rationale behind subtracting 0.5 to the pseudocounts instead of directly truncating or raising to the next integer value, is to try to approach as much as possible the correct proportion of zero counts in the normalized data.
However, when performing DE analysis with edgeR, or with DESeq and its own normalization procedure, the specific recommendations made by the corresponding software authors were followed. More concretely, raw counts were not transformed in order to preserve their sampling properties and normalization adjustments entered the DE analysis through the corresponding normalization factors and offsets arguments within the functions that test for DE (see scripts for details in Additional file 1).
Preprocessing of microarray data
The microarray LCL data from [24] was processed from the raw CEL files available at http://www.ncbi.nlm.nih.gov/geounder accession GSE7792. Firstly, we only considered YRI samples. Secondly, data was processed using the Bioconductor oligo package. Quality assessment was performed by calculating NUSE and RLE diagnostics (Bolstad et al., 2005) and discarding those samples that either of the two reported diagnostics was considered below a minimum quality threshold. Third, using the RMA algorithm (Irizarry et al., 2003) implemented in the rma() function from the oligo package with argument target=~core~, expression values were background corrected, normalized and summarized into Affymetrix transcript clusters. Fourth, most samples formed part of family trios and only samples belonging to father or mother were kept. Fifth, using the getNetAffx() function from the oligo package, Ensembl Transcript identifiers well obtained for each Affymetrix transcript cluster identifier. Sixth, using the bioconductor package biomaRt, Ensembl Transcript identifiers were translated into Ensembl Gene identifiers, resolving multiple assignments by keeping the Ensembl Gene identifier that had a match in the Ensembl Gene identifiers forming the table of counts of the [12] RNAseq data, or choosing one arbitrarily, otherwise. Seven, duplicated assignments of the same Ensembl Gene identifier to multiple Affymetrix transcript cluster identifiers were resolved by keeping the transcript cluster with largest expression variability measured by its interquartile range (IQR).
At this point an expression data matrix of 16,323 Ensembl Genes by 74 samples was obtained and using the scanning date of each CEL file, samples were grouped into 5 batches, out of which one containing only three male samples was discarded leaving a total of 71 samples distributed into 4 balanced batches across gender. Batch effect was removed by using the QRdecomposition method implemented in the removeBatchEffect() function from the Bioconductor package limma[25] while keeping the sexspecific expression effect by setting the gender sample indicator variable within the design matrix argument. Finally, samples and genes were further filtered to match those from the RNAseq table of counts.
Matching RNAseq and microarray expression data matrices
To perform the analyses summarized in Figure 11 and Additional file 2: Figure S12 we further filtered the previously preprocessed RNAseq and microarray gene expression matrices to match both Ensembl Gene identifiers and individual HapMap identifiers. This resulted in two gene expression data matrices of equal dimension with 15,194 genes and 36 samples. We only considered the RNAseq data normalized with the cqn package.
To perform the analyses summarized in Figure 12 we built two other gene expression data matrices where, as before, samples were restricted to those 36 that matched between RNAseq and microarray data but genes were not, leading to a RNAseq and microarray gene expression data matrices of 27,438 and 16,323 Ensembl Genes by 36 samples, respectively. Genes were not matched since the purpose of these analyses was to gather insight into the differences and challenges in detecting DE genes using RNAseq with respect to microarray gene expression data with many replicates.
Functional annotations
Functional annotations for Ensembl genes forming the tables of counts, were retrieved from http://jun2011.archive.ensembl.org with R and the biomaRt Bioconductor package. Gene length and G+C content annotations, used with the cqn normalization method, were obtained by downloading all human cDNAs from http://ftp.ensembl.org/pub/release63/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.63.cdna.all.fa.gz and calculating the length and G+C content of the longest cDNA for each Ensembl gene.
The goldstandard list of genes with sexspecific expression was built with genes reported in the literature that, in one hand, escape chromosome X inactivation [22] and, on the other hand, belong to the malespecific region of chromosome Y [23]. In both cases, gene symbols were first translated into Ensembl gene identifiers and then further filtered to keep only those included in the set of Ensembl gene identifiers release 63. This resulted in a goldstandard list of 95 genes with sexspecific expression.
The list of housekeeping genes was retrieved from the literature [19] and mapped to Ensembl genes release 63, resulting in a final set of 669 housekeeping genes. The expression breadth reported in Figure 5 was obtained through the Barcode Gene Expression catalog [18] which uses information from 18,656 publicly available microarray samples from 131 tissue types, of the HGU133 Plus 2.0 Affymetrix chip, to estimate the proportion of tissue types in which a given probeset is expressed in more than half the samples. After discarding unreliable probes (annotated with highentropy in the catalog), we use these values as surrogates for expression breadth by mapping Affymetrix probeset identifiers to the genes in our table of counts through the hgu133plus2.db Bioconductor annotation package, leading to 16,292 genes with expression breadth values. When two or more probesets mapped to the same gene, the maximum value was taken for that gene.
All these functional data are included in the experimental data package tweeDEseqCountData available at http://www.bioconductor.org under the keywords annotEnsembl63, genderGenes and hkGenes.
PoissonTweedie distributions
PoissonTweedie (PT) distributions have been studied by several authors [3134] and unify several overdispersed count data distributions (see Figure one in [34]). This family of distributions can be defined by a probability generating function and mass probabilities have to be computed using a recursive algorithm [31, 34]. ElShaarawi et al. (2011) [34] compared different recursions and parameterizations of this family providing an algorithm to compute the PT probability distribution function. In the R package tweeDEseq we have developed a fast implementation, written in the C programming language, of this recursive algorithm. We briefly describe here the PT family of distributions as well as how we have used it to analyze RNAseq count data in the context of a differential expression (DE) analysis.
and p_{ i } denotes the probability of observing i counts.
The PT model includes not only Poisson (a=1) and negative binomial (NB) (a=0) but also other distributions that have been used to analyze count data such as PoissonInverse Gaussian (PIG) ($a=\frac{1}{2}$), PólyaAeppli (PA) (a=−1) or Neyman type A (a→−∞). Therefore, the PT distribution family unifies several diverse count data distributions, including different overdispersed distributions such as NB or PIG. These distributions can model different scenarios as, for instance, a RNAseq expression profile with a wide dynamic range leading to a heavy tail in the distribution. In such a case, PIG has a heavier tail than NB and this would make it more appropriate for such a gene. Note that an extremely heavy tail implies overdispersion, but the converse does not hold; hence the NB distribution is not adequate to model RNAseq expression profiles of genes with a wide dynamic range due to their intrinsic biological variability [15].
where p is the shape parameter of that specific parameterization. It follows that, whereas the NB distribution is only able to capture a quadratic meanvariance relationship, the PT family is able to generalize this relationship to any order. As a result, it is more convenient to use the PT model when dealing with count data which presents variable overdispersion.
Parameter estimation for PoissonTweedie distributions
In practice, we do not know the parameters θ_{ g }=μ_{ g },ϕ_{ g },a_{ g }, but we can estimate them from data by maximum likelihood when the sample size is sufficiently large so that it guarantees the desirable large sample properties of unbiasedness and minimum variance of the maximum likelihood estimate (MLE). In the Additional file 2: Supplementary Information we provide a simulation study in order to estimate the minimum number of samples per group that approximately meets this requirement (see Additional file 2: Figure S7).
We obtained the MLE $\widehat{\theta}$ using a quasiNewton method with constraints. We have implemented such a procedure using the optim function in R. In order to guarantee good convergence, we consider as initial parameters the moment estimates of μ_{ g } and ϕ_{ g }, and a_{ g }=0. We choose this value for a_{ g } because it corresponds to an NB model that is the natural cutpoint of PT’s parameter space.
Goodnessoffit to a negative binomial distribution
where numerator and denominator correspond to the likelihood functions for the PT and NB models, respectively. Since the PT model has just one parameter more than the NB model, the quantity $2logT\sim {\chi}_{1}^{2}$ under the null hypothesis, as n grows large, and it can be used to decide whether count data follow a NB distribution by means of a QQ plot (see Additional file 2: Figure S2) or by calculating the corresponding Pvalue.
Test to determine differentially expressed genes
The PTstatitic, T, follows a standard normal distribution under the null hypothesis. Therefore, the (1−α)% percentile of a N(0,1) distribution is used to determine whether the observed differences between the two groups are statistically significant or not by providing a corresponding Pvalue that can be later on corrected for multiple testing using, for instance, BenjaminiHochberg’s FDR [35].
Simulation of RNAseq data
The results shown in Figure 6 recreating the null hypothesis of no DE with real RNAseq data were performed by dividing the LCL data into two separate data sets of male and female samples. From each data set we bootstrapped 100 times two groups of 20 samples uniformly at random, thus obtaining on the one hand, group pairs of female samples and, on the other hand, group pairs of male samples. On each bootstrapped data set we performed the twosample test for DE detection of every method between the groups of female versus female samples and male versus male samples. We also considered two versions of the data, one with the raw unnormalized counts and the other with the counts normalized with the cqn package [4]. In principle, there are no DE genes to be discovered from these comparisons, and therefore, under the null hypothesis of no DE, the Pvalue distribution for any given gene throughout the 100 bootstrapped data sets should be uniform.
The simulations shown in Figures 7, 8 and 9 contained synthetic RNAseq data generated from a gammaPoisson mixture model in a similar way to other published studies [8]. Under this model, we first draw dispersion parameters ϕ_{ g } for every gene g at random from a gamma distribution Gamma(k=2,θ=0.7) and means according to three different foldchanges (1.5, 2 and 4) where half of the genes were upregulated and the other half downregulated. The λ_{ gi } Poisson parameter for every gene g and sample i was drawn at random from a gamma distribution Gamma(k=a,θ=1/(ϕ−1)) with a=f μ_{ gk }/(ϕ−1) and f≈N(0,σ) corresponding to library factor which was either constant (σ=0) or variable (σ=0.5). Counts were simulated for each gene g from the resulting mixture gammaPoisson distribution with parameters λ_{ gi } for each sample i. Note that the resulting marginal distribution from the gammaPoisson is a negativebinomial.
Software availability

Project name: tweeDEseq

Project home page: http://www.bioconductor.org/packages/release/bioc/html/tweeDEseq.html

Operating system(s): Platform independent

Programming language: R and C

Other requirements: R 3.0.0

Licence: GNU GPL

Any restrictions tu use by nonacademics: no restrictions
Declarations
Acknowledgements
This work was supported by grants from the ‘Ministerio de Ciencia e Innovación  MICINN’ (MTM201126515 to JRG and ME, MTM201009526E to JRG, MTM200910893 to PP and TIN201122826 to RC), from European Reseach Council, Breathe project, ERCAdG, GA num. 268479 to ME.
Authors’ Affiliations
References
 Mortazavi1 A, Williams B, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNASeq. Nat Methods. 2008, 5: 621628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
 Robinson M, Oshlack A: A scaling normalization method for differential expression analysis of RNAseq data. Genome Biol. 2010, 11: R2510.1186/gb2010113r25.PubMed CentralView ArticlePubMedGoogle Scholar
 Risso D, Schwartz K, Sherlock G, Dudoit S: GCcontent normalization for RNASeq data. BMC Bioinformatics. 2011, 12: 48010.1186/1471210512480.PubMed CentralView ArticlePubMedGoogle Scholar
 Hansen KD, Irizarry RA, Wu Z: Removing technical variability in RNAseq data using conditional quantile normalization. Biostatistics. 2012, 13 (2): 204216. 10.1093/biostatistics/kxr054.PubMed CentralView ArticlePubMedGoogle Scholar
 Marioni J, Mason C, Mane S, Stephens M, Gilad Y: RNAseq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 15091517. 10.1101/gr.079558.108.PubMed CentralView ArticlePubMedGoogle Scholar
 Robinson MD, Smyth GK: Smallsample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008, 9 (2): 321332.View ArticlePubMedGoogle Scholar
 Anders S, Huber W: Differential expression analysis for sequence count data. Genome Biol. 2010, 11 (10): R10610.1186/gb20101110r106.PubMed CentralView ArticlePubMedGoogle Scholar
 Lund SP, Nettleton D, McCarthy DJ, Smyth GK: Detecting differential expression in RNAsequence data using quasiLikelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012, 11 (5): doi:10.1093/biostatistics/kxs033.Google Scholar
 Hardcastle TJ, Kelly KA: baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010, 11: 42210.1186/1471210511422.PubMed CentralView ArticlePubMedGoogle Scholar
 McCarthy DJ, Chen Y, Smyth GK: Differential expression analysis of multifactor RNASeq experiments with respect to biological variation. Nucleic Acids Res. 2012, 40 (10): 42884297. 10.1093/nar/gks042.PubMed CentralView ArticlePubMedGoogle Scholar
 Wu H, Wang C, Wu Z: A new shrinkage estimator for dispersion improves differential expression detection in RNAseq data. Biostatistics. 2012, doi:10.1093/biostatistics/kxs033.Google Scholar
 Pickrell J, Marioni J, Pai A, Degner J, Engelhardt B, Nkadori E, Veyrieras J, Stephens M, Gilad Y, Pritchard J: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464: 768772. 10.1038/nature08872.PubMed CentralView ArticlePubMedGoogle Scholar
 Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5 (10): R8010.1186/gb2004510r80.PubMed CentralView ArticlePubMedGoogle Scholar
 Van De Wiel MA, Leday GGR, Pardo L, Rue H, Van Der Vaart AW, Van Wieringen WN: Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics. 2012, doi:10.1093/biostatistics/kxs031.Google Scholar
 Hansen K, Wu Z, Irizarry R, Leek J: Sequencing technology does not eliminate biological variability. Nat Biotech. 2011, 29: 572573. 10.1038/nbt.1910.View ArticleGoogle Scholar
 Jorgensen B: The Theory of Dispersion Models. 1997, New York: Chapman and HallGoogle Scholar
 Kokonendji C, DossouGbété S, Demétrio C: Some discrete exponencial dispersion models: PoissonTweedie and HindeDemétrio classes. SORT. 2004, 28 (2): 201214.Google Scholar
 McCall M, Uppal K, Jaffee H, Zilliox R M J Irizarry: The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res. 2011, 39: D1011D1015. 10.1093/nar/gkq1259.PubMed CentralView ArticlePubMedGoogle Scholar
 Eisenberg E, Levanon EY: Human housekeeping genes are compact. Trends Genet. 2003, 19 (7): 362365. 10.1016/S01689525(03)001409.View ArticlePubMedGoogle Scholar
 Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007, 3 (9): 17241735.View ArticlePubMedGoogle Scholar
 Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003, 100 (16): 94409445. 10.1073/pnas.1530509100.PubMed CentralView ArticlePubMedGoogle Scholar
 Carrel L, HF W: Xinactivation profile reveals extensive variability in Xlinked gene expression in females. Nature. 2005, 434: 400404. 10.1038/nature03479.View ArticlePubMedGoogle Scholar
 Skaletsky H, KurodaKawaguchi T, Minx P, Cordum H, Hillier L, Brown L, Repping S, Pyntikova T, Ali J, Bieri T, Chinwalla A, Delehaunty A, Delehaunty K, Du H, Fewell G, Fulton L, Fulton R, Graves T, Hou SF, Latrielle P, Leonard S, Mardis E, Maupin R, McPherson J, Miner T, Nash W, Nguyen C, Ozersky P, Pepin K, Rock S, Rohlfing T, Scott K, Schultz B, Strong C, TinWollam A, Yang SP, Waterston R, Wilson R, Rozen S, Page D: The malespecific region of the human Y chromosome is a mosic of discrete sequence classes. Nature. 2003, 423: 825837. 10.1038/nature01722.View ArticlePubMedGoogle Scholar
 Huang RS, Duan S, Bleibel WK, Kistner EO, Zhang W, Clark TA, Chen TX, Schweitzer AC, Blume JE, Cox NJ, Dolan ME: A genomewide approach to identify genetic variants that contribute to etoposideinduced cytotoxicity. Proc Natl Acad Sci U S A. 2007, 104 (23): 97589563. 10.1073/pnas.0703736104.PubMed CentralView ArticlePubMedGoogle Scholar
 Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: doi:10.2202/15446115.1027.Google Scholar
 Nguyen DK, Disteche CM: Dosage compensation of the active X chromosome in mammals. Nat Genet. 2006, 38: 4753. 10.1038/ng1705.View ArticlePubMedGoogle Scholar
 Knowles DG, Röder M, Merkel A, Guigó R: Grape RNASeq analysis pipeline environment. Bioinformatics. 2013, 29 (5): 614621. 10.1093/bioinformatics/btt016.PubMed CentralView ArticlePubMedGoogle Scholar
 MarcoSola S, Sammeth M, Guigó R, Ribeca P: The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Methods. 2012, 9 (12): 11851188. 10.1038/nmeth.2221.View ArticlePubMedGoogle Scholar
 Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JGR, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE, Guigo R: Genome Biol. 2006, 7 (Suppl 1): S4.1S4.9.View ArticleGoogle Scholar
 Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139140. 10.1093/bioinformatics/btp616.PubMed CentralView ArticlePubMedGoogle Scholar
 Hougaard P, Lee ML, Whitmore G: Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes. Biometrics. 1997, 53: 12251238. 10.2307/2533492.View ArticlePubMedGoogle Scholar
 Gupta R, Ong S: A new generalization of the negative binomial distribution. Compu Stat Data An. 2004, 45: 287300. 10.1016/S01679473(02)003018.View ArticleGoogle Scholar
 Puig P, Valero J: Count Data Distributions: Some Characterizations With Applications. J Am Stat Assoc. 2006, 101: 332340. 10.1198/016214505000000718.View ArticleGoogle Scholar
 ElShaarawi A, Zhu R, Joe H: Modelling species abundance using the PoissonTweedie family. Environmetrics. 2011, 22: 152164. 10.1002/env.1036.View ArticleGoogle Scholar
 Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc B. 1995, 57: 289300.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.