ShrinkBayes: a versatile R-package for analysis of count-based sequencing data in complex study designs

Background Complex designs are common in (observational) clinical studies. Sequencing data for such studies are produced more and more often, implying challenges for the analysis, such as excess of zeros, presence of random effects and multi-parameter inference. Moreover, when sample sizes are small, inference is likely to be too liberal when, in a Bayesian setting, applying a non-appropriate prior or to lack power when not carefully borrowing information across features. Results We show on microRNA sequencing data from a clinical cancer study how our software ShrinkBayes tackles the aforementioned challenges. In addition, we illustrate its comparatively good performance on multi-parameter inference for groups using a data-based simulation. Finally, in the small sample size setting, we demonstrate its high power and improved FDR estimation by use of Gaussian mixture priors that include a point mass. Conclusion ShrinkBayes is a versatile software package for the analysis of count-based sequencing data, which is particularly useful for studies with small sample sizes or complex designs.

The first 3 criteria focus on inference performance, the last one on estimation. Table 1 displays the results for FDR-estimation at target FDR=0.1. SpGG generally performs well across all distributions; SpG and SpNP are close, but somewhat inferior for the Gammadistribution and (not surprisingly) the Gaussian mixture. SpSlab is a conservative alternative, which is not surprising given the wide slab, which represents the non-null features. Table 2 shows the number of detections for estimate BFDR(t, Y ) ≈ 0.1. From this table we conclude that the conservative behavior of SpSlab comes at a price: the number of detections is much lower than for the other 3 methods.
Area-under-the-curves (AUC) were computed for the ROC-curves, see Table 3. Results are fairly similar for most distributions, indicating that the ranking of features (which determines the ROC-curve) does not strongly depend on the prior. SpGG seems to outperform the other three for the Gamma-distribution. Note that we also computed AUC for partial ROC-curves, restricted to false positive rate ≤ 0.2. Since results are in the same spirit as those presented here, we do not show these.
Finally, results on accuracies of the estimations are given in Table 4. Here, one may expect SpNP to perform best, given the flexibility of the smooth non-parametric component. We observe, however, that SpGG is a very good competitor which indicates that the mixture is flexible enough for estimation purposes.

The case p 0 = 1
We also evaluated the performances of the four priors in case the proportion of true nullhypotheses, p 0 , equals 1. Then, of course, the estimate of p 0 should be close to 1. Also, the (B)FDR estimate, BFDR(t, Y ), for threshold t equal to the smallest posterior null-probability P (H 0i |Y ), i = 1, . . . , p should be high, so that the procedure will not declare significant findings at any reasonable threshold. Table 5 displays the results for N = 10, 20. Although none of the priors renders any significant result at BFDR < 0.1, we observe that the results are best for the SpGG prior.
2 Changes with respect to ShrinkBayes version 1.6.

List of changes
ShrinkBayes versions 2.3 and higher contain several additional functionalities and changes with respect to ShrinkBayes version 1.6, which corresponds to Van de Wiel et al. (2012): (a) Inference for nested models that differ by more than one variable is now feasible.
(d) Parametric mixture priors are allowed to contain non-equal mixture proportions for negative and positive effects.
(e) Use of mixture priors is easier due to automatic refinement of parameter search.
(f) Comparison of full model M and null-model M 0 can now be performed in two ways. Either exact (recommended), by computing a Bayes Factor from fits under both models or by applying the Savage-Dickey approximation for the Bayes Factor (see below). The latter is computationally faster, because it requires to fit model M only. In addition, it is very convenient for contrasts for which a null-model can not always be defined without the use of constraints. However, it is somewhat less accurate sometimes, because it may depend on parametrization (Wetzels et al., 2010).
(g) Gaussian approximation to the marginal likelihood is used rather than the integrated version (Rue et al., 2009), because the former renders better performance for comparing a null-model with a full model.
(h) Convergence is assessed by total marginal likelihood: if the change is less than 0.1% with respect to the previous value iteration is stopped. This is faster than assessing convergence by Kolmogorov-Smirnov distances per parameter. In addition, it prevents 'walking on a ridge' of the likelihood landscape. The latter is particularly important for p 0 estimation.
(i) Default for the number of decimal digits has changed. More digits were needed for accurate marginal likelihood computation.
(j) Function SummaryTable is added, which supplies summary statistics (lfdr, BFDR, posterior mean) for (significant) features in a data frame format.
(k) Function plotPoster is added, which plots posterior densities.

Savage-Dickey approximation
The Savage-Dickey approximation of the Bayes Factor, which equals BF = ML(Y ; M 0 )/ML(Y ; M): the ratio of marginal likelihoods, is obtained by a conditioning argument (assuming null model M 0 is the same as full model M, except for excluding parameter β): , which is the ratio of the posterior and prior density on 0. Accuracy of the approximation depends on the validity of the conditioning argument (Wetzels et al., 2010).

Preprocessing of miRseq colon tumor plus metastasis data
The miRseq data consist of 2060 features representing 3p-and 5p-variants of novel and known human microRNAs for 55 resections. We did not apply any count-based filtering, because this may introduce biases in a multiple testing setting (van Iterson et al., 2010) and ShrinkBayes is able to process features with many low counts. We normalized the counts using edgeR's function calcNormFactors, with method = "TMM" (Oshlack et al., 2010).

R code
The code below is also available on http://www.few.vu.nl/˜mavdwiel/ShrinkBayes.html, as a "ready-to-run" R-script.
First, we load the library, the miRseq data and the design, then we set the number of cores to be used for parallel computing.
Then, retrieve the covariates: Specify the full model M and null-model M 0 . INLA notation f() is used for random effects. form = y~1 + PM + timepos + chemo + organ1 + organ2 + organ3 + f(indiv) form0 = y~1 + PM + timepos + chemo + f(indiv) Start simultaneous shrinkage for parameters specified under shrinkfixed (contains parameter of primary interest, here PM) and under shrinkaddfixed; excludefornull specifies effects left out in the null-model (form0) for the purpose of multi-parameter inference. Default likelihood is zero-inflated negative binomial.
lcv <-AllComp(c("organ1","organ2","organ3")) Fit models M and M 0 for all features. The contrasts are only relevant under model M. We set finalprior=TRUE, because the priors resulting from the ShrinkSeq are the final ones (we do not intend to update these; see remarks at the end of this section).

plotPoster(539,postercombined)
Further remarks on the code The precision parameter of indiv is not shrunken, because a) the non-shrunken estimate is likely to be accurate given the number of individuals (26) in the study and b) modeling indiv as a random effect already shrinks the effect of each individual.
When one aims to further update the priors (using either of the functions NonparaUp-datePrior or MixtureUpdatePrior, see example in next section) it is better to disperse the priors returned by ShrinkSeq when using FitAllShrink (for numerical stability: Van de Wiel et al., 2012). This is achieved by using the default finalprior=FALSE. We built in an extra safeguard in the function BFUpdatePosterior: it will not run when the prior was (mistakenly) dispersed, so when finalprior=FALSE was used.
About direction="equal" in the function SummaryTable: in a multi-parameter setting this tests the joint null-hypothesis of all parameters equalling zero. Alternatively, one may opt to use direction="two-sided". In such a case, a multiple comparisons approach is taken (see Van de Wiel et al., 2012): a BFDR is computed for all parameters (and/or contrasts), and a feature is selected if the minimum BFDR ≤ 0.10. The latter approach is slightly more conservative (e.g. for this example, 37 instead of 43 miRs would be selected).

Inference for (organ-specific) P-M contrast
Load the data and the design, set the number of cores, and retrieve the covariates as in the previous section. Since the overall P-M effect should reflect the difference of P with respect to M on average across organs, we redefine the effect of the baseline organ 0 to be minus the sum of the effects of the other organs (which implies a sum-to-zero constraint for the organ effects).
plotPoster(333, mixtpostshr, plotlabel="P-M") The inference for the overall P-M contrast does not render significant miRNAs at BFDR = 0.1. The small sample sizes for organs 1 to 3 (5,3, and 5, respectively) and the differences between the four organs most likely cause this. We now focus on the contrast between P and organ 0 (sample size 15). For this we can conveniently directly use the fitall object, so it is not necessary to also fit a new null-model: the Bayes factor of the full model and the null model is computed by the Savage-Dickey approximation (see Section 2.1). Hence, we immediately compute a mixture prior for the contrast of interest, where fitall0=NULL by default, and update the posteriors: mixtprior0 <-MixtureUpdatePrior(fitall, shrinklc="PminMorgan0", ncpus = nc) mixtpostshr0 <-MixtureUpdatePosterior(fitall, mixtprior0, ncpus = nc) Again, we compute a summary table, with BFDR threshold 0.1:

Computing time
Computing time depends on the number of samples, number of features, complexity of the design and the number of cpus available for parallel computing. The examples above both run in approximately 30 min. on 6 cpus of a Linux-cluster. The functions ShrinkSeq, MixtureUp-datePrior and FitAllShrink demand most of the time, but only the latter one takes more time with more features. E.g. for 100,000 features computing time would be approximately 500 min. Extremely large data sets can first be screened with p-value-based methods using ShrinkBayes functions FastScreenP and ScreenData, which potentially reduces computing time by a large factor.