Skip to main content

Table 3 Summary of the parameters used to generate the synthetic data sets

From: A comparison of methods for differential expression analysis of RNA-seq data

Sim. study | G DE up | | G DE down | |{g;ϕ g  = 0}| ‘Single’ outlier fraction ‘Random’ outlier fraction
B 0 0 0 0 0 0 0
B 0 1250 1,250 0 0 0 0
B 625 625 625 625 0 0 0
B 0 4000 4,000 0 0 0 0
B 2000 2000 2,000 2,000 0 0 0
P 0 0 0 0 6,250 0 0
P 625 625 625 625 6,250 0 0
S 0 0 0 0 0 10% 0
S 625 625 625 625 0 10% 0
R 0 0 0 0 0 0 5%
R 625 625 625 625 0 0 5%
  1. In all synthetic data sets, the observations were distributed between two conditions (denoted S1 and S2), with the same number of observations (2, 5 or 10) in each condition. We let G DE up and G DE down denote, respectively, the number of genes that were up- and downregulated in condition S2 compared to S1. The number of genes whose counts were drawn from a Poisson distribution (i.e., with the dispersion parameter equal to zero) is given by |{g; ϕ g  = 0}|. The ‘single’ outlier fraction denotes the fraction of the genes for which we selected a single sample and multiplied the corresponding count with a factor between 5 and 10. The ‘random’ outlier fraction denotes the fraction of counts that were selected randomly (among all counts) and multiplied with a factor between 5 and 10. The notation for the simulation studies (leftmost column) summarizes the type of simulation (B - ‘baseline’, P - ‘Poisson’, S - ‘single outlier’, R - ‘random outlier‘), the number of DE genes that are upregulated in S2 (i.e., G DE up , in the superscript) and the number of DE genes that are downregulated in S2 (i.e., | G DE down | , , in the subscript).