A comparison of methods for differential expression analysis of RNA-seq data

BMC Bioinformatics

Table 3 Summary of the parameters used to generate the synthetic data sets

Sim. study	$\| G_{DE}^{up} \|$	$\| G_{DE}^{down} \|$	\|{g; ϕ_g = 0}\|	‘Single’ outlier fraction	‘Random’ outlier fraction
$B_{0}^{0}$	0	0	0	0	0
$B_{0}^{1250}$	1,250	0	0	0	0
$B_{625}^{625}$	625	625	0	0	0
$B_{0}^{4000}$	4,000	0	0	0	0
$B_{2000}^{2000}$	2,000	2,000	0	0	0
$P_{0}^{0}$	0	0	6,250	0	0
$P_{625}^{625}$	625	625	6,250	0	0
$S_{0}^{0}$	0	0	0	10%	0
$S_{625}^{625}$	625	625	0	10%	0
$R_{0}^{0}$	0	0	0	0	5%
$R_{625}^{625}$	625	625	0	0	5%

In all synthetic data sets, the observations were distributed between two conditions (denoted S₁ and S₂), with the same number of observations (2, 5 or 10) in each condition. We let $|G_{DE}^{up}|$ and $|G_{DE}^{down}|$ denote, respectively, the number of genes that were up- and downregulated in condition S₂ compared to S₁. The number of genes whose counts were drawn from a Poisson distribution (i.e., with the dispersion parameter equal to zero) is given by |{g; ϕ_g = 0}|. The ‘single’ outlier fraction denotes the fraction of the genes for which we selected a single sample and multiplied the corresponding count with a factor between 5 and 10. The ‘random’ outlier fraction denotes the fraction of counts that were selected randomly (among all counts) and multiplied with a factor between 5 and 10. The notation for the simulation studies (leftmost column) summarizes the type of simulation (B - ‘baseline’, P - ‘Poisson’, S - ‘single outlier’, R - ‘random outlier‘), the number of DE genes that are upregulated in S₂ (i.e., $|G_{DE}^{up}|$ , in the superscript) and the number of DE genes that are downregulated in S₂ (i.e., $| G_{DE}^{down} |,$ , in the subscript).

ISSN: 1471-2105