Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: Internal and external normalization of nascent RNA sequencing run-on experiments

Fig. 2

Spike-ins have unusual behavior at the extremes. To assess where our model diverges in behavior from linear regression, we ran the VSI model on data from a number of published experiments [19,20,21,22,23,24,25,26,27,28,29,30,31,32]. Within each experiment, samples were grouped by condition and analyzed within those groups. All samples had Drosophila spike-ins, so annotated Drosophila genes were selected as the invariant set to count over. A Comparison of regression factors inferred by linear regression (x-axis) to those inferred by the Bayesian VSI model (y-axis). Estimates are shown along with an error bound of \(\pm \sigma\). Notably, the regression estimate (x-axis) and VSI estimate (y-axis) deviate most dramatically when the absolute value of the normalization factor is large. B When we plot the depth of coverage of the spike-in (x-axis) against the VSI error estimate (y-axis) shows samples with less than \(10\times\) spike-in transcriptome coverage are less consistent than those above this threshold (dotted red line). Of note, error estimates range between 0.8 and 1.0, but when applied to the data they must be converted out of log2 space and multiplied by the normalization factor. Hence the impact of the error will scale with the normalization factor size. In a biological context, this is good—samples with large normalization factors have less confidence indicating poorer experimental efficiency and reproducibility

Back to article page