Skip to main content

Advertisement

Table 1 RNA-seq alignment statistics for different combinations of the sequencing data processing steps

From: Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data

Trim Dup RS Total reads properly paired (%) singletons (%) with mate mapped to a different chr (%) Number of DEGs KEGG: ECM-receptor interaction GO: multicellular organismal process Reactome: Transmembrane transport of small molecules R2
- - - 41,505,942 68.98+-3.71 16.03+-3.20 4.01+-3.48 2189 1.86E-07 8.86E-16 6.38E-13 0.6687
+ - - 51,984,539 60.10+-3.68 10.98+-2.44 17.92+-3.89 2139 3.29E-08 2.86E-13 9.85E-11 0.6614
- + - 15,429,501 61.72+-5.45 12.49+-2.37 10.22+-4.68 2487 1.34E-07 2.50E-22 1.85E-11 0.6672
+ + - 25,738,167 43.14+-5.77 7.51+-1.33 36.18+-5.70 2391 1.97E-07 4.93E-17 3.54E-09 0.6575
- - 75 28,283,010 70.55+-3.17 16.74+-3.85 0.69+-0.09 2100 7.62E-08 8.85E-17 5.71E-12 0.6708
- - 50 26,450,592 70.05+-3.24 17.22+-3.95 0.63+-0.08 2068 7.18E-08 1.31E-16 6.85E-14 0.6712
- - 25 24,703,408 69.63+-3.31 17.66+-4.05 0.62+-0.08 2021 8.92E-09 6.33E-19 2.94E-14 0.6705
- - 0 21,413,178 69.39+-3.47 18.20+-4.26 0.61+-0.09 2087 1.02E-08 3.13E-19 3.98E-15 0.6643
+ - 75 32,589,028 64.70+-3.88 12.29+-2.88 10.46+-1.82 2116 4.88E-07 1.12E-13 2.89E-12 0.6637
+ - 50 30,174,345 64.93+-3.92 12.70+-2.97 9.71+-1.82 2066 3.49E-07 2.96E-14 2.44E-12 0.6642
+ - 25 28,231,486 64.55+-4.00 13.04+-3.03 9.76+-1.87 2004 3.15E-07 5.05E-16 1.13E-13 0.6636
+ - 0 24,546,936 63.85+-4.25 13.42+-3.12 10.26+-2.06 2028 2.65E-07 3.42E-16 3.55E-14 0.6583
- + 75 9,681,047 68.59+-4.15 12.22+-2.96 1.45+-0.25 2302 1.21E-07 4.18E-23 5.71E-12 0.6695
- + 50 8,987,150 68.34+-4.18 12.53+-3.06 1.221+-0.20 2256 1.48E-07 4.80E-22 4.07E-14 0.6700
- + 25 8,346,861 68.09+-4.22 12.83+-3.16 1.21+-0.19 2245 1.48E-07 2.70E-21 1.44E-14 0.6694
- + 0 7,151,500 68.13+-4.26 12.99+-3.12 1.19+-0.20 2326 1.69E-07 2.81E-24 8.28E-16 0.6628
+ + 75 14,251,402 52.88+-6.50 7.54+-1.45 23.48+-5.56 2210 1.18E-06 4.34E-20 3.95E-09 0.6598
+ + 50 12,985,873 53.93+-6.45 7.65+-1.48 22.01+-5.48 2180 7.69E-07 5.94E-19 3.02E-11 0.6604
+ + 25 12,125,100 53.69+-6.50 7.81+-1.51 22.11+-5.58 2124 4.40E-06 2.98E-19 6.28E-13 0.6599
+ + 0 10,416,970 52.34+-6.98 7.74+-1.30 23.54+-6.17 2176 4.22E-06 3.54E-18 3.00E-13 0.6539
  1. "Total reads" - average number of reads; "paired (%)" - average percent of paired reads; "singletons (%)" - average percent of single end reads; "with mate mapped to a different chr (%)" - average percent of inter-chromosome mapped reads. "Number of DEGs" - number of differentially expressed genes. To allow direct comparisons of p-values among the processing steps, the "ECM-receptor interaction" KEGG pathway, the "multicellular organismal process" GO, and the "Transmembrane transport of small molecules" Reactome pathway were selected as the most representative and most enriched functional categories in each processing step, with the full enrichment analyses results shown in Additional Files 4 and 5. "+/-" indicate whether the step (Trim - adapter trimming, Dup - duplicate removal, RS - filtering out low complexity regions with RepeatSoaker) was applied/not applied, respectively. The number in the RepeatSoaker column reflects the threshold of removing reads overlapping with low complexity regions, i.e., 75% indicates that reads overlapping 75% or more with a low complexity region were removed.