Skip to main content

Table 4 Filter and assembly statistics for Bignorm with Q 0=20, Diginorm, and the raw datasets (Part I)

From: An improved filtering algorithm for big read datasets and its application to single-cell assembly

Dataset Algorithm Reads kept Mean phred Contigs Filter time SPAdes time
   in % score ≥10 000 in sec in sec
Aceto Bignorm 3.16 37.33 1 906 1708
  Diginorm 3.95 27.28 1 3290 4363
  Raw   36.52 3   47,813
Alphaproteo Bignorm 3.13 34.65 18 623 420
  Diginorm 7.81 28.73 17 1629 11,844
  Raw   33.64 17   29,057
Arco Bignorm 2.20 33.77 4 429 207
  Diginorm 8.76 21.39 6 1410 1385
  Raw   32.27 6   15,776
Arma Bignorm 7.90 28.21 44 240 135
  Diginorm 29.30 21.19 50 588 1743
  Raw   26.96 44   5371
ASZN2 Bignorm 5.66 37.66 118 1224 1537
  Diginorm 12.62 32.73 130 5125 21,626
  Raw   36.85 112   47,859
Bacteroides Bignorm 2.85 37.47 6 653 3217
  Diginorm 4.94 27.64 5 2124 3668
  Raw   37.25 9   32,409
Caldi Bignorm 3.97 37.82 41 842 455
  Diginorm 5.61 30.67 36 1838 793
  Raw   37.37 38   7563
Caulo Bignorm 2.40 36.95 10 679 712
  Diginorm 4.70 25.16 9 2584 765
  Raw   36.01 13   18,497
Chloroflexi Bignorm 1.40 31.91 32 694 134
  Diginorm 9.70 18.91 33 2304 1852
  Raw   30.50 34   15,108
Crenarch Bignorm 1.46 33.18 19 1107 790
  Diginorm 9.72 19.80 18 2931 3754
  Raw   31.49 26   20,590
Cyanobact Bignorm 1.65 30.45 12 679 450
  Diginorm 11.30 17.58 13 1487 1343
  Raw   28.49 13   9417
E. coli Bignorm 1.91 26.14 67 2279 598
  Diginorm 17.03 19.34 63 9105 3995
  Raw   24.34 64   16,706
SAR324 Bignorm 4.34 33.05 55 1222 708
  Diginorm 4.69 23.58 52 3706 3085
  Raw   32.52 51   26,237