Skip to main content

Table 1 Analysis of the influence of different threshold values on reference genome selection after taxonomy identification and compression ratios

From: MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression

Data Original (MB) Comp. (MB) Processing time Align. % No. files Comp. (MB) Processing time Align. % No. files
ERR321482 1429 191 299 m 20 s 26.99 211 193 239 m 28 s 24.22 29
    422 m 21 s 3.57 1480   398 m 3 s 6.5 1567
    12 m 24 s     8 m 13 s   
SRR359032 3981 319 127 m 34 s 57.72 26 320 93 m 60 s 57.71 7
    245 m 53 s 9.7 30   206 m 18 s 9.71 32
    8 m 37s     7 m 27 s   
ERR532393 8230 948 639 m 55 s 45.78 267 963 522 m 42.45 39
    1061 m 50 s 1.98 1456   1067 m 49 s 7.16 1639
    73 m 59 s     28 m 13s   
SRR1450398 5399 703 440 m 4 s 7.14 190 703 364 m 34 s 6.82 26
    866 m 56 s 0.6 793   790 m 52 s 0.91 818
    21 m 2 s     17 m 38 s   
SRR062462 6478 137 217 m 21 s 2.55 278 139 197 m 15 s 2.13 50
    254 m 26 s 0.13 570   241 m 2 s 0.51 656
    15 m 45 s     19 m 31 s   
  1. Columns in bold represent a threshold of 75 species, while the columns not bolded correspond to a cutoff of 10 species. The results are shown for MetaCRAM-Huffman. “Align. %” refers to the alignment rates for the first and second round, and “No. files” refers to the number of reference genome files selected in the first and second iteration. Processing times are recorded row by row denoting real, user, and system time in order