Skip to main content

Table 1 Analysis of the influence of different threshold values on reference genome selection after taxonomy identification and compression ratios

From: MetaCRAM: an integrated pipeline for metagenomic taxonomy identification and compression

Data

Original (MB)

Comp. (MB)

Processing time

Align. %

No. files

Comp. (MB)

Processing time

Align. %

No. files

ERR321482

1429

191

299 m 20 s

26.99

211

193

239 m 28 s

24.22

29

   

422 m 21 s

3.57

1480

 

398 m 3 s

6.5

1567

   

12 m 24 s

   

8 m 13 s

  

SRR359032

3981

319

127 m 34 s

57.72

26

320

93 m 60 s

57.71

7

   

245 m 53 s

9.7

30

 

206 m 18 s

9.71

32

   

8 m 37s

   

7 m 27 s

  

ERR532393

8230

948

639 m 55 s

45.78

267

963

522 m

42.45

39

   

1061 m 50 s

1.98

1456

 

1067 m 49 s

7.16

1639

   

73 m 59 s

   

28 m 13s

  

SRR1450398

5399

703

440 m 4 s

7.14

190

703

364 m 34 s

6.82

26

   

866 m 56 s

0.6

793

 

790 m 52 s

0.91

818

   

21 m 2 s

   

17 m 38 s

  

SRR062462

6478

137

217 m 21 s

2.55

278

139

197 m 15 s

2.13

50

   

254 m 26 s

0.13

570

 

241 m 2 s

0.51

656

   

15 m 45 s

   

19 m 31 s

  
  1. Columns in bold represent a threshold of 75 species, while the columns not bolded correspond to a cutoff of 10 species. The results are shown for MetaCRAM-Huffman. “Align. %” refers to the alignment rates for the first and second round, and “No. files” refers to the number of reference genome files selected in the first and second iteration. Processing times are recorded row by row denoting real, user, and system time in order