Skip to main content

Table 2 k -mers counting results for Homo sapiens NA19238 individual (353 GB FASTQ file or 463 gzipped FASTQ files of total size 116.6 GB)

From: Disk-based k-mer counting on a PC

 

k=22

k=25

k=28

k=31

Algorithm

Space

Time

Space

Time

Space

Time

Space

Time

 

Classic counters

 

32-core server

BFCounter

46/ 0

114,083

41/ 0

99,468

Failed

Failed

Jellyfish

50/ 0

2,303

64/ 0

2,258

75/ 0

2,208

87/ 0

2,107

Jellyfish1

27/ 39

2,964

33/ 36

2,769

21/ 27

2,673

24/ 22

2,511

DSK

6/200

6,490

6/340

6,020

6/280

5,115

6/221

4,215

DSK gz

6/200

9,076

6/340

8,029

6/280

7,157

6/221

6,424

KMC

32/104

1,405

32/130

1,488

32/133

1,522

32/121

1,471

KMC

16/107

1,548

16/131

1,657

16/141

1,684

16/128

1,568

KMC gz

32/104

1,040

32/129

1,066

32/132

1,055

32/120

989

KMC gz

16/107

1,278

16/130

1,631

16/141

1,662

16/125

1,307

 

6-core PC

DSK

6/200

21,468

6/340

18,774

6/280

15,384

6/221

11,857

DSK gz

6/200

18,939

6/340

16,818

6/280

14,694

6/221

12,070

KMC

11/107

3,482

11/128

3,442

11/138

3,584

11/127

3,515

KMC gz

11/107

2,198

11/128

2,206

11/138

2,303

11/127

2,365

 

Quake-compatible counters

 

32-core server

BFCounter

70/ 0

171,888

72/ 0

180,861

Failed

Failed

Jellyfish

100/ 0

4,339

57/230

2,891

64/192

3,246

70/175

3,161

KMC

32/311

2,585

32/302

2,467

32/282

2,347

32/237

2,106

KMC

16/315

2,615

16/305

2,730

16/283

2,592

16/245

2,284

KMC gz

32/310

2,071

32/302

1,995

32/282

1,880

32/237

1,690

KMC gz

16/318

2,273

16/304

2,611

16/283

2,015

16/244

2,707

 

6-core PC

KMC

11/316

5,538

11/298

5,533

11/277

5,184

11/242

5,016

KMC gz

11/316

5,370

11/298

5,060

11/277

4,708

11/243

4,643

  1. RAM and disk spaces (the first and the second value in the column “Space”, respectively) are in GB (1GB= 230B). Time is in seconds. The test machines: 32-core server, 6-core PC (see more details in the text). Superscripts denote: 1—RAM limited to 36GB, gz—input data in gzipped files. The programs were used for the number of threads adjusted to the number of cores to achieve maximum speed. The asterisk signs (for Jellyfish) denote that two separate databases were constructed by Jellyfish due to the memory limit of the machine (128GB RAM) and Jellyfish reported that to merge these databases it needs more RAM, so these times are underestimated.