Skip to main content

Table 1 Choosing k and K

From: FMLRC: Hybrid long read error correction using an FM-index

 

K

k

49

59

69

79

89

E.coli - Matching Bases

17

404736174

405150086

404809579

404608992

404361044

403888297

19

403117580

404580392

404571352

404418084

404297325

403826761

21

403002237

404365615

404367089

404255830

404131272

403841312

23

403516577

404381062

404378242

404240202

404107504

404041491

25

403819785

404461301

404480970

404453527

404363292

404236385

E. coli - Gain

17

0.1011

0.388

0.4709

0.5258

0.5468

0.521

19

0.3823

0.5887

0.612

0.6245

0.6279

0.6172

21

0.4879

0.634

0.6429

0.6459

0.6442

0.6345

23

0.5137

0.641

0.6474

0.6487

0.6457

0.6361

25

0.523

0.6396

0.6453

0.6461

0.6422

0.6318

S. cerevisiae - Matching Bases

17

1250679980

1253590990

1252340540

1251288299

1250445441

1249925285

19

1250052124

1252517259

1252462544

1252139063

1251853858

1251785285

21

1248322270

1251887685

1251963458

1251672602

1251758116

1251744201

23

1248801294

1252245368

1252387319

1252408890

1252545735

1252558864

25

1249574404

1252269051

1252478532

1252557840

1252778626

1252739127

S. cerevisiae - Gain

17

0.0264

0.224

0.3159

0.3946

0.452

0.4871

19

0.1172

0.3903

0.443

0.4822

0.5096

0.5273

21

0.2527

0.4938

0.5129

0.527

0.5367

0.5434

23

0.3319

0.5153

0.5251

0.5332

0.5388

0.5435

25

0.3728

0.5155

0.5226

0.5287

0.5334

0.5372

  1. This table shows the result of running FMLRC using many different values for k and K for an E. coli and S. cerevisiae datasets
  2. The test cases with K=− indicate that no second pass of correction using the long K-mer was performed, so those test cases use a single pass short k-mer only. After correcting the reads, we aligned the results using BLASR [22] and gathered statistics on the alignments. Matching bases indicates the number of matching bases across all mappings. Gain is defined as (TPFP)/(TP+FN) (see “Correction accuracy” section). For each statistic, the best result is bolded in the above table. To summarize, increasing values for k and K tend to increase the gain but decrease the total matching bases - a tradeoff between sensitivity and specificity. Additionally, all tested values of K for a long K-mer pass improves the results over a single k-mer pass