Skip to main content

Table 1 Datasets Characteristic: The name of synthetic datasets contains the read simulator name

From: IMOS: improved Meta-aligner and Minimap2 On Spark

DataSet name

Dataset size (MB)

Read length (bp)

Read length range (bp)

Mismatch (%)

Indel (%)

Number of reads

Wgsim-S0

59

300

FIX

0.9

0.1

100000

Wgsim-S1

193

1000

FIX

0.9

0.1

100000

Wgsim-S2

193

1000

FIX

0

10

100000

Wgsim-S3

193

1000

FIX

9

1

100000

Wgsim-L0

232

7000

FIX

1

16

20000

Wgsim-L1

232

7000

FIX

0.9

0.1

20000

Wgsim-L2

458

12000

FIX

1

16

20000

SimLoRD

315

8182

500-34687

1

16

20000

PBSim

343

7596

181-24998

1

16

22556

SRX533609

2589

6890

500-39445

1

16

174537

ERX1366175

800

12997

503-55908

1

16

30713

  1. For Wgsim datasets, There are two type of synthetic datasets, S- and L-class which refers to Short and Long reads and their name start with S and L, respectively. The last two datasets are real and represent by the accession number