Skip to main content

Table 1 Datasets Characteristic: The name of synthetic datasets contains the read simulator name

From: IMOS: improved Meta-aligner and Minimap2 On Spark

DataSet name Dataset size (MB) Read length (bp) Read length range (bp) Mismatch (%) Indel (%) Number of reads
Wgsim-S0 59 300 FIX 0.9 0.1 100000
Wgsim-S1 193 1000 FIX 0.9 0.1 100000
Wgsim-S2 193 1000 FIX 0 10 100000
Wgsim-S3 193 1000 FIX 9 1 100000
Wgsim-L0 232 7000 FIX 1 16 20000
Wgsim-L1 232 7000 FIX 0.9 0.1 20000
Wgsim-L2 458 12000 FIX 1 16 20000
SimLoRD 315 8182 500-34687 1 16 20000
PBSim 343 7596 181-24998 1 16 22556
SRX533609 2589 6890 500-39445 1 16 174537
ERX1366175 800 12997 503-55908 1 16 30713
  1. For Wgsim datasets, There are two type of synthetic datasets, S- and L-class which refers to Short and Long reads and their name start with S and L, respectively. The last two datasets are real and represent by the accession number