Skip to main content

Table 1 Experimental data for performance evaluation

From: BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data

Data set

Tailored data size

# of reads

Description

Simulation data

122MB

1,000,000

Simulation set with 0% error

 

122MB

1,000,000

Simulation set with 1% error

 

122MB

1,000,000

Simulation set with 2% error

GEO WGBS data (GSE80911)

1.6GB

10,000,000

10 million reads real data set

 

7.9GB

50,000,000

50 million reads real data set

 

16GB

100,000,000

100 million reads real data set

 

32GB

200,000,000

200 million reads real data set

Reference genome

Build 37, hg19

  
  1. Simulation data sets are generated by Sherman [26] with various error rates (0%, 1% and 2% respectively) where the error rate is a mean error rate per bp whereby the error curve follows an exponential decay model. Each test data sets are tailored from original WGBS data based on number of reads