Skip to main content

Table 4 The detailed situation of the segmentation indicators in each dataset

From: ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark

Dataset

D1

D2

D3

D4

BAM file size

67.8GB

128.5GB

59.3GB

250.15GB

Default number of data blocks

543

1028

475

2002

Target number of segmentations (5%)

28

52

24

102

Actual number of segmentations

84

147

76

303

Actual proportion of segmenting

15.47%

14.3%

16%

15.13%

Number of matching blocks

27

52

24

101

Segmenting Precision

32.14%

35.37%

31.58%

33.33%

Segmenting Recall

96.43%

100%

100%

99.02%