ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark

Table 4 The detailed situation of the segmentation indicators in each dataset

Dataset	D1	D2	D3	D4
BAM file size	67.8GB	128.5GB	59.3GB	250.15GB
Default number of data blocks	543	1028	475	2002
Target number of segmentations (5%)	28	52	24	102
Actual number of segmentations	84	147	76	303
Actual proportion of segmenting	15.47%	14.3%	16%	15.13%
Number of matching blocks	27	52	24	101
Segmenting Precision	32.14%	35.37%	31.58%	33.33%
Segmenting Recall	96.43%	100%	100%	99.02%

ISSN: 1471-2105