Skip to main content

Table 5 Factors in the variant calling case

From: doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows

Step

Parameter

Abbr.

Type

Min

Max

Default

Optimized

Variant calling

Global assumed mismapping rate for reads (globalMAPQ)

GMQ

Ordinal

20

55

45

46

Minimum base quality for calling (mbq)

MBQ

Ordinal

5

25

10

10

Minimum reads per alignment start (minReadsPerAlignment)

RAS

Ordinal

5

25

10

20

Minimum confidence threshold for calling (stand_call_conf)

SCC

Quantitative

5

25

10

5

Variant filtering

Quality by depth (QD)

QD

Quantitative

0

10

2

0.41

Read position rank sum test (ReadPosRankSum)

RPRS

Quantitative

−40

0

−20

−37.5

Fisher test for strand bias (FS)

FS

Quantitative

0

250

200

62.5

Strand odds ratio (SOR)

SOR

Quantitative

0

20

10

8.16

  1. The factors investigated in the variant calling case are described above. The optimization was carried out sequentially for two main steps, variant calling and variant filtering, and which step each factor belongs to is indicated. For the variant calling step, the factor’s corresponding command line flag is given in parentheses after the parameter name. For the variant filtering step, the corresponding information tag annotated in the VCF file is indicated in parentheses. The min and max values define the design space. The default values for all factors are also indicated; for the calling step they are the built-in default values of the HaplotypeCaller tool, while for the filtering step the default values are those recommended by the GATK team. The optimized values are those that in combination produced the best outcome, as found by doepipeline