Skip to main content

Table 5 Factors in the variant calling case

From: doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows

Step Parameter Abbr. Type Min Max Default Optimized
Variant calling Global assumed mismapping rate for reads (globalMAPQ) GMQ Ordinal 20 55 45 46
Minimum base quality for calling (mbq) MBQ Ordinal 5 25 10 10
Minimum reads per alignment start (minReadsPerAlignment) RAS Ordinal 5 25 10 20
Minimum confidence threshold for calling (stand_call_conf) SCC Quantitative 5 25 10 5
Variant filtering Quality by depth (QD) QD Quantitative 0 10 2 0.41
Read position rank sum test (ReadPosRankSum) RPRS Quantitative −40 0 −20 −37.5
Fisher test for strand bias (FS) FS Quantitative 0 250 200 62.5
Strand odds ratio (SOR) SOR Quantitative 0 20 10 8.16
  1. The factors investigated in the variant calling case are described above. The optimization was carried out sequentially for two main steps, variant calling and variant filtering, and which step each factor belongs to is indicated. For the variant calling step, the factor’s corresponding command line flag is given in parentheses after the parameter name. For the variant filtering step, the corresponding information tag annotated in the VCF file is indicated in parentheses. The min and max values define the design space. The default values for all factors are also indicated; for the calling step they are the built-in default values of the HaplotypeCaller tool, while for the filtering step the default values are those recommended by the GATK team. The optimized values are those that in combination produced the best outcome, as found by doepipeline