Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: ToTem: a tool for variant calling pipeline optimization

Fig. 1

a Once the pipeline is set up for the optimization, all the configurations are run in parallel using raw input data. In this particular example, the emphasis is placed on optimizing the variant calling filters, however, the pipeline design depends on the user’s needs. In the case of the GIAB approach, the benchmarking step is part of the pipeline done by RTG Tools and hap.py. The pipeline results in the form of the stratified performance reports (csv) provided by hap.py are imported into ToTem’s internal database and filtered using ToTem’s filtering tool. This allows the best performing pipeline to be selected based on the chosen quality metrics, variant type and genomic region. b Similar to the previous diagram, the optimization is focused on tuning the variant filtering. Contrary to the previous case, Little Profet requires the pipeline results to be represented as tables of normalized variants with mandatory headers (CHROM, POS, REF, ALT). Such data are imported into ToTem’s internal database for pipeline benchmarking by the Little Profet method. Benchmarking is done by comparing the results of each pipeline to the ground truth reference variant calls in the given regions of interest and by estimating TP, FP, FN; and quality metrics derived from them - precision, recall and F-measure. To prevent overfitting of the pipelines, Little Profet also calculates the reproducibility of each quality metric over different data subsets. The results are provided in the form of interactive graphs and tables

Back to article page