Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: HELLO: improved neural network architectures and methodologies for small variant calling

Fig. 1

Excess errors in DeepVariant and GATK variant calls compared to HELLO for HG003 Illumina Whole Genome Sequencing (WGS) data. Positive values indicate better performance by HELLO over competitor. Bars reference the left y-axis and indicate differences in FP and FN counts between tool (GATK or DeepVariant) and HELLO. Formally, let \({\text{N}}_{{{\text{FP}}}}^{{\text{T}}} ,{\text{ N}}_{{{\text{FN}}}}^{{\text{T}}} , \in _{{\text{T}}} = \left( {{\text{N}}_{{{\text{FP}}}}^{{\text{T}}} + {\text{N}}_{{{\text{FN}}}}^{{\text{T}}} } \right)\) represent, respectively, the false positive count, false negative count, and total error count for tool \({\text{T}}\) at a given coverage point. Then, the value of the bar, marked “FP”, for that coverage point for tool \({\text{T}}\), is \({\text{N}}_{{{\text{FP}}}}^{{\text{T}}} - {\text{N}}_{{{\text{FP}}}}^{{{\text{HELLO}}}} ,\) and the value of the bar marked “FN” is \({\text{N}}_{{{\text{FN}}}}^{{\text{T}}} - {\text{N}}_{{{\text{FN}}}}^{{{\text{HELLO}}}}\). The line plots reference the right y-axis, and indicate the differences in total error count as a fraction of total errors of the worse performing method at each coverage point. Formally, for the line plots for tool \({\text{T}}\), the value of a point in the line plot is \(\left( {\in_{{\text{T}}} - \in{}_{{{\text{HELLO}}}} } \right)/{\text{max}}\left( {\in_{{\text{T}}} ,{ }\in_{{{\text{HELLO}}}} } \right).\) This scheme is followed for Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9

Back to article page