Skip to main content

Table 1 Accuracy of heterozygous variant calling by Pilon was assessed

From: HaplotypeTools: a toolkit for accurately identifying recombination and recombinant genotypes

Test 1 2 3 4 5 6
Introduced HET 23,13,700 23,13,700 2,31,370 2,31,370 23,137 23,137
Introduced HET (/kb) 100 100 10 10 1 1
Read Length (nt, paired) 100 10,000 100 10,000 100 10,000
SNP 86,808 1,89,498 7,752 16,583 1,075 1,633
HET 20,90,044 17,81,654 2,30,380 1,91,558 37,153 22,365
INS 43 66 0 0 0 0
DEL 48 69 2 1 2 1
AMB 2 92 2 59 2 67
TP 20,27,722 17,45,567 2,13,362 1,87,637 21,429 18,807
TN 2,08,11,068 1,97,41,846 2,28,43,007 2,19,30,208 2,30,49,406 2,20,49,041
FP 62,322 36,087 17,018 3,921 15,724 3,558
FN 98,679 1,65,971 9,073 16,227 904 1,572
FP other 86,901 1,89,725 7,756 16,643 1,079 1,701
TP (%) 87.64 75.44 92.22 81.10 92.62 81.29
TN (%) 89.94 85.32 98.73 94.78 99.62 95.29
FP (%) 2.98 2.03 7.39 2.05 42.32 15.91
FN (%) 4.26 7.17 3.92 7.01 3.91 6.79
Sensitivity 0.95 0.91 0.96 0.92 0.96 0.92
Specificity 0.99 0.99 1.00 1.00 1.00 1.00
Accuracy 0.99 0.98 1.00 1.00 1.00 1.00
  1. Paired reads (100nt or 10 kb) were simulated at 20X depth from reference Bd JEL423 genome that was duplicated to create an in silico diploid. In silico mutations were then randomly introduced throughout (1/kb, 10/kb or 100/kb). Reads were aligned to the original reference sequence (non-duplicated, non-mutated version), and diploid variants called by Pilon. Counts of variants are shown including single nucleotide polymorphisms (SNP), heterozygous positions (HET), insertions (INS), deletions (DEL) and ambiguous (AMB). Accuracy was assessed according to Comparison of FDR tool [28], that calculated TN = true negatives (correct reference bases), TP = true positives (correct HET), FN = false negatives (incorrect reference bases) and FP = false positives (incorrect HET). FP (other) is a count of all additional (non-heterozygous) incorrect bases including SNPs, INS, DEL and AMB. > 99% of FP (other) were SNPs. TP (%) and FN (%) are precents of Introduced HET, FN (%) is a percent of assembly length, and FP (%) is a percent of HETs called. Sensitivity = TP/(TP + FN), Specificity = TN/(TN + FP + FP (other)), Accuracy = (TN + TP)/(TN + TP + FN + FP + FP (other))