Skip to main content

Table 1 Accuracy of heterozygous variant calling by Pilon was assessed

From: HaplotypeTools: a toolkit for accurately identifying recombination and recombinant genotypes

Test

1

2

3

4

5

6

Introduced HET

23,13,700

23,13,700

2,31,370

2,31,370

23,137

23,137

Introduced HET (/kb)

100

100

10

10

1

1

Read Length (nt, paired)

100

10,000

100

10,000

100

10,000

SNP

86,808

1,89,498

7,752

16,583

1,075

1,633

HET

20,90,044

17,81,654

2,30,380

1,91,558

37,153

22,365

INS

43

66

0

0

0

0

DEL

48

69

2

1

2

1

AMB

2

92

2

59

2

67

TP

20,27,722

17,45,567

2,13,362

1,87,637

21,429

18,807

TN

2,08,11,068

1,97,41,846

2,28,43,007

2,19,30,208

2,30,49,406

2,20,49,041

FP

62,322

36,087

17,018

3,921

15,724

3,558

FN

98,679

1,65,971

9,073

16,227

904

1,572

FP other

86,901

1,89,725

7,756

16,643

1,079

1,701

TP (%)

87.64

75.44

92.22

81.10

92.62

81.29

TN (%)

89.94

85.32

98.73

94.78

99.62

95.29

FP (%)

2.98

2.03

7.39

2.05

42.32

15.91

FN (%)

4.26

7.17

3.92

7.01

3.91

6.79

Sensitivity

0.95

0.91

0.96

0.92

0.96

0.92

Specificity

0.99

0.99

1.00

1.00

1.00

1.00

Accuracy

0.99

0.98

1.00

1.00

1.00

1.00

  1. Paired reads (100nt or 10 kb) were simulated at 20X depth from reference Bd JEL423 genome that was duplicated to create an in silico diploid. In silico mutations were then randomly introduced throughout (1/kb, 10/kb or 100/kb). Reads were aligned to the original reference sequence (non-duplicated, non-mutated version), and diploid variants called by Pilon. Counts of variants are shown including single nucleotide polymorphisms (SNP), heterozygous positions (HET), insertions (INS), deletions (DEL) and ambiguous (AMB). Accuracy was assessed according to Comparison of FDR tool [28], that calculated TN = true negatives (correct reference bases), TP = true positives (correct HET), FN = false negatives (incorrect reference bases) and FP = false positives (incorrect HET). FP (other) is a count of all additional (non-heterozygous) incorrect bases including SNPs, INS, DEL and AMB. > 99% of FP (other) were SNPs. TP (%) and FN (%) are precents of Introduced HET, FN (%) is a percent of assembly length, and FP (%) is a percent of HETs called. Sensitivity = TP/(TP + FN), Specificity = TN/(TN + FP + FP (other)), Accuracy = (TN + TP)/(TN + TP + FN + FP + FP (other))