Skip to main content

Table 3 Classification Performance with Reference Genomes of genus Arabidopsis

From: Machine learning on alignment features for parent-of-origin classification of simulated hybrid RNA-seq

Arabidopsis DNA

A

HiSat2

B

Hi_AS

C

Hi_RF

D

STAR

E

St_AS

F

St_RF

G

bwa

Accuracy

73.3%

82.8%

94.5%

72.9%

81.4%

88.6%

76.4%

Sensitivity

50.1%

72.1%

91.2%

49.6%

69.9%

87.6%

56.1%

Specificity

96.4%

93.5%

98.3%

96.2%

93.0%

89.7%

96.7%

Precision

93.4%

91.7%

98.1%

92.9%

90.9%

89.5%

94.4%

F1-score

65.2%

80.7%

94.5%

64.6%

79.0%

88.5%

70.4%

MCC

0.525

0.671

0.897

0.518

0.646

0.773

0.578

AUPRC

–

–

99.3%

–

–

96.5%

–

AUROC

–

–

99.2%

–

–

96.2%

–

Pos Pref

26.8%

39.3%

46.5%

26.7%

38.4%

48.9%

29.7%

Ties

–

13.3%

–

–

14.7%

–

–

  1. Performance metrics for parent-of-origin classification in Arabidopsis. In all six approaches, RNA-seq read pairs were assigned to either of two reference genomes. Whether used with HiSat2 or STAR, the random forest led to superior accuracy, F1, and MCC. For the sake of directional statistics like sensitivity, species A. lyrata and A. halleri were designated as the negative and positive classes, respectively. (A) Parent chosen by the HiSat2 aligner. (B) Parent chosen by comparing HiSat2 alignment scores. (C) Parent chosen by the random forest classifier using HiSat2 alignment features. (D, E, F) Similar to columns A, B, and C but using the STAR aligner, configured for splicing. (G) Parent chosen by bwa