Skip to main content

Table 5 Comparison of AUC scores of RF (random forest) combined with word frequency vector with that based on Manhattan distance and \(d_{2}^{*}\) statistic when k=6

From: Prediction of virus-host infectious association by supervised learning methods

 

Manhattan

\({d}_{2}^{*}\)

RF-feat-1

  

i.i.d.

1st−m c

2nd−m c

 

Bacillus

0.829

0.752

0.873

0.851

0.863

Escherichia

0.880

0.833

0.958

0.945

0.856

Lactococcus

0.767

0.775

0.828

0.750

1.000

Mycobacterium

0.976

0.977

0.966

0.984

0.985

Pseudomonas

0.951

0.934

0.974

0.970

0.981

Salmonella

0.837

0.818

0.900

0.900

0.896

Staphylococcus

0.964

0.941

0.947

0.974

0.987

Synechococcus

0.929

0.906

0.994

0.993

0.978

Vibrio

0.841

0.733

0.854

0.817

0.940

  1. For the background model of \(d_{2}^{*}\) statistic, we considered independent identically distributed (i.i.d.) model, first and second order Markov chains