Skip to main content

Table 1 Description of the training and testing data

From: Prediction of virus-host infectious association by supervised learning methods

Bacterial

Cutting

# of viruses

# of viruses

# of

genus

year

before

after the

non-infectious

  

cutting year

cutting year

viruses

Bacillus

2012

31

31

1364

Escherichia

2012

141

32

1253

Lactococcus

2013

49

6

1371

Mycobacterium

2013

172

46

1208

Pseudomonas

2013

68

28

1330

Salmonella

2012

32

22

1372

Staphylococcus

2012

43

20

1363

Synechococcus

2012

30

17

1379

Vibrio

2012

39

29

1358

  1. For a specific year, the positive training data set contains viruses infecting the corresponding host identified before the specific year and the positive testing data set contains viruses infecting the corresponding host discovered after the specific year. The negative training data and the negative testing data were chosen randomly without overlaps from the viruses that were not identified to infect the host