Skip to main content

Table 1 The top HP PGFams that serve as features of WSPC according to their average MDI rank, along with the number of HP and NHP genomes in the training set that contain the respective PGFams

From: Predicting the pathogenicity of bacterial genomes using widely spread protein families

 

PGFam ID

Gene function

General function

MDI (SD\(^1\))

HPs

NHPs

P-ratio\(^2\)

# Genera\(^3\)

1

04139053

Uroporphyrinogen III decarboxylase

Energy production

0.038 (0.012)

362

27

6.47

109

2

01915472

Dihydrolipoamide acetyltransferase component of PDC\(^{*}\)

Aerobic respiration

0.035 (0.01)

385

48

3.93

120

3

07629184

Cytosol aminopeptidase PepA

Regulation

0.03 (0.009)

366

39

4.58

115

4

07157721

Heme O synthase, protoheme IX farnesyltransferase, COX10-CtaB

Aerobic respiration

0.022 (0.007)

312

14

10.41

89

5

00022550

Molybdopterin synthase catalytic subunit MoaE

Respiration, energy conversion

0.013 (0.005)

303

17

8.42

89

6

01033770

Class 2 Dihydroorotate dehydrogenase (DHODase)

Amino acid biosynthesis

0.011 (0.005)

333

35

4.63

99

7

00006100

tRNA-modifying protein YgfZ

Synthesis and repair

0.011 (0.006)

305

17

8.48

93

8

07941512

23S rRNA methyltransferase

Methylation

0.01 (0.003)

324

37

4.27

89

9

00405499

YpfJ protein, zinc metalloprotease superfamily

Protein cleavage

0.009 (0.003)

273

13

9.76

87

10

06757295

Threonine dehydratase

Amino acid biosynthesis

0.008 (0.004)

323

34

4.62

95

11

08199696

Glutathione reductase

Stress tolerance

0.008 (0.002)

220

12

8.48

66

12

03081665

Cell division integral membrane protein, YggT

Stress tolerance

0.007 (0.003)

352

56

3.09

109

13

07854425

Superoxide dismutase [Cu–Zn] precursor

Stress tolerance

0.006 (0.001)

252

21

5.74

75

14

01668012

Sulfur carrier protein FdhD

Stress tolerance

0.006 (0.003)

300

26

5.56

98

15

01147190

Deoxyribodipyrimidine photolyase

DNA repair

0.006 (0.003)

281

17

7.82

88

  1. The average MDI rank of a PGFam is the average value of the feature’s MDI values computed using 100 random forest classifiers with different random seeds trained on the training set. \(^1\)Standard Deviation. \(^2\)The ratio between the proportion of HPs with the corresponding PGFam and the proportion of NHPs with the corresponding PGFam. To avoid zero division, add-one smoothing was performed. \(^3\) The number of different genera to which the genomes that contain the respective PGFams belong. PDC: Pyruvate dehydrogenase complex