Skip to main content

Table 1 Cluster analysis of the Salmonella dataset using the method of topic model-derived clustering based on highest probable topic assignment.

From: Topic modeling for cluster analysis of large biological and medical datasets

Most dominant serotype

Number of isolates

Topic ID

% of most dominant serotype

Enteritidis

1046

T11

99.71%

Saintpaul

989

T12

99.60%

Paratyphi B

850

T26

99.41%

Enteritidis

1236

T2

99.35%

Saintpaul

709

T29

99.29%

Hadar

1837

T18

99.18%

Poona

1216

T22

98.68%

Oranienburg

1847

T27

98.65%

Poona

504

T16

98.41%

Newport

1179

T15

98.39%

Braenderup

852

T14

98.12%

Heidelberg

2125

T23

96.80%

Typhi

1845

T19

95.88%

Braenderup

1135

T9

95.51%

Javiana

2002

T1

94.36%

Agona

1846

T13

91.87%

Infantis

2130

T25

89.48%

Thompson

2195

T7

89.25%

4, 5, 12:i-

1024

T28

86.82%

Paratyphi B

1041

T10

85.40%

Typhimurium var. 5-

288

T5

84.03%

Montevideo

2240

T17

80.31%

4, 5, 12:i-

854

T3

79.39%

Mississippi

1860

T4

78.60%

Typhimurium var. 5-

1201

T20

66.36%

Typhimurium

1217

T21

54.97%

Typhimurium

738

T0

51.63%

Typhimurium var. 5-

417

T6

48.68%

Typhimurium

815

T24

38.16%

Muenchen

3994

T8

36.60%