Skip to main content

Table 1 Cluster analysis of the Salmonella dataset using the method of topic model-derived clustering based on highest probable topic assignment.

From: Topic modeling for cluster analysis of large biological and medical datasets

Most dominant serotype Number of isolates Topic ID % of most dominant serotype
Enteritidis 1046 T11 99.71%
Saintpaul 989 T12 99.60%
Paratyphi B 850 T26 99.41%
Enteritidis 1236 T2 99.35%
Saintpaul 709 T29 99.29%
Hadar 1837 T18 99.18%
Poona 1216 T22 98.68%
Oranienburg 1847 T27 98.65%
Poona 504 T16 98.41%
Newport 1179 T15 98.39%
Braenderup 852 T14 98.12%
Heidelberg 2125 T23 96.80%
Typhi 1845 T19 95.88%
Braenderup 1135 T9 95.51%
Javiana 2002 T1 94.36%
Agona 1846 T13 91.87%
Infantis 2130 T25 89.48%
Thompson 2195 T7 89.25%
4, 5, 12:i- 1024 T28 86.82%
Paratyphi B 1041 T10 85.40%
Typhimurium var. 5- 288 T5 84.03%
Montevideo 2240 T17 80.31%
4, 5, 12:i- 854 T3 79.39%
Mississippi 1860 T4 78.60%
Typhimurium var. 5- 1201 T20 66.36%
Typhimurium 1217 T21 54.97%
Typhimurium 738 T0 51.63%
Typhimurium var. 5- 417 T6 48.68%
Typhimurium 815 T24 38.16%
Muenchen 3994 T8 36.60%