Skip to main content

Table 1 Clustering accuracy for the GroEL networks

From: A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen

Graph (812 nodes)

Th.

Edges

C

|C1|

|C2|

|C>2|

Genus prec.

Genus recall

Species prec.

Species recall

Similarity Score

5%

2886

371

246

122

444

43.0%

21.6%

34.9%

43.0%

 

15%

4668

275

175

90

547

38.9%

22.2%

30.3%

40.0%

 

25%

8222

182

122

44

646

33.0%

31.8%

22.4%

36.9%

 

35%

12,491

81

55

22

735

26.5%

18.6%

17.1%

28.0%

Bitscore (from max)

50

544

623

552

86

174

30.5%

21.7%

24.7%

51.3%

 

100

2895

367

243

122

447

42.1%

20.2%

34.0%

42.0%

 

200

4576

275

175

86

551

38.7%

21.7%

29.9%

42.1%

 

300

9271

183

126

40

646

31.8%

26.2%

22.4%

33.8%

Edit Distance Threshold

8

2139

456

345

128

339

97.3%

33.4%

77.9%

58.3%

 

16

2904

391

268

126

418

95.9%

35.3%

72.7%

59.1%

 

30

4254

304

188

118

506

90.3%

47.3%

60.6%

63.2%

 

42

5023

256

154

90

568

85.0%

51.9%

56.5%

64.4%

 

54

6582

206

114

76

622

81.5%

58.3%

50.8%

66.6%

 

60

7196

190

99

80

633

80.6%

62.6%

49.3%

66.8%

Needleman-Wunsch (from max score)

100

1780

482

386

110

316

42.2%

4.7%

30.8%

5.8%

 

200

4691

280

175

98

539

87.0%

49.7%

59.4%

63.5%

 

300

7733

183

96

80

636

79.1%

62.5%

48.0%

66.8%

DiWANN

NA

1055

180

0

118

694

80.4%

43.9%

59.5%

61.8%

  1. This table shows a summary of clustering accuracy for the various GroEL networks. Th. gives the threshold used for a given network, either in number of edits, distance from the maximum similarity score (for bitscore and Needleman-Wunsch) or percent similarity score. C gives the total number of clusters, |C1| gives the number of nodes in clusters of size 1 (singletons), |C2| gives the number of nodes in clusters of size 2, and |C>2| shows the number of nodes in clusters of size 3 and above. For calculating precision and recall, we assume clusters should correspond to the genus and species labels for a given GroEL sequence. Each GroEL sequence is between roughly 550 and 600 amino acids