Skip to main content

Table 1 Top 20 gene cluster predictions.

From: Statistics for approximate gene clusters

distance to ref.

  

ID

G

GN

min

max

avg

p-score

corr. p-score

description

1

9

108

2

5

2.8

1314.43

1307.49

30S/50S ribosomal subunit

2

7

114

0

3

1.6

1258.12

1251.18

30S/50S, rpoA, infA

3

6

91

0

2

0.7

1031.47

1024.83

ATP synthase

4

9

57

0

5

1.4

896.31

890.57

NADH dehydrogenase

5

8

108

3

5

4.1

716.68

711.29

30S/50S ribosomal subunit

6

8

88

0

5

4.2

569.88

564.63

phosphate ABC transporter

7

8

93

0

5

4.1

486.80

481.67

infB, rfbA, nusA, hypothetical protein

8

8

79

3

5

4.6

367.33

362.27

putative/peptide ABC transporter

9

8

62

3

5

4.4

294.41

289.40

sugar ABC transporter

10

8

65

2

5

4.1

290.24

285.24

N-acetylmuramoyl, cell division

11

4

33

0

0

0.0

272.99

267.55

succinate dehydrogenase

12

8

51

3

5

4.9

221.73

216.79

pdhA/B/C

13

8

48

2

5

4.9

216.54

211.62

ATP-dependent (Clp) protease, trigger factor

14

8

58

0

5

4.2

216.12

211.20

50S L31, prfA, thrA/B/C, rho, hemK

15

8

50

4

5

4.9

213.61

208.70

hisA/C/F/H

16

6

32

0

2

1.7

200.11

194.80

dnaA/N, gyrA/B, recF

17

6

27

1

2

1.7

194.39

189.10

carA/B, pyrC/B/R

18

8

67

4

5

5.0

192.56

187.69

elongation factor Tu, G; 30S S7

19

8

29

4

5

4.5

190.62

185.75

sulfate ABC transporter

20

8

44

2

5

4.3

181.13

176.28

argB/C/D/G/H/F/R

  1. The first 20 gene clusters found when searching Mycobacterium tuberculosi s CDC1551 against 118 bacterial genomes. Clusters are sorted by p-values, computed using the "individual distance bounds" method. "G" is the number of different genes in the reference gene cluster; "GN" is the number of genomes where the reference gene cluster is found; "distance to ref." indicates the observed distances between the reference gene cluster and its occurrences. The "p-score" is the negative log10 of the p-value, before and after FDR correction.