Skip to main content

Table 1 Sizes of gene set collections built from the NCBI gene2go table 1

From: GO2MSIG, an automated GO based multi-species gene set generator for gene set enrichment analysis

   

Number of gene sets in collection (average number of genes in set)

Taxon ID

Organism

Number of genes with GO annotation

All evidence codes

High quality evidence codes

234826

Anaplasma marginale str. St. Maries

196

48 (40)

 

212042

Anaplasma phagocytophilum str. HZ

1288

218 (55)

221 (60)

3702

Arabidopsis thaliana

27942

2032 (129)

1951 (85)

227321

Aspergillus nidulans FGSC A4

7326

1152 (69)

35 (31)

198094

Bacillus anthracis str. Ames

5097

465 (81)

466 (81)

9913

Bos taurus

5567

2634 (67)

1285 (58)

6239

Caenorhabditis elegans

12642

1505 (84)

1098 (81)

195099

Campylobacter jejuni RM1221

1826

315 (62)

316 (63)

246194

Carboxydothermus hydrogenoformans Z-2901

2609

363 (64)

362 (65)

227377

Coxiella burnetii RSA 493

1798

271 (67)

272 (67)

214684

Cryptococcus neoformans var. neoformans JEC21

3427

969 (68)

 

7955

Danio rerio

16957

2201 (83)

1342 (68)

243164

Dehalococcoides ethenogenes 195

1583

265 (72)

265 (71)

352472

Dictyostelium discoideum AX4

7694

1184 (86)

801 (72)

7227

Drosophila melanogaster

12560

2750 (83)

2459 (78)

205920

Ehrlichia chaffeensis str. Arkansas

1090

221 (56)

223 (59)

511145

Escherichia coli str. K-12 substr. MG1655

2518

198 (112)

 

9031

Gallus gallus

2104

1460 (64)

643 (52)

243231

Geobacter sulfurreducens PCA

3269

347 (82)

348 (82)

9606

Homo sapiens

18106

5808 (82)

4403 (81)

265669

Listeria monocytogenes serotype 4b str. F2365

2811

384 (79)

385 (79)

243233

Methylococcus capsulatus str. Bath

2902

377 (72)

378 (72)

10090

Mus musculus

24667

5615 (79)

3643 (74)

222891

Neorickettsia sennetsu str. Miyayama

928

204 (54)

206 (56)

39947

Oryza sativa Japonica Group

4266

30 (18)

2 (14)

36329

Plasmodium falciparum 3D7

1770

212 (65)

219 (67)

223283

Pseudomonas syringae pv. tomato str. DC3000

3950

436 (73)

439 (77)

10116

Rattus norvegicus

18599

5746 (79)

3081 (75)

246200

Ruegeria pomeroyi DSS-3

4250

497 (85)

496 (86)

559292

Saccharomyces cerevisiae S288c

6244

2005 (75)

1849 (74)

284812

Schizosaccharomyces pombe 972 h-

5276

1627 (82)

1118 (67)

211586

Shewanella oneidensis MR-1

4272

418 (79)

419 (79)

999953

Trypanosoma brucei brucei strain 927/4 GUTat10.1

1073

157 (74)

147 (80)

9606

Homo sapiens (MSigDB collection)

18106

 

1422 (69)2

9606

Homo sapiens (From Affymetrix annotation file)

18106

5383 (80)

 
  1. Gene sets were built from the NCBI gene2go annotation table and GO ontology downloaded on 13th September 2013. Default settings were used which filter out gene sets containing fewer than 10 or more than 700 genes. Organisms were omitted when the biggest collection contained fewer than 30 sets. In cases where use of all evidence codes reduces the number of gene sets compared with using high quality codes only, this is due to maximum set size filtering. 1For comparison the currently available MSigDB GO based human collection and a human set built from the annotation file for the Affymetrix HG-U133 Plus 2.0 array are also shown. 2Set number and sizes were calculated for the MSigDB collection with filtering as above (the full collection contains 1454 gene sets).