Skip to main content

Table 1 Statistics for the KOG, CEGMA, and ECK databases. The Eukaryotic Orthologous Groups (KOG) database contains sequences from seven eukaryotic genomes that were available at the time of its creation in 2003. The Conserved Eukaryotic Genes Mapping Approach (CEGMA) database is a subset of 458 KOGs that contain at least one sequence from each of the six free-living KOG organisms, from which inparalogs were then removed. These inparalogs were restored in the Expanded CEGMA KOGs (ECK) clusters, and sequences from four additional taxa annotated by the CEGMA developers were added

From: Evaluation of BLAST-based edge-weighting metrics used for homology inference with the Markov Clustering algorithm

 

KOG

CEGMA

ECK

 

Seqs

KOGs

Seqs

CEGs

Seqs

ECKs

Homo sapiens

19,039

4,597

458

458

1,350

458

Arabidopsis thaliana

13,744

3,285

458

458

1,175

458

Caenorhabditis elegans

10,581

4,235

458

458

635

458

Drosophila melanogaster

8,445

4,351

458

458

611

458

Saccharomyces cerevisiae

4,003

2,668

458

458

606

458

Schizosaccharomyces pombe

3,728

2,762

458

458

557

458

Encephalitozoon cuniculi

1,218

1,073

-

-

311

291

Anopheles gambiae

-

-

-

-

453

453

Ciona intestinalis

-

-

-

-

432

432

Chlamydomonas reinhardtii

-

-

-

-

407

407

Toxoplasma gondii

-

-

-

-

303

303

Database Totals

60,758

4,852

2,748

458

6,840

458