Skip to main content

Table 2 Cluster size distribution and the distribution of sequences in these clusters

From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Cluster size

#clusters

#sequences

#non-redundant sequences

2–4

208,096

794,592

521,898

5–9

43,453

428,469

273,694

10–19

15,584

346,415

206,188

20–49

4,053

234,338

143,438

50–99

4,641

547,862

331,773

100–199

3,546

870,406

491,229

200–499

2,600

1,381,135

806,560

500–999

961

1,133,749

669,420

1,000–1,999

698

1,768,532

1,002,815

≥2,000

665

5,220,484

2,909,845

Total

284,297

12,725,982

7,356,860

  1. The size of a cluster is defined as the number of non-redundant sequences in it.