Skip to main content

Table 2 Cluster size distribution and the distribution of sequences in these clusters

From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Cluster size #clusters #sequences #non-redundant sequences
2–4 208,096 794,592 521,898
5–9 43,453 428,469 273,694
10–19 15,584 346,415 206,188
20–49 4,053 234,338 143,438
50–99 4,641 547,862 331,773
100–199 3,546 870,406 491,229
200–499 2,600 1,381,135 806,560
500–999 961 1,133,749 669,420
1,000–1,999 698 1,768,532 1,002,815
≥2,000 665 5,220,484 2,909,845
Total 284,297 12,725,982 7,356,860
  1. The size of a cluster is defined as the number of non-redundant sequences in it.