Table 4 Clusters recruiting largest number of HOT/ALOHA sequences

From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Cluster ID # sequences Process, Protein Family
CAM_CL_49 562 Metabolism, short chain dehydrogenase
CAM_CL_399 368 Metabolism, Sulfatase
CAM_CL_26 338 electron transport, Acyl-CoA dehydrogenase
CAM_CL_1239 314 metabolism, AMP-binding enzyme
CAM_CL_2568 312 transport, ABC transporter
CAM_CL_1581 274 bioluminescence, methanogenesis, Luciferase-like monooxygenase
CAM_CL_4294 240 nucleotide-sugar metabolism, NAD dependent epimerase/dehydratase family
CAM_CL_1593 235 metabolism, CoA-transferase family III
CAM_CL_357 227 Tetratricopeptide repeat
CAM_CL_333 225 lignin biosynthesis, Zinc-binding dehydrogenase