Skip to main content

Table 4 Clusters recruiting largest number of HOT/ALOHA sequences

From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Cluster ID

# sequences

Process, Protein Family

CAM_CL_49

562

Metabolism, short chain dehydrogenase

CAM_CL_399

368

Metabolism, Sulfatase

CAM_CL_26

338

electron transport, Acyl-CoA dehydrogenase

CAM_CL_1239

314

metabolism, AMP-binding enzyme

CAM_CL_2568

312

transport, ABC transporter

CAM_CL_1581

274

bioluminescence, methanogenesis, Luciferase-like monooxygenase

CAM_CL_4294

240

nucleotide-sugar metabolism, NAD dependent epimerase/dehydratase family

CAM_CL_1593

235

metabolism, CoA-transferase family III

CAM_CL_357

227

Tetratricopeptide repeat

CAM_CL_333

225

lignin biosynthesis, Zinc-binding dehydrogenase