Skip to main content

Table 3 Clusters recruiting largest number of PANDA sequences

From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Cluster ID #sequences #non-redundant sequences Description
CAM_CL_2057 20,508 24 Reverse transcriptase (HIV)
CAM_CL_1132 18,882 1,406 Cytochrome c oxidase subunit I
CAM_CL_2568 15,405 6,091 ABC transporter
CAM_CL_4367 15,228 771 Cytochrome b
CAM_CL_49 14,751 7,389 Short-chain dehydrogenase
CAM_CL_3510 13,255 5,173 Immunoglobulin
CAM_CL_2630 13,140 3,297 Envelope glycoprotein
CAM_CL_160 13,054 3,897 Kinases
CAM_CL_4556 12,403 6,345 Response regulator
CAM_CL_481 12,078 5,477 Transcription regulator
  1. Column 3 hints at the extent of redundancy in the PANDA set.