Skip to main content

Table 3 Clusters recruiting largest number of PANDA sequences

From: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Cluster ID

#sequences

#non-redundant sequences

Description

CAM_CL_2057

20,508

24

Reverse transcriptase (HIV)

CAM_CL_1132

18,882

1,406

Cytochrome c oxidase subunit I

CAM_CL_2568

15,405

6,091

ABC transporter

CAM_CL_4367

15,228

771

Cytochrome b

CAM_CL_49

14,751

7,389

Short-chain dehydrogenase

CAM_CL_3510

13,255

5,173

Immunoglobulin

CAM_CL_2630

13,140

3,297

Envelope glycoprotein

CAM_CL_160

13,054

3,897

Kinases

CAM_CL_4556

12,403

6,345

Response regulator

CAM_CL_481

12,078

5,477

Transcription regulator

  1. Column 3 hints at the extent of redundancy in the PANDA set.