Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

BMC Bioinformatics

Table 3 Clusters recruiting largest number of PANDA sequences

Cluster ID	#sequences	#non-redundant sequences	Description
CAM_CL_2057	20,508	24	Reverse transcriptase (HIV)
CAM_CL_1132	18,882	1,406	Cytochrome c oxidase subunit I
CAM_CL_2568	15,405	6,091	ABC transporter
CAM_CL_4367	15,228	771	Cytochrome b
CAM_CL_49	14,751	7,389	Short-chain dehydrogenase
CAM_CL_3510	13,255	5,173	Immunoglobulin
CAM_CL_2630	13,140	3,297	Envelope glycoprotein
CAM_CL_160	13,054	3,897	Kinases
CAM_CL_4556	12,403	6,345	Response regulator
CAM_CL_481	12,078	5,477	Transcription regulator

ISSN: 1471-2105