Skip to main content
Fig. 12 | BMC Bioinformatics

Fig. 12

From: Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation

Fig. 12

Examples of primary clustering for two proteins in PUA_UR50, namely A0A142XZI2 (a) and Q5BH58 (b). Thick, black lines represent the query sequences. Red lines show regions of the query that have been aligned by BLAST to other sequences in the PUA_UR50 dataset. The bottom part of each panel shows a comparison between Pfam annotation and MC clustering of the query sequences. According to Pfam, both A0A142XZI2 and Q5BH58, contain a LON_substr_bdg domain (a member of the PUA clan), the position of which is highlighted by a yellow frame. Protein Q5BH58, in addition, contains an AAA domain and a Lon_C domain, colored green and blue, respectively. Purple lines show the primary clusters we obtained automatically using the red line alignments at the top of each panel. Primary clusters are sorted from top to bottom according to decreasing value of their \(\gamma\) parameter (see “Methods” section), so that the top ones will most probably be cluster centers. We can see that some of the primary clusters overlap remarkably well with Pfam-annotated families while others either cover more than one family or overlap with only a fraction of a family. Also, note that in Q5BH58 no MC captures the LON_substr_bdg domain. In this particular case, we found that this region of Q5BH58 is a quite divergent member of the Pfam family, with both BLAST and phmmer finding less than 10 parwise alignments when using that portion of the protein as a query

Back to article page