Skip to main content
Figure 2 | BMC Bioinformatics

Figure 2

From: Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks

Figure 2

shows the distribution of InterPro family size. Figure 2 shows the distribution of the InterPro families used in the benchmarking dataset based upon the number of members in each family. There are 102 singleton InterPro families, and the largest InterPro family in the benchmarking dataset is Rhodopsin-like GPCR superfamily which has 1058 protein sequences in the benchmarking dataset.

Back to article page