Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: In silico approach to designing rational metagenomic libraries for functional studies

Fig. 2

Definition of a representative. a-c) Schematic overview. The representative of a family is calculated based on the distance in a phylogenetic tree. a The phylogenetic distance between sequences A1 and A2 is 3 units. b A1 and A3 are separated by 4 units. c Since the distance between A2 and A3 amounts to 5 units, the sum of the distances for the three proteins to all other proteins are A1: 3 + 4 = 7 units, A2: 3 + 5 = 8 units, and A3: 4 + 5 = 9 units. Because A1 has the shortest distance to all other proteins in the family, it is considered the representative protein. d To account for differences in the automatically generated phylogenetic tree, randomly selected subsets containing 90% of the sequences of a family were resampled 100 times. The protein that was selected in these subsets most often as the representative was defined as the representative of the family. The majority of representatives were selected more than 80 times. Black bars represent HMM-based families, grey bars MCL-based families

Back to article page