Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Stratification of co-evolving genomic groups using ranked phylogenetic profiles

Figure 1

The rank-BLAST classification procedure. The colored circles and squares represent proteins; different shapes and colors represent different taxonomic origins. Protein sequences lacking taxonomic-annotations (retrieved for example form metagenomic samples which include partial or complete genome sequences of assortments of species) are subject to a BLAST search. For each protein, the results of the BLAST search are converted into a vector describing the ranking order of species where it recognizes homologues. Each species is ranked once, according to its first appearance. All possible protein-pairs combinations are compared in order to determine whether the positions of species on the vectors are correlated. Two vectors are considered to be correlated (green squares) when their Kendall tau correlation coefficient is higher than a threshold (see Methods). The correlation matrix is transformed into a probability matrix, estimating the significance of the similarity between the correlation profiles of each protein pair. Green boxes represent protein pairs where the P value is lower than a threshold (see Methods). In the final stage, proteins are clustered according to the similarity of their probability vectors.

Back to article page