Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: PanDelos: a dictionary-based method for pan-genome content discovery

Fig. 1

Overview of PanDelos Pan-genome computation of three genomes (represented as blue, red, and violet). Genomes are taken in input as list of genetic sequences (represented as colored rectangles). The homology detection schema is divided into 5 steps. PanDelos, at first, chooses an optimal word length that is used to compare dictionaries of genetic sequences. The 1-vs-1 genome comparisons are performed. An initial candidate gene pairs selection is obtained by applying a minimum percentage threshold on the dictionary intersection. Then, PanDelos computes generalized Jaccard similarities among genes (shown in the bottom left matrix). Only pairs of genes that passed the threshold applied on dictionary percentages are taken in consideration for the similarity computation. Pairs that did not pass the threshold are represented by gray tiles. Next, PanDelos computes bidirectional best hits (BBH), here represented with green borders. On the bottom right, a similarity network, made of reciprocal best hits is shown. Border colors represent the genomes to which genes belong. A final computational step discards edges in inconsistent components of the network and returns the final list of gene families. A component is inconsistent if it contains two genes belonging to the same genome that are not accounted as paralogs. A family may contains orthologous as well as paralogous genes, such as the yellow/brown ones. Families are finally classified as singletons, dispensable or core depending on their presence among genomes (borders of the rectangles represent the genomes the genes belong to)

Back to article page