Skip to main content
Fig. 3 | BMC Bioinformatics

Fig. 3

From: Primary orthologs from local sequence context

Fig. 3

Secondary orthologous and paralogous regions tend to be nested in primary orthologous regions. Blue and red bars in each subfigure indicate regions of homology between genomes A and B, and dashed arrows indicate the direction of segmental duplications, which could be either recent or ancient. a When the parent copy of duplication is well conserved, our proposal applies on exact matches. Comparing with an outgroup sequence Bseq2 in a different genome, the exact match between the parent copy Aseq2 and Bseq2 can usually be extended into the flanking region until the extended match (Aseq1, Bseq1) is maximal. In contrast, the exact match between the daughter copy Aseq2’ and Bseq2 is already a maximal match, and is not extendable. As a result, the latter exact match (which is secondary orthologous or paralogous) is nested in the former (which is primary orthologous). b When the parent copy of duplication is degraded by mutations after duplication, our method based on exact matches may yield misclassifications between primary orthologs and other homologs. The shaded region in subfigure (b) represents a region of mutation with length d2 that breaks down the primary orthologous region (Aseq1, Bseq1) into two shorter exact matches, em2 and em3. To compensate for the impact of such mutations, we concatenate neighboring exact matches (em1~em4 in subfigure (b)) separated by regions of mismatches whose lengths in both genomes coincide, into a longer matched region, which we call an “equidistant match” (EDM). With EDMs, the secondary orthologous (or paralogous) region (Aseq1’, Bseq1) is still nested in the region of EDMs, which is primary orthologous

Back to article page