Fig. 2From: Improvements in viral gene annotation using large language models and soft alignmentsThe similarity matrix is traversed removing spurious mutual matches. The soft alignment approach assumes that matches will exist near one another and on a diagonal close to the diagonal representing the alignment. A In the first step, the matrix is traversed to identify and discard top-left to bottom-right diagonals with fewer than five mutual matches and which are positioned five insertions or deletions away from a diagonal with more than five mutual matches (cells in red). B In the second step, Single gaps along the main diagonal, and representing the same amino acid, are classified as reciprocal matches (green cells), an indication of a false negativeBack to article page