Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: Rapid protein sequence evolution via compensatory frameshift is widespread in RNA virus genomes

Fig. 2

Higher estimation of protein evolutionary distance caused by the compensatory frameshift. a A compensatory frameshift case was identified near the C-terminus of the IAV HA protein. The frameshifted segment is marked in red. There were 129 reference and 13 compensatory frameshift form sequences. b A part of multiple alignment of representative IAV HA CDS sequences is shown. Three 1-nt insertion positions in the compensatory frameshift form are marked by filled triangles. The frameshifted segment is in red. The alignment of all 142 sequences is presented in Additional file 3: Fig. S2. c Comparison of protein sequences revealed that the 12-aa sequence in the reference form was replaced with a 13-aa sequence (in red) in the compensatory frameshift form. d, e Phylogenetic trees inferred from multiply aligned IAV HA nucleotide (d) and protein (e) sequences are presented. In both trees, all compensatory frameshift form sequences (in red) were grouped into a single clade, suggesting that the indel event occurred once in the ancestral branch (red arrowhead) of all the compensatory frameshift form sequences. Note that the relative length of the ancestral branch of the compensatory frameshift form sequences is longer in the protein tree than in the nucleotide tree. Branches with bootstrap support values of ≥ 95% are marked by black or red circles on the nodes. f, g Phylogenetic trees inferred from multiply aligned IAV HA nucleotide (f) and protein (g) sequences are presented. Note that the length of the ancestral branch (red arrowhead) of the compensatory frameshift form sequences is significantly shorter in the protein tree (g) compared to the original tree (e). High resolution images of phylogenetic trees (d–g) are presented in Additional file 3: Fig. S3. h Protein distance values (Dp; vertical axis) between the reference sequence and all other sequences deduced from the protein tree were plotted against corresponding nucleotide distance values (Dn; horizontal axis) deduced from the nucleotide tree. Black and red dots indicate the reference and compensatory frameshift form sequences, respectively. The dotted linear regression line was calculated only from data of the reference form sequences. Note the higher Dp values of the compensatory frameshift form sequences compared to those of reference form sequences with similar Dn values. i When the frameshifted segments were removed, Dp values of the compensatory frameshift form sequences were similar to those of reference form sequences

Back to article page