Skip to main content

Table 1 Classification of Sequences with Gaps for Which Local Alignment Program (LAP) and NucAmino Yielded Different Results

From: NucAmino: a nucleotide to amino acid alignment optimized for virus gene sequences

Classification

Number of Discordances

Explanation

Example

LAP output

NucAmino output

A. Insertions and deletions in RT β3–β4 loop region (codons 62 to 72)

128

For 233 insertions in this region, NucAmino placed all but 3 at codon 69. In contrast, LAP placed 113 insertions at codon 69, whereas 111 were placed at codons 62 to 70. For 99 deletions in this region, there were 3 differences between NucAmino and LAP which can be all classified into classification D. (Example shown on the right: AF315241)

66 67 68 69 70

Lys Asp Ser Thr Lys

: : : . . . ++++++ : : : . . . : : :

AAA GAA AGTTCT AGC GGT AAA

66 67 68 69 70

Lys Asp Ser Thr Lys

::: ... ::: . . . ++++++ :::

AAA GAA AGT TCT AGCGGT AAA

B. Insertions in PR codons 33/41 loop region

31

For 151 insertions in this region, LAP and NucAmino placed 31 insertions at different positions. (Example shown on the right: HQ657812)

32 33 34 35 36 37

Val Leu Glu Glu Met Asn

::: ::: +++ ::: ::: ... :::

GTA TTA GAR GAA GAA ATA AAT

32 33 34 35 36 37

Val Leu Glu Glu Met Asn

::: ::: ::: ::: +++ ... :::

GTA TTA GAR GAA GAA ATA AAT

C. Different placement of indels and/or frameshifts (not in classification A or B)

213

For 213 sequences with indels and/or frameshifts outside of the RT β3–β4 loop region and the PR codon 33/41 loop region, gaps were placed at slightly different positions. (Example shown on the right: EF071939)

306 307 308 309 310 311 312

Asn Arg Glu Ile Leu Lys Glu

::: ... ... --- ... ::: :::

AAC AAG AAT TTT AAG GAG

306 307 308 309 310 311 312

Asn Arg Glu Ile Leu Lys Glu

::: ... ... ... --- ::: :::

AAC AAG AAT TTT AAG GAG

D. Codon alignment corrections (not in classification A, B or C)

108

Overall, there were 218 sequences with 110 insertions and 133 deletions for which LAP aligned 3 nucleotides across more than one codon whereas NucAmino aligned the nucleotides to a single codon. (Example: HM569289) Of these, 114 were not in the RT β3β4 loop or in PR codon 35/41 loop region.

200 2 01 202 203

Ile V al Asp Ile

::: .+++.. ::: :::

ATA GGAATA GGC ATA

200 201 202 203

Ile Val Asp Ile

::: +++ ... ::: :::

ATA GGA ATA GAC ATA

E. Large gaps

32

32 sequences had large gaps presumably because the contributor excluded unsequenced regions from the GenBank submission or inserted large stretches of N’s. For 21 and 11 of these regions, LAP and NucAmino accurately reported a large deletion encompassing the missing region, respectively.

(Examples in the online dataset: HQ685003, AY090840)

Insufficient space to provide an example.

  1. Abbreviations: RT reverse transcriptase, PR protease