Skip to main content

Table 1 Classification of Sequences with Gaps for Which Local Alignment Program (LAP) and NucAmino Yielded Different Results

From: NucAmino: a nucleotide to amino acid alignment optimized for virus gene sequences

Classification Number of Discordances Explanation Example
LAP output NucAmino output
A. Insertions and deletions in RT β3–β4 loop region (codons 62 to 72) 128 For 233 insertions in this region, NucAmino placed all but 3 at codon 69. In contrast, LAP placed 113 insertions at codon 69, whereas 111 were placed at codons 62 to 70. For 99 deletions in this region, there were 3 differences between NucAmino and LAP which can be all classified into classification D. (Example shown on the right: AF315241) 66 67 68 69 70
Lys Asp Ser Thr Lys
: : : . . . ++++++ : : : . . . : : :
AAA GAA AGTTCT AGC GGT AAA
66 67 68 69 70
Lys Asp Ser Thr Lys
::: ... ::: . . . ++++++ :::
AAA GAA AGT TCT AGCGGT AAA
B. Insertions in PR codons 33/41 loop region 31 For 151 insertions in this region, LAP and NucAmino placed 31 insertions at different positions. (Example shown on the right: HQ657812) 32 33 34 35 36 37
Val Leu Glu Glu Met Asn
::: ::: +++ ::: ::: ... :::
GTA TTA GAR GAA GAA ATA AAT
32 33 34 35 36 37
Val Leu Glu Glu Met Asn
::: ::: ::: ::: +++ ... :::
GTA TTA GAR GAA GAA ATA AAT
C. Different placement of indels and/or frameshifts (not in classification A or B) 213 For 213 sequences with indels and/or frameshifts outside of the RT β3–β4 loop region and the PR codon 33/41 loop region, gaps were placed at slightly different positions. (Example shown on the right: EF071939) 306 307 308 309 310 311 312
Asn Arg Glu Ile Leu Lys Glu
::: ... ... --- ... ::: :::
AAC AAG AAT TTT AAG GAG
306 307 308 309 310 311 312
Asn Arg Glu Ile Leu Lys Glu
::: ... ... ... --- ::: :::
AAC AAG AAT TTT AAG GAG
D. Codon alignment corrections (not in classification A, B or C) 108 Overall, there were 218 sequences with 110 insertions and 133 deletions for which LAP aligned 3 nucleotides across more than one codon whereas NucAmino aligned the nucleotides to a single codon. (Example: HM569289) Of these, 114 were not in the RT β3β4 loop or in PR codon 35/41 loop region. 200 2 01 202 203
Ile V al Asp Ile
::: .+++.. ::: :::
ATA GGAATA GGC ATA
200 201 202 203
Ile Val Asp Ile
::: +++ ... ::: :::
ATA GGA ATA GAC ATA
E. Large gaps 32 32 sequences had large gaps presumably because the contributor excluded unsequenced regions from the GenBank submission or inserted large stretches of N’s. For 21 and 11 of these regions, LAP and NucAmino accurately reported a large deletion encompassing the missing region, respectively.
(Examples in the online dataset: HQ685003, AY090840)
Insufficient space to provide an example.
  1. Abbreviations: RT reverse transcriptase, PR protease