Skip to main content

Table 3 Lerna evaluated on 7 Illumina (Table 2) short read datasets

From: Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing

Dataset Read length Exhaustive search Lerna (Char) Lerna (word) Athena (RNN)
  Base pairs
bp
Selected
k
Alignment
rate (%)
Selected
k
Alignment
rate (%)
Selected
k
Alignment
rate (%)
Selected
k
Alignment
rate (%)
D1 36 17 98.95 Same as exhaustive search Same as exhaustive search Same as exhaustive search
D2 47 15 61.42 Same as exhaustive search 19 61.27% k=17 61.15%
D3 36 15 80.44 k=17 80.39% Same as exhaustive search k=17 80.39%
D4 75 17 93.95 k=15 93.72% Same as exhaustive search Same as exhaustive search
D5 100 17 92.15 Same as Exhaustive Search Same as exhaustive search k=25 92.09%
D6 250 25 86.16 Same as Exhaustive Search Same as exhaustive search k=17 85.63%
D7 101 17 40.53 k=15 40.24% k=15 40.24% Same as exhaustive search
  1. We test both the Transformer character- and word-level LMs. We observed that the best performance was attained by using the Transformer word-level LM with word length 4 (|w|=4). For the Transformer word-level LM (|w| = 4), Lerna finds either the optimal value or a value with an alignment rate within 0.31% of the theoretical best, consistent with the reported results by Lighter (Figure 5 in [18]). These slightly sub-optimal configurations are an artefact of sub-sampling