Skip to main content

Table 3 Lerna evaluated on 7 Illumina (Table 2) short read datasets

From: Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing

Dataset

Read length

Exhaustive search

Lerna (Char)

Lerna (word)

Athena (RNN)

 

Base pairs

bp

Selected

k

Alignment

rate (%)

Selected

k

Alignment

rate (%)

Selected

k

Alignment

rate (%)

Selected

k

Alignment

rate (%)

D1

36

17

98.95

Same as exhaustive search

Same as exhaustive search

Same as exhaustive search

D2

47

15

61.42

Same as exhaustive search

19

61.27%

k=17

61.15%

D3

36

15

80.44

k=17

80.39%

Same as exhaustive search

k=17

80.39%

D4

75

17

93.95

k=15

93.72%

Same as exhaustive search

Same as exhaustive search

D5

100

17

92.15

Same as Exhaustive Search

Same as exhaustive search

k=25

92.09%

D6

250

25

86.16

Same as Exhaustive Search

Same as exhaustive search

k=17

85.63%

D7

101

17

40.53

k=15

40.24%

k=15

40.24%

Same as exhaustive search

  1. We test both the Transformer character- and word-level LMs. We observed that the best performance was attained by using the Transformer word-level LM with word length 4 (|w|=4). For the Transformer word-level LM (|w| = 4), Lerna finds either the optimal value or a value with an alignment rate within 0.31% of the theoretical best, consistent with the reported results by Lighter (Figure 5 in [18]). These slightly sub-optimal configurations are an artefact of sub-sampling