Skip to main content

Table 4 A comparison between Athena and Lerna on short reads

From: Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing

Dataset

Read length (bp)

Coverage

Correlation Athena

Correlation Lerna

NG50 without correction

NG50 with Lerna

NG50 with Athena

D1

36

80\(\times\)

− 0.93

− 0.94

3019

6827

6827

D2

47

71\(\times\)

− 0.97

− 0.96

47

2254

2164

D3

36

173\(\times\)

− 0.92

− 0.93

1042

4873

4164

D4

75

62\(\times\)

− 0.86

− 0.97

118

858

858

D5

100

166\(\times\)

− 0.96

− 0.98

186

3524

2799

D6

250

70\(\times\)

+0.95

− 0.84

1098

1344

1237

D7

101

67\(\times\)

− 0.72

− 0.82

723

739

754

  1. The dataset is described in Table 2. We show the correlation between the perplexity metric and the alignment rate of the data after correction for Lerna vis-à-vis its closest competitor, Athena. On dataset D6 (greatest read length of 250 bp), Athena fails and has a positive correlation of + 0.95 (instead of having a negative correlation, as desired), highlighted in the Table. This is in line with the fact that RNNs are unable to model longer sequences. We also show the improvement of the assembly quality (in NG50) after tuning the EC tool with Lerna versus using the uncorrected reads. The NG50 with Lerna is always higher than, or equal to, that with Athena, other than a small drop for Dataset D7. All the superior values are bolded