Skip to main content

Table 4 A comparison between Athena and Lerna on short reads

From: Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing

Dataset Read length (bp) Coverage Correlation Athena Correlation Lerna NG50 without correction NG50 with Lerna NG50 with Athena
D1 36 80\(\times\) − 0.93 − 0.94 3019 6827 6827
D2 47 71\(\times\) − 0.97 − 0.96 47 2254 2164
D3 36 173\(\times\) − 0.92 − 0.93 1042 4873 4164
D4 75 62\(\times\) − 0.86 − 0.97 118 858 858
D5 100 166\(\times\) − 0.96 − 0.98 186 3524 2799
D6 250 70\(\times\) +0.95 0.84 1098 1344 1237
D7 101 67\(\times\) − 0.72 − 0.82 723 739 754
  1. The dataset is described in Table 2. We show the correlation between the perplexity metric and the alignment rate of the data after correction for Lerna vis-à-vis its closest competitor, Athena. On dataset D6 (greatest read length of 250 bp), Athena fails and has a positive correlation of + 0.95 (instead of having a negative correlation, as desired), highlighted in the Table. This is in line with the fact that RNNs are unable to model longer sequences. We also show the improvement of the assembly quality (in NG50) after tuning the EC tool with Lerna versus using the uncorrected reads. The NG50 with Lerna is always higher than, or equal to, that with Athena, other than a small drop for Dataset D7. All the superior values are bolded