Dataset | Read length (bp) | Coverage | Correlation Athena | Correlation Lerna | NG50 without correction | NG50 with Lerna | NG50 with Athena |
---|
D1 | 36 | 80\(\times\) | − 0.93 | − 0.94 | 3019 | 6827 | 6827 |
D2 | 47 | 71\(\times\) | − 0.97 | − 0.96 | 47 | 2254 | 2164 |
D3 | 36 | 173\(\times\) | − 0.92 | − 0.93 | 1042 | 4873 | 4164 |
D4 | 75 | 62\(\times\) | − 0.86 | − 0.97 | 118 | 858 | 858 |
D5 | 100 | 166\(\times\) | − 0.96 | − 0.98 | 186 | 3524 | 2799 |
D6 | 250 | 70\(\times\) | +0.95 | − 0.84 | 1098 | 1344 | 1237 |
D7 | 101 | 67\(\times\) | − 0.72 | − 0.82 | 723 | 739 | 754 |
- The dataset is described in Table 2. We show the correlation between the perplexity metric and the alignment rate of the data after correction for Lerna vis-à -vis its closest competitor, Athena. On dataset D6 (greatest read length of 250 bp), Athena fails and has a positive correlation of + 0.95 (instead of having a negative correlation, as desired), highlighted in the Table. This is in line with the fact that RNNs are unable to model longer sequences. We also show the improvement of the assembly quality (in NG50) after tuning the EC tool with Lerna versus using the uncorrected reads. The NG50 with Lerna is always higher than, or equal to, that with Athena, other than a small drop for Dataset D7. All the superior values are bolded