Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing

Table 10 Lerna and Athena run on 7 Illumina short read datasets

Dataset	Coverage	Genome	Read length (bp)	#Reads	Athena	Lerna	Speedup
D1	80\(\times\)	E. coli str. K-12 substr	136	20.8M	98s	5.5s	17.8\(\times\)
D2	71\(\times\)	E. coli str. K-12 substr	47	7.1M	49s	3s	16.3\(\times\)
D3	173\(\times\)	Acinetobacter sp. ADP1	36	18.1M	69s	4s	17.25\(\times\)
D4	62\(\times\)	B. subtilis	75	3.5M	52s	3s	17.3\(\times\)
D5	166\(\times\)	L. interrogans C sp. ADP1	100	7.1M	100s	5.5s	18.2\(\times\)
D6	70\(\times\)	A. thaliana	250	33.6M	400s	23s	17.4\(\times\)
D7	67\(\times\)	Homo sapiens	101	202M	960s	63s	15.2\(\times\)

The time for calculating perplexity has been reported, along with the read lengths and number of reads. We observe that on average our pipeline is 18\(\times\) faster than Athena. This translates to 80\(\times\) to 275\(\times\) faster than estimating the alignment rate with Bowtie2

ISSN: 1471-2105