CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice

BMC Bioinformatics

Table 3 Evaluation of QA pipeline

Evaluation metric	Top@ 1	Top@ 5	Top@ 10	Top@ 20
Retriever
Recall (single document)	0.495	0.711	0.720	0.836
Recall (multiple documents)	0.494	0.716	0.720	0.836
Mean reciprocal rank (MRR)	0.495	0.572	0.582	0.775
Precision	0.495	0.344	0.342	0.304
Mean average precision (MAP)	0.494	0.672	0.690	0.697
Reader
F1-Score	0.504	0.636	0.636	0.771
Exact match (EM)	0.539	0.549	0.698	0.775
Semantic answer similarity (SAS)	0.503	0.623	0.687	0.785
Accuracy	0.895 (same for all top @k)

ISSN: 1471-2105