Skip to main content

Table 3 Evaluation of QA pipeline

From: CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice

Evaluation metric

Top@ 1

Top@ 5

Top@ 10

Top@ 20

Retriever

Recall (single document)

0.495

0.711

0.720

0.836

Recall (multiple documents)

0.494

0.716

0.720

0.836

Mean reciprocal rank (MRR)

0.495

0.572

0.582

0.775

Precision

0.495

0.344

0.342

0.304

Mean average precision (MAP)

0.494

0.672

0.690

0.697

Reader

F1-Score

0.504

0.636

0.636

0.771

Exact match (EM)

0.539

0.549

0.698

0.775

Semantic answer similarity (SAS)

0.503

0.623

0.687

0.785

Accuracy

0.895 (same for all top @k)

  1. Bold means best result