HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

BMC Bioinformatics

Table 9 Average speed in sentence pairs per second (sent/s) and CUI pairs per second (CUIs/s) for the evaluation of the UBSM [39] sentence similarity measure combined with three representative ontology-based similarity measures based on MeSH (Nov, 2019) in 30 sentence pairs extracted from the MedSTS [135] sentence similarity dataset, and 1 million sentence pairs extracted from BioC corpus [136]

Pairwise sentence comparison based on MeSH	UMLS::Sim (30 pairs)		SML (30 pairs)		HESML (30 pairs)		\({{HESML\, (10^6\, pairs)}}\)
Similarity measure	Avg. speed (sent/s)	Avg. speed (CUIs/s)	Avg. speed (sent/s)	Avg. speed (CUIs/s)	Avg. speed (sent/s)	Avg. speed (CUIs/s)	Avg. speed (sent/s)	Avg. speed (CUIs/s)
Rada et al. [71]	0.441	36.63	0.126	10.478	2830.189	235000	7982.222	337843.826
AncSPL-Rada (this work)	–	–	–	–	2542.373	211101.695	7958.742	336850.041
Lin-Seco [87, 110]	0.782	64.956	2586.207	214741.379	3125	259479.167	8166.185	345629.98
Wu-Palmer\(_{fast}\) [72]	0.181	15.067	–	–	3125	259479.167	7892.959	334065.805

We provide the average evaluation in normalized CUI pairs per second to allow a fair and unbiased comparison of the results reported for 30 and 1 million sentence pairs. The dataset with 30 sentence pairs requires 2491 pairwise CUI comparisons, whilst the 1 million sentence pairs dataset requires 42324534 pairwise CUI comparisons. Best performing values are shown in bold. Non-implemented methods (–)

ISSN: 1471-2105