Skip to main content

Table 1 Highest correlation coefficients obtained with different methods

From: Neural sentence embedding models for semantic similarity estimation in the biomedical domain

Method

r

String-based methods

 Jaccard

0.751

 Q-gram (q = 3)

0.723

Unsupervised

 fastText (skip-gram, max pooling)

0.766

 fastText (CBOW, max pooling)

0.253

 Sent2vec

0.798

 Skip-thoughts

0.485

 Paragraph vector (PV-DM)

0.819

 Paragraph vector (PV-DBOW)

0.804

Unsupervised combination of several methods (mean)

 Jaccard, q-gram, Paragraph vector (PV-DBOW) and sent2vec

0.846

Supervised combination of several methods

 Supervised linear regression (Combination of Jaccard, Q-gram, sent2vec, Paragraph vector DM, skip-thoughts, fastText)

0.871

  1. r Pearson correlation, CBOW Continuous Bag of Words, PV-DM Paragraph Vector Distributed Memory, PV-DBOW Paragraph Vector Distributed Bag of Words