Skip to main content

Table 1 Highest correlation coefficients obtained with different methods

From: Neural sentence embedding models for semantic similarity estimation in the biomedical domain

Method r
String-based methods
 Jaccard 0.751
 Q-gram (q = 3) 0.723
Unsupervised
 fastText (skip-gram, max pooling) 0.766
 fastText (CBOW, max pooling) 0.253
 Sent2vec 0.798
 Skip-thoughts 0.485
 Paragraph vector (PV-DM) 0.819
 Paragraph vector (PV-DBOW) 0.804
Unsupervised combination of several methods (mean)
 Jaccard, q-gram, Paragraph vector (PV-DBOW) and sent2vec 0.846
Supervised combination of several methods
 Supervised linear regression (Combination of Jaccard, Q-gram, sent2vec, Paragraph vector DM, skip-thoughts, fastText) 0.871
  1. r Pearson correlation, CBOW Continuous Bag of Words, PV-DM Paragraph Vector Distributed Memory, PV-DBOW Paragraph Vector Distributed Bag of Words