Skip to main content

Table 1 \(|\mathcal {V}|\) denotes vocabulary size. For train, validation and test datasets, the number of abstracts followed and the number of sentences (in parentheses) are shown

From: Fast and scalable neural embedding models for biomedical sentence classification

Dataset

\(|\mathcal {V}|\)

Train

Validation

Test

PubMed 20k

68k

15k (180k)

2.5k (30k)

2.5k (30k)

PubMed 200k

331k

190k (2.2M)

2.5k (29k)

2.5k (29k)

Extended corpus

451k

872k (10,3M)

2.5k (29k)

2.5k (29k)