From: Fast and scalable neural embedding models for biomedical sentence classification
Dataset | \(|\mathcal {V}|\) | Train | Validation | Test |
---|---|---|---|---|
PubMed 20k | 68k | 15k (180k) | 2.5k (30k) | 2.5k (30k) |
PubMed 200k | 331k | 190k (2.2M) | 2.5k (29k) | 2.5k (29k) |
Extended corpus | 451k | 872k (10,3M) | 2.5k (29k) | 2.5k (29k) |