From: Neural sentence embedding models for semantic similarity estimation in the biomedical domain
File size | 45 GB |
Number of articles | > 1,700,000 |
Total number of tokens | 8,126,457,106 |
Number of unique words | 31,974,798 |
Number of sentences | 277,809,416 |
Average line length before post-processing (number of characters) | 162 |
Longest line length before post-processing (number of characters) | 111,562 |