Skip to main content

Table 1 NCBI disease corpus and our plant corpus

From: A method for named entity normalization in biomedical articles: application to diseases and plants

Data set Abstracts Total Unique Unique
   disease disease concept
   mentions mentions IDs
Disease training set 592 5145 1170 670
Disease development set 100 787 368 176
Disease test set 100 960 427 203
Total 792 6892 2136 790
Plant training set 128 2647 1543 1143
Plant development set 40 709 400 329
Plant test set 40 629 427 298
Total 208 3985 2370 1770