Skip to main content

Table 1 NCBI disease corpus and our plant corpus

From: A method for named entity normalization in biomedical articles: application to diseases and plants

Data set

Abstracts

Total

Unique

Unique

  

disease

disease

concept

  

mentions

mentions

IDs

Disease training set

592

5145

1170

670

Disease development set

100

787

368

176

Disease test set

100

960

427

203

Total

792

6892

2136

790

Plant training set

128

2647

1543

1143

Plant development set

40

709

400

329

Plant test set

40

629

427

298

Total

208

3985

2370

1770