Corpus | Devel F | Test F |
---|
GE'09 task 1 | 56.27 | 53.15 |
GE'09 task 2 | 54.25 | 50.68 |
GE task 1 | 55.78 | 53.30 |
GE task 2 | 53.39 | 51.97 |
GE task 3 | 38.34 | 26.86 |
EPI | 56.41 | 53.33 |
ID | 44.92 | 42.57 |
BB | 27.01 | 26 |
BI | 77.24 | 77 |
CO | 36.22 | 23.77 |
REL | 65.99 | 57.7 |
REN | 84.62 | 87.0 |
- The performance of our new system on the BioNLP'09 ST GENIA dataset is shown for reference, with task 3 omitted due to a changed metric. For GE-tasks, the Approximate Span & Recursive matching criterion is used. In many tasks, the development and test set results differ considerably, which may be partially explained by noise unseen due to lack of cross-validation and by the event distribution not being stratified across the sets.