Skip to main content

Advertisement

Table 1 Overall results

From: Combining joint models for biomedical event extraction

    UMass    Stanford    FAUST  
   Recall Precision F1 Recall Precision F1 Recall Precision F1
GE Task 1 48.5 64.1 55.2 42.4 61.1 50.0 49.4 64.8 56.0
GE Task 2 43.9 60.9 51.0 -- -- -- 46.7 63.8 53.9
EPI FULL 28.1 41.6 33.5 26.6 37.9 31.2 28.9 44.5 35.0
EPI CORE 57.0 73.3 64.2 56.9 70.2 62.8 59.9 80.3 68.6
ID FULL 46.9 62.0 53.4 46.3 55.9 50.6 48.0 66.0 55.6
ID CORE 49.7 62.4 55.3 49.2 56.4 52.5 50.8 66.4 57.6
   FAUST (without novel)       
   Recall Precision F1       
GE Task 1 47.6 69.7 56.6       
  1. Results on test sets of all tasks we submitted to, for three models. We list recall, precision, and F1 using the standard BioNLP approximate recursive metric. For the GE and ID datasets, the Stanford model used all four decoders with the reranker. For EPI, the Stanford model used only the 1N decoder with the reranker. In all three domains, the stacked UMass←Stanford model (FAUST) used all four decoders from the Stanford model as inputs. The "FAUST (without novel)" is created by removing all events which don't occur in either the UMass or Stanford models (i.e., events which are novel to the stacked output).