Skip to main content

Table 1 Overall results

From: Combining joint models for biomedical event extraction

   

UMass

  

Stanford

  

FAUST

 
  

Recall

Precision

F1

Recall

Precision

F1

Recall

Precision

F1

GE

Task 1

48.5

64.1

55.2

42.4

61.1

50.0

49.4

64.8

56.0

GE

Task 2

43.9

60.9

51.0

--

--

--

46.7

63.8

53.9

EPI

FULL

28.1

41.6

33.5

26.6

37.9

31.2

28.9

44.5

35.0

EPI

CORE

57.0

73.3

64.2

56.9

70.2

62.8

59.9

80.3

68.6

ID

FULL

46.9

62.0

53.4

46.3

55.9

50.6

48.0

66.0

55.6

ID

CORE

49.7

62.4

55.3

49.2

56.4

52.5

50.8

66.4

57.6

  

FAUST (without novel)

      
  

Recall

Precision

F1

      

GE

Task 1

47.6

69.7

56.6

      
  1. Results on test sets of all tasks we submitted to, for three models. We list recall, precision, and F1 using the standard BioNLP approximate recursive metric. For the GE and ID datasets, the Stanford model used all four decoders with the reranker. For EPI, the Stanford model used only the 1N decoder with the reranker. In all three domains, the stacked UMass←Stanford model (FAUST) used all four decoders from the Stanford model as inputs. The "FAUST (without novel)" is created by removing all events which don't occur in either the UMass or Stanford models (i.e., events which are novel to the stacked output).