Skip to main content


Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Table 5 Stacking and other model combination techniques

From: Combining joint models for biomedical event extraction

Model F1   
UMass 54.7   
UMass←Stanford 55.8   
Model Alone Intersection with UMass Union with UMass
Stanford (1N) 49.9 49.0 54.7
Stanford (1P) 49.0 48.3 54.6
Stanford (2N) 46.5 45.4 54.8
Stanford (2P) 49.5 49.1 54.4
Stanford (all) -- 42.4 53.0
Stanford (1N, reranked) 50.2 49.7 54.4
Stanford (1P, reranked) 49.4 50.2 53.2
Stanford (2N, reranked) 47.8 46.9 54.6
Stanford (2P, reranked) 50.4 50.0 54.4
Stanford (all, reranked) 50.7 50.0 54.7
Model   Intersection Union
Stanford (all)   43.9 50.2
  1. Stacking and reranking outperform the intersection and union model combination baselines. The first section of the table summarizes the results from the UMass and stacked models. The second section gives the performance of each Stanford model alone and when combined with the pure UMass model via the intersection and union methods. In the last section, we evaluate the intersection and union baselines using only the four Stanford models as inputs. The "Stanford (all)" line represents using all four individual decoders without model combination (hence the Alone column in the second table is left empty--it cannot be evaluated since it isn't a single set of outputs). In "Stanford (all, reranked)", the reranker was used to combine the four decoders into a single output before being intersected or unioned. All results are on the development set for the Genia track.