Model | F1 | | |
---|
UMass | 54.7 | | |
---|
UMass←Stanford | 55.8 | | |
---|
Model | Alone | Intersection with UMass | Union with UMass |
---|
Stanford (1N) | 49.9 | 49.0 | 54.7 |
Stanford (1P) | 49.0 | 48.3 | 54.6 |
Stanford (2N) | 46.5 | 45.4 | 54.8 |
Stanford (2P) | 49.5 | 49.1 | 54.4 |
Stanford (all) | -- | 42.4 | 53.0 |
Stanford (1N, reranked) | 50.2 | 49.7 | 54.4 |
Stanford (1P, reranked) | 49.4 | 50.2 | 53.2 |
Stanford (2N, reranked) | 47.8 | 46.9 | 54.6 |
Stanford (2P, reranked) | 50.4 | 50.0 | 54.4 |
Stanford (all, reranked) | 50.7 | 50.0 | 54.7 |
Model | | Intersection | Union |
Stanford (all) | | 43.9 | 50.2 |
- Stacking and reranking outperform the intersection and union model combination baselines. The first section of the table summarizes the results from the UMass and stacked models. The second section gives the performance of each Stanford model alone and when combined with the pure UMass model via the intersection and union methods. In the last section, we evaluate the intersection and union baselines using only the four Stanford models as inputs. The "Stanford (all)" line represents using all four individual decoders without model combination (hence the Alone column in the second table is left empty--it cannot be evaluated since it isn't a single set of outputs). In "Stanford (all, reranked)", the reranker was used to combine the four decoders into a single output before being intersected or unioned. All results are on the development set for the Genia track.