Model | P (%) | R (%) | F (%) |
---|
Our best (BiLSTM-CRF) | 90.71 | 89.99 | 90.35 |
w/o label re-correction | 91.34 | 80.76 | 85.73** |
w/o CDRC | 90.48 | 89.14 | 89.81* |
w/o CDRA | 90.17 | 89.55 | 89.86** |
- The highest scores are highlighted in bold
- w/o label re-correction: we train the teachers on the two weakly labeled datasets CDWC and CDWA rather than CDRC and CDRA
- w/o CDRC: we train a single teacher without CDRC (i.e. only with CDRA)
- w/o CDRA: we train a single teacher without CDRA (i.e. only with CDRC)
- the marker * and ** represent P value < 0.05 and P value < 0.01, respectively, using pairwise t-test against our best (BiLSTM-CRF). Firstly, the formula of the pairwise t-test is defined as the sum of the differences of each pair divided by the square root of n times the sum of the differences squared minus the sum of the squared differences, overall n − 1. n is the number of pair. Then in this paper we use a two-tailed test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values