Performance of our system over development and test sets, varying the likelihood threshold The blue curve displays the precision and recall of our system over the development set, while varying the likelihood threshold. In this figure, the values for the likelihood threshold ranged from 0.5 to 0.005 and are displayed for major intervals on the curve. The threshold value of 0.05 was chosen, since it seemed to yield the highest recall without unnecessarily sacrificing precision over the development set. The red isolated point corresponds to the performance of our system, using the chosen threshold value of 0.05, over the arthritis corpus test set, while the blue point corresponds to its performance over the development set. The red curve corresponds to our system’s performance over the arthritis test set. Note that this curve has a similar trajectory to the performance over the BC development set and that the point of 0.05 likelihood threshold on it corresponds to a similar precision/recall trade-off as the development curve.