From: Improving classification in protein structure databases using text mining
CONDITIONS | PERFORMANCE | |||||
---|---|---|---|---|---|---|
 |  |  |  | Superfamily Classification N = 352, 5 classes | ||
 | Test Set | Reference Set | Lucene Analyser | AUC | MCC | F |
1 | Abstract | Dp20 – DX33 -Ann | Stop | 0.75 | 0.51 | 0.56 |
2 | Annotations | Dp20 – DX33 -Ann | Stop | 0.77 | 0.53 | 0.58 |
3 | Abstract | Dp20 – Ann | Stop | 0.74 | 0.50 | 0.53 |
4 | Abstract | Dp20 – Ann | Standard | 0.70 | 0.33 | 0.44 |
5 | Abstract | Dp20 | Stop | 0.74 | 0.49 | 0.52 |
6 | Abstract | Dp1 | Stop | 0.64 | 0.31 | 0.40 |