Skip to main content

Table 1 Task 2 dataset description in numbers. The table shows the basic numbers referring to the task 2 training and test datasets. The full text articles of the training set were from the Journal of Biological Chemistry (JBC), Nature Medicine, Nature Genetics and Oncogene, while the test set articles were all from JBC.

From: Evaluation of BioCreAtIvE assessment of task 2

Data set description Training set Test set 2.1 Test set 2.2 Data Type
Full text articles 803 113 99 free text
Total of GO annotation 2317 1076 1227 annotations
Number of proteins in the GO annotations 939 138 138 proteins
Number of GO terms used in the GO annotations 776 580 544 GO terms
Average number of annotations per protein 2.467 7.797 8.891 annotations
Annotations of Molecular Function GO terms 709 330 356 annotations
Annotations of Biological Process GO terms 1061 544 701 annotations
Annotations of Cellular Component GO terms 547 182 170 annotations
Molecular Function terms in the annotations 343 173 179 GO terms
Biological Process terms in the annotations 339 334 314 GO terms
Cellular Component terms in the annotations 94 57 51 GO terms