Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Investigating heterogeneous protein annotations toward cross-corpora utilization

Figure 1

Learning curve drawn from the results of the preliminary experiments with the AIMed and GENIA corpora. The numbers represent the F-scores based on the exact matching criterion. The red color indicates the experimental results of the training with AIMed, and the blue color indicates the experimental results of adding GENIA. The F-score of 60.9% was obtained when about 10% of the AIMed abstracts were used for the training. Following that, 10% more AIMed abstracts were added to the training material gradually. Note that the best result (77.68% in F-score) was achieved when all the AIMed training portions (181 abstracts) were used. When 20 more GENIA sentences were added to the training material, the F-score degraded to 75.26%. The performance kept degrading when the GENIA abstracts were added in increasing proportions. Further, when the entire GENIA corpus (1,999 abstracts) was included, the F-score was as low as 66.16%.

Back to article page