Skip to main content

Table 2 Results for the ML-Normalization evaluated with the test corpora

From: Moara: a Java library for extracting and normalizing gene and protein mentions

Organism

Best results

(BioCreative and GNAT)

Moara results

    

Exact matching

Machine learning matching

 

Recall

Precision

F-Measure

Recall

Precision

F-Measure

Recall

Precision

F-Measure

Yeast

89.4

95.0

92.1

83.52

95.17

88.97

84.34

81.67

82.99

Mouse

91.6

72.6

81.0

77.57

65.83

71.22

79.60

32.90

46.56

Fly

80.0

83.1

81.5

69.76

59.12

63.58

69.00

55.22

61.35

Human

90.1

81.1

85.4

83.31

55.00

66.26

85.99

29.13

43.52

  1. Best results by organism for the gene/protein normalization task evaluated with the test corpora of the BioCreative 1 task 1B (yeast, mouse and fly) and BioCreative 2 Gene Normalization task (human). These corpora consist of 250 PubMed abstracts each for yeast, mouse and fly, and 262 documents for human. The results were produced using a mix of Abner, Banner and CBR-Tagger (CbrBC2ymf), flexible matching, and single disambiguation by cosine similarity multiplied by the number of common words. The machine learning configuration is the one that performs reasonable well for all the organisms examined here and uses Support Vector Machines as the main algorithm, the F2 set of features (trigram similarity, bigram similarity, number and string similarity), pairs of synonyms selected by 0.9 trigram and bigram similarity and Smith-Waterman for the string similarity feature. The best results for each organism in both competitions are shown.