Skip to main content

Table 1 Sensitivity of gene name extraction of PSE. Five datasets were used for the evaluation. The first two datasets represented as Set 1A, 1B contain forty abstracts retrieved with randomly generated PubMed IDs, respectively. The next two datasets labeled as Set 2A, 2B contain forty abstracts which were randomly selected from the 4,548 abstracts retrieved with "gene AND disease AND activation" as keywords, respectively. The last dataset represented as GENIA is the result from using the GENIA corpus containing 2,000 abstracts. The results using Set 1A,1B and Set 2A, 2B and GENIA are represented as Ev1, Ev2 and Ev3 in this manuscript, respectively. TP, FP and FN represent the true positive, the false positive and the false negative, respectively.

From: PSE: A tool for browsing a large amount of MEDLINE/PubMed abstracts with gene names and common words as the keywords

Dataset

TP

FP

FN

Precision

Recall

F-measure

Set 1A

50

15

61

76.9%

45.0%

56.8%

Set 1B

40

11

21

78.4%

65.6%

71.4%

Set 2A

287

23

157

92.6%

64.6%

76.1%

Set 2B

291

49

210

85.6%

58.1%

69.2%

GENIA

12,842

7,752

6,912

62.4%

65.0%

63.7%