Skip to main content

Table 1 Sensitivity of gene name extraction of PSE. Five datasets were used for the evaluation. The first two datasets represented as Set 1A, 1B contain forty abstracts retrieved with randomly generated PubMed IDs, respectively. The next two datasets labeled as Set 2A, 2B contain forty abstracts which were randomly selected from the 4,548 abstracts retrieved with "gene AND disease AND activation" as keywords, respectively. The last dataset represented as GENIA is the result from using the GENIA corpus containing 2,000 abstracts. The results using Set 1A,1B and Set 2A, 2B and GENIA are represented as Ev1, Ev2 and Ev3 in this manuscript, respectively. TP, FP and FN represent the true positive, the false positive and the false negative, respectively.

From: PSE: A tool for browsing a large amount of MEDLINE/PubMed abstracts with gene names and common words as the keywords

Dataset TP FP FN Precision Recall F-measure
Set 1A 50 15 61 76.9% 45.0% 56.8%
Set 1B 40 11 21 78.4% 65.6% 71.4%
Set 2A 287 23 157 92.6% 64.6% 76.1%
Set 2B 291 49 210 85.6% 58.1% 69.2%
GENIA 12,842 7,752 6,912 62.4% 65.0% 63.7%