Skip to main content

Table 1 Algorithms for Word Sense Disambiguation.

From: Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

 

publ.

Data

Background knowledge

Approach

Experiment

Accuracy

Established Knowledge

[12]

gene definition & abstract vector

5 human gen. dbs & MeSH

cosine similarity

52,529 Medline abstracts, 690 human gene symbols

92.7%

 

[13]

free text

UMLS, Journal Descriptors

Journal Descriptor Indexing (JDI)

45 ambiguous UMLS terms (NLM WSD Collection)

78.7%

 

[14]

Medline abstracts

BioCreative-2 GN lexicon & text, EntrezGene, UniProt, GOA

motifs from multiple sequence alignments

BioCreative-2 GN challenge

81%

 

[15]

Medline abstracts

list of gene senses, EntrezGene

inverse co-author graph

BioCreative GN challenge

97%P

Supervised

[8]

XML tagged abstracts, positional info, PoS

-

naive Bayes, decision trees, inductive rule training

protein/gene/mRNA assignment: 9 million words (mol. biol. journals)

85%

 

[49]

text

-

word count, word cooc

-

86.5%

 

[9, 50]

Medline abstracts

UMLS terms

UMLS term cooc

35 biomedical abbreviations

93%P

 

[10]

abbreviations in Medline abstracts

-

SVM

build dictionary, use for abbreviations occurring with their long forms

98.5%

 

[11]

gene symbol context (n words +/-)

-

SVM

-

85%

Unsupervised

[19, 20]

document

-

LSA/LSI, 2ndorder cooc

170,000 documents, 1013 terms (TREC-1) (Wall Street Journal)

↑ 7–14%

 

[51]

word cooc, PoS tags

WordNet

average link clustering

13 words, ACL/DCI

73.4%

 

[21]

   

Wall Street Journal Corpus

 
 

[22]

-

-

1st, 2ndorder context vectors (coocs within 5 positions)

24 Senseval-2 words, Line, Hard, Serve corpora

44%

 

[23]

text

few tagged data, WordNet

co-training, collocations

12 common Engl. words × 4000 instances

96.5%

 

[25]

-

-

co-training & majority voting

Senseval-2 generic English

↑ 9.8%

 

[24]

-

WordNet

noun coocs, Markov clustering

-

-