From: Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy
publ. | Data | Background knowledge | Approach | Experiment | Accuracy | |
---|---|---|---|---|---|---|
Established Knowledge | [12] | gene definition & abstract vector | 5 human gen. dbs & MeSH | cosine similarity | 52,529 Medline abstracts, 690 human gene symbols | 92.7% |
[13] | free text | UMLS, Journal Descriptors | Journal Descriptor Indexing (JDI) | 45 ambiguous UMLS terms (NLM WSD Collection) | 78.7% | |
[14] | Medline abstracts | BioCreative-2 GN lexicon & text, EntrezGene, UniProt, GOA | motifs from multiple sequence alignments | BioCreative-2 GN challenge | 81% | |
[15] | Medline abstracts | list of gene senses, EntrezGene | inverse co-author graph | BioCreative GN challenge | 97%P | |
Supervised | [8] | XML tagged abstracts, positional info, PoS | - | naive Bayes, decision trees, inductive rule training | protein/gene/mRNA assignment: 9 million words (mol. biol. journals) | 85% |
[49] | text | - | word count, word cooc | - | 86.5% | |
Medline abstracts | UMLS terms | UMLS term cooc | 35 biomedical abbreviations | 93%P | ||
[10] | abbreviations in Medline abstracts | - | SVM | build dictionary, use for abbreviations occurring with their long forms | 98.5% | |
[11] | gene symbol context (n words +/-) | - | SVM | - | 85% | |
Unsupervised | document | - | LSA/LSI, 2ndorder cooc | 170,000 documents, 1013 terms (TREC-1) (Wall Street Journal) | ↑ 7–14% | |
[51] | word cooc, PoS tags | WordNet | average link clustering | 13 words, ACL/DCI | 73.4% | |
[21] | Wall Street Journal Corpus | |||||
[22] | - | - | 1st, 2ndorder context vectors (coocs within 5 positions) | 24 Senseval-2 words, Line, Hard, Serve corpora | 44% | |
[23] | text | few tagged data, WordNet | co-training, collocations | 12 common Engl. words × 4000 instances | 96.5% | |
[25] | - | - | co-training & majority voting | Senseval-2 generic English | ↑ 9.8% | |
[24] | - | WordNet | noun coocs, Markov clustering | - | - |