Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation

BMC Bioinformatics

Table 7 Overall accuracy on the data set

Data set	NB	AEC	JDI	MRD	2-MRD
Abbreviation Set	0.9716	0.9090		0.8759	0.8501
Abbreviation Subset	0.9760	0.9218	0.6725	0.8838	0.8725
Term Set	0.8980	0.7462		0.7148	0.6773
Term Subset	0.8991	0.7448	0.6209	0.7132	0.6609
Term/Abbreviation Set	0.9384	0.8879		0.8801	0.9356
Term/Abbreviation Subset	0.9360	0.9026	0.6899	0.8715	0.9350
Overall MSH WSD Set	0.9386	0.8383		0.8070	0.7799
Overall MSH WSD Subset	0.9413	0.8448	0.6551	0.8118	0.7837
NLM WSD	0.8830	0.6836		0.6389	0.5500
NLM WSD Subset	0.9063	0.6932	0.7475	0.6526	0.5800

NB stands for Naïve Bayes, AEC stands for Automatic Extracted Corpus, MRD stands for Machine Readable dictionary, 2-MRD stands for 2nd Order Co-occurrence MRD, and JDI stands for Journal Descriptor Indexing. The term set stands for all the ambiguous words in the category while subset indicates that only the words that the JDI method can use are considered. Results on the NLM WSD set have been included.

ISSN: 1471-2105