Feature engineering for MEDLINE citation categorization with MeSH

BMC Bioinformatics

Table 3 Results of the best performance features (Unigrams, Bigrams, Concepts’ names and CUIs, and First level taxonomy) keeping the source of tokens (either title or abstract), using SVM-perf and a binary representation of features

	Precision	Recall	F-measure
SVM-perf unigram	0.395	0.654	0.492
SVM-perf bigram	0.414	0.675	0.513*
SVM-perf concepts	0.404	0.646	0.497*
SVM-perf CUIs	0.404	0.643	0.496*
SVM-perf first level taxonomy	0.351	0.653	0.456
SVM-perf TIAB unigram	0.398	0.659	0.496*
SVM-perf TIAB bigram	0.408	0.685	0.512*
SVM-perf TIAB Concepts	0.405	0.656	0.501*
SVM-perf TIAB CUIs	0.407	0.655	0.502*
SVM-perf TIAB first level taxonomy	0.376	0.610	0.465

ISSN: 1471-2105