Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

Figure 1

The workflow of BICEPP and the evaluation procedure. A. The procedures of feature derivation and feature selection. The features for the inputs of machine learning classifiers are the CDF of 20-most predictive tokens. The CDF of a token, given a drug, is defined as the proportion of abstracts containing the token within the list of abstracts retrieved by using the drug name as query to search MEDLINE. B. Cross-validation was performed to estimate the generalisation performance of BICEPP. The feature selection described in (A) was performed on the training set (which contains k-1 folds of data) and machine learning models were built to predict test set data. This figure illustrates the 5 × stratified up-to-10-fold cross-validation procedure used throughout the evaluation experiments in this paper. Abbreviations: AMH: Australian Medicines Handbook; AWT: abstract with title; AUC: area under ROC curve; CDF: conditional document frequency using the drug name as query to search MEDLINE; ROC: receiver operating characteristics;

Back to article page