Volume 14 Supplement 17
Feature selection and prediction with a Markov blanket structure learning algorithm
© Tan and Liu; licensee BioMed Central Ltd. 2013
Published: 22 October 2013
Classification and prediction are common tasks in machine learning. For example, many studies have attempted to predict gene expression given information, such as DNA sequence, expression of other genes or epigenetic modifications. Many existing methods, such as neural networks and support vector machines, have been used to make these predictions. Unfortunately, these black box techniques offer little insight into the reasoning behind the predictions. In many cases, relatively few attributes contribute to the classification accuracy. Bayesian networks explicitly encode the relationships among attributes to make predictions. In a Bayesian network, the Markov blanket (MB) of the class variable gives all of the information necessary to predict its value. In this work, we propose an algorithm to learn only the MB of the class variable; other attributes are removed. Therefore, our algorithm combines classification and feature selection. Results on benchmark machine learning datasets indicate that our feature selection technique usually reduces the size of the dataset more than 80% on some datasets. Accuracy results suggest that the classification ability of our algorithm is competitive with existing state of the art techniques.
Materials and methods
In a classification problem, we are given a dataset consisting of a set of attributes A and a class variable C. Furthermore, the dataset is split into a training set D tr and a testing set D te . The goal is to learn a classifier from D tr that correctly predicts C in D te . In this study, we compared the performance of our Markov Blanket structure with other classical classifiers such as C4.5  , optimal Bayesian network , and Tree Augmented Naïve Bayes Network  and Markov Blank Hill Climbing . Here is a general introduction for those classifiers.
Markov blanket feature selection algorithm
Discussion and conclusions
The compression ratio decreases as the number of variables in the dataset increases. This suggests that, even as dataset sizes increase, only a few attributes are helpful in predicting the class variable. The compression ratio is unaffected by the number of records in the dataset. This suggests that even when given many records, our algorithm does not pick many attributes in an attempt to overfit the dataset. Ignoring unimportant attributes does not significantly affect the classification accuracy. Despite compressing the data on average more than 70%, the classification accuracy is rarely more that 5% below the best classifier. Identifying MB variables could significantly reduce the cost of diagnostic lab tests by focusing interest on only the most relevant attributes.
- Quinlan JR: C4.5: programs for machine learning. Machine Learning. 1994, 16 (3): 235-240.Google Scholar
- Malone B, Yuan C, Hansen E, Bridges S: Improving the scalability of optimal Bayesian network learning with external-memory frontier breadth-first branch and bound search. Proceedings of the Twenty-Seventh Annual Conference on Uncertainty in Artificial Intelligence. Edited by: Fabio G. Cozman and Avi Pfeffer. 2011, Barcelona: AUAI Press, 2011: 479-488.Google Scholar
- Friedman N, Geiger D, Goldszmidt M: Bayesian network classifiers. Machine Learning. 1997, 29: 131-163. 10.1023/A:1007465528199.View ArticleGoogle Scholar
- Tsamardinos I, Brown LE, Aliferis CF: The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning. 2006, 65 (1): 31-78. 10.1007/s10994-006-6889-7.View ArticleGoogle Scholar
- Bache K, Lichman M: UCI Machine Learning Repository. [http://archive.ics.uci.edu/ml]
- Liu Z, Malone B, Yuan C: Empirical evaluation of scoring functions for Bayesian network model selection. BMC Bioinformatics. 2012, 13 (Suppl 15): S14-10.1186/1471-2105-13-S15-S14.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.