Empirical evaluation of scoring functions for Bayesian network model selection

BMC Bioinformatics

Table 1 Summary of gold standard networks

Dataset	Domain	Instances	Nodes	Edges	Average In-degree
Statlog (Australian Credit Approval)	Industry	690	15	33	2.20
Breast Cancer	Biology	699	10	20	2.00
Car Evaluation	Industry	1,728	7	9	1.29
Cleveland Heart Disease	Biology	303	14	22	1.57
Credit Approval	Industry	690	16	35	2.19
Diabetes	Biology	768	9	13	1.44
Glass Identification	Industry	214	10	17	1.70
Statlog (Heart)	Biology	270	14	21	1.50
Hepatitis	Biology	155	20	36	1.80
Iris	Biology	150	5	8	1.60
Nursery	Industry	12,960	9	14	1.56
Statlog (Vehicle Silhouettes)	Industry	846	19	40	2.11
Congressional Voting Records	Political	436	17	46	2.71

This table describes all of the datasets we used in this study. Dataset gives the name of the dataset in the UCI machine learning repository. Domain gives a rough indication of the domain of the dataset. Instances gives the number of instances in the original dataset. Nodes gives the number of variables in the dataset (and the number of nodes in the corresponding Bayesian network). Edges gives the number of edges in the optimal Bayesian network learned from the original dataset. This is the gold standard network used throughout the rest of the evaluation. Average In - degree gives the average number of parents of each variable in the learned Bayesian network.

ISSN: 1471-2105