Data set | Description | Number of training Cases/Controls | Number of test Cases/Controls | Number of features | Citations |
---|---|---|---|---|---|
Lung Cancer | Clinical data, X-ray data, etc. used to predict 3 pathological types of lung cancer. The instances are divided into three classes of 9, 10, and 13 observations. For purposes here, the first two classes are aggregated into a single class. | 8/8 | 11/5 | 54 integer clinical features | [66] |
SPECT | Instances of normal and abnormal cardiac diagnoses. | 40/40 | 172/15 | 22 binary features indicating partial diagnoses | |
Parkinsons | Biomedical voice measurements from 31 people, including 23 with Parkinson’s disease. | 72/25 | 75/23 | 22 real features | [69] |
Arcene | Mass-spectrometric data that can be used to distinguish patients with cancer versus healthy subjects. | 44/56 | 44/56 | The data set contains 10,000 integer features; a Kolmogorov-Smirnov test [61] was used to choose the top 268 most discriminating features for classification. | [70] |
Arrhythmia | Normal and “abnormal” instances of demographic and electrocardiogram features. | 127/99 | 118/108 | 278 categorial, integer and real demographic and electrocardiogram features. A Kolmogorov-Smirnov test [61] was used to select the 32 most discriminating features for classification. | [71] |
Breast Cancer | This data set contains features from a digitized images of fine needle aspirates (FNA) of breast masses, which describe characteristics of the cell nuclei present in the images. The data set contains benign and malignant instances of real-valued features. | 130/219 | 111/239 | 8 | |
Contraception | This data set is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey which samples married women who were either not pregnant or do not know if they were at the time of interview. The aim for the binary classifier constructed in this work is to predict whether or not a woman uses contraception based on their categorical and integer-valued demographic and socio-economic characteristics. The subset contains information for 1473 women, who are sub-divided based on their contraceptive use: no use (629), long-term methods (333), or short-term methods (511). The goal of the classifier is to classify women based on whether or not they use contraception based on categorical and integer-valued demographic and socio-economic characteristics. | 423/313 | 421/316 | 8 | [74] |