Skip to main content

Table 1 Description of the datasets. Size of the dataset (n), number of variables (p), number of minority class samples (n min ) and number of majority class samples (n maj )

From: Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models

Name n p n min n maj n min (%) Name minority
Indian 768 8 268 500 34.9 Positive
Parkinson 195 22 48 147 24.6 Healthy
Hepatitis 155 19 32 123 20.6 Dead
Abalone 4,177 8 1,307 2,870 31.3 Female
Letter 17,307 16 689 16,618 3.4 A
Lung 32 56 9 23 28.1 1
Tae 151 5 49 102 32.4 Low
Breast 106 9 22 84 20.8 Adi
Sonar 208 60 97 111 46.6 Rock
Ozone 2,536 72 73 2,463 2.9 Ozone day
Sotiriou:er 99 7,650 34 65 34.3 ER-
Sotiriou:grade 99 7,650 45 54 45.5 Grade 3
Ivshina:er 245 22,283 34 211 13.9 ER-
Ivshina:grade 245 22,283 55 234 22.4 Grade 3
Wang:er 286 22,283 77 209 26.9 ER-
Wang:relapse 286 22,283 107 179 37.4 Relapse