Prediction of diabetes disease using an ensemble of machine learning multi-classifier models

Abnoosian, Karlo; Farnoosh, Rahman; Behzadi, Mohammad Hassan

doi:10.1186/s12859-023-05465-z

BMC Bioinformatics

Table 2 shows various MLMs with hyper-parameters that can be tuned in the internal loop using optimization approaches

From: Prediction of diabetes disease using an ensemble of machine learning multi-classifier models

MLMs	Hyper-parameters
K-NN	The number of neighbors to inspect in a k-NN Algorithm for computing nearest neighbors Ball Tree: A D-dimension hyper-parameter or ball is defined by Node KD Tree: A D-dimension point is the Leaf node Brute: based on the search using brute-force The size of the leaf for BT or KDT is determined by the nature of the problem The distance metric to use for the tree [Manhattan (\({L}_{1}\)- norm) or Euclidean (\({L}_{2}\)- norm)]
SVM	The type of kernel function (Linear, Polynomial, RBF, sigmoid) C: Penalty parameter (The C parameter controls how much you want to punish your model for each misclassified point for a given curve) Gama: Kernel coefficient (Gamma parameter in Radial basis function, polynomial, and sigmoid kernels, controls the distance of influence of a single training point) Decision_function_shape or multi-classification approach (OVA or OVO)
DT	Criterion function: Gini (Gini impurity) or entropy (information gain) The method for selecting the split at each node The tree's maximum depth The bare minimum of samples is needed to split an internal node The bare minimum of samples is required at each leaf node. The total weights' minimum weighted fraction The number of features to take into account when looking for the ideal split
RF	The N of Decision Trees in the forest The Criteria which to split on at each node of the trees: (Gini or Entropy for classification) The maximum depth of the individual trees At an internal node, a minimal number of samples to divide on. Maximum number of leaf nodes Number of random features The size of the bootstrapped dataset
AdaBoost	The boosting algorithm Real boosting Discrete boosting Learning rate to shrink the contribution of each classifier The maximum number of estimators to terminate the boosting
GNB	Variance smoothing (the portion of the largest variance of all features)

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com