From: Prediction of diabetes disease using an ensemble of machine learning multi-classifier models
MLMs | Hyper-parameters |
---|---|
K-NN | The number of neighbors to inspect in a k-NN Algorithm for computing nearest neighbors   Ball Tree: A D-dimension hyper-parameter or ball is defined by Node   KD Tree: A D-dimension point is the Leaf node   Brute: based on the search using brute-force The size of the leaf for BT or KDT is determined by the nature of the problem The distance metric to use for the tree [Manhattan (\({L}_{1}\)- norm) or Euclidean (\({L}_{2}\)- norm)] |
SVM | The type of kernel function (Linear, Polynomial, RBF, sigmoid) C: Penalty parameter (The C parameter controls how much you want to punish your model for each misclassified point for a given curve) Gama: Kernel coefficient (Gamma parameter in Radial basis function, polynomial, and sigmoid kernels, controls the distance of influence of a single training point) Decision_function_shape or multi-classification approach (OVA or OVO) |
DT | Criterion function: Gini (Gini impurity) or entropy (information gain) The method for selecting the split at each node The tree's maximum depth The bare minimum of samples is needed to split an internal node The bare minimum of samples is required at each leaf node. The total weights' minimum weighted fraction The number of features to take into account when looking for the ideal split |
RF | The N of Decision Trees in the forest The Criteria which to split on at each node of the trees: (Gini or Entropy for classification) The maximum depth of the individual trees At an internal node, a minimal number of samples to divide on. Maximum number of leaf nodes Number of random features The size of the bootstrapped dataset |
AdaBoost | The boosting algorithm   Real boosting   Discrete boosting Learning rate to shrink the contribution of each classifier The maximum number of estimators to terminate the boosting |
GNB | Variance smoothing (the portion of the largest variance of all features) |