Skip to main content

Table 2 Description of developed machine learning algorithms

From: AUD-DSS: a decision support system for early detection of patients with alcohol use disorder

Model

Description

SVM [56]

It is a statistical model that performs classification using a maximum margin. SVM classifies data by calculating a hyper plane that separates points in an N-dimensional space (N features), while maintaining a maximum margin between points in the classes. To perform classification, the algorithm looks for the hyper plane that separates classes so that the support vectors are furthest from it

KNN [57]

As a non-parametric classifier, KNN attempts to classify an unknown instance based on its neighbors' classification. This means that it labels targets by checking class labels of the k nearest points in the feature space. When classifying a target, it assigns the most common class assigned to its nearest k neighboring points

DT [58]

It is recursive, greedy algorithm that implements a tree data structure where nodes and branches represent targets and features respectively. The first node is the root node, and other nodes split from it. All nodes and subsequent leaves are used in finding the best class for the target. The DT algorithm first develops a tree to its maximum depth, ensuring so each leaf node is pure, and then prunes upwards to optimize the classification error as well as the proportion of final nodes in the tree

RF [59]

It is a bagging ensemble algorithm that is very popular in health-related studies. In general, a RF is a set of classifiers made up of decision trees created from two separate randomization sources. Firstly, a random sample is trained on each individual decision tree, replacing original information with the same size as the supplied training set. Around 37% of redundant instances are estimated to be present in the resulting bootstrapping

XGBoost [60]

It is a DT ensemble based on gradient boosting algorithm that is adaptable, portable and efficient. XGBoost uses the 2nd order derivative as an approximation and provides additional hyperparameters. As a starting point, a predicted value is assumed. Improvement of the prediction accuracy is done by adding an additional tree to the residuals of its preceding tree. After each tree is trained, its contribution to the final model is weighted by a learning rate

LR [61]

The LR algorithm is a common classification approach in clinical research since the dependent event is discrete, such as positive/negative, and it is often included into the ensemble framework. In our work, LR classifies by calculating the probability of a discrete binary class, such as AUD-Positive/AUD-Negative. LR is a type of linear regression that employs a "Sigmoid Function" cost function. This function converts any value between 0 and 1 to the probability value between 0 and 1. Predictions and probability are correlated using this function. The cost function reflects the purpose of optimization. This optimization is accomplished by reducing the cost function in order to create minimum error. Using the gradient descent, the cost value is reduced

SE [62]

The stacking method is a well-liked heterogeneous ensemble learning technique that uses metamodels to enable merging various base classifiers to generate predictions with a higher degree of accuracy. The main benefit of SE is its ability to combine various effective models to produce more accurate forecasts. Particularly, each of base classifier has its own advantages. SE is basically trained on the entire training sent and a meta estimator is employed to learn how to combine the base classifiers, distinct other ensemble learning algorithms such as RF. SE can evaluate the error of all base classifiers individually using basic learning processes, and then decrease residual errors using meta learning steps

  1. SVM Support vector machine, KNN K-nearest neighbor, DT Decision tree, RF Random forest, XGBoost Extreme gradient boosting, LR Logistic regression, SE Stacking ensemble