Skip to main content

Table 1 Default models and the possible parameter choices for each model under different conditions

From: LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data

Classifier

Non-tunable parameters

Number of training features (≥ 4 features)

Parameters (If number of samples > 6)

Parameters (if number of samples ≤ 6)

Logistic regression (LBFGS solver)

max_iter = 2000

penalty = “l2”

Randomly selected—user defined

A grid search using fivefold stratified cross-validation is used to choose C from logarithmically spaced values in the range of 10–4 and 104

C parameter set to 1.0

Logistic regression (Liblinear solver)

max_iter = 2000

penalty = “l1”

Full

A grid search using fivefold stratified cross-validation is used to choose C from logarithmically spaced values in the range of 10–4 and 104

C parameter set to 1.0

Linear SVC

max_iter = 2000

Randomly selected—user defined

A grid search using fivefold stratified cross-validation is used to choose alpha (for SGD classifiers) or C (for the linear SVC). The possible choices for these parameters are 0.001, 0.01, 0.1, 1.0, 10, 100. In the case of the SGD Classifier, the loss function (hinge or modified Huber) is also chosen using 5 cross-validation

C parameter set to 1.0

Stochastic gradient descent classifier (L2 penalty)

max_iter = 2000

Randomly selected—user defined

A grid search using fivefold stratified cross-validation is used to choose alpha (for SGD classifiers) or C (for the linear SVC). The possible choices for these parameters are 0.001, 0.01, 0.1, 1.0, 10, 100. In the case of the SGD Classifier, the loss function (hinge or modified Huber) is also chosen using 5 cross-validation

Alpha parameter set to 1.0, loss function (hinge or modified Huber) is randomly chosen

Stochastic gradient descent classifier (elastic-net penalty)

max_iter = 2000

Full

A grid search using fivefold stratified cross-validation is used to choose alpha (for SGD classifiers) or C (for the linear SVC). The possible choices for these parameters are 0.001, 0.01, 0.1, 1.0, 10, 100. In the case of the SGD Classifier, the loss function (hinge or modified Huber) is also chosen using 5 cross-validation

Alpha parameter set to 1.0, loss function (hinge or modified Huber) is randomly chosen

Ridge regression

NA

Randomly selected—user defined

Alpha chosen from logarithmically spaced values in the range of 10–3 and 104 using generalized cross validation

NA

Neural network

batch_size = 32

epochs = 300

validation_split = 0.10

min_delta = 0.0001

patience = 40

See text for architecture details

Randomly selected—user defined

NA

NA