N | Algorithms | Pre-processing | Parameter values | |
---|---|---|---|---|
1 | Spark Random Foresta | ROS/RUS |
NumTrees: 100 (by default) MaxBins: 1000 (by default) Impurity: gini/entropy |
MaxDepth: 5 (by default) Number of maps: 20 MinInstancesPerNode: 2 MinInfoGain: 0 FeatureSubsetStrategy: auto Resampling size: 100%/130% |
2 | Spark Decision Treesb | ROS/RUS |
MaxBins - > Number of bins used when discretizing continuous features: 100 (by default) Impurity - > Impurity measure: gini (by default) MaxDepth - > Maximum depth of each tree: 5 (by default) MinInstancesPerNode: 2 MinInfoGain: 0 FeatureSubsetStrategy: auto Resampling size: 100%/130% | |
3 | Spark Support Vector Machinesc | ROS |
Regulation parameter: 1.0/0.5/0.0 Number of iterations: 100 (by default) |
StepSize: 1.0 (by default) miniBatchFraction: 1.0 Resampling size: 100%/130% |
4 | Spark Logistic Regressiond | ROS |
Number of iterations: 100 (by default) StepSize - > Stochastic gradient descent parameter: 1.0 (by default) |
MiniBatchFraction - > Fraction of the dataset sampled and used in each iteration: 1.0 (by default: 100%) Resamplig size: 100%/130% |
5 | Spark Naive Bayese | ROS | Additive smoothing Lambda: 1.0 (by default) | Resampling size: 100%/130% |
6 | MapReduce Random Forestsf | ROS |
Number of trees: 100 Random selected attributes per node: 3 |
Number of maps: 20 Resampling size: 100%/130% |