Skip to main content

Table 1 Big data supervised algorithms, imbalance management pre-processing methods and parameter values considered in this paper

From: Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers

N Algorithms Pre-processing Parameter values
1 Spark Random Foresta ROS/RUS NumTrees: 100
(by default)
MaxBins: 1000
(by default)
Impurity: gini/entropy
MaxDepth: 5 (by default)
Number of maps: 20
MinInstancesPerNode: 2
MinInfoGain: 0
FeatureSubsetStrategy: auto
Resampling size: 100%/130%
2 Spark Decision Treesb ROS/RUS MaxBins - > Number of bins used when discretizing continuous features: 100 (by default)
Impurity - > Impurity measure: gini (by default)
MaxDepth - > Maximum depth of each tree: 5 (by default)
MinInstancesPerNode: 2
MinInfoGain: 0
FeatureSubsetStrategy: auto
Resampling size: 100%/130%
3 Spark Support Vector Machinesc ROS Regulation parameter: 1.0/0.5/0.0
Number of iterations: 100 (by default)
StepSize: 1.0 (by default)
miniBatchFraction: 1.0
Resampling size: 100%/130%
4 Spark Logistic Regressiond ROS Number of iterations: 100 (by default)
StepSize - > Stochastic gradient descent parameter:
1.0 (by default)
MiniBatchFraction - > Fraction of the dataset sampled and used in each iteration: 1.0
(by default: 100%)
Resamplig size: 100%/130%
5 Spark Naive Bayese ROS Additive smoothing Lambda: 1.0 (by default) Resampling size: 100%/130%
6 MapReduce Random Forestsf ROS Number of trees: 100
Random selected attributes per node: 3
Number of maps: 20
Resampling size: 100%/130%
  1. ROS: Random Oversampling, RUS: Random Undersampling
  2. a https://spark.apache.org/docs/latest/mllib-ensembles.html
  3. b https://spark.apache.org/docs/latest/mllib-decision-tree.html
  4. c https://spark.apache.org/docs/latest/mllib-linear-methods.html#linear-support-vector-machines-svms
  5. d https://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression
  6. e https://spark.apache.org/docs/latest/mllib-naive-bayes.html
  7. fRandom Forest implementation available in https://mahout.apache.org/