Skip to main content

Table 4 The AUC and G-Mean values of all the algorithms (supervised and unsupervised) in testing datasets

From: Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers

Algorithm/Dataset Alignment-based Features Alignment-free Features Alignment-based + Alignment-free Features
G-Mean AUC G-Mean AUC G-Mean AUC
  Scer
Cgla
Cgla
Klac
Klac
Kwal
Scer
Cgla
Cgla
Klac
Klac
Kwal
Scer
Cgla
Cgla
Klac
Klac
Kwal
Scer
Cgla
Cgla
Klac
Klac
Kwal
Scer
Cgla
Cgla
Klac
Klac
Kwal
Scer
Cgla
Cgla
Klac
Klac
Kwal
Supervised Algorithms
 Spark Random Forest MLlib 1.6 (Impurity: gini)
  Normal 0.3853 0.3119 0.3421 0.5742 0.5486 0.5585 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.6647 0.1009 0.6104 0.7209 0.5051 0.6863
  ROS-100 0.9962 0.9941 0.9966 0.9962 0.9941 0.9966 0.9375 0.9139 0.9186 0.9375 0.9148 0.9189 0.9972 0.9917 0.9950 0.9972 0.9917 0.9950
  ROS-130 0.9977 0.9956 0.9974 0.9977 0.9956 0.9974 0.9313 0.9162 0.9166 0.9315 0.9166 0.9166 0.9958 0.9929 0.9945 0.9958 0.9930 0.9945
  RUS 0.9974 0.9953 0.9977 0.9974 0.9953 0.9977 0.9325 0.8917 0.9152 0.9325 0.8941 0.9153 0.9973 0.9950 0.9973 0.9973 0.9950 0.9973
 Spark Random Forest MLlib 1.6 (Impurity: entropy)
  Normal 0.7457 0.0365 0.3809 0.7780 0.1192 0.5725 0.0000 0.0000 0.0000 0.5000 0.0858 0.5000 0.6001 0.0064 0.3195 0.6801 0.0064 0.5510
  ROS-100 0.9971 0.9948 0.9969 0.9971 0.9948 0.9969 0.9333 0.9169 0.9097 0.9333 0.9180 0.9106 0.9971 0.9947 0.9965 0.9971 0.9947 0.9965
  ROS-130 0.9974 0.9950 0.9967 0.9974 0.9950 0.9967 0.9267 0.9101 0.9087 0.9267 0.9108 0.9088 0.9975 0.9955 0.9945 0.9975 0.9955 0.9945
  RUS 0.9977 0.9949 0.9976 0.9977 0.9949 0.9976 0.9396 0.9081 0.9202 0.9397 0.9097 0.9207 0.9974 0.9948 0.9975 0.9974 0.9948 0.9975
 Spark Decision Trees MLlib 1.6
  Normal 0.3751 0.2983 0.3301 0.5703 0.5445 0.5545 0.3848 0.0252 0.3548 0.5740 0.5003 0.5629 0.6505 0.5017 0.6107 0.7115 0.6259 0.6865
  ROS-100 0.9973 0.9941 0.9960 0.9973 0.9941 0.9960 0.9496 0.9153 0.9258 0.9496 0.9157 0.9262 0.9977 0.9483 0.9954 0.9977 0.9495 0.9954
  ROS-130 0.9957 0.9906 0.9961 0.9957 0.9906 0.9961 0.9464 0.8993 0.9293 0.9465 0.9002 0.9293 0.9972 0.9449 0.9965 0.9972 0.9463 0.9965
  RUS 0.9970 0.9936 0.9975 0.9970 0.9936 0.9975 0.9473 0.9156 0.9317 0.9473 0.9158 0.9317 0.9971 0.9720 0.9966 0.9971 0.9723 0.9966
 Spark Support Vector Machines MLlib 1.6
  Normal (0.0) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  Normal (0.5) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  Normal (1.0) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  ROS-100 (0.0) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.8486 0.8467 0.8482 0.8517 0.8482 0.8496 0.9682 0.9581 0.9677 0.9684 0.9585 0.9679
  ROS-100 (0.5) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  ROS-100 (1.0) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  ROS-130 (0.0) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.7719 0.7786 0.7779 0.7929 0.7950 0.7961 0.9708 0.9612 0.9683 0.9709 0.9615 0.9685
  ROS-130 (0.5) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  ROS-130 (1.0) 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
 Spark Logistic Regression MLlib 1.6
  Normal 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  ROS-100 0.3994 0.3663 0.3943 0.5012 0.4848 0.4981 0.2861 0.2867 0.2725 0.5028 0.5032 0.4989 0.0815 0.0665 0.0677 0.5007 0.4995 0.4996
  ROS-130 0.4056 0.3925 0.4060 0.5006 0.5089 0.5003 0.3008 0.3091 0.2954 0.5027 0.5054 0.5012 0.1416 0.1173 0.1274 0.5018 0.4987 0.4999
 Spark Naive Bayes MLlib 1.6
  Normal 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000 0.0000 0.0000 0.0000 0.5000 0.5000 0.5000
  ROS-100 0.4070 0.3943 0.4002 0.4990 0.4981 0.4949 0.4182 0.4371 0.4164 0.5009 0.5113 0.4999 0.1365 0.1498 0.1180 0.4996 0.5016 0.4972
  ROS-130 0.0171 0.4060 0.0172 0.5001 0.5003 0.5001 0.4823 0.4991 0.4825 0.4997 0.5202 0.4985 0.2067 0.2163 0.1953 0.5003 0.5024 0.4979
 MapReduce Random Forest Mahout 0.9
  Normal 0.7178 0.6652 0.6864 0.7576 0.7212 0.7356             
  ROS-100 0.9903 0.9786 0.9859 0.9903 0.9789 0.9860             
  ROS-130 0.9905 0.9783 0.9846 0.9905 0.9785 0.9847             
Unsupervised Algorithms
 RBH 0.8069 0.8052 0.8491 0.8255 0.8242 0.8605             
 RSD 0.2 1e-20 0.9309 0.9038 0.9654 0.9333 0.9092 0.966             
 RSD 0.5 1e-10 0.9426 0.9277 0.9818 0.9442 0.9294 0.9819             
 RSD 0.8 1e-05 0.9472 0.9373 0.9876 0.9486 0.9374 0.9877             
 OMA 0.7311 0.7264 0.9388 0.7673 0.9163 0.9407             
  1. Supervised algorithm performance is presented for the alignment-based, alignment-free and alignment-based + alignment-free feature combinations. The best results in each dataset are in bold face and the general best results are underlined. The Random Oversampling pre-processing (ROS) is accompanied by the corresponding resampling size value. RSD parameter values are the divergence and the E-value thresholds. Support Vector Machines are represented with their regulation parameter values