Skip to main content

Table 4 The AUC and G-Mean values of all the algorithms (supervised and unsupervised) in testing datasets

From: Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers

Algorithm/Dataset

Alignment-based Features

Alignment-free Features

Alignment-based + Alignment-free Features

G-Mean

AUC

G-Mean

AUC

G-Mean

AUC

 

Scer

Cgla

Cgla

Klac

Klac

Kwal

Scer

Cgla

Cgla

Klac

Klac

Kwal

Scer

Cgla

Cgla

Klac

Klac

Kwal

Scer

Cgla

Cgla

Klac

Klac

Kwal

Scer

Cgla

Cgla

Klac

Klac

Kwal

Scer

Cgla

Cgla

Klac

Klac

Kwal

Supervised Algorithms

 Spark Random Forest MLlib 1.6 (Impurity: gini)

  Normal

0.3853

0.3119

0.3421

0.5742

0.5486

0.5585

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.6647

0.1009

0.6104

0.7209

0.5051

0.6863

  ROS-100

0.9962

0.9941

0.9966

0.9962

0.9941

0.9966

0.9375

0.9139

0.9186

0.9375

0.9148

0.9189

0.9972

0.9917

0.9950

0.9972

0.9917

0.9950

  ROS-130

0.9977

0.9956

0.9974

0.9977

0.9956

0.9974

0.9313

0.9162

0.9166

0.9315

0.9166

0.9166

0.9958

0.9929

0.9945

0.9958

0.9930

0.9945

  RUS

0.9974

0.9953

0.9977

0.9974

0.9953

0.9977

0.9325

0.8917

0.9152

0.9325

0.8941

0.9153

0.9973

0.9950

0.9973

0.9973

0.9950

0.9973

 Spark Random Forest MLlib 1.6 (Impurity: entropy)

  Normal

0.7457

0.0365

0.3809

0.7780

0.1192

0.5725

0.0000

0.0000

0.0000

0.5000

0.0858

0.5000

0.6001

0.0064

0.3195

0.6801

0.0064

0.5510

  ROS-100

0.9971

0.9948

0.9969

0.9971

0.9948

0.9969

0.9333

0.9169

0.9097

0.9333

0.9180

0.9106

0.9971

0.9947

0.9965

0.9971

0.9947

0.9965

  ROS-130

0.9974

0.9950

0.9967

0.9974

0.9950

0.9967

0.9267

0.9101

0.9087

0.9267

0.9108

0.9088

0.9975

0.9955

0.9945

0.9975

0.9955

0.9945

  RUS

0.9977

0.9949

0.9976

0.9977

0.9949

0.9976

0.9396

0.9081

0.9202

0.9397

0.9097

0.9207

0.9974

0.9948

0.9975

0.9974

0.9948

0.9975

 Spark Decision Trees MLlib 1.6

  Normal

0.3751

0.2983

0.3301

0.5703

0.5445

0.5545

0.3848

0.0252

0.3548

0.5740

0.5003

0.5629

0.6505

0.5017

0.6107

0.7115

0.6259

0.6865

  ROS-100

0.9973

0.9941

0.9960

0.9973

0.9941

0.9960

0.9496

0.9153

0.9258

0.9496

0.9157

0.9262

0.9977

0.9483

0.9954

0.9977

0.9495

0.9954

  ROS-130

0.9957

0.9906

0.9961

0.9957

0.9906

0.9961

0.9464

0.8993

0.9293

0.9465

0.9002

0.9293

0.9972

0.9449

0.9965

0.9972

0.9463

0.9965

  RUS

0.9970

0.9936

0.9975

0.9970

0.9936

0.9975

0.9473

0.9156

0.9317

0.9473

0.9158

0.9317

0.9971

0.9720

0.9966

0.9971

0.9723

0.9966

 Spark Support Vector Machines MLlib 1.6

  Normal (0.0)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  Normal (0.5)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  Normal (1.0)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  ROS-100 (0.0)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.8486

0.8467

0.8482

0.8517

0.8482

0.8496

0.9682

0.9581

0.9677

0.9684

0.9585

0.9679

  ROS-100 (0.5)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  ROS-100 (1.0)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  ROS-130 (0.0)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.7719

0.7786

0.7779

0.7929

0.7950

0.7961

0.9708

0.9612

0.9683

0.9709

0.9615

0.9685

  ROS-130 (0.5)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  ROS-130 (1.0)

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

 Spark Logistic Regression MLlib 1.6

  Normal

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  ROS-100

0.3994

0.3663

0.3943

0.5012

0.4848

0.4981

0.2861

0.2867

0.2725

0.5028

0.5032

0.4989

0.0815

0.0665

0.0677

0.5007

0.4995

0.4996

  ROS-130

0.4056

0.3925

0.4060

0.5006

0.5089

0.5003

0.3008

0.3091

0.2954

0.5027

0.5054

0.5012

0.1416

0.1173

0.1274

0.5018

0.4987

0.4999

 Spark Naive Bayes MLlib 1.6

  Normal

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

0.0000

0.0000

0.0000

0.5000

0.5000

0.5000

  ROS-100

0.4070

0.3943

0.4002

0.4990

0.4981

0.4949

0.4182

0.4371

0.4164

0.5009

0.5113

0.4999

0.1365

0.1498

0.1180

0.4996

0.5016

0.4972

  ROS-130

0.0171

0.4060

0.0172

0.5001

0.5003

0.5001

0.4823

0.4991

0.4825

0.4997

0.5202

0.4985

0.2067

0.2163

0.1953

0.5003

0.5024

0.4979

 MapReduce Random Forest Mahout 0.9

  Normal

0.7178

0.6652

0.6864

0.7576

0.7212

0.7356

            

  ROS-100

0.9903

0.9786

0.9859

0.9903

0.9789

0.9860

            

  ROS-130

0.9905

0.9783

0.9846

0.9905

0.9785

0.9847

            

Unsupervised Algorithms

 RBH

0.8069

0.8052

0.8491

0.8255

0.8242

0.8605

            

 RSD 0.2 1e-20

0.9309

0.9038

0.9654

0.9333

0.9092

0.966

            

 RSD 0.5 1e-10

0.9426

0.9277

0.9818

0.9442

0.9294

0.9819

            

 RSD 0.8 1e-05

0.9472

0.9373

0.9876

0.9486

0.9374

0.9877

            

 OMA

0.7311

0.7264

0.9388

0.7673

0.9163

0.9407

            
  1. Supervised algorithm performance is presented for the alignment-based, alignment-free and alignment-based + alignment-free feature combinations. The best results in each dataset are in bold face and the general best results are underlined. The Random Oversampling pre-processing (ROS) is accompanied by the corresponding resampling size value. RSD parameter values are the divergence and the E-value thresholds. Support Vector Machines are represented with their regulation parameter values