Skip to main content

Table 4 Prediction accuracy in the 20 disease gene expression data sets

From: Random generalized linear model: a highly accurate and interpretable ensemble predictor

Data set

RGLM

RF

RFbigmtry

Rpart

LDA

DLDA

KNN

SVM

SC

adenocarcinoma

0.842

0.842

0.842

0.737

0.842

0.744

0.842

0.842

0.803

brain

0.881

0.810

0.833

0.762

0.810

0.929

0.881

0.786

0.929

breast2

0.623

0.610

0.636

0.584

0.610

0.636

0.584

0.558

0.636

breast3

0.705

0.695

0.716

0.611

0.695

0.705

0.669

0.674

0.700

colon

0.855

0.823

0.823

0.726

0.855

0.839

0.774

0.774

0.871

leukemia

0.921

0.895

0.921

0.816

0.868

0.974

0.974

0.763

0.974

lymphoma

0.968

1.000

1.000

0.903

0.960

0.984

0.984

1.000

0.984

NCI60

0.902

0.869

0.869

0.738

0.885

0.902

0.852

0.869

0.918

prostate

0.931

0.892

0.902

0.853

0.873

0.627

0.804

0.853

0.912

srbct

1.000

0.944

0.984

0.921

0.857

0.905

0.952

0.873

1.000

BrainTumor2

0.760

0.750

0.740

0.620

0.760

0.700

0.700

0.660

0.720

DLBCL

0.909

0.851

0.883

0.831

0.922

0.779

0.870

0.792

0.857

lung1

0.931

0.931

0.931

0.828

0.914

0.931

0.931

0.897

0.914

lung2

0.935

0.935

0.935

0.826

0.957

0.978

0.935

0.848

0.978

lung3

0.901

0.901

0.887

0.803

0.873

0.859

0.831

0.859

0.887

psoriasis1

0.989

0.994

0.989

0.978

0.994

0.989

0.989

0.983

0.989

psoriasis2

0.963

0.988

0.976

0.963

0.976

0.963

0.963

0.963

0.963

MSstage1

0.846

0.846

0.846

0.423

0.769

0.769

0.808

0.769

0.769

MSdiagnosis1

0.963

0.926

0.926

0.556

0.889

0.889

0.963

0.926

0.926

MSdiagnosis2

0.591

0.614

0.614

0.568

0.545

0.568

0.568

0.568

0.523

MeanAccuracy

0.871

0.856

0.863

0.752

0.843

0.833

0.844

0.813

0.863

Rank

1

4

2.5

9

6

7

5

8

2.5

Pvalue

NA

0.029

0.079

0.00014

0.0075

0.05

0.014

0.00042

0.37

  1. For each data set, the prediction accuracy was estimated using 3−f o l d cross validation across 100 random partitions of the data into 3 folds. Mean accuracies across the 20 data sets and the resulting ranks are summarized at the bottom. Two sided paired Wilcoxon test p-values can be used to determine whether the accuracy of RGLM is significantly different from that of other predictors. Note that the RGLM yields the highest mean accuracy.