Skip to main content

Table 4 Performance comparison with other ensemble approaches

From: Comprehensive ensemble in QSAR prediction for drug discovery

BioAssay

limited ensemble

comprehensive ensemble

 

method ensemble

representation ensemble

  
 

PubChem

ECFP

MACCS

RF

SVM

GBM

NN

NN (+SMILES) ∗

average

meta-learning

1851_1a2

0.921

0.922

0.910

0.931

0.920

0.907

0.937

0.941

0.934

0.943

1851_2c19

0.875

0.889

0.879

0.893

0.887

0.869

0.902

0.905

0.900

0.908

1851_2c9

0.878

0.885

0.866

0.888

0.882

0.865

0.899

0.905

0.898

0.908

1851_2d6

0.870

0.869

0.853

0.880

0.869

0.852

0.884

0.886

0.884

0.892

1851_3a4

0.890

0.902

0.874

0.898

0.901

0.881

0.913

0.919

0.914

0.920

1915

0.729

0.721

0.750

0.766

0.728

0.739

0.747

0.750

0.755

0.764

2358

0.758

0.781

0.780

0.805

0.780

0.772

0.805

0.803

0.803

0.807

463213

0.669

0.672

0.669

0.689

0.671

0.666

0.682

0.684

0.689

0.694

463215

0.604

0.603

0.639

0.636

0.604

0.623

0.623

0.624

0.627

0.634

488912

0.674

0.682

0.676

0.698

0.668

0.667

0.695

0.698

0.698

0.700

488915

0.720

0.719

0.699

0.731

0.711

0.700

0.732

0.737

0.735

0.739

488917

0.811

0.815

0.785

0.824

0.808

0.782

0.832

0.838

0.834

0.841

488918

0.777

0.783

0.743

0.780

0.782

0.752

0.793

0.799

0.799

0.801

492992

0.820

0.829

0.795

0.854

0.818

0.812

0.836

0.845

0.845

0.862

504607

0.710

0.687

0.682

0.708

0.701

0.703

0.698

0.706

0.721

0.726

624504

0.879

0.875

0.867

0.896

0.880

0.878

0.892

0.900

0.897

0.904

651739

0.795

0.806

0.774

0.800

0.776

0.783

0.803

0.807

0.804

0.809

651744

0.892

0.902

0.868

0.890

0.882

0.879

0.899

0.905

0.901

0.909

652065

0.795

0.791

0.784

0.807

0.804

0.803

0.813

0.822

0.826

0.832

average

0.793

0.796

0.784

0.809

0.793

0.786

0.810

0.814

0.814

0.821

  1. All AUC values except those in the last two columns are based on limited subject ensembles, while the AUC values in the last two columns are from the comprehensive ensemble. The first three columns are method ensembles that consider various methods by fixing them to a target molecular fingerprint. The next five columns are representation ensembles that consider various chemical compound representations by fixing them to a learning method. Except for the final meta-learning approach, combining is based on uniform averaging. Each value is the averaged AUC from five repeated experiments (bold: top 3)
  2. ∗NN(+SMILES) is a representation ensemble that combines a set of models trained on a diversified input representation of fingerprints (PubChem, ECFP, MACCS) and SMILES-based on NN