Skip to main content

Table 4 Performance comparison with other ensemble approaches

From: Comprehensive ensemble in QSAR prediction for drug discovery

BioAssay limited ensemble comprehensive ensemble
  method ensemble representation ensemble   
  PubChem ECFP MACCS RF SVM GBM NN NN (+SMILES) average meta-learning
1851_1a2 0.921 0.922 0.910 0.931 0.920 0.907 0.937 0.941 0.934 0.943
1851_2c19 0.875 0.889 0.879 0.893 0.887 0.869 0.902 0.905 0.900 0.908
1851_2c9 0.878 0.885 0.866 0.888 0.882 0.865 0.899 0.905 0.898 0.908
1851_2d6 0.870 0.869 0.853 0.880 0.869 0.852 0.884 0.886 0.884 0.892
1851_3a4 0.890 0.902 0.874 0.898 0.901 0.881 0.913 0.919 0.914 0.920
1915 0.729 0.721 0.750 0.766 0.728 0.739 0.747 0.750 0.755 0.764
2358 0.758 0.781 0.780 0.805 0.780 0.772 0.805 0.803 0.803 0.807
463213 0.669 0.672 0.669 0.689 0.671 0.666 0.682 0.684 0.689 0.694
463215 0.604 0.603 0.639 0.636 0.604 0.623 0.623 0.624 0.627 0.634
488912 0.674 0.682 0.676 0.698 0.668 0.667 0.695 0.698 0.698 0.700
488915 0.720 0.719 0.699 0.731 0.711 0.700 0.732 0.737 0.735 0.739
488917 0.811 0.815 0.785 0.824 0.808 0.782 0.832 0.838 0.834 0.841
488918 0.777 0.783 0.743 0.780 0.782 0.752 0.793 0.799 0.799 0.801
492992 0.820 0.829 0.795 0.854 0.818 0.812 0.836 0.845 0.845 0.862
504607 0.710 0.687 0.682 0.708 0.701 0.703 0.698 0.706 0.721 0.726
624504 0.879 0.875 0.867 0.896 0.880 0.878 0.892 0.900 0.897 0.904
651739 0.795 0.806 0.774 0.800 0.776 0.783 0.803 0.807 0.804 0.809
651744 0.892 0.902 0.868 0.890 0.882 0.879 0.899 0.905 0.901 0.909
652065 0.795 0.791 0.784 0.807 0.804 0.803 0.813 0.822 0.826 0.832
average 0.793 0.796 0.784 0.809 0.793 0.786 0.810 0.814 0.814 0.821
  1. All AUC values except those in the last two columns are based on limited subject ensembles, while the AUC values in the last two columns are from the comprehensive ensemble. The first three columns are method ensembles that consider various methods by fixing them to a target molecular fingerprint. The next five columns are representation ensembles that consider various chemical compound representations by fixing them to a learning method. Except for the final meta-learning approach, combining is based on uniform averaging. Each value is the averaged AUC from five repeated experiments (bold: top 3)
  2. NN(+SMILES) is a representation ensemble that combines a set of models trained on a diversified input representation of fingerprints (PubChem, ECFP, MACCS) and SMILES-based on NN