Skip to main content

Table 5 Overlap between top genes and gene sets for different classifiers

From: Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context

Classifier # MSigDB set p-value matches set size
CC 1 GNF2_MKI67 < l.00 × l0-40 31 47
  2 GNF2_TTK < l.00 × l0-40 29 57
  3 GNF2_CCNA2 < l.00 × 10-40 48 99
  4 GNF2_HMMR < 1.00 × 10-40 42 78
  5 GNF2_SMC2L1 < 1.00 × 10-40 26 51
  6 GNF2_CDC20 < 1.00 × 10-40 46 91
  7 GNF2_ESPL1 < 1.00 × 10-40 27 58
  8 GNF2_H2AFX < 1.00 × 10-40 24 54
  9 GNF2_RRM2 < 1.00 × 10-40 32 68
  10 chrlqll 2.32 × 10-6 2 4
SVM 1 chr7q12 6.23 × 104 1 1
  2 chr3qll 1.00 0 8
  3 chrxq 1.00 0 2
  4 BYSTRYKH_RUNX1_TARGETS_GLO-CUS 8.06 × 10-3 1 13
  5 TESTIS_EXPRESSED _GENES 7.28 × 10-7 4 107
  6 chr22q 1.00 0 6
  7 REGULATION_OF_G_PROTEIN_COU-PLED_RECEPTOR_PROTEIN_SIGNAL-ING_PATHWAY 4.28 × 10-4 2 48
  8 chr11p14 1.00 0 20
  9 TERCPATHWAY 1.00 0 15
  10 chrlq41 2.02 × 10-4 2 33
LR 1 chrSqll 1.00 0 8
  2 chr22q 1.00 0 6
  3 TERCPATHWAY 1.00 0 15
  4 chrxq 1.00 0 2
  5 BYSTRYKH_RUNX1_TARGETS_GLO-CUS 8.06 × 10-3 1 13
  6 HSA00130_UBIQUINONE_BIOSYNTHE-SIS 1.00 0 8
  7 chr20p 1.00 0 2
  8 chrlq41 1.29 × 10-6 3 33
  9 chr3q12 1.00 0 23
  10 BETA_TUBULIN_BINDING 1.00 0 12
  1. Top 10 sets using the set centroid statistic using different classifiers, and the p-value for the number of top genes belonging to each of them (Fisher's exact test, one sided). CC is centroid classifier, LR is logistic regression.