Skip to main content

Table 5 Overlap between top genes and gene sets for different classifiers

From: Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context

Classifier

#

MSigDB set

p-value

matches

set size

CC

1

GNF2_MKI67

< l.00 × l0-40

31

47

 

2

GNF2_TTK

< l.00 × l0-40

29

57

 

3

GNF2_CCNA2

< l.00 × 10-40

48

99

 

4

GNF2_HMMR

< 1.00 × 10-40

42

78

 

5

GNF2_SMC2L1

< 1.00 × 10-40

26

51

 

6

GNF2_CDC20

< 1.00 × 10-40

46

91

 

7

GNF2_ESPL1

< 1.00 × 10-40

27

58

 

8

GNF2_H2AFX

< 1.00 × 10-40

24

54

 

9

GNF2_RRM2

< 1.00 × 10-40

32

68

 

10

chrlqll

2.32 × 10-6

2

4

SVM

1

chr7q12

6.23 × 104

1

1

 

2

chr3qll

1.00

0

8

 

3

chrxq

1.00

0

2

 

4

BYSTRYKH_RUNX1_TARGETS_GLO-CUS

8.06 × 10-3

1

13

 

5

TESTIS_EXPRESSED _GENES

7.28 × 10-7

4

107

 

6

chr22q

1.00

0

6

 

7

REGULATION_OF_G_PROTEIN_COU-PLED_RECEPTOR_PROTEIN_SIGNAL-ING_PATHWAY

4.28 × 10-4

2

48

 

8

chr11p14

1.00

0

20

 

9

TERCPATHWAY

1.00

0

15

 

10

chrlq41

2.02 × 10-4

2

33

LR

1

chrSqll

1.00

0

8

 

2

chr22q

1.00

0

6

 

3

TERCPATHWAY

1.00

0

15

 

4

chrxq

1.00

0

2

 

5

BYSTRYKH_RUNX1_TARGETS_GLO-CUS

8.06 × 10-3

1

13

 

6

HSA00130_UBIQUINONE_BIOSYNTHE-SIS

1.00

0

8

 

7

chr20p

1.00

0

2

 

8

chrlq41

1.29 × 10-6

3

33

 

9

chr3q12

1.00

0

23

 

10

BETA_TUBULIN_BINDING

1.00

0

12

  1. Top 10 sets using the set centroid statistic using different classifiers, and the p-value for the number of top genes belonging to each of them (Fisher's exact test, one sided). CC is centroid classifier, LR is logistic regression.