Skip to main content
Figure 1 | BMC Bioinformatics

Figure 1

From: Modelling p-value distributions to improve theme-driven survival analysis of cancer transcriptome datasets

Figure 1

Association between geneset size and significance for breast cancer OS. For each type of geneset (i.e. Biocarta, GO, and KEGG), sets of random genes of sizes 20, 75, 150, 300, and 500 (the latter omitted for Biocarta because the pool of genes was too small) consisting of 10,000 genesets for each size were generated. For each geneset, hierarchical clustering was performed to segregate samples into two groups, a subsequent log-rank test was performed to assess a difference in prognosis between both groups, and the p-value1 was recorded. The negative base 10 logarithms of the p-values1 are plotted against geneset size. Biocarta-like genesets appear to be more significant around length = 150 (A); GO-like genesets do not show a clear correlation of significance to geneset size (B); KEGG-like genesets seem to be more significant as size becomes smaller (C).

Back to article page