Yang Et Al, Supporting Text 1

1 1.1) Simulation study to find the required sample size and the optimized the parameter α 2 Theoretical Simulation – We performed two simulation studies to evaluate gene-set analysis methods in 3 the context of statistical power and type-I error, similar to a prior study [1]. The purpose of the first study 4 was to find a minimal sample size to guarantee statistical power for gene-set signature identification, 5 whereas the second was to decide the optimal parameter α for the FAIME algorithm and reveal the effect 6 of gene-set size when detecting gene-set signatures. We used four different simulation scenarios for each 7 study, considering strong and weak signal-to-noise ratios (1 and 0.5 respectively) and altered fractions of 8 differentially expressed (DE) genes (50% and 80% respectively). 9 In the first simulation study, we created mimic datasets with increased sample sizes and two gene


1.1) Simulation study to find the required sample size and the optimized the parameter α
Theoretical Simulation -We performed two simulation studies to evaluate gene-set analysis methods in the context of statistical power and type-I error, similar to a prior study [1].The purpose of the first study was to find a minimal sample size to guarantee statistical power for gene-set signature identification, whereas the second was to decide the optimal parameter α for the FAIME algorithm and reveal the effect of gene-set size when detecting gene-set signatures.We used four different simulation scenarios for each study, considering strong and weak signal-to-noise ratios (1 and 0.5 respectively) and altered fractions of differentially expressed (DE) genes (50% and 80% respectively).
In the first simulation study, we created mimic datasets with increased sample sizes and two genesets (GSs) each of a fixed GS size (30 out of 5000 genes), where one GS was differentially expressed and the other was not.Larger sample size noticeably increased statistical power but not type-I error.
Using 20 samples per group, we could identify GS signatures in strong signal-to-noise-ratio scenarios (Fig. S1-C3, D3, 60-95% statistical power), using either FAIME.α,GSEA, or GSVA.Additionally, all methods provided effective control of the type-I error rate when GS having more than 30 gene members.
GSEA and GSVA slightly outperformed FAIME.1 in scenarios with weak signal-to-noise-ratio (Fig. S1-A,     B), while all three algorithms performed equally in the scenarios with strong signal-to-noise-ratio (Fig. S1-C, D).We used the Bioconductor packages PGSEA and GSVA to apply the GSEA and GSVA algorithms to all gene-sets with five or more genes from the MSigDB.
In the second simulation study, we repeated all of the aforementioned steps but tested on simulated GSs of different gene sizes (x=10, 20, 80, and 100).For small GSs (10-20 genes), FAIME showed statistical advantages with consistently low type-I error (Fig. S1-a1, b1, b2).For other GSs (>=30 genes), the larger a GS size the lower a statistical power derived from the same sample size in all simulation scenarios, suggests that 40-60 samples per group are required to test larger GS (>100 genes).
Alternatively, applying a larger parameter α to FAIME will benefit the analysis of small GS via weighting more sharply towards the leading expression ranks (Equation 1).In both simulation studies, FAIME with α =1 by default worked more accurately than α =5 or 10, whereas the latter effectively controlled the type-I error for small GSs.

1.2) Summary of published LSC-associated gene signatures
Table S1 summarizes nine multiple-gene studies pertaining to LSCs, three of which the original authors checked prognosis in primary AML samples.These gene signatures can be divided into the three categories below.For the comparison between AML LSC+ and normal HSC+, we excluded those "HSC" samples from Table S1 with detectable expression of only CD34+ (neither the mature blood lineages nor their committed progenitors markers) because normal CD34+ mononuclear cells contain hematogones (B lymphocyte precursors) and CD34+ megakaryocytes.
Stemness -We obtained an union set of three published multi-gene signatures, which we refer to as "LSCvLPC" (Fig. S2).Three studies respectively identified a gene-list that significantly distinguishes LSC+ populations (CD34 + CD38 -) from more mature fraction of the leukemia clone, the leukemia progenitor cell (LPC) populations (CD34 + CD38 + ), purified from the same AML samples [2][3][4].Ishikawa et al demonstrated that such LSC+ exclusively recapitulates AML and retains self-renewal capacity in vivo [2].Gentles et al showed that expression of their gene-signature in bulk primary AML tumor samples was associated with clinical outcomes in four independent patient cohorts (n = 1047) [4].Unfortunately, few overlaps exist among these three signatures.Only the cell surface marker CD38 was expressed lower in LSC than in LPC (Fig. S2A1).Additionally, a potential non-stem-cell signature was involved, as both studies ignored the fact that the CD34 + CD38 + subpopulation may also resides LSCs [5][6][7][8].However, these three studies are more biologically sensitive by comparing pairwise cell sub-populations purified from the same patients.Therefore by joining these three published gene signatures, we could generally characterize leukemia stemness.
An improved study by Eppert et al verified four divergent LSC+ fractions using xenograft models [5].
They identified multiple LSC+ fractions in AML samples using a sensitive xenograft assay and then identified a LSC-specific signature more highly expressed in LSC+ than in LSC-.They showed this signature to be a significant and independent predictor of patient survival (n=445).They also identified a signature specific to normal HSC+ fractions but not normal mature fractions and showed its prognosis in primary AML [5].+ compartment [11].Based on the summed expression level of three out of the 50 genes, they suggested that a high transcript level of CD34+ cells was associated with significant unfavorable overall survivals in two independent cohorts (n=381) of normal karyotype AML.However, the statistical significance is mild as it can be achieved after trichotomizing bulk samples rather than dichotomizing.

2.1) Data process
Gene expression data.We downloaded the normalized gene expression values and transformed the values to a logarithmic scale (log2) when required.One RNA-seq dataset of normal HSC samples was provided by the authors [12].The measurements were scaled in Reads Per Kilobase of exon model per Million mapped reads (RPKM) format [13], and genes were annotated by Ensemble IDs.

2.2) Hypergeometric probability analysis
To estimate the probability of observing n=6 overlapping genes among three instances of random sampling (Table S2, three DNM GSs using FAIME.5 profiles), we did the following modeling: out of a space of N=22000 genes, we randomly picked a group of A=11 genes, recorded them and put them back, then repeated for a group of B=6 genes and C=19 genes.We wished to compute the probability that out of the 36=A+B+C recorded genes, exactly n=6 genes appear twice and exactly A+B+C-2n=24 genes appear once.
Note that for a gene to appear exactly twice, it must appear in two of the three groups and not in the third.In particular, any number 0 ≤ n-k ≤ n of the repeated genes may have one of its groups be C, which means the remaining k genes must be in both A and B. We can arbitrarily pick A, and given those 11 distinct genes, there are a total of ways to pick B and C. We can pick the k genes from A that will be repeated in B in ways, and we can pick the remaining B-k genes in B in There are a total of A+B-k genes in the union of A and B, A+B-2k=17-2k of which only appeared once.Since we didn't observe any gene appearing in all three of the groups but we needed to have n-k more genes repeated for a total of n repeated genes, we chose the n-k genes from the non-repeated pool of 17-2k genes in A and B to also appear in C.This could be done in ways, and we could Summing over all k's, we have our desired probability that out of the (A+B+C)=36 recorded genes, exactly (A+B+C-2n)=24 genes appear once and exactly n=6 genes appear twice is:

2.3) Gene Ontology sematic similarity evaluation
The Gene Ontology (GO) sematic similarity between pair-wise gene members of the identified 25-or 30gene signature was estimated using Lin's method [15].Given two genes x and y, Lin's method assigns a sematic similarity score, ) , GOSim to run the calculation (similarity="funSimMax", similarityTerm="relevance", normalization=TRUE) [16], respecting the GO biological process and GO molecular function respectively.To estimate the empirical p-value of an observed similarity score, we ran the same calculation for 1000 pairs of randomly selected genes.

2.4) Enrichment analysis (EA) on the published gene lists
For LSC stemness (LSC+ compared to LSC-), we focused on the gene-sets that were identified by two out of the three gene lists: the joint gene-set LSCvLPC, AML stemness, and normal stemness in Eppert study (Fig. S2A2, yellow circles).For LSC malignancy (AML LSC+ compared to normal HSC+), we focused on the gene-sets identified by both the jointed LSCvNHSC list (Fig. S2A2, dash-lined pink circles) and LSC highly expressed gene-sets in De Jonge's study (Fig. S2A2, blue circles).To interrogate the LSC stemness, we reported the gene-sets that were significantly enriched in two out of the three genelists (Fig. S2A2, yellow circles).Specifically for enriched gene-set in Eppert study (GES30377), we merged our EA identification with three additional author-reported LSC-associated gene-sets (BENPORATH PRC2 TARGETS, PARK HSC VS MULTIPOTENT PROGENITORS UP, and IVANOVA HEMATOPOIESIS EARLY PROGENITOR).We did so because these three gene-sets are LSC-specific (LSC p<0.05 but HSC p>0.05, see the Table S14 in the original publication) and can be mapped into the newest MsigDB v4.1 version.To interrogate the malignancy, we reported the gene-sets that were significantly enriched in both two gene-lists (Fig. S2A2, blue and pink circles).Note that there are no overlaps between stemness and malignancy for the LSC+ population, suggesting their distinct properties.

Limitation
Our study has some limitations.First, FAIME is designed to identify up-regulated or down-regulated genesets.However, genes in the same pathway are not always differentially expressed in the same direction.
Some disease condition associated pathways may contain both up-and down-expressed genes caused by feedback loops, such as the p53 pathway [18].Whereas GSVA and GSAA [19] can identify this type of concerted gene-set expression using a non-parametric KS-test.Second, based on an arbitrary cutoff of significance at the gene-set level, identifications that have borderline differential activities or modest effect size may be lost when applying inter-group comparison for both FAIME and GSVA methods.On the other hand, some false positive identifications met the criteria of significance, suggesting that we should apply FAIME/GSEA on the data with symmetric differential expression background at gene level.Third, the 'inter-dataset' normalization is a straightforward use of Z-scores.This type of standardization has been successfully applied to integrate gene-set scores of differentially expressed genes and of trait-associated genetic markers [19].Even so, a standard deviation parameter for the normalization of all gene-sets, including over-and under-represented gene-sets, may reduce the over-representation of some gene-sets while increasing the under-representation of others.In such cases, distinct standard deviation parameters for the over-and under-represented gene-sets are suggested for future discussion.Finally, gene-sets only reflect an approximation of biological functions or pathways and are pre-defined.Only a subset of genes within a set may contribute to a gene-set expression signature.Different gene-sets may have similar signatures pertaining to the same phenotype, owing to either an overlap between the gene-sets or co-regulation of non-overlapping gene-sets.Grouping correlated gene-sets and extending the interaction of their gene members, perhaps by modeling on additional information, is a potentially promising approach to define new functional gene-sets.
MSigDB employs only the official gene symbols and Entrez IDs, leading us to use the Bioconductor package biomaRt (version 2.16.0) to map all probes on a microarray to Entrez gene IDs as well as all Ensemble IDs of RNAseq to official gene symbols.To get the best coverage for custom microarray platforms, we also incorporated the corresponding custom annotation files downloaded from GEO.Functional gene-sets -We downloaded three categories of previously defined gene-sets from Molecular Signature Database (MSigDB, version 4.0)[14]: canonical representations of biological processes compiled by domain experts (from BIOCARTA, KEGG, and REACTOME) (N=1320), gene-sets representing expression signatures of genetic and chemical perturbations (CGP, N=3402), and transcription factor or microRNA targets based on conserved cis-regulatory motifs from a comparative analysis of the human, mouse, rat, and dog genomes (N=836).The average sizes of gene-members in these three gene-set categories vary from 29 to 233.CGP gene-sets have on average 45 gene-members per set (range of 5 to 1972).