A probabilistic method for leveraging functional annotations to enhance estimation of the temporal order of pathway mutations during carcinogenesis

Background Cancer arises through accumulation of somatically acquired genetic mutations. An important question is to delineate the temporal order of somatic mutations during carcinogenesis, which contributes to better understanding of cancer biology and facilitates identification of new therapeutic targets. Although a number of statistical and computational methods have been proposed to estimate the temporal order of mutations, they do not account for the differences in the functional impacts of mutations and thus are likely to be obscured by the presence of passenger mutations that do not contribute to cancer progression. In addition, many methods infer the order of mutations at the gene level, which have limited power due to the low mutation rate in most genes. Results In this paper, we develop a Probabilistic Approach for estimating the Temporal Order of Pathway mutations by leveraging functional Annotations of mutations (PATOPA). PATOPA infers the order of mutations at the pathway level, wherein it uses a probabilistic method to characterize the likelihood of mutational events from different pathways occurring in a certain order. The functional impact of each mutation is incorporated to weigh more on a mutation that is more integral to tumor development. A maximum likelihood method is used to estimate parameters and infer the probability of one pathway being mutated prior to another. Simulation studies and analysis of whole exome sequencing data from The Cancer Genome Atlas (TCGA) demonstrate that PATOPA is able to accurately estimate the temporal order of pathway mutations and provides new biological insights on carcinogenesis of colorectal and lung cancers. Conclusions PATOPA provides a useful tool to estimate temporal order of mutations at the pathway level while leveraging functional annotations of mutations.

FIG. S2: The MAPK signaling pathway. Based on the KEGG MAPK signaling pathway (hsa04010), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4].
FIG. S3: The PI3K signaling pathway. Based on the KEGG PI3K signaling pathway (hsa04151), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4].

FIG. S4:
The TGF-beta signaling pathway. Based on the KEGG TGF-beta signaling pathway (hsa04350), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4].
FIG. S5: The p53 signaling pathway. Based on the KEGG p53 signaling pathway (hsa04115), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4].
FIG. S6: The apoptosis signaling pathway. Based on the KEGG apoptosis signaling pathway (hsa04210), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4].

FIG. S7:
The cell cycle signaling pathway. Based on the KEGG Cell cycle signaling pathway (hsa04110), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4].
FIG. S8: The adherens junction signaling pathway. Based on the KEGG adherens junction signaling pathway (hsa04520), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4].

FIG. S9:
The VEGF signaling pathway. Based on the KEGG VEGF signaling pathway (hsa04370), we further selected "core" pathway genes as described in the pathway definition subsection of the main text. Our analysis only used the "core" pathway genes, which are hightlighted in green. The figure was generated based on Pathview [4]. We focus on Wnt, MAPK, PI3K, TGF-beta, and p53 signaling pathways as those are the pathways presented in the literature [3]. The figure compares temporal orders of these pathways a) reported in the literature for colorectal combined tumor; b) inferred by PATOPA using TCGA rectal cancer data; and c) inferred by PATOPA using TCGA colon cancer data. The PATOPA inferred temporal orders of WNT -MAPK -PI3K -p53 signaling pathways for rectal cancer and WNT -MAPK -PI3K -TGFbeta signaling pathways for colon cancer were the same as the known sequences of biological events in colorectal cancer. The only differences between PATOPA analysis and the literature are that TGFbeta pathway were placed before the MAPK pathway from PATOPA analysis of rectal cancer, and the p53 signaling pathway was placed before the PI3K and TGFbeta signaling pathways from PATOPA analysis of colon cancer. The PolyPhen-2 scores were categorized into probably damaging (supposed with high confidence to affect protein function or structure), possibly damaging (supposed to affect protein function or structure), or benign (most likely lacking any phenotypic effect) [1]. The categorized data were downloaded from TCGA. For each pathway, we calculated frequencies of its mutations falling into these categories. Left panel: rectal cancer; right panel: colon cancer.
FIG. S13: Order of pathway mutations for colorectal cancer inferred by H-CBN. For methods comparison, we applied H-CBN [2] to TCGA rectal (a) and colon (b) cancer mutation data. We used H-CBN method with default parameter settings to get estimated progression networks, which indicate the dependency orders of pathway mutations. We performed 100 bootstraps for each cancer type and the fraction of the times that an order was validated in bootstrap was shown beside the corresponding arrow.