To demonstrate the utility and performance of gcMECM, we analyzed the missense, start_lost, stop_gained, and stop_lost mutations from TCGA breast invasive carcinoma (TCGA-BRCA) and lung adenocarcinoma (TCGA-LUAD) to identify modules with mutually exclusive mutation patterns [17, 18]. Using TCGA-BRCA data, we identified 9451 genes mutated in 985 samples. Of these, 3784 genes have negative correlations (Fisher’s exact test p value < 0.001). A total of 6 modules were detected with the minimal module size of 155 genes and the maximal module size of 1106 genes. Similarly, using TCGA-LUAD data, we found 12,683 genes mutated in 565 samples, with 4440 genes having negative correlation (Fisher’s exact test p value < 0.001). A total of 7 modules were identified with the minimal module size of 85 genes and the maximal module size of 1114 genes.
We next mapped modules from TCGA-LUAD and TCGA-BRCA onto the Ras pathway to identify subnetworks, which reduced the complexity and could be used for the detection of biologically relevant patterns. This pathway is critical in carcinogenesis and includes genes involved in oncogenic signaling, cell cycle, DNA replication, and DNA repair. Those genes are frequently altered in different cancers, including AKT1, EGFR, KRAS, and STK11 in lung cancer and AKT1, BRCA2, ERBB2, and PIK3CA in Breast cancer [19].
Two subnetworks in TCGA-LUAD, KRAS-SHC3 and BRCA2-FANCA, have been shown to have mutually exclusive mutations and more than half of those genes are present in COSMIC cancer census genes [20] (Fig. 2A). Most genes have a low mutation rate; only KRAS and PDGFRA have a mutation frequency greater than 5%. The KRAS-SHC3 subnetwork captures the upstream signaling component in the RAS pathway, which involves the ERBB signaling pathway, VEGF-PDGFR signaling pathway, and MAPK signaling pathway, as demonstrated using the Gene Ontology and KEGG analyses with g:Profiler [21]. The BRCA2-FANCA subnetwork is related to meiotic cell cycle process, cell signal transduction by p53 class mediator, DNA replication, and homologous recombination. These two subnetworks with distinct biological functions suggest that the mutual exclusivity in genes with related functionality could be used to identify cancer-relevant genes, especially when the subnetworks also include well-established cancer genes.
Three subnetworks, ERBB2-FGFR2, BRAF-SCRIB, and BRCA2-FANCA, are identified in TCGA-BRCA after mapping to RAS pathway (Fig. 2B). The ERBB2-FGFR2 subnetwork is related to ERBB signaling pathway and BRCA2-FANCA subnetwork is linked to DNA repair and meiotic cell cycle process, which are similar to KRAS-SHC3 and BRCA2-FANCA subnetworks in TCGA-LUAD respectively. The BRAF-SCRIB subnetwork is found to regulate MAP kinase activity and ErbB signaling pathways that are linked to many cancers such as melanoma, lung, ovarian, breast, and prostate [22]. The mutation rate of genes in these three subnetworks is all less than 5%. Each subnetwork consists of genes with similar functions, which can be used to group samples into mutated and non-mutated categories for survival analysis. As seen in ERBB2-FGFR2 subnetwork, the survival difference is not statistically significant between two groups at the individual gene level. However, if samples are divided into two groups based on the mutation status of all genes in ERBB2-FGFR2 subnetwork, the mutation group exhibits a significantly lower survival (FDR < 0.05). These results demonstrate that integrating subnetworks of mutually exclusive mutations with pathways and clinical features can aid in interpreting the subnetwork’s resulting biological functions.