Identification of genes and pathways involved in kidney renal clear cell carcinoma
© Yang et al.; licensee BioMed Central Ltd. 2014
Published: 16 December 2014
Kidney Renal Clear Cell Carcinoma (KIRC) is one of fatal genitourinary diseases and accounts for most malignant kidney tumours. KIRC has been shown resistance to radiotherapy and chemotherapy. Like many types of cancers, there is no curative treatment for metastatic KIRC. Using advanced sequencing technologies, The Cancer Genome Atlas (TCGA) project of NIH/NCI-NHGRI has produced large-scale sequencing data, which provide unprecedented opportunities to reveal new molecular mechanisms of cancer. We combined differentially expressed genes, pathways and network analyses to gain new insights into the underlying molecular mechanisms of the disease development.
Followed by the experimental design for obtaining significant genes and pathways, comprehensive analysis of 537 KIRC patients' sequencing data provided by TCGA was performed. Differentially expressed genes were obtained from the RNA-Seq data. Pathway and network analyses were performed. We identified 186 differentially expressed genes with significant p-value and large fold changes (P < 0.01, |log(FC)| > 5). The study not only confirmed a number of identified differentially expressed genes in literature reports, but also provided new findings. We performed hierarchical clustering analysis utilizing the whole genome-wide gene expressions and differentially expressed genes that were identified in this study. We revealed distinct groups of differentially expressed genes that can aid to the identification of subtypes of the cancer. The hierarchical clustering analysis based on gene expression profile and differentially expressed genes suggested four subtypes of the cancer. We found enriched distinct Gene Ontology (GO) terms associated with these groups of genes. Based on these findings, we built a support vector machine based supervised-learning classifier to predict unknown samples, and the classifier achieved high accuracy and robust classification results. In addition, we identified a number of pathways (P < 0.04) that were significantly influenced by the disease. We found that some of the identified pathways have been implicated in cancers from literatures, while others have not been reported in the cancer before. The network analysis leads to the identification of significantly disrupted pathways and associated genes involved in the disease development. Furthermore, this study can provide a viable alternative in identifying effective drug targets.
Our study identified a set of differentially expressed genes and pathways in kidney renal clear cell carcinoma, and represents a comprehensive computational approach to analysis large-scale next-generation sequencing data. The pathway and network analyses suggested that information from distinctly expressed genes can be utilized in the identification of aberrant upstream regulators. Identification of distinctly expressed genes and altered pathways are important in effective biomarker identification for early cancer diagnosis and treatment planning. Combining differentially expressed genes with pathway and network analyses using intelligent computational approaches provide an unprecedented opportunity to identify upstream disease causal genes and effective drug targets.
Cancer is not only complex, in that many genetic variations can contribute to malignant transformation, but also wildly heterogeneous, in that genetic mechanisms can vary between patients of same pathological type. Kidney Renal Clear Cell Carcinoma (KIRC) is the eighth most common cancer and is known to be the most lethal of all the genitourinary tumours with an estimation of approximately 65,000 new cases and approximately 13,000 deaths annually in United States . This disease is known resistant to radiotherapy and chemotherapy , and there are very few cases that have been reported to respond immunotherapy . If KIRC can be detected in very early stages, it is potentially curable by surgical resection, while adjuvant therapies have not been proven beneficial. The recurrence rate is not very high, although still considered not uncommon. Nevertheless, there is no curative treatment for late stage KIRC. The 2-year survival rate of patients with metastatic KIRC is less than 20% [4, 5]. Therefore, further investigations of the genomic alterations and underlying molecular mechanisms are essential for early diagnosis and treatment. As cancer is a consequence of the accumulation of genetic alterations and dysregulation of pathways, identification of differentially expressed genes and pathways is important. We aimed to develop integrative approaches to identify differentially expressed genes and pathways in combination with gene network analysis for finding effective early cancer biomarkers and drug targets.
In this study, we designed computational approaches to identify differentially expressed genes from the RNA-seq data provided by the TCGA data portal. We further performed Gene Ontology (GO) analysis and categorized expression patterns. Categorization of differentially expressed genes and expression patterns suggested distinct disease subtypes that are associated with distinct biological processes. Many studies have indicated that same type of cancer can have different subtypes with different genetic mechanisms and treatment responses. Bannon et.al. discovered two distinct KIRC subtypes using gene microarray expression data , whereas four stable subtypes of KIRC were detected using both mRNA and miRNA expression data sets . Despite of discoveries of differentially expressed genes and genetic mutations, the knowledge of biological pathways involved in the disease is limited. To facilitate the effective biomarker identification, we therefore further analysed pathways and gene networks related to the differentially expressed genes.
KEGG (Kyoto Encyclopaedia of Genes and Genomes, http://www.genome.jp/kegg/) pathway analysis revealed that differentially expressed genes are significantly enriched in a number of biological pathways that are known in cancer, as well as previously unreported pathways. This study provided new insights into the regulatory mechanisms of KIRC through comprehensive differential gene expression, pathway and network analyses.
Differentially expressed genes in KIRC
Biological functions associated with distinct differential gene groups.
GO term (P < 0.01, Fisher's Exact Test with Benjamini multiple test correction)
Response to environmental stimulus
Organismal physiological process
Signal transmission across a synapse
Organismal physiological process
Transmembrane transported activity
Biological pathways associated with KIRC
KEGG pathways that are significantly enriched for differential genes.
P-value (hypergeometric test)
Taurine and hypotaurine metabolism
Neuroactive ligand-receptor interaction
Glycosaminoglycan biosynthesis - heparin sulfate
Peroxisome proliferator-activated receptor (PPAR) signalling pathway
Gastric acid secretion
Construction of a classifier for KIRC prediction
Performance of the classifier
Area under ROC
For identification of differentially expressed genes, we used 68 paired tumour and normal kidney tissue samples. In each paired samples, one sample was from KIRC tumour tissue and the paired sample was from the pathologically normal kidney tissue from the same patient. Using paired sequencing data obtained from the same patient can reduce false positives due to genetic difference or other confounding factors that are not necessarily related to the cancer. This is particularly advantageous in avoiding genetically natural polymorphisms among the population. Although only the matched sets of samples were used in the identification of differentially expressed genes, we have obtained highly homogenous clusters using expression levels of differentially expressed genes in clustering all samples including 469 unmatched KIRC and 4 normal kidney tissue samples. Moreover, the SVM-based classifier utilizing the expression levels of differentially expressed genes was shown capable of predicting unknown tissue samples with high accuracy, suggesting that the differentially expressed genes can serve as expression signatures of the disease. Analyses of gene expression profiles and differently expressed genes suggested four different subtypes of the cancer. This was further supported by the pathway and network analyses that have revealed differentially disturbed pathways and networks influenced by the subtypes of the disease. Analyses of cancer related pathways and networks not only confirmed some of discoveries that have been already reported in literature, but also provided new findings. Amongst the significant networks that we identified, NF-κB and UBC are examples of central nodes for molecular transport, hereditary disorder, metabolic disease network and network of renal or urological disease. Interestingly, the expression levels of both NF-κB and UBC were not significantly altered in KIRC, instead they highly interconnect with other genes in the network structure, suggesting their important regulatory roles in the cancer. Though they can be considered as markers of the disease, the differentially expressed gene set shall be further utilized to infer upstream disease causal genes by combining pathway and network analyses with the systems biology approaches. This research was part of investigations of integrative systems biology approaches to identify disrupted pathways in disease development (http://www.world-academy-of-science.org/worldcomp14/ws/keynotes/invited_talk_yang). Integrating the disease-associated networks with differentially expressed genes and pathways can lead to the identification of useful biomarkers and effective drug targets.
Our differentially expressed genes and pathway analyses have utilized large-scale RNA-seq data and have provided new insights into the molecular mechanisms in the cancer. In the study, the expression levels of differentially expressed genes detected in the paired tumour samples were used as input features for the machine learning classifier. We were able to identify a set of genes associated with molecular perturbations in the disease development, and obtain highly homogenous clusters. The intelligent machine built for this study was able to achieve high accuracy in clustering and classifying the tumour samples. Indeed, pathway and network analyses based on significant genes not only confirmed pathways implicated in the disease, but also identified new roles of unreported pathways in the cancer. Combing differentially expressed genes with pathway and network analyses not only provided an unprecedented opportunity to reveal subtypes of the disease, but also better understanding of underlying molecular mechanisms related the cancer development.
Differentially expressed genes in KIRC
The RNA-seq and meta-data of KIRC were downloaded from TCGA Data Portal Bulk Download (https://tcga-data.nci.nih.gov/tcga). The availability of more than 500 cancer patient data has been used advantageously to train the high performance classifier. Tumour purification and data quality were investigated. All samples contained at least 60% tumour nuclei by pathological determination . Since there are common concerns of sample impurity in cancer genomics analysis, the next-generation sequences from this quality of cancerous tissue samples were considered as sufficient. RNA-seq version 2 data were provided by University of North Carolina Genome Centre using the RNA-seq data protocol generated by the Illumina HiSeq. Normal tissue samples were defined as pathologically no cancerous nuclei and micronuclei, and the normal tissue samples contained 4 tissues from healthy (no cancer) human kidneys, and 68 tissues from either paired pathologically normal portions of the disease kidney or paired other side no-cancer healthy kidneys from the KIRC patients. The cancerous tissue samples included 68 matched and 469 unmatched tumour tissue samples. All cancerous tissue samples contained about 2/3 or more tumorous cell nuclei pathologically. The edgeR package [8, 9] was used for differentially expressed gene identification. Support Vector Machine was designed for classifying KIRC samples.
Gene ontology and pathway analyses
We performed Gene Ontology (GO) analysis on differentially expressed genes in KIRC. Fisher exact test with multiple test correction was used to obtain significant GO terms that are associated with differentially expressed genes. We also searched for enrichment of differentially expressed genes in KEGG pathway analysis. Hypergeometric test was applied to attain significant KEGG pathways. GO and pathway analyses were conducted using Bioconductor packages . In addition, Ingenuity Pathway Analysis (http://www.ingenuity.com) was applied to identify differential networks in KIRC.
Machine learning classifier
where TP is the count of true positives, here referring to number of true cancer samples; TN is the count of true negatives, referring to number of non-cancer (normal tissue) samples; FP is count of the false positives, referring to non-cancer tissue samples that were misclassified as cancer; and FN is the count of false negatives, referring to cancer tissue samples that were misclassified as normal (no-cancer) tissue samples. Since there is always a trade-off between specificity and sensitivity, the area under ROC curves were used to evaluate the overall performance of the classifier. A high performance classifier can reach up to 1.0 (100%) for area under ROC while a random classifier has just about or barely above 0.5 (50%) on the area under ROC. The SVM-based classifier achieved average 96.5% sensitivity, 97.0% specificity and 98.7% of area under ROC respectively, thus the classifier can effectively identify cancer samples.
The research was supported by the National Institutes of Health (NIH), Arkansas Science and Technology Authority (ASTA), and National Science Foundation (NSF). Specifically, MQY was supported by NIH/NIGMS 5P20GM10342913 and ASTA Award # 15-B-23. XQ was supported by NIH/NHGRI 5U54HG003273-11. KY and WY were supported by NSF Award # 1359323 and NSF Award # 1429160.
The funding for publication of the article has come from the MidSouth Bioinformatics Centre, and the Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences with NIH/NIGMS 5P20GM10342913 and ASTA award # 15-B-23.
This article has been published as part of BMC Bioinformatics Volume 15 Supplement 17, 2014: Selected articles from the 2014 International Conference on Bioinformatics and Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S17.
- Siegel R, Naishadham D, Jemal A: Cancer statistics, 2013. CA: A Cancer Journal for Clinicians. 2013, 63 (1): 11-30. 10.3322/caac.21166.Google Scholar
- Linehan WM, Walther MM, Zbar B: The genetic basis of cancer of the kidney. The Journal of urology. 2003, 170 (6 Pt 1): 2163-2172.View ArticlePubMedGoogle Scholar
- Kitamura H, Honma I, Torigoe T, Asanuma H, Sato N, Tsukamoto T: Down-regulation of HLA class I antigen is an independent prognostic factor for clear cell renal cell carcinoma. The Journal of urology. 2007, 177 (4): 1269-1272. 10.1016/j.juro.2006.11.082. discussion 1272View ArticlePubMedGoogle Scholar
- Mickisch GH: Principles of nephrectomy for malignant disease. BJU international. 2002, 89 (5): 488-495. 10.1046/j.1464-410X.2002.02654.x.View ArticlePubMedGoogle Scholar
- Janzen NK, Kim HL, Figlin RA, Belldegrun AS: Surveillance after radical or partial nephrectomy for localized renal cell carcinoma and management of recurrent disease. The Urologic clinics of North America. 2003, 30 (4): 843-852. 10.1016/S0094-0143(03)00056-9.View ArticlePubMedGoogle Scholar
- Brannon AR, Reddy A, Seiler M, Arreola A, Moore DT, Pruthi RS, Wallen EM, Nielsen ME, Liu H, Nathanson KL: Molecular Stratification of Clear Cell Renal Cell Carcinoma by Consensus Clustering Reveals Distinct Subtypes and Survival Patterns. Genes & cancer. 2010, 1 (2): 152-163. 10.1177/1947601909359929.View ArticleGoogle Scholar
- Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013, 499 (7456): 43-49. 10.1038/nature12222.Google Scholar
- Robinson MD, Smyth GK: Moderated statistical tests for assessing differences in tag abundance. Bioinformatics (Oxford, England). 2007, 23 (21): 2881-2887. 10.1093/bioinformatics/btm453.View ArticleGoogle Scholar
- Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England). 2010, 26 (1): 139-140. 10.1093/bioinformatics/btp616.View ArticleGoogle Scholar
- Jacques P, Elewaut D: Tumor necrosis factor alpha-induced proteins: natural brakes on inflammation. Arthritis and rheumatism. 2012, 64 (12): 3831-3834. 10.1002/art.34664.View ArticlePubMedGoogle Scholar
- Teng YC, Lee CF, Li YS, Chen YR, Hsiao PW, Chan MY, Lin FM, Huang HD, Chen YT, Jeng YM: Histone demethylase RBP2 promotes lung tumorigenesis and cancer metastasis. Cancer research. 2013, 73 (15): 4711-4721. 10.1158/0008-5472.CAN-12-3165.View ArticlePubMedGoogle Scholar
- Cao J, Liu Z, Cheung WK, Zhao M, Chen SY, Chan SW, Booth CJ, Nguyen DX, Yan Q: Histone demethylase RBP2 is critical for breast cancer progression and metastasis. Cell reports. 2014, 6 (5): 868-877. 10.1016/j.celrep.2014.02.004.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000, 28 (1): 27-30. 10.1093/nar/28.1.27.PubMed CentralView ArticlePubMedGoogle Scholar
- Andrew Gross MC, Shen John, James Michael Randall, Trey Ideker, Matan Hofree, University of California, San Diego, La Jolla CA, UC San Diego Moores: Association of methylation of genes in the taurine/hypotaurine pathway with worse prognosis in renal cell carcinoma. J Clin Oncol. 2013, 31:Google Scholar
- Gordon SC, Moonka D, Brown KA, Rogers C, Huang MA, Bhatt N, Lamerato L: Risk for renal cell carcinoma in chronic hepatitis C infection. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. 2010, 19 (4): 1066-1073. 10.1158/1055-9965.EPI-09-1275.View ArticleGoogle Scholar
- Hofmann JN, Torner A, Chow WH, Ye W, Purdue MP, Duberg AS: Risk of kidney cancer and chronic kidney disease in relation to hepatitis C virus infection: a nationwide register-based cohort study in Sweden. European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation (ECP). 2011, 20 (4): 326-330. 10.1097/CEJ.0b013e32834572fa.View ArticleGoogle Scholar
- Wiwanitkit V: Renal cell carcinoma and hepatitis C virus infection: is there any cause-outcome relationship?. Journal of cancer research and therapeutics. 2011, 7 (2): 226-227. 10.4103/0973-1482.82931.View ArticlePubMedGoogle Scholar
- Lou S, Ren L, Xiao J, Ding Q, Zhang W: Expression profiling based graph-clustering approach to determine renal carcinoma related pathway in response to kidney cancer. European review for medical and pharmacological sciences. 2012, 16 (6): 775-780.PubMedGoogle Scholar
- Oya M, Ohtsubo M, Takayanagi A, Tachibana M, Shimizu N, Murai M: Constitutive activation of nuclear factor-kappaB prevents TRAIL-induced apoptosis in renal cancer cells. Oncogene. 2001, 20 (29): 3888-3896. 10.1038/sj.onc.1204525.View ArticlePubMedGoogle Scholar
- Oya M, Takayanagi A, Horiguchi A, Mizuno R, Ohtsubo M, Marumo K, Shimizu N, Murai M: Increased nuclear factor-kappa B activation is related to the tumor development of renal cell carcinoma. Carcinogenesis. 2003, 24 (3): 377-384. 10.1093/carcin/24.3.377.View ArticlePubMedGoogle Scholar
- Wu X, Zhang W, Font-Burgada J, Palmer T, Hamil AS, Biswas SK, Poidinger M, Borcherding N, Xie Q, Ellies LG: Ubiquitin-conjugating enzyme Ubc13 controls breast cancer metastasis through a TAK1-p38 MAP kinase cascade. Proceedings of the National Academy of Sciences of the United States of America. 2014Google Scholar
- Falcon S, Gentleman R: Using GOstats to test gene lists for GO term association. Bioinformatics (Oxford, England). 2007, 23 (2): 257-258. 10.1093/bioinformatics/btl567.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.