Analysis of AML genes in dysregulated molecular networks
© Lee et al; licensee BioMed Central Ltd. 2009
Published: 17 September 2009
Identifying disease causing genes and understanding their molecular mechanisms are essential to developing effective therapeutics. Thus, several computational methods have been proposed to prioritize candidate disease genes by integrating different data types, including sequence information, biomedical literature, and pathway information. Recently, molecular interaction networks have been incorporated to predict disease genes, but most of those methods do not utilize invaluable disease-specific information available in mRNA expression profiles of patient samples.
Through the integration of protein-protein interaction networks and gene expression profiles of acute myeloid leukemia (AML) patients, we identified subnetworks of interacting proteins dysregulated in AML and characterized known mutation genes causally implicated to AML embedded in the subnetworks. The analysis shows that the set of extracted subnetworks is a reservoir rich in AML genes reflecting key leukemogenic processes such as myeloid differentiation.
We showed that the integrative approach both utilizing gene expression profiles and molecular networks could identify AML causing genes most of which were not detectable with gene expression analysis alone due to the minor changes in mRNA level.
Mining disease-causing genes and elucidating their pathogenic molecular mechanisms are of great importance for developing effective diagnostics and therapeutics [1–5]. Along with many genetic and genomic studies aimed at identification of disease genes (e.g. linkage analysis, cytogenetic studies, microarray experiments, proteomic studies), several computational methods have been proposed to prioritize candidate genes based on various information including sequence similarity, literature annotation, and molecular pathways [6–11]. Given a set of genes known to be involved in disease, these methods typically score similarities between candidate genes and known disease genes in terms of various genomic features.
Recently, accumulated knowledge about molecular interaction networks in human cells such as protein-protein, and protein-DNA interactions has been utilized to predict disease genes [6–8, 10, 12–14]. The previous studies have incorporated topological characteristics of known disease genes such as degrees in networks , the overlap between interaction partners of candidate genes and those of known disease genes , the probability of candidate genes to participate in the same protein complexes with known disease-causing genes , or the distribution of distances from candidate genes to known disease genes .
Despite their successful performance in general, for some specific diseases of our interest, such as acute myeloid leukemia (AML), the performance is not satisfactory (AUC = 0.55 by Radivojac et al. ). We hypothesized that integrating molecular networks with mRNA expression profiles from patients might help delineate disease-specifically dysregulated molecular subnetworks containing disease-causing mutation genes. Chuang et al. supported this hypothesis showing the identified subnetworks included significantly enriched known breast cancer mutation genes . Mani et al. proposed another method predicting oncogenes in B-cell lymphomas integrating both molecular interactions and mRNA expressions .
Here, we identified molecular subnetworks dysregulated in AML patients which were associated with key leukemogenic processes such as myeloid differentiation. We also evaluated the enrichment of known AML-causing mutation genes within the subnetworks, and found that the subnetworks contain significant fraction of known AML genes (mostly non-differentially expressed) embedded among the interconnections of differentially expressed genes. In addition, several characteristics of AML genes in the subnetworks were reported in this study, which can be utilized to build prediction models for unknown AML genes.
Results and discussion
Identification of subnetworks perturbed in AML
AML subnetworks associated with key leukemogenic processes
AML subnetworks enriched for known AML causing genes
Characteristics of AML genes in the subnetworks
AML mutation genes in subnetworks
Number of Subnetworks
Finally, we investigated the differential expression of AML genes in mRNA levels (Figure 4b). There was no significant difference between each group of genes, and all known AML genes and those found in subnetworks except FLT3, and JAK3 did not show mRNA level aberrations. This result shows that gene expression alone does not provide enough information to predict unknown AML-causing mutation genes. However, our integrative approach could capture non-differentially expressed AML genes in subnetworks if they were entangled with differentially expressed neighbour proteins yielding subnetworks with high perturbation scores.
We have demonstrated that integration of condition-independent molecular networks extracted from various types of cells and experiments under different conditions, and disease-specific mRNA expression profiles of AML patients enables the dissection of pathogenic modules of interacting proteins reflecting key leukemogenic processes. In addition, the dissected modules are enriched for AML-causing mutation genes most of which are not detectable with gene expression analysis alone due to minor changes in their mRNA levels. Identification of subnetworks perturbed in AML patients can provide novel molecular hypotheses underlying AML etiology, and investigated characteristics of known AML genes appearing in the subnetworks can be exploited to predict unknown AML-causing genes.
Protein-protein interaction networks
We downloaded the PPI network from the PhenoPred website by Radivojac et al. . It consists of 41456 physical interactions among 9142 proteins assembled from Human Protein Reference Database (HPRD) , the Online Predicted Human Interaction Database (OPHID) , and studies by Rual et al. and Stelzl et al. [22, 23].
mRNA expression profiles of AML patients
Gene expression profiles of 65 peripheral-blood samples and 54 bone marrow specimens from 116 adult patients with AML were downloaded from Gene Expression Omnibus (GSE425) whose expression values are log ratios (base 2) of mean intensities of patient samples vs. common reference mRNA . Gene identifiers of three cDNA microarray platforms (GPL317,318,319) were mapped to gene symbols using accompanied gene annotation files from GEO yielding 6987 gene symbols with expression levels in at least one of three platforms.
Mutation genes in AML patients
We compiled two sets of AML-associated genes: 14 genes downloaded from PhenoPred web site originally collected from OMIM , Swiss-Prot , and HPRD  by Radivojac et al. (Disease Ontology ID: 9119) , and 62 genes whose somatic and germline mutations are causally implicated in AML patients downloaded from Sanger Cancer Gene Census , and also appearing in our PPI network.
Significance evaluation of subnetworks
To evaluate the significance of the identified subnetworks, we performed the same search procedure over 1000 random trials in which the expression vectors of individual genes are randomly permuted in the network. The p value of each real subnetwork was calculated as the fraction of random subnetworks having higher PS scores than the designated real subnetwork among all random subnetworks. We considered subnetworks with the p-value P < 0.05 significant in this work.
This work was supported by the Samsung Biomedical Research Institute Grant (C-A7-101-3). DL was additionally supported by the National Research Laboratory Program (R0A-2005-000-10094-0) and the Korean Systems Biology Program (M10309020000-03B5002-00000) from the Ministry of Education, Science and Technology. PR was supported by NSF award DBI-0644017.
This article has been published as part of BMC Bioinformatics Volume 10 Supplement 9, 2009: Proceedings of the 2009 AMIA Summit on Translational Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S9.
- Dalkilic MM, Costello JC, Clark WT, Radivojac P: From protein-disease associations to disease informatics. Front Biosci 2008, 13: 3391–3407. 10.2741/2934View ArticlePubMedGoogle Scholar
- Ideker T, Sharan R: Protein networks in disease. Genome Res 2008, 18(4):644–652. 10.1101/gr.071852.107PubMed CentralView ArticlePubMedGoogle Scholar
- Kann MG: Protein interactions and disease: computational approaches to uncover the etiology of diseases. Brief Bioinform 2007, 8(5):333–346. 10.1093/bib/bbm031View ArticlePubMedGoogle Scholar
- Lussier YA, Liu Y: Computational approaches to phenotyping: high-throughput phenomics. Proc Am Thorac Soc 2007, 4(1):18–25. 10.1513/pats.200607-142JGPubMed CentralView ArticlePubMedGoogle Scholar
- Oti M, Brunner HG: The modular nature of genetic diseases. Clin Genet 2007, 71(1):1–11. 10.1111/j.1399-0004.2006.00708.xView ArticlePubMedGoogle Scholar
- Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, Tranchevent LC, De Moor B, Marynen P, Hassan B, et al.: Gene prioritization through genomic data fusion. Nat Biotechnol 2006, 24(5):537–544. 10.1038/nbt1203View ArticlePubMedGoogle Scholar
- Bergholdt R, Storling ZM, Lage K, Karlberg EO, Olason PI, Aalund M, Nerup J, Brunak S, Workman CT, Pociot F: Integrative analysis for finding genes and networks involved in diabetes and other complex diseases. Genome Biol 2007, 8(11):R253. 10.1186/gb-2007-8-11-r253PubMed CentralView ArticlePubMedGoogle Scholar
- George RA, Liu JY, Feng LL, Bryson-Richardson RJ, Fatkin D, Wouters MA: Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res 2006, 34(19):e130. 10.1093/nar/gkl707PubMed CentralView ArticlePubMedGoogle Scholar
- Tiffin N, Adie E, Turner F, Brunner HG, van Driel MA, Oti M, Lopez-Bigas N, Ouzounis C, Perez-Iratxeta C, Andrade-Navarro MA, et al.: Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res 2006, 34(10):3067–3081. 10.1093/nar/gkl381PubMed CentralView ArticlePubMedGoogle Scholar
- Lage K, Karlberg EO, Storling ZM, Olason PI, Pedersen AG, Rigina O, Hinsby AM, Tumer Z, Pociot F, Tommerup N, et al.: A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol 2007, 25(3):309–316. 10.1038/nbt1295View ArticlePubMedGoogle Scholar
- Butte AJ, Kohane IS: Creation and implications of a phenome-genome network. Nat Biotechnol 2006, 24(1):55–62. 10.1038/nbt1150PubMed CentralView ArticlePubMedGoogle Scholar
- Oti M, Snel B, Huynen MA, Brunner HG: Predicting disease genes using protein-protein interactions. J Med Genet 2006, 43(8):691–698. 10.1136/jmg.2006.041376PubMed CentralView ArticlePubMedGoogle Scholar
- Radivojac P, Peng K, Clark WT, Peters BJ, Mohan A, Boyle SM, Mooney SD: An integrated approach to inferring gene-disease associations in humans. Proteins 2008, 72(3):1030–1037. 10.1002/prot.21989PubMed CentralView ArticlePubMedGoogle Scholar
- Xu J, Li Y: Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics 2006, 22(22):2800–2805. 10.1093/bioinformatics/btl467View ArticlePubMedGoogle Scholar
- Chuang HY, Lee E, Liu YT, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol 2007, 3: 140. 10.1038/msb4100180PubMed CentralView ArticlePubMedGoogle Scholar
- Mani KM, Lefebvre C, Wang K, Lim WK, Basso K, Dalla-Favera R, Califano A: A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas. Mol Syst Biol 2008, 4: 169. 10.1038/msb.2008.2PubMed CentralView ArticlePubMedGoogle Scholar
- Walters DK, Mercher T, Gu TL, O'Hare T, Tyner JW, Loriaux M, Goss VL, Lee KA, Eide CA, Wong MJ, et al.: Activating alleles of JAK3 in acute megakaryoblastic leukemia. Cancer Cell 2006, 10(1):65–75. 10.1016/j.ccr.2006.06.002View ArticlePubMedGoogle Scholar
- Tomasson MH, Xiang Z, Walgren R, Zhao Y, Kasai Y, Miner T, Ries RE, Lubman O, Fremont DH, McLellan MD, et al.: Somatic mutations and germline sequence variants in the expressed tyrosine kinase genes of patients with de novo acute myeloid leukemia. Blood 2008, 111(9):4797–4808. 10.1182/blood-2007-09-113027PubMed CentralView ArticlePubMedGoogle Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, et al.: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003, 13(10):2363–2371. 10.1101/gr.1680803PubMed CentralView ArticlePubMedGoogle Scholar
- Brown KR, Jurisica I: Online predicted human interaction database. Bioinformatics 2005, 21(9):2076–2082. 10.1093/bioinformatics/bti273View ArticlePubMedGoogle Scholar
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al.: Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437(7062):1173–1178. 10.1038/nature04209View ArticlePubMedGoogle Scholar
- Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 2005, 122(6):957–968. 10.1016/j.cell.2005.08.029View ArticlePubMedGoogle Scholar
- Bullinger L, Dohner K, Bair E, Frohling S, Schlenk RF, Tibshirani R, Dohner H, Pollack JR: Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med 2004, 350(16):1605–1616. 10.1056/NEJMoa031046View ArticlePubMedGoogle Scholar
- Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA: Online Mendelian Inheritance in Man (OMIM). Hum Mutat 2000, 15(1):57–61. 10.1002/(SICI)1098-1004(200001)15:1<57::AID-HUMU12>3.0.CO;2-GView ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, et al.: The Universal Protein Resource (UniProt). Nucleic Acids Res 2005, (33 Database):D154–159.Google Scholar
- Sanger Cancer Gene Census[http://www.sanger.ac.uk/genetics/CGP/Census/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.