Kiwi: a tool for integration and visualization of network topology and gene-set analysis
© Väremo et al.; licensee BioMed Central. 2014
Received: 1 September 2014
Accepted: 3 December 2014
Published: 11 December 2014
The analysis of high-throughput data in biology is aided by integrative approaches such as gene-set analysis. Gene-sets can represent well-defined biological entities (e.g. metabolites) that interact in networks (e.g. metabolic networks), to exert their function within the cell. Data interpretation can benefit from incorporating the underlying network, but there are currently no optimal methods that link gene-set analysis and network structures.
Here we present Kiwi, a new tool that processes output data from gene-set analysis and integrates them with a network structure such that the inherent connectivity between gene-sets, i.e. not simply the gene overlap, becomes apparent. In two case studies, we demonstrate that standard gene-set analysis points at metabolites regulated in the interrogated condition. Nevertheless, only the integration of the interactions between these metabolites provides an extra layer of information that highlights how they are tightly connected in the metabolic network.
Kiwi is a tool that enhances interpretability of high-throughput data. It allows the users not only to discover a list of significant entities or processes as in gene-set analysis, but also to visualize whether these entities or processes are isolated or connected by means of their biological interaction. Kiwi is available as a Python package at http://www.sysbio.se/kiwi and an online tool in the BioMet Toolbox at http://www.biomet-toolbox.org.
Gene-set analysis (GSA) is a widely used category of bioinformatics methods and there are many available tools that perform GSA ,. In GSA, genes known to contribute to a certain function, or share a relevant biological feature, are collected into sets. If these gene-sets are enriched by transcriptome or other high-throughput data, GSA directly highlights the most prominent among these sets, and thereby the underlying functions that are implicated by the data . Networks stand at the basis of complex biological systems  and in many cases gene-sets represent elements that are connected, not simply because of gene overlap, but rather to exert a coordinated function through their interactions (the gene-set interaction network). Examples of elements that can be used as gene-sets and where an interaction network can be defined include: transcription factors in a gene regulatory network ; the hierarchical network of Gene Ontology terms ; and metabolite gene-sets in a metabolic network . In particular the last example provides a very useful case since metabolite gene-sets (genes that are associated to reactions in which the metabolite takes part in) are connected through reaction pathways, but will usually not share any common genes (unless they participate in the same reaction). Thus, when several metabolite gene-sets in a pathway are significant their important biological connection will be lost, unless the gene-set interaction network is taken into account.
With this in mind, interpretation and visualization of the results from a GSA currently suffers from several limitations. Typically, the results are presented as a list of the most significant gene-sets, or visualized in a heatmap where gene-sets are clustered according to either the pattern of significance across several conditions or their direction of regulation. In both cases, the biologically relevant connections between gene-sets, defined by their interaction network, are ignored. Multiple connected significant gene-sets will likely represent an important biological process, but with the current visualization approaches these connections are lost and are tedious to elucidate manually.
On the other hand, it is not unusual to see GSA results presented as networks, with nodes representing the most significant gene-sets ,-. However, in these cases edges between nodes simply represent gene overlap. This can help to reduce the bias from redundant gene-sets by clustering gene-sets with overlapping gene content together. Nevertheless, a network visualization approach where the edges represent gene-set interactions is advantageous in the context of biological interpretation. Indeed, different tools can be used to visualize data on gene-set interaction networks -, although some of them are not specifically made for that purpose. Unfortunately, these tools suffer from one or several of the following drawbacks:
The tool is not made specifically to handle GSA data, which requires the user to tweak the input (e.g. common identifiers and color-coding scheme) in the best way possible to fit the framework of that tool.
The tool is only made for a specific type of network (e.g. KEGG pathways or GO-terms), constraining the user to only one single gene-set type.
The tool is not effectively reducing the network to highlight the significant results, but instead simply overlaying the data on the original, and potentially huge, gene-set interaction network.
Here we address the current limitations by developing a new network-based visualization approach and implement it in the software tool Kiwi. Contrary to other available tools, Kiwi explicitly embraces the paradigm that gene-sets can be biological entities that interact and it therefore aims at visualizing GSA results in the context of the gene-set interaction network in such way that the biological connections between all significant gene-sets become apparent. This is done by taking into account both the directionality and significance of the gene-sets and by removing non-interesting gene-sets from the visualized network. Further on, Kiwi is made as general as possible, in the sense that it accepts input from any GSA tool and any gene-set interaction network defined by the user. Finally, since the biological measurements behind the data are made at the gene-level, Kiwi enables the user to go from the visualization network of significant gene-sets back to the gene-level data, in order to detect driver genes behind the regulated biological elements that the gene-sets represent.
The input to Kiwi is at minimum the gene-set interaction network and a table of p-values for the gene-sets, which can be collected from the output of any GSA tool. Apart from this, it is recommended to also supply the gene members of the gene-sets as well as the gene-level statistics (e.g. p-values and fold-changes) that were used as input to the GSA. Full details and required format for the input files can be found in the online Kiwi reference manual.
Kiwi produces two figures: a network and a heatmap. The network presents an uncluttered view where the most important features are highlighted. The node sizes and color-codes are adjusted according to the gene-set significance and general direction of change. The heatmap serves as a complement to the network by displaying the gene-level statistics for each gene-set in the network. The rows (gene-sets) and columns (genes) are hierarchically clustered, which enables the identification of (i) gene-sets with similar gene content and (ii) the significant genes that are driving the observed changes. Both figures can be fine-tuned by the user through several parameters and the network can also be saved in graphML format and imported into Cytoscape for further customization.
To illustrate the advantages of Kiwi, we use two case studies. The first one is based on a differential gene expression dataset from lung adenocarcinoma vs. normal lung tissue . Metabolites from a human genome-scale metabolic model  were used as gene-sets and the GSA was carried out using the Bioconductor R-package piano .
For the second case study we used gene expression data from a study on Kras conditional activation in mouse xenograft tumors . Metabolites from a mouse genome-scale metabolic model, derived from the human genome-scale metabolic model used in case study 1, using gene homology as described in , were used as gene-sets. The GSA was carried out using the Bioconductor R-package piano.
Results and discussion
In order to show the advantages, in terms of biological interpretation, of using Kiwi to visualize GSA results in the context of a gene-set interaction network, we performed two case studies. In both cases we used a genome-scale metabolic model to define a metabolite-metabolite network (connecting metabolites if they are substrates or products of the same reaction). A metabolite gene-set is defined by the group of genes that are associated with reactions in which the metabolite participates in.
Metabolic changes associated with lung adenocarcinoma transformation
To illustrate the benefits of exploiting the gene-set interaction network, compared to only considering the gene overlap, we re-analysed a differential gene expression dataset from lung adenocarcinoma vs. normal lung tissue . Metabolites from the human genome-scale metabolic model HMR2  were used as gene-sets (i.e. genes associated with reactions in which a specific metabolite participates) and the GSA was carried out using the Bioconductor R-package piano , which produces files that can be directly imported by Kiwi. The Kiwi network (Figure 2a) clearly identifies significant gene-sets composing two metabolically connected pathways. For example, 5-phosphoribosylamine and 1-pyrroline-5-carboxylate both participate in pyrimidine biosynthesis, but their relatedness becomes apparent if the underlying metabolic network that measures the mutual distance is considered. These important connections are lost when the results are presented as a traditional heatmap (Figure 2b) or a network based on overlap of gene members of the different gene-sets (Figure 2c). The Kiwi heatmap (Figure 2d) shows the gene-level transcriptional changes for each gene-set enabling the identification of interacting gene-sets without gene overlap, and their driver-genes. For example, 5-phosphoribosylamine is a significant gene-set because of GART and PPAT up-regulation, while 1-pyrroline-5-carboxylate is significant due to LEFTY1 and PYCR up-regulation. The heatmap also simplifies the detection of similar gene-sets, as e.g. nLc6Cer[c] and paragloboside[c].
Metabolic changes associated with activation of oncogenic Kras in mouse tumor xenografts
Using a second case study we sought to test if Kiwi is able to reproduce networks known to be informative in a certain condition. To this end, we re-analyzed gene expression data from a study where the oncoprotein Kras was conditionally activated in mouse xenograft tumors . The authors showed that activation of oncogenic Kras entails extensive metabolic reprogramming, in particular up-regulation of steroid biosynthesis. We therefore performed GSA  in the context of a mouse genome-scale metabolic network (Figure 3a) and tested if Kiwi could capture the relevant network of gene-sets upon Kras activation. In line with the results in the aforementioned study, we observe the emergence of the steroid biosynthetic pathway, which is overexpressed in different steps (Figure 3b). Indeed, despite the fact that isopentenyl-pPP, 14-demethyllanosterol, squalene, and lanosterol are not overlapping gene-sets (as shown by the heatmap in Figure 3c), Kiwi relates the metabolites given their vicinity in the underlying mouse metabolic network. Notably, contrary to the gene-set enrichment analysis used by the authors, Kiwi also identifies which pathway among the different branches of steroid biosynthesis is truly up-regulated by Kras activation, namely lanosterol synthesis.
Kiwi is a new tool tailored for the visualization of GSA results in a gene-set interaction network context. As opposed to available tools, Kiwi starts from the premise that gene-sets can be precise biological entities that achieve a certain function by means of their interactions, such as metabolites in a pathway. This paradigm significantly improves the interpretation of the effect of transcriptional regulation in a certain context, such as metabolism, because it adds an extra layer of information to the GSA results. As exemplified in the two case studies, such addition is fundamental to capture certain transcriptionally regulated processes. In the case of the transformation to lung adenocarcinoma, we observe that the up-regulation of pyrimidine biosynthesis is mediated by the connection provided by choloyl-CoA. In the case of oncogenic Kras activation in mouse tumors, not only do we reproduce the up-regulation of the steroid biosynthetic process, but we also report that this is ascribed mainly to the synthesis of lanosterol. In neither case could such results be highlighted by connecting gene-sets using gene overlap (see Figure 2c) or by overlaying the GSA results on the corresponding gene-set interaction network (see Figure 3a). In favour of a clean layout for enhanced interpretation, Kiwi reduces the gene-set interaction network while maintaining and highlighting the important gene-set connections. It works with the output from any GSA tool and any collection of gene-sets that can be described as a network. For full usability, from raw data to final figure, it integrates seamlessly with the Bioconductor R-package piano (for GSA) and Cytoscape (for advanced layout and customization). Kiwi is available as a Python package at http://www.sysbio.se/kiwi and an online tool in the BioMet Toolbox at http://www.biomet-toolbox.org .
Availability and requirements
Project name: Kiwi
Project home page: www.sysbio.se/kiwi
Operating system(s): Platform independent
Programming language: Python
Other requirements: Kiwi depends on the following python packages: numpy > = 1.8.0; matplotlib > = 1.3.1; networkx > = 1.8.1; mygene > = 2.1.0; pandas > = 0.13.1; scipy > = 0.13.3.
Any restrictions to use by non-academics: None
The authors would like to thank Adil Mardinoglu for reconstructing the mouse genome-scale metabolic model and Subazini Thankaswamy for including Kiwi in the BioMet Toolbox. This work was funded by Knut and Alice Wallenberg foundation, and Chalmers foundation.
- Väremo L, Nielsen J, Nookaew I: Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. Nucleic Acids Res. 2013, 41 (8): 4378-4391. 10.1093/nar/gkt111.View ArticlePubMed CentralPubMedGoogle Scholar
- Hung JH, Yang TH, Hu Z, Weng Z, Delisi C: Gene set enrichment analysis: performance evaluation and usage guidelines. Briefings Bioinform. 2012, 13 (3): 281-291. 10.1093/bib/bbr049.View ArticleGoogle Scholar
- Barabasi A-L, Oltvai ZN: Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004, 5 (2): 101-113. 10.1038/nrg1272.View ArticlePubMedGoogle Scholar
- Oliveira AP, Patil KR, Nielsen J: Architecture of transcriptional regulatory circuits is knitted over the topology of bio-molecular interaction networks. BMC Syst Biol. 2008, 2: 17-10.1186/1752-0509-2-17.View ArticlePubMed CentralPubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25 (1): 25-10.1038/75556.View ArticlePubMed CentralPubMedGoogle Scholar
- Patil KR, Nielsen J: Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc Natl Acad Sci U S A. 2005, 102 (8): 2685-2689. 10.1073/pnas.0406811102.View ArticlePubMed CentralPubMedGoogle Scholar
- Chen E, Tan C, Kou Y, Duan Q, Wang Z, Meirelles G, Clark N, Ma’ayan A: Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinf. 2013, 14 (1): 128-10.1186/1471-2105-14-128.View ArticleGoogle Scholar
- Merico D, Isserlin R, Stueker O, Emili A, Bader GD: Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PLoS One. 2010, 5 (11): e13984-10.1371/journal.pone.0013984.View ArticlePubMed CentralPubMedGoogle Scholar
- Wang X, Terfve C, Rose JC, Markowetz F: HTSanalyzeR: an R/Bioconductor package for integrated network analysis of high-throughput screens. Bioinformatics. 2011, 27 (6): 879-880. 10.1093/bioinformatics/btr028.View ArticlePubMed CentralPubMedGoogle Scholar
- Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z: GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinf. 2009, 10 (1): 48-10.1186/1471-2105-10-48.View ArticleGoogle Scholar
- Yamada T, Letunic I, Okuda S, Kanehisa M, Bork P: iPath2.0: interactive pathway explorer. Nucleic Acids Res. 2011, 39 (suppl 2): W412-W415. 10.1093/nar/gkr313.View ArticlePubMed CentralPubMedGoogle Scholar
- Luo W, Brouwer C: Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics. 2013, 29 (14): 1830-1831. 10.1093/bioinformatics/btt285.View ArticlePubMed CentralPubMedGoogle Scholar
- Al-Shahrour F, Minguez P, Tárraga J, Montaner D, Alloza E, Vaquerizas JM, Conde L, Blaschke C, Vera J, Dopazo J: BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res. 2006, 34 (suppl 2): W472-W476. 10.1093/nar/gkl172.View ArticlePubMed CentralPubMedGoogle Scholar
- Bates JT, Chivian D, Arkin AP: GLAMM: Genome-Linked Application for Metabolic Maps. Nucleic Acids Res. 2011, 39 (suppl 2): W400-W405. 10.1093/nar/gkr433.View ArticlePubMed CentralPubMedGoogle Scholar
- Gatto F, Nookaew I, Nielsen J: Chromosome 3p loss of heterozygosity is associated with a unique metabolic network in clear cell renal carcinoma. Proc Natl Acad Sci U S A. 2014, 111 (9): E866-E875. 10.1073/pnas.1319196111.View ArticlePubMed CentralPubMedGoogle Scholar
- Mardinoglu A, Agren R, Kampf C, Asplund A, Uhlen M, Nielsen J: Genome-scale metabolic modelling of hepatocytes reveals serine deficiency in patients with non-alcoholic fatty liver disease. Nat Commun. 2014, 5: 3083-10.1038/ncomms4083.View ArticlePubMedGoogle Scholar
- Ying H, Kimmelman Alec C, Lyssiotis Costas A, Hua S, Chu Gerald C, Fletcher-Sananikone E, Locasale Jason W, Son J, Zhang H, Coloff Jonathan L, Yan H, Wang W, Chen S, Viale A, Zheng H, J-h P, Lim C, Guimaraes Alexander R, Martin Eric S, Chang J, Hezel Aram F, Perry Samuel R, Hu J, Gan B, Xiao Y, Asara John M, Weissleder R, Wang YA, Chin L, Cantley Lewis C, et al: Oncogenic Kras Maintains Pancreatic Tumors through Regulation of Anabolic Glucose Metabolism. Cell. 2012, 149 (3): 656-670. 10.1016/j.cell.2012.01.058.View ArticlePubMed CentralPubMedGoogle Scholar
- Sigurdsson M, Jamshidi N, Steingrimsson E, Thiele I, Palsson B: A detailed genome-wide reconstruction of mouse metabolism based on human Recon 1. BMC Syst Biol. 2010, 4 (1): 140-10.1186/1752-0509-4-140.View ArticlePubMed CentralPubMedGoogle Scholar
- Garcia-Albornoz M, Thankaswamy-Kosalai S, Nilsson A, Väremo L, Nookaew I, Nielsen J: BioMet Toolbox 2.0: genome-wide analysis of metabolism and omics data. Nucleic Acids Res. 2014, 42 (Web Server issue): W175-W181. 10.1093/nar/gku371.View ArticlePubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.