The KUPNetViz: a biological network viewer for multiple -omics datasets in kidney diseases
© Moulos et al.; licensee BioMed Central Ltd. 2013
Received: 12 July 2012
Accepted: 21 July 2013
Published: 24 July 2013
Skip to main content
© Moulos et al.; licensee BioMed Central Ltd. 2013
Received: 12 July 2012
Accepted: 21 July 2013
Published: 24 July 2013
Constant technological advances have allowed scientists in biology to migrate from conventional single-omics to multi-omics experimental approaches, challenging bioinformatics to bridge this multi-tiered information. Ongoing research in renal biology is no exception. The results of large-scale and/or high throughput experiments, presenting a wealth of information on kidney disease are scattered across the web. To tackle this problem, we recently presented the KUPKB, a multi-omics data repository for renal diseases.
In this article, we describe KUPNetViz, a biological graph exploration tool allowing the exploration of KUPKB data through the visualization of biomolecule interactions. KUPNetViz enables the integration of multi-layered experimental data over different species, renal locations and renal diseases to protein-protein interaction networks and allows association with biological functions, biochemical pathways and other functional elements such as miRNAs. KUPNetViz focuses on the simplicity of its usage and the clarity of resulting networks by reducing and/or automating advanced functionalities present in other biological network visualization packages. In addition, it allows the extrapolation of biomolecule interactions across different species, leading to the formulations of new plausible hypotheses, adequate experiment design and to the suggestion of novel biological mechanisms. We demonstrate the value of KUPNetViz by two usage examples: the integration of calreticulin as a key player in a larger interaction network in renal graft rejection and the novel observation of the strong association of interleukin-6 with polycystic kidney disease.
The KUPNetViz is an interactive and flexible biological network visualization and exploration tool. It provides renal biologists with biological network snapshots of the complex integrated data of the KUPKB allowing the formulation of new hypotheses in a user friendly manner.
During the past decade major advances in biological research, mainly in the field of high throughput analysis (e.g. –omics), has led to an exponential increase in available experimental data, produced through a variety of techniques including DNA, miRNA  and antibody arrays , next generation sequencing technologies  and mass spectrometry . This switch in life sciences towards multi-omics approaches has created a gap in the provision of bioinformatics tools capable of combining this data. More importantly, there appears to be a shortage in efficient tools that would aid the bench biologist to i) simplify and categorize the results of a multi-omics approach which usually come into the form of long lists ii) visualize different information layers which can be mined from multi-omics approaches and aggregate useful pieces into biochemical pathways and/or biological function groups iii) combine steps (i) and (ii) in a repeatable and reusable fashion so as to extract meaningful outcomes regarding the biological system under investigation and iv) combine all the aforementioned steps to formulate plausible hypotheses, possibly applied to similar systems (e.g. systems functioning in the same tissue/organ/similar disease situation) and eventually design new experiments for hypothesis validation.
The renal biology field is facing similar problems. Although kidney diseases have been extensively studied in different species (i.e. human, mouse, rat), this wealth of information remains hidden across several layers of public data and/or literature repositories. Although considerable effort has been devoted in aggregating data from several resources [5–7], the resulting databases fail to meet the multi-omics attribute and remain dispersed across the web. To address this problem, we developed the Kidney and Urinary Pathway Knowledge Base (KUPKB) , a publicly available repository which organizes an important amount of existing knowledge regarding renal tissue, cell and disease categorization, using Semantic Web technologies . The KUPKB can be queried through the user-friendly iKUP browser, , accessible at http://www.kupkb.org.
The iKUP comprises a powerful tool in terms of speed, selectivity and descriptive power. Nevertheless, its nature restricts the user to viewing the query results in tabular format, only skimming the surface of the rich interconnected data otherwise available in the KUPKB. In addition, although it succeeds in displaying findings hidden in scattered repositories, such as the expression of a set of genes under a very specific combination of kidney tissue, cellular type and disease, it remains unable to map this information to interaction networks available as background information and in a multiple species manner.
The value of biological network representations has been extensively analyzed in bioinformatics and systems biology literature [10, 11]. Some important aspects include the ability to capture fixed snapshots of cellular states  otherwise hidden in tables, to infer functional associations , to reduce complexity by combining protein interaction, gene expression and metabolic profiles in a single image , and perform pattern recognition in a network snapshot .
In this article, we describe KUPNetViz, an interactive biological network querying and visualization application. Its main purpose is to assist renal scientists to extend their research by providing an alternative KUPKB data image depicting interactions among the queried molecules and their neighbors, coupled with functional and biochemical pathway annotation in both a species dependent and independent manner. The main tasks supported and promoted by KUPNetViz are: i) the display, exploration and manipulation of general protein-protein interaction as well as functional and biochemical pathway association networks for a number of widely studied mammalian species, ii) the transformation of these general networks to kidney specific networks through their association with related gene/protein/miRNA expression datasets, iii) the mining of possibly hidden relationships among co-regulated and/or directly interacting (“neighboring”) entities under several combinations of kidney anatomies and/or disease models and iv) the extrapolation of both expression data and network interactions across different diseases, anatomies but also different species facilitating the quick formulation and screening of biological hypotheses and the design of new experiments.
The molecule and interaction visual representations in KUPNetViz have been inspired by previous work in the field  with several additions and significant modifications to fit for use. To demonstrate its value and necessity in kidney research studies and its ability to complete the aforementioned tasks, we present two usage examples that exemplify that the KUPNetViz clearly extends the functionalities of the iKUP. Specifically, we found substantial additional evidence for a role of calreticulin in renal disease in humans  by showing that calreticulin is involved in a larger interaction network in renal graft rejection. In addition, we propose a novel association of the inflammatory axis interleukin-6 (IL6)/IL6 receptor with the progression of polycystic kidney disease.
The KUPNetViz is a web application developed using PHP for server side programming and jQuery ( http://www.jquery.org) coupled with static HTML for the client side. The backend of the application is a MySQL database hosting the background knowledge used to build the mappings among entities, the network interactions, annotation data and properly parsed experimental data that make up the most important part of the KUPKB. The backend database can be very easily updated and maintained as it can be rebuilt in an automated fashion with a single one-line command, through a series of Perl scripts that download, parse and import KUPKB data and the background knowledge resources into the schema, using a wrapper script and a simple YAML ( http://www.yaml.org) configuration file. The resulting interaction networks from user queries along with experimental data mappings (gene, protein, miRNA expression and statistical significance) are rendered, visualized and controlled using the Cytoscape Web graph visualization library .
The background knowledge integrated in KUPNetViz from KUPKB has been extensively described elsewhere [8, 18]. Briefly, gene, protein and miRNA annotations are derived from NCBI gene, UniProt  and Ensembl , and Microcosm  respectively. For the mappings among the various biological entities and accession numbers, we used the mapping files provided by NCBI (ftp.ncbi.nlm.nih.gov) as well as the Biomart web services . Files provided by NCBI were also used for the mapping of genes to their respective GO terms. Molecule interactions are extracted from the STRING protein-protein interaction database  by parsing publicly available flat files and miRNA to gene interactions are extracted from the Microcosm miRNA target files ( http://www.ebi.ac.uk/enright-srv/microcosm/htdocs/targets/v5/). Biochemical pathway information was derived from KEGG  using its freely available web service API to download and construct mappings between genes and their respective pathways.
The experimental data integrated in KUPNetViz and used to map gene/protein/miRNA abundance to the biochemical networks constructed by querying the application has been extensively described elsewhere . Briefly, the knowledge base currently contains over 220 experiments spanning several biological layers (gene, protein, miRNA and metabolite abundances) derived from published articles and public repositories (e.g. GEO). In some cases data are reported as extracted from the respective publication (coupled with statistical significance where available) while in other cases data had to be reanalyzed (mostly microarray data) before integration. All experimental data are derived from studies related to several kidney diseases and anatomies and are manually curated.
The user can query KUPNetViz using a variety of molecule identification types, including HUGO gene symbols (e.g. “CALR”, but the search is case-insensitive), Entrez accession numbers (e.g. 811), UniProt accessions (e.g. P27797), Ensembl gene and protein IDs (e.g. ENSG00000179218 or ENSP00000320866), miRBase accessions (e.g. hsa-miR-362-3p) or by simple free text (e.g. “calreticulin”). All the above accessions and search terms are equivalent to calreticulin and can be used in mixed identification types. Four different mammalian species are currently supported, namely human, mouse, rat and dog. A strong feature of KUPNetViz, which is rare among other application of the same kind, is the ability to query multiple species for protein-protein network reconstruction at the same time.
The extrapolation procedure followed by KUPNetViz is quite simplistic and is performed by assuming that a protein-protein interaction which has been recorded experimentally or predicted in e.g. humans but not in mice has a probability of being also present in mice (or any other of the supported by the application mammalian species). Thus, instead of aligning networks from different organisms (in similar taxonomies) the tool creates a meta-network with “super-nodes” and “super-edges” where protein-protein interactions are shared among species. Although this assumption involves certain risk regarding its biological validity, it allows the quick creation of hypotheses regarding those protein-protein interactions and the quick screening from the experimentalist. The extrapolation assumption to super-nodes and super-edges is also based on a certain “bias” in the number of biological studies regarding each organism, as there is a normal and understandable trend towards the study of human biomaterial (cell lines, healthy or diseased tissues) as compared to other organisms. For example, a simple search of the term “homo sapiens” in Pubmed yields a number of studies which is one order of magnitude greater (12547599) than the respective number of studies for “mus musculus” (1159658) at the time of search. This is reflected also in the number of protein-protein interactions: using a calreticulin-centered first level network (the first neighbors of calreticulin) for human, 365 protein-protein interactions are found (interaction score threshold: 0.2) while the same number for mouse is 8 (same threshold).
The running time of the KUPNetViz depends on two factors: i) the performance of the network layout algorithm and the visualization itself (e.g. supported network size) which depends on the Cytoscape Web library and ii) the querying performance of the application regarding the mapping of renal –omics datasets to the network. The latter depends on the number of queried molecules. Typical queries with a few molecule names complete in a few seconds. The only operations that are quite time consuming both in terms of querying and visualization time are the search for second level neighbors of selected nodes and the case of many queried molecules when working in multi-species mode. Finally, the speed of the network layout depends on the user’s machine as the network rendering takes place in the user’s browser.
Network export is available through the respective functions of the Cytoscape Web library. The user has the ability to export the resulting network after querying and processing, in a variety of text formats (SIF, GraphML, XGMML and Arena3D format ) as well as high-quality image formats (PDF, PNG and SVG). The text formats allow the import of the resulting networks into other graph analysis applications (such as Cytoscape) for further explorations and network property analysis. The supported text formats are sufficient for compliance with most current biological network visualization tools.
To demonstrate the added value of the KUPNetViz we present two case studies, revealing insights that would require significant effort to mine using the iKUP only. Specifically, we explored the role of calreticulin, a protein involved in renal disease in animals, in a larger interaction network in renal graft rejection (interstitial fibrosis and tubular atrophy) and its association with other functional and pathway elements and the association of the inflammatory axis interleukin-6 (IL6)/IL6 receptor with the progression of polycystic kidney disease.
Calreticulin is a protein involved in renal disease in animals . Using the iKUP browser to explore calreticulin entries in the KUPKB, we demonstrated for the first time that calreticulin expression was induced in human renal graft rejection, an in silico hypothesis that was then experimentally confirmed . To better understand the role of calreticulin in renal graft rejection, we sought to investigate the following questions: 1) was calreticulin acting as an independent player or as a part of a bigger protein network and 2) what were the processes associated with calreticulin dysregulation. These questions could not be answered using the iKUP browser alone, as they required knowledge about protein-protein interactions and protein annotation with biological processes.
Overall, we observed that calreticulin was at the centre of a network where many genes and proteins were also modified in this pathology, and that most of these biomolecules were part of the basal membrane and involved with cell to extracellular matrix interactions (ECM). Specifically, the ECM-receptor interaction pathway is mostly upregulated, as observed from the orange to red colour scale of the nodes connected to calreticulin (Figure 2).
ECM and basement membrane are known to be significantly altered in renal graft rejection  and the analysis performed using our tool links calreticulin to these pathological processes. Furthermore, basement membrane proteins identified as interacting with calreticulin such as laminins and fibronectin (LAMA2, LAMA4, LAMA5, LAMB1, LAMB2, LAMC1 and FN1 in Figure 2) have been identified to be upregulated in renal graft rejection [28, 29]. Altogether, the use of KUPNetViz provides additional, pathway-based, evidence for calreticulin as a valuable target in renal graft rejection. Although it would not have been impossible to find these links otherwise, the use of the tool presented in this article significantly accelerated this discovery. For example, a search using Pubmed did not return any results when querying for “laminin fibronectin calreticulin”. Furthermore, querying classic network visualization tools such as the tool provided in the STRING database  website lack background knowledge on kidney diseases.
IL6 is a pro-inflammatory cytokine that has been previously described as a pro-fibrotic mediator in liver , in lung and in skin  models of fibrosis. However, current evidence for a possible role of IL6 in the development of renal fibrosis is limited. The only available evidence of this link in the kidney was published recently in a study demonstrating that mice with genetic blockade of IL6 were protected against the development of renal fibrosis . In addition, the authors also observed increased IL6 expression levels in kidney biopsies of chronic kidney disease patients compared to age-matched control biopsies. IL6 is a peculiar cytokine as, although it is expressed by a large variety of cells, it can only target a low number of those, due to the very limited expression of its receptor (IL6R) .
The wealth of biological information regarding kidney disease that can be extracted from the current version of the KUPKB is currently presented in tabular format. Given the possibly long returning lists of molecules, the nature of the knowledge base and the fact that it contains hierarchical and protein interaction data, we sought to visualize these interactions through a gene network visualization module. Prior to its development we tried to map experimental data to some of the numerous existing software packages that have been developed for biological network visualization and exploration and at the same time suited at least partially our needs (mapping of possible multi-omics data, integrated background databases, customizable data sources). These tools included EGAN,  and Biological Networks, . In the case of EGAN, the feedback included the excessive amount of displayed information, the view complexity derived from the multiple data sources, the inability to simultaneously display expression data from multiple experiments and the moderate performance of the application regarding big networks. Another major issue with EGAN is the lack of directionality in gene-to-gene edges. On the other hand, in the case of Biological Networks, the users were overwhelmed mostly by the amount of functionalities, the complex interface and the multiple background knowledge data sources, features that might fascinate a bioinformatician but often frustrates a bench biologist seeking clear-cut specific answers instead of heavy and complex multi-functionalities. In addition, Biological Networks requires a local installation which is often a disadvantage, mostly in terms of maintenance.
The above tools comprise only a small example out of a nowadays large pool of tools allowing the combined view of protein-protein interaction networks and abundance data . Leaving aside the domain-specific nature of KUPNetViz, most of the packages presented in the first table of  meet the criteria of combined views of interaction and abundance data (e.g. VANTED, , GENeVis, ) and are equipped with high-level functionalities (e.g. network clustering algorithms,  dimensionality reduction methods, , visualization of extremely large networks, , and network module detection, ). However, although the above tools excel in network-related functionalities and visualization quality, none of them comes with a complete and yet simple set of bundled background databases (protein-protein interaction, element annotation, functional and pathway associations) in one place, a fact which is essential for the researcher. Other solutions including Pajek, , and yEd ( http://www.yworks.com/en/products_yed_about.html) comprise also excellent network analysis tools but are very generic and not focused to biological problems which may also confuse the biologist.
Another drawback with many current applications is that they are desktop-based while there is a current trend in software development to switch from desktop to web-based applications for a variety of reasons including flexibility, maintenance and platform independence. Thus, although there are certain web-based tools, based on JavaTM webstart technology (VANTED, ) or related technologies (VisANT, ), they still have limitations similar to the ones described for certain desktop applications, and mainly the lack of a small but adequate set of bundled background databases and a relatively simplistic interface which does not require prior training. The KUPNetViz application, although simplistic in its basic concepts, aspires to initiate a set of domain-specific integrative visualization tools that focus in comprehensive interfaces. These interfaces could be used to answer biological questions from the first usages, without demanding from the user to go through long manuals, time-consuming database and literature searches to annotate the various entities in the network and understand basic mathematical concepts of graph theory.
Even though the developers of several biological network visualization and data integration packages have put substantial effort in the simplicity and clarity of the outcome, some of them can only be fully exploited by trained bioinformaticians while others require prior training in the form of workshops or seminars in order to be used at their full potential. The development of the KUPNetViz application was a continuous interaction between computer scientists and renal biologists, where several features were added, removed or re-implemented in the basis of the arguments that the application should remain simple to use, self explanatory and it should provide simple biological network snapshots. As a result, the tool manages to keep the network display and the functionalities as simple as possible by hiding and/or automating several technical aspects and mathematical properties of the biological graph. At the same time, it does not restrain more advanced users from exploring other possible mathematical properties of the resulting networks as it allows their export in text formats that can be imported to numerous existing graph analysis packages for further analysis. Additionally, taking into account a recent trend in the development of bioinformatics tools , which focuses to the application of user-centered design methods, the KUPNetViz comprises a good example of such a strategy, given the time that was required for the case studies to be completed and the feedback from biologists that used the application.
To our knowledge, one of the novelties of the KUPNetViz, rare among other biological network visualization tools, is the ability to visualize gene-to-gene relationships in a multispecies manner, while at the same time maintaining the simple minimalistic network display. The majority of current packages focuses either on the alignment of known biochemical networks among different species (e.g. Osprey,  or VANLO, ), allowing the user to visualize but not automatically extrapolate possible interactions that may apply across species, or on the manual building of such inter-species interaction networks based on multiple available data sources (e.g. Biological Networks, ). In addition, the multiple network alignment or the manual network building is usually performed with a certain cost in application and visualization simplicity, often discouraging the simple or the hurried user.
The KUPNetViz goes one step further by allowing the possibility of extrapolation of biomolecular relationships from one species to another while at the same time maintaining its straightforward visual interface and graph views (Additional file 1). This creates the unique possibility to combine –omics data from different animal models and/or human disease and quickly screen for reasonable hypotheses. As the KUPNetViz supported organisms are not equally extensively annotated in the literature (e.g. mouse is better annotated for developmental functions than human ), the multi-species functionality allows the extrapolation of gene-to-gene relationships and gene/protein expression data from one organism to another. While some relationships between molecules may differ between species, most relationships and biological pathways can be considered evolutionary conserved and thus allow the easy formulation of extrapolated hypotheses, regarding both biomolecular relationships and gene/protein expression under certain pathological conditions across species. The researcher can then judge the fundamental validity of these hypotheses and proceed either by rejecting a hypothesis and formulate another, or by creating a list of plausible hypothesis which can ultimately be verified in the lab.
The reasoning behind the construction of a MySQL backend database to support the application instead of using the KUPKB semantic web repository includes three main arguments: firstly, during the development we sought to maintain software modularity and re-usability even though in its current version, the visualization application is adjusted to kidney. Thus, we considered that an RDB model (especially regarding the background knowledge part) can be adjusted easier to other similar experimental data sources (e.g. a different organ) or data sources of different structure (e.g. multi-omics data from different types of cancer), requiring minor modifications only in the user interface part. Secondly, not all the background knowledge data sources that are used by the KUPNetViz were initially incorporated in the semantic web model (e.g. the protein-protein interactions or the reference pathways). This does not suggest a KUPKB model design problem but rather highlights the different initial purpose of the model which was not network visualization, but the detailed knowledge representation of kidney anatomies and disease models and the experimental data mapping and annotation to this knowledge representation model. Finally, during the design process, we realized that not all of the knowledge stored in the semantic web repository (e.g. the highly detailed localization and hierarchization of cell types and kidney tissues) was required for the minimalistic approach promoted by KUPNetViz. Thus, we used an RDB model to reduce query complexities and running speeds. In addition to the latter, the backend database building time, including the download and parsing of the background knowledge files and the re-parsing of the experimental data requires less than three hours while the rebuild of the semantic web repository requires over a day (the times were measured on the same computer). This allows the easier maintenance of the KUPNetViz and the more frequent synchronization of the backend database with the latest releases of the incorporated background knowledge sources.
KUPNetViz is a user-friendly biological network visualization tool dedicated to renal research. This tool uses data gathered from multiple resources on several renal pathological states and background knowledge elements regarding biomolecular interactions as well as functional and biochemical pathway associations and creates biological network snapshots that can shed light on mechanisms involved in renal disease. It provides renal biologists with an alternative network data representation, complementary to the functionality of the iKUP exploration tool with a small cost regarding the display complexity. Moreover, maybe its most important feature is the biomolecular interaction extrapolation across species which allows the researcher to quickly formulate and screen several hypotheses in a simple manner. We have demonstrated the value of KUPNetViz in two case studies, further investigating the role of calreticulin as a key player in a gene network mostly up-regulated in renal graft rejection and newly investigating the potential involvement of IL6 and the IL6 receptor in the progression of polycystic kidney disease.
The value of the KUPNetViz in kidney research will be increased in parallel with the number of related public multi-omics datasets which will be also available in the KUPKB. The expertise of the KUPNetViz hosting group in renal biology guarantees its constant curation and addition of related datasets, as these are getting published. This will potentially lead to increased confidence in the observed networks as additional biological evidence will eventually pile up from those constantly added datasets. The addition of datasets is easy and straightforward, since data are entered in ready-to-fill Excel sheets by biologists and incorporated by a single command line. We are aiming to the inclusion of the latest omics data sets in the KUPKB at least once a year from which the KUPNetViz will directly benefit from. Future extensions include the incorporation of additional biomolecule annotation and interaction resources (KEGG compounds) and the addition of a module that will identify highly enriched and over-represented biological functions and biochemical pathways using the StRAnGER algorithm .
Operating system: KUPNetViz is a web-based application thus it is platform independent.
Requirements: KUPNetViz is best used and viewed under Internet Explorer 8 or higher, Mozilla Firefox, Google Chrome, Apple Safari and Opera. The use of Internet Explorer 7 and lower is not recommended.
License: KUPNetViz is free for academic use but requires a license from the authors for any commercial purposes. The software is available without user registration.
Further information: An analytical user’s guide coupled with usage examples is available at the application’s homepage and as supplementary material online (Additional file 2).
Kidney and urinary pathways
Kidney and urinary knowledge base
Kidney and urinary pathway network visualizer
Kyoto encyclopedia of genes and genomes
Polycystic kidney disease
Chronic kidney disease
Interleukin 6 receptor
The authors want to express their acknowledgements to Dr Aristotle Chatziioannou for useful input at the early stages of the application design. This work is funded by the EU/FP7/ICT-2007.4.4 e-LICO project ( http://www.e-lico.eu) and by the FP7-IAPP program “Protoclin” (GA 251368, http://www.protoclin.org).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.