iCTNet: A Cytoscape plugin to produce and analyze integrative complex traits networks
© Wang et al; licensee BioMed Central Ltd. 2011
Received: 5 May 2011
Accepted: 26 September 2011
Published: 26 September 2011
The speed at which biological datasets are being accumulated stands in contrast to our ability to integrate them meaningfully. Large-scale biological databases containing datasets of genes, proteins, cells, organs, and diseases are being created but they are not connected. Integration of these vast but heterogeneous sources of information will allow the systematic and comprehensive analysis of molecular and clinical datasets, spanning hundreds of dimensions and thousands of individuals. This integration is essential to capitalize on the value of current and future molecular- and cellular-level data on humans to gain novel insights about health and disease.
We describe a new open-source Cytoscape plugin named iCTNet (i ntegrated C omplex T raits Net works). iCTNet integrates several data sources to allow automated and systematic creation of networks with up to five layers of omics information: phenotype-SNP association, protein-protein interaction, disease-tissue, tissue-gene, and drug-gene relationships. It facilitates the generation of general or specific network views with diverse options for more than 200 diseases. Built-in tools are provided to prioritize candidate genes and create modules of specific phenotypes.
iCTNet provides a user-friendly interface to search, integrate, visualize, and analyze genome-scale biological networks for human complex traits. We argue this tool is a key instrument that facilitates systematic integration of disparate large-scale data through network visualization, ultimately allowing the identification of disease similarities and the design of novel therapeutic approaches.
The online database and Cytoscape plugin are freely available for academic use at: http://www.cs.queensu.ca/ictnet
In recent years, the availability of high throughput datasets from a variety of biological sources has prompted the creation of a multitude of databases that significantly facilitate biomedical research. In parallel, network biology has emerged as a powerful paradigm to visualize and analyze large data ensembles in novel ways with unparalleled flexibility . More recent applications of this approach have enabled a detailed look at the genetic landscape of complex human phenotypes . In 2007, Goh et al. reported the first human disease network and provided a novel view of the genetic relationship among diseases . Subsequently, more complex approaches that included the integration of quantitative trait loci, gene expression, and clinical phenotypic data were used to construct disease similarity networks [4, 5]. Another pioneering study summarized the application of protein networks for network-based classification of diseases  and integration of drug targets and disease gene products led to the field of systems pharmacology [6, 7]. Overall, the availability of large-scale datasets has prompted efforts to integrate data with the ultimate goal of providing systematic insights into complex traits.
Recently, multiple databases were elegantly combined to explore gene-disease associations . While this is a useful tool to visualize relationships among phenotypes and study disease-related genes, the obtained networks are limited to only this type of interaction. Here we present iCTNet (i ntegrated C omplex T raits Net works), a tool to create and analyze human complex traits networks that assembles and integrates information from genome-wide association studies, protein-protein interactions, tissue expression, and drug targets with the goal of identifying novel relationships across several domains that may assist in elucidating a new classification, pathogenic mechanism, or treatment for common human traits. To the best of our knowledge, iCTNet constitutes the first effort to integrate multiple layers of information as multi-partite networks thus enabling systematic analysis of human complex traits.
Data sources and distribution
Number of disease-gene associations at different GWAS cutoff values
More than 33,000 interactions among 12,000 proteins were downloaded from the human protein reference database (HPRD, R.8) . In addition, protein (TF) -DNA interactions were incorporated by extracting transcription factor binding sites (TFBS) information from the UCSC Genome Browser v.18 . Protein-DNA interactions were defined as directed edges linking TF and their target genes. Edges were derived from high confidence TFBS (ZScore > 4.0) located within 5,000 bp of the transcriptional start site of the nearest gene.
To refine phenotype-SNP association, we incorporated information on genes expressed in specific tissues. This information was downloaded from HPRD and included 13,337 genes, 579 tissues and 10,486 tissue-gene connections. We curated and pooled the tissues into 38 categories to standardize the naming conventions in various sources, and filtered "unknown" gene symbols, resulting in 7,260 genes and 43,206 tissue-gene edges. We also manually curated the disease-tissue association for the phenotypes present in iCTNet. The final disease-tissue subnetwork has 27 pooled tissues, 247 phenotypes and 281 edges.
Candidate gene prioritization
where an iterative walker's transition in the network is explained and where p t+1 is the vector holding the scores of the nodes at time step t + 1, W is the normalized adjacency matrix of the network, p t is a vector holding the score of the nodes at the previous time step t, and r is the restart rate ranging from 0 to 1. In both methods, the walker begins with starting nodes and extends to randomly selected neighbors in the network. The restart ratio represents the probability of the transition to jump back to starting nodes at every time step. In other words, the transition will reach farther nodes in the network with small restart ratio; otherwise, the walker will be trapped at starting nodes if the restart ratio is 1.
The main difference between PRINCE  and the random walk method  is the input data p0 and the adjacency matrix W. In random walk with restarts, the initial vector p0 was constructed such that equal probabilities were assigned to the starting nodes. Next, all genes with GWAS p-values are classified as either "associated" or "candidate" based on a user-selected threshold. This algorithm measures the closeness of potentially associated (candidates) to confirmed (associated) genes within the global protein network, and ranks candidate genes for further biological investigation. As for the PRINCE algorithm, the original version takes as input a disease similarity matrix (arbitrarily defined), and a protein interaction network. PRINCE then uses a network propagation-based algorithm to infer a strength-of-association scoring function and exploits the prior information on causal genes for the same disease or similar ones. This scoring is used in combination with a PPI network to infer protein complexes that are involved in a given disease. We modified the algorithm to work with unweighted protein-protein interactions and extended it to include all types of network interactions supported by iCTNet. In addition, instead of using an arbitrarily defined disease similarity matrix, our implementation of PRINCE uses true disease associations as defined by a user-selected p-value threshold. A genetic similarity network is then created from GWAS data. Finally, the association of candidate genes with a given phenotype is prioritized via network propagation as originally described. The complexity of both methods is O(tn2), where n is the number of nodes in the network, and t represents the number of time-steps. The run times depend on the number of truly associated genes, their associated strength (p-value), and the number of connections among their protein products in a network.
Results and Discussion
A useful feature of iCTNet is the ability to create a similarity network on any existing network using any type of node. This function replaces indirect connections in the bi-partite network (in this case diseases are connected though genes) thus creating a simpler display. For example, a similarity network of the autoimmunity graph shown in Figure 2A is shown in Figure 2B. This feature is particularly useful when handling very large networks. The color of each edge is proportional to the number of genes shared by each disease (using a heatmap coloring scheme). Another key component of iCTNet is the availability of drug-target relationships. Figure 2C shows the same autoimmunity network in which proteins that are drug targets are linked to drugs (blue nodes) as described in DrugBank. A straightforward advantage of this multidimensional display is that it may identify drugs that are effectively used for one disease as a plausible alternative for another disease genetically associated to the same drug target.
iCTNet also provides two candidate prioritization algorithms that take full advantage of the underlying protein interaction network. Both the random walk and PRINCE algorithms take a set of "associated" genes (genes with association p-value below a user-selected threshold) and perform searches through the entire protein interaction network. This is a powerful way to identify a set of candidate genes that even if their association p-value is modest, their position in the protein ensemble makes them suitable candidates for further follow-up. To the best of our knowledge a head-to-head comparison of these algorithms under extensive range of parameters has not been performed. Thus, we are unable to recommend the use of a particular algorithm, and instead encourage the user to test them under different experimental scenarios.
Previous studies that explored the relationship between genes and diseases on a large scale [3, 8] were based on manually curated databases such as the Online Mendelian Inheritance in Man (OMIM) . While data from OMIM is readily accessible, the relationships between genes and diseases from its GeneMap do not strictly represent susceptibility loci, but in some cases also refer to progression or pharmacogenomics effects. In contrast, iCTNet incorporates phenotype-SNP associations from the genome-wide association studies (GWAS) Catalog database (http://www.genome.gov/gwastudies). When multiple sources of information implicate a given gene with a trait, p-values from those studies were combined into a meta p-value. As a result, each disease-gene interaction (edge) in iCTNet has a quantitative value approximately equal to the -Log10 of the association p-value (-Log10(p)). This strategy also enables the iCTNet user to filter results based on a given significance threshold. Another distinctive feature of iCTNet is that multi-partite networks can be created by combining up to four classes of nodes (disease, gene, drug, and tissue) with up to five classes of edges (protein-protein, protein-DNA, disease-gene, drug-gene, and tissue-gene).
In summary, here we present a database and Cytoscape plugin for the integration of different high-throughput datasets. iCTNet represents a new family of applications that are designed to integrate and analyze disparate data sources, a key pillar in the new paradigm of systems biology.
iCTNet is a powerful plugin for Cytoscape, built on a complex database that integrates interactions among human phenotypes, proteins, tissues, and drugs. It utilizes the power of multi-partite network analysis and visualization to uncover genetic similarities among multiple traits to suggest alternative therapeutic approaches and to prioritize disease-associated genes. iCTNet enables a point and click environment to load views for user-selected phenotypes, and provides two methods for evaluation or prioritization of disease-causing genes. To maintain iCTNet, monthly updates of GWAS catalog are planned. Integration of further data sources including quantitative omics data, miRNA targets, and advanced analysis are among future plans.
Availability and Requirements
Project name: iCTNet, integrated Complex Traits Networks.
Project home page: http://www.cs.queensu.ca/ictnet
Operating system: Platform independent
Programming language: Java, minimum requirement Java SE 1.5
Cytoscape version: iCTNet requires Cytoscape version 2.6 or later, and has been tested on version 2.8
Memory: minimum 2GB for large networks
License: BSD-style open source license
Any restrictions to use by non-academics: none other than those in the BSD license
List of abbreviations
integrated complex trait networks
single nucleotide polymorphism
genome-wide association study
Human protein reference database
Type 1 diabetes
Acknowledgements and Funding
SEB is a Harry Weaver Neuroscience Scholar from the National Multiple Sclerosis Society. PM acknowledges the Ontario Ministry of Research and Innovation, Early Researcher Award and the Natural Sciences and Engineering Council of Canada.
- Barabasi AL, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet 2004, 5(2):101–113. 10.1038/nrg1272View ArticlePubMedGoogle Scholar
- Ideker T, Sharan R: Protein networks in disease. Genome Res 2008, 18(4):644–652. 10.1101/gr.071852.107PubMed CentralView ArticlePubMedGoogle Scholar
- Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL: The human disease network. Proc Natl Acad Sci USA 2007, 104(21):8685–8690. 10.1073/pnas.0701361104PubMed CentralView ArticlePubMedGoogle Scholar
- Sieberts SK, Schadt EE: Moving toward a system genetics view of disease. Mamm Genome 2007, 18:(6–7):389–401.View ArticleGoogle Scholar
- Schadt EE, Friend SH, Shaywitz DA: A network view of disease and compound screening. Nat Rev Drug Discov 2009, 8(4):286–295. 10.1038/nrd2826View ArticlePubMedGoogle Scholar
- Berger SI, Iyengar R: Role of systems pharmacology in understanding drug adverse events. Wiley Interdiscip Rev Syst Biol Med 2010, 3(2):129–135.PubMed CentralView ArticlePubMedGoogle Scholar
- Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M: Drug-target network. Nat Biotechnol 2007, 25(10):1119–1126. 10.1038/nbt1338View ArticlePubMedGoogle Scholar
- Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI: DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks. Bioinformatics 2010, 26(22):2924–2926. 10.1093/bioinformatics/btq538View ArticlePubMedGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 2003, 13(11):2498–2504. 10.1101/gr.1239303PubMed CentralView ArticlePubMedGoogle Scholar
- Johnson AD, O'Donnell CJ: An open access database of genome-wide association results. BMC Med Genet 2009, 10: 6.PubMed CentralView ArticlePubMedGoogle Scholar
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 2009, 106(23):9362–9367. 10.1073/pnas.0903103106PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher RA: Combining independent tests of significance. American Statistician 1948, 2(5):30. 10.2307/2681650Google Scholar
- Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, et al.: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003, 13(10):2363–2371. 10.1101/gr.1680803PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res 2002, 12(6):996–1006.PubMed CentralView ArticlePubMedGoogle Scholar
- Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 2008, (36 Database):D901–906.
- Kohler S, Bauer S, Horn D, Robinson PN: Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 2008, 82(4):949–958. 10.1016/j.ajhg.2008.02.013PubMed CentralView ArticlePubMedGoogle Scholar
- Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R: Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 2010, 6(1):e1000641. 10.1371/journal.pcbi.1000641PubMed CentralView ArticlePubMedGoogle Scholar
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA: Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 2005, 33(Database issue):D514–517.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.