GARNET – gene set analysis with exploration of annotation relations
- Kyoohyoung Rho†1, 4,
- Bumjin Kim†2, 3,
- Youngjun Jang†1, 4,
- Sanghyun Lee3,
- Taejeong Bae1, 4,
- Jihae Seo2, 3,
- Chaehwa Seo2, 3,
- Jihyun Lee1, 4,
- Hyunjung Kang2, 3,
- Ungsik Yu5,
- Sunghoon Kim1, 4,
- Sanghyuk Lee2, 3 and
- Wan Kyu Kim2, 3Email author
© Rho1 et al; licensee BioMed Central Ltd. 2011
Published: 15 February 2011
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information.
GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules - gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations.
GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Omics studies usually yield a number of gene lists e.g. differentially expressed genes (DEGs). Typically, a statistical test of enrichment or depletion is performed for an a priori defined set of genes (usually from clustering of microarray data) or gene annotations. This approach has been successfully applied for diverse subjects including gene ontology (GO), signalling and metabolic pathways, and identification of regulatory elements such as transcription factors and microRNAs. However, biological interpretation of gene lists is still a challenge for many biologists because there is no ‘golden standard method’ established yet. Numerous annotation DBs and tools have been developed for biological interpretation of experimental gene lists including but not limited to GSEA , DAVID , Gazer , FatiGO+ , g:Profiler , WebGestalt  Lists2Networks  and GOAL . A comprehensive list of 68 GSA web tools is recently reviewed by Huang et al as well as several important points to consider in using such tools. As Huang and colleagues suggests, each tool has its own strength and limitations in terms of statistical method, coverage of gene annotation types and user interface .
One important issue in the field is the growing complexity of annotation data themselves. The benefit of gene set analysis (GSA) mainly comes from the power to summarize hundreds or even thousands of genes into a smaller number of enriched biological themes e.g. GO term or pathways, allowing simplified interpretation of high-throughput experiments. However, the analytic complexity of GSA is getting beyond its benefits because of the rapid increase of gene annotations e.g. a few dozens of genes can be enriched in a hundred or more annotation terms. The number of GO terms is already more than the number of genes in a human genome and the situation is getting similar with other types of annotation like pathways, the regulatory targets of TFs and miRNA, disease-associated genes and chromosomal locations even without considering their combinations . Increasingly, omics data continue to be sources of new annotations e.g. cancer signature genes from microarray  and disease-associated genes from GWAS studies . Major difficulties towards meaningful biological interpretation are integrating diverse types of annotations and at the same time, handling the complexities for efficient exploration of annotation relationships.
GARNET (Gene Annotation Relationship Network Tools) is an integrative platform for diverse types of gene set analysis, allowing convenient annotation network navigation. The utility of GARNET is two-fold. One is to facilitate the interpretation of gene sets from high-throughput experiments such as microarray, ChIP-chip (ChIP-Seq) and high-throughput screening. The other is to serve as a framework for meta-analysis of heterogeneous annotations and pre-existing knowledge, which often lead to novel insights undetectable by individual analyses . In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations) are also included. To deal with the complexity from a large number of annotations from different categories, a dedicated annotation network viewer has been developed for the visualization of related annotations.
GARNET system consists of three modules – gene set retrieval tool, gene set manager tool, and gene set analysis tool, which are tightly integrated to allow access, manipulation and statistical analysis of pre-compiled gene annotations and user-defined gene lists. The relationship between annotation terms is calculated using Cohen’s kappa statistic. Kappa is less sensitive to the gene set size than other P-value statistics such as Chi-square, hypergeometric and binomial test because it measures the difference between the observed and the expected agreement between two annotation terms.
Construction and content
Summary of the annotation categories and types in the GARNET system
Number of Annotations
Number of Unique Genes
Protein-Protein Interactions (PPI)
NCBI integrated DB
hg17, Build 35
hg17, Build 35
Gene Ontology (GO)
Disease & Drug
The GARNET system consists of three main tools – manager, analysis, and retrieval tools. Figure 1(b) shows the workflow of GARNET analysis. Users define gene sets using the manager tool where the set operation is available to combine two gene sets as union, intersection, and subtraction. The analysis tool performs enrichment test for the user-supplied genes in the annotation categories of choice. Multiple test correction is applied by default since there are so many annotation terms. The result of statistical analysis is given in table format where one can access the network view of annotation terms. In addition to the flat table view, GARNET also supports the tree view for hierarchical annotations such as GO and OMIM disease terms. The retrieval tool allows users to access the annotation database to extract genes assigned to any annotation term. Importantly, users may expand their gene list using the molecular networks in the annotation database. This novel feature allows users to investigate the down-stream effects of their genes of interest.
List of supported ID types in GARNET
1457, 2002, 1950
HGNC Gene Symbol
A1BG, A1CF, A2LD1
HGNC Gene ID
7, 8, 7645
S34755, A61235, I38947
Affymetrix probe ID
Illumina probe ID
A comprehensive set of gene annotation data are integrated in the GARNET system (Table 1). The annotations are grouped into four different categories of molecular network, genome annotation, gene expression, and disease & drug. The molecular network category consists of pathways (KEGG , BioCarta), protein-protein interactions (PPI) from NCBI and four major miRNA target databases (miRBase , TarBase , TargetScan  and PicTar ). The category of genome annotation contains information on gene function (Gene Ontology ), protein domain (Pfam ) and chromosomal location. Tissue-specific or cancer-related gene expression data are included in the gene expression category. We also collected gene-diesease (OMIM , GAD ) and gene-drug association (DrugBank ) data from relevant sources, that are deposited in the disease & drug category.
GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool.
Availability and requirements
This work was supported by "GIST Systems Biology Infrastructure Establishment Grant (2010) through Ewha Research Center for Systems Biology (ERCSB)”, Biogreen 21 Program of the Korean Rural Development Administration(20070401034010) and Korea Science and Engineering Foundation (KOSEF) funded by the Korea government (MEST) (R01-2008-000-20818-0 and 2007-03983).
This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S1.
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 2005, 102(43):15545–15550. 10.1073/pnas.0506580102PubMed CentralView ArticlePubMedGoogle Scholar
- Huang da W, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, et al.: DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic acids research 2007, 35(Web Server issue):W169–175. 10.1093/nar/gkm415PubMed CentralView ArticlePubMedGoogle Scholar
- Kim SB, Yang S, Kim SK, Kim SC, Woo HG, Volsky DJ, Kim SY, Chu IS: GAzer: gene set analyzer. Bioinformatics (Oxford, England) 2007, 23(13):1697–1699. 10.1093/bioinformatics/btm144View ArticleGoogle Scholar
- Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, Montaner D, Dopazo J: FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic acids research 2007, 35(Web Server issue):W91–96. 10.1093/nar/gkm260PubMed CentralView ArticlePubMedGoogle Scholar
- Reimand J, Kull M, Peterson H, Hansen J, Vilo J: g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic acids research 2007, 35(Web Server issue):W193–200. 10.1093/nar/gkm226PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang B, Kirov S, Snoddy J: WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic acids research 2005, 33(Web Server issue):W741–748. 10.1093/nar/gki475PubMed CentralView ArticlePubMedGoogle Scholar
- Lachmann A, Ma'ayan A: Lists2Networks: integrated analysis of gene/protein lists. BMC bioinformatics 11: 87. 10.1186/1471-2105-11-87Google Scholar
- Tchagang AB, Gawronski A, Berube H, Phan S, Famili F, Pan Y: GOAL: a software tool for assessing biological significance of genes groups. BMC bioinformatics 11: 229. 10.1186/1471-2105-11-229Google Scholar
- Huang da W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research 2009, 37(1):1–13. 10.1093/nar/gkn923PubMed CentralView ArticlePubMedGoogle Scholar
- Antonov AV, Schmidt T, Wang Y, Mewes HW: ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data. Nucleic acids research 2008, 36(Web Server issue):W347–351. 10.1093/nar/gkn239PubMed CentralView ArticlePubMedGoogle Scholar
- Rhodes DR, Kalyana-Sundaram S, Mahavisno V, Varambally R, Yu J, Briggs BB, Barrette TR, Anstet MJ, Kincead-Beal C, Kulkarni P, et al.: Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia 2007, 9(2):166–180. 10.1593/neo.07112PubMed CentralView ArticlePubMedGoogle Scholar
- Becker KG, Barnes KC, Bright TJ, Wang SA: The genetic association database. Nature genetics 2004, 36(5):431–432. 10.1038/ng0504-431View ArticlePubMedGoogle Scholar
- Zhang Y, De S, Garner JR, Smith K, Wang SA, Becker KG: Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information. BMC medical genomics 3: 1. 10.1186/1755-8794-3-1Google Scholar
- Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M: DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic acids research 2008, 36(Database issue):D901–906.PubMed CentralPubMedGoogle Scholar
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic acids research 2006, 34(Database issue):D354–357. 10.1093/nar/gkj102PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic acids research 2008, 36(Database issue):D154–158.PubMed CentralPubMedGoogle Scholar
- Papadopoulos GL, Reczko M, Simossis VA, Sethupathy P, Hatzigeorgiou AG: The database of experimentally supported targets: a functional update of TarBase. Nucleic acids research 2009, 37(Database issue):D155–158. 10.1093/nar/gkn809PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 2005, 120(1):15–20. 10.1016/j.cell.2004.12.035View ArticlePubMedGoogle Scholar
- Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, et al.: Combinatorial microRNA target predictions. Nature genetics 2005, 37(5):495–500. 10.1038/ng1536View ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 2000, 25(1):25–29. 10.1038/75556PubMed CentralView ArticlePubMedGoogle Scholar
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al.: The Pfam protein families database. Nucleic acids research 2008, 36(Database issue):D281–288.PubMed CentralPubMedGoogle Scholar
- Amberger J, Bocchini CA, Scott AF, Hamosh A: McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic acids research 2009, 37(Database issue):D793–796. 10.1093/nar/gkn665PubMed CentralView ArticlePubMedGoogle Scholar
- Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic acids research 2002, 30(7):1575–1584. 10.1093/nar/30.7.1575PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.