- Open Access
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool
© Chen et al.; licensee BioMed Central Ltd. 2013
- Received: 17 January 2013
- Accepted: 28 March 2013
- Published: 15 April 2013
System-wide profiling of genes and proteins in mammalian cells produce lists of differentially expressed genes/proteins that need to be further analyzed for their collective functions in order to extract new knowledge. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries. While many enrichment analysis tools and gene-set libraries databases have been developed, there is still room for improvement.
Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists. Enrichr is open source and freely available online at: http://amp.pharm.mssm.edu/Enrichr.
- Histone Modification
- Enrichment Analysis
- Fisher Exact Test
- Gene List
- Enrich Term
Recent improvements in our ability to perform genome-wide profiling of DNA, RNA, and protein at lower costs and more accurately further highlight the need for developing tools that can convert such an abundance of data into useful biological, biomedical, and pharmacological knowledge. One of the most powerful methods for analyzing such massive datasets is summarizing the results as lists of differentially expressed genes and then querying such gene lists against prior knowledge gene-set libraries [1, 2]. Differentially expressed gene lists can be extracted from RNA-seq or microarray studies; gene lists can be created from genes harboring mutations in cohorts of patients, or gene lists can be putative targets of transcription factors or histone modifications profiled by ChIP-seq. In fact, gene lists can be produced from any relevant experimental method that profiles the entire genome or the proteome. Once unbiased lists of genes or proteins are generated from such experiments, these lists are used as input for computing enrichment with existing lists created from prior knowledge organized into gene-set libraries.
Gene-set libraries are used to organize accumulated knowledge about the function of groups of genes. Each gene-set library is made of a set of related gene lists where each set of genes is associated with a functional term such as a pathway name or a transcription factor that regulates the genes. Creating such gene-set libraries can be achieved by assembling gene sets from diverse contexts. The original method that developed this approach is called gene set enrichment analysis (GSEA), first used to analyze microarray data collected from muscle biopsies of diabetic patients . The authors of this seminal publication developed a statistical test that is based on the Kolmogorov-Smirnov test  as well as developed a database of gene-set libraries called MSigDB . Many other gene set enrichment analysis tools have been developed in recent years following the original concept . However, many of such enrichment analysis tools focus on performing enrichment using only the Gene Ontology resource . In addition, enrichment analysis tools most commonly use the Fisher exact test or similar variations of it to compute enrichment . This family of tests has some bias to list size. Besides computing enrichment for input lists of genes, gene-set libraries can be used to build functional association networks [8, 9], predict novel functions for genes, and discover distal relationships between biological and pharmacological processes. While many gene-set libraries and tools for performing enrichment analysis already exist, there is a growing need for them and there are more ways to improve and validate gene set enrichment methods. For instance, many useful novel gene set libraries can be created; the performance of the enrichment computation can be improved; and visualization of enrichment results can be done in more intuitive and interactive ways.
Here, we present Enrichr, an integrative web-based and mobile software application that includes many new gene-set libraries, a new approach to rank enriched terms, and powerful interactive visualizations of the results in new ways. Enrichr is delivered as an HTML5 web-based application and also as a mobile app for the iPhone, Android and Blackberry. Users are provided with the ability to share the results with collaborators and export vector graphic figures that display the enrichment results in a publication ready format. We evaluated the ability of Enrichr to rank terms from gene-set libraries by comparing the Fisher exact test to a method we developed which computes the deviation from the expected rank for terms. To evaluate various methods that rank enriched terms, we analyzed lists of differentially expressed genes from studies that measured gene expression after knockdown of transcription factors to see the ranking of the knocked down factors using a transcription-factor/target-gene library . We show that the deviation from the expected rank method ranks more relevant terms higher. We also applied Enrichr to analyze nine cancer cell lines by comparing their enrichment signature patterns to the enrichment signatures of matched normal tissues. Such analysis provides a global visualization of critical regulatory differences between normal tissues and cancer cell lines. In particular, we observed a common pattern of up regulation of the PRC2 polycomb group target genes and enrichment for the histone mark H3K27me3 in many cancer cell lines. The global view of enrichment signature patterns also clearly unravels that Toll-like receptor signaling is turned off in K562 cells when compared to normal CD33+ myeloid cells, whereas interleukin signaling stays intact in both cell types. Overall, Enrichr is an easy to use intuitive enrichment analysis web-based tool providing various types of visualization summaries of collective functions of gene lists.
Creating the gene-set libraries
Enrichr contains 35 gene-set libraries where some libraries are borrowed from other tools while many other libraries are newly created and only available in Enrichr. The gene-set libraries provided by Enrichr are divided into six categories: transcription, pathways, ontologies, diseases/drugs, cell types and miscellaneous. The following is a description of each library and how it was created:
The ChIP-x Enrichment Analysis (ChEA) database  is our own resource for storing putative targets for transcription factors extracted from publications that report experiments of profiling transcription factors binding to DNA in mammalian cells. The database is already formatted into a gene-set library where the functional terms are the transcription factors profiled in each study together with the PubMed identifier (PMID) of the paper used to extract the gene. The ChEA gene-set library used in Enrichr is an updated version from the originally published database containing more than twice the entries compared to the originally published version .
PWMs from TRANSFAC and JASPAR were used to scan the promoters of all human genes in the region −2000 and +500 from the transcription factor start site (TSS). We retained only the 100% matches to the consensus sequences to call an interaction between a factor and target gene. This gene-set library was created for a tool we previously published called Expression2Kinases .
Transcription factor target genes inferred from PWMs for the human genome were downloaded from the UCSC Genome Browser  FTP site which contains many resources for gene and sequence annotations. We converted this file into a gene set library and included it in Enrichr since it produces different results compared with the other method to identify transcription factor/target interactions from PWMs as described above.
The ENCODE transcription factor gene-set library is the fourth method to create a transcription factor/target gene set library. We processed the newly published data from the Encyclopedia of DNA Elements (ENCODE) project [14, 15]. Using the aligned files for all 646 experiments that profiled transcription factors in mammalian cells, we identified the peaks using the MACS software  and then identified the genes targeted by the factors using our own custom processing. We sorted the peaks for each experiment by distance to the transcription factor start site (TSS) and retained the top 2000 target genes for each experiment.
The Histone modification gene-set library was created by processing experiments from the NIH Roadmap Epigenomics . Such experiments were conducted using various types of human cell lines types with antibodies targeting over 30 different histone modification marks. ChIP-seq datasets from the Roadmap Epigenomics project deposited to the GEO database were analyzed and converted to gene sets with the use of the software, SICER . Previous studies  have indicated that the use of control sample substantially reduces DNA shearing biases and sequencing artifacts; therefore, for each experiment, an input control sample was matched according to the description in GEO. ChIP-seq experiments without matched control input were not included. The resulting gene-set library contains 27 types of histone modifications for 64 human cell lines from various tissue origins.
The pathways category includes gene-set libraries from well-known pathway databases such as WikiPathways , KEGG , BioCarta, and Reactome  as well as five gene-set libraries we created from our own resources: kinase enrichment analysis (KEA)  for kinases and their known substrates, protein-protein interaction hubs , CORUM , and complexes from a recent high-throughput IP-MS study  as well as a manually assembled gene-set library created from extracting lists of phosphoproteins from SILAC phosphoproteomics publications .
The Kinase Enrichment Analysis (KEA) gene-set library contains human or mouse kinases and their known substrates collected from literature reports as provided by six kinase-substrate databases: HPRD , PhosphoSite , PhosphoPoint , Phospho.Elm , NetworKIN , and MINT .
The protein-protein interaction hubs gene-set library is made from an updated version of a human protein-protein interaction network that we are continually updating and originally published as part of the program, Expression2Kinases . From this network, we extracted the proteins with 120 or more interactions. These proteins are the terms in the library whereas their direct protein interactors are the genes in each gene set.
The next two gene-set libraries in the pathway category are protein complexes. The first library was created from a recent study that profiled nuclear complexes in human breast cancer cell lines after applying over 3000 immuno-precipitations followed by mass-spectrometry (IP-MS) experiments using over 1000 different antibodies . The second complexes gene-set library was created from the mammalian complexes database, CORUM .
The SILAC phosphoproteomics gene set library was created by processing tables from the supporting materials of SILAC phosphoproteomics studies. From each supporting table, we extracted lists of up and down proteins without applying any cutoffs. Protein IDs were converted to mammalian gene IDs when necessary using online gene symbol conversion tools. A total of 84 gene lists were extracted from such studies.
The ontology category contains gene-set libraries created from the three gene ontology trees  and from the knockout mouse phenotypes ontology developed by the Jackson Lab from their MGI-MP browser . To create such gene-set libraries, we “cut” the tree at either the third or fourth level and created a gene set from the terms and their associated genes downstream of the cut. The details about creating the Gene Ontology gene-set libraries are provided in our previous publication, Lists2Networks .
The Connectivity Map (CMAP) database  contains over 6,000 Affymetrix microarray gene expression experiments where human cancer cell lines were treated with over 1,300 drugs, many of them FDA approved, and changes in expression where measured after six hours. The drugs were always used as a single treatment but varied in concentrations. The CMAP database provides the results in a table where genes are listed in rank order based on their level of differential expression compared to the untreated state. From this table, we extracted the top 100 and bottom 100 differentially expressed genes to create two gene-set libraries, one for the up genes and one for the down genes for each condition. Each set is associated with a drug name and the four digit experiment number from CMAP. This four digit number can be used to locate the concentration, cell-type, and batch.
The GeneSigDB gene-set library was borrowed from the GeneSigDB database . The database contains gene lists extracted manually from the supporting tables of thousands of publications; most are from cancer related studies.
The OMIM gene-set library was created directly from the NCBI’s OMIM Morbid Map . We removed diseases with only a few genes and merged diseases with similar names because these are likely made of few subtypes of the same disease. In addition, since most diseases have only few genes, we used our tool, Genes2Networks , to create the OMIM expanded gene-set library. We entered the disease genes as the seed list and expanded the list by identifying proteins that directly interact with at least two of the disease gene products; in other words, we searched for paths that connect two disease gene products with one intermediate protein, resulting in a sub-network that connects the disease genes with additional proteins/genes. Each sub-network for each disease was converted to a gene set.
The VirusMINT gene-set library was created from the VirusMINT database , which is made of literature extracted protein-protein interactions between viral proteins and human proteins. Each term in the library represents a virus wherein the genes/proteins in each set are the host proteins that are known to directly interact with all the viral proteins for each virus.
The MSigDB computational and MSigDB oncogenic signature gene-set libraries were borrowed from the MSigDB database from categories C4 and C6 . These gene-set libraries contain modules of genes differentially expressed in various cancers.
The cell type category is made of four gene-set libraries: genes highly expressed in human and mouse tissues extracted from the Mouse and Human Gene Atlases  and genes highly expressed in cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE)  and NCI-60 . The gene-set libraries in this category were all created similarly. The Cancer Cell Line Encyclopedia (CCLE) dataset was derived from the gene-centric RMA-normalized mRNA expression data from the CCLE site. The Human Gene Atlas and Mouse Gene Atlas datasets were derived from averaged GCRMA-normalized mRNA expression data from the BioGPS site. Finally, the Human NCI60 Cell Lines dataset, while also downloaded from the BioGPS site, was raw and not normalized; hence, it was normalized using quantile normalization. The downloaded datasets were all of similar format such that the raw data was in a table with the rows being the genes and the columns being the expression values in the different cells. For each gene, the average and standard deviation of the expression values across all samples were computed. For each gene/term data point, a z-score was calculated based on the row’s average and standard deviation. Duplicate gene probes were merged by selecting the highest absolute z-score. Only genes with an absolute z-score of greater than 3 were selected to be part of a gene set for a particular cell which represents the term.
The miscellaneous category has three gene-set libraries: chromosome location, metabolites, and structural domains. The chromosomal location library is made of human genes belonging to chromosomal segments of the human genome. It is derived from MSigDB . The metabolite library was created from HMDB, a database  enlisting metabolites and the genes associated with them. Finally, the structural domains library was created from the PFAM  and InterPro  databases where the terms are structural domains and the genes/proteins are the genes containing the domains.
Where c is the combined score, p is the p-value computed using the Fisher exact test, and z is the z-score computed by assessing the deviation from the expected rank. Enrichr provides all three options for sorting enriched terms. In the results section, we show how we evaluated the quality of each of these three enrichment methods by examining how the methods rank terms that we know should be highly ranked.
Visualization of the results on a grid
Enrichr provides various ways to visualize the results from the enrichment analysis. One such method is the visualization of the enriched terms on a grid of squares. Here, all terms from a gene-set library are represented by squares on a grid which is organized based on the terms’ gene content similarity where an area of high similarity is made brighter. To arrange terms on the grid, term-term similarity is first computed using our algorithm, Sets2Networks . For this, the gene-set library is transposed making each gene the set label and the terms the sets for each gene. Sets2Networks then computes the probability for term-term similarity based on a co-occurrence probabilistic calculation. Once an adjacency distance matrix is computed for similarity between all pairs of terms, a simulated annealing process is used to arrange all terms on the dimension-less torodial grid. Dimension-less torodial grid means that the edges of the grid are continuous and connected, forming a torus. The simulated annealing process attempts to maximize the global similarity of terms based on their computed similarity distances as determined by Sets2Networks. The annealing starts with a random arrangement of terms, and then, using the Boltzman distribution, we swap the location of pairs of terms randomly and compute the global fitness of the swap. We run such annealing process until the arrangement converges to a fitness maximum. Once enrichment analysis is computed, the enriched terms are highlighted with higher p-values indicated by a brighter square. The grid can be clicked to toggle between the two alternative views: The alternative view shows all terms on the grid where the enriched terms are highlighted with circles, colored from bright white to gray based on their p-values.
Computing the significance of clustering of terms on the grid
Once enrichment analysis on the grid is achieved, we compute an index that distinguishes between randomly distributed enriched terms on the grid and terms that significantly cluster. While the continuous case of computing such clustering has a foundation in the literature [50, 51], the discrete nature of the grids of terms used in Enrichr has an appreciable effect that makes the computation with the continuous assumption inaccurate. Hence, we implemented a numerical approach to compute such a clustering index with associated probabilities.
Visualization of the results as a network of terms
Implementation of the web and mobile applications
Adding Enrichr as a final step to RNA-seq pipelines
Enrichr's online help contains a Python script that takes as input the output from CuffDiff which is a part of CuffLinks . CuffDiff is a common last step in the analysis of RNA-seq data which finds differentially expressed genes for various comparisons of RNA-seq data. However, the output from CuffDiff is not easy to handle. The python script extracts all the up and down gene lists from the input file, and then using the Python library, Poster, generates links to Enrichr analyses.
The user interface
To view the results in a tabular format, the user can switch to the table view tab. The results are presented in an HTML sortable table with various columns showing the enriched terms with the various scores (Figure 1 and Additional file 3: Figure S3). Clicking on the headers allows the user to sort the different columns and a search box is also available if interested in finding the scores for a particular term. Furthermore, the user can export the table to a tab-delimited formatted file that can be opened with software tools such as Excel or any text editor. Within these files, the users can see all terms, their scores, and the overlapping genes with the input genes for each term. The overlapping genes can be seen also by mouse hovering the terms in the table. For most tables, the enriched terms are hyperlinked to external sources that provide more information about the term.
Enrichr also provides a unique visualization of the results on a grid of terms (Figure 1 and Additional file 4: Figure S4). On each grid spot, the terms from a gene-set library are arranged based on their gene content similarity. The enriched terms are highlighted on the grid and color coded based on their level of enrichment, where brighter spots signify more enrichment. Enrichr also provides a measure of clustering of the enriched terms on the grid. The clustering level z-scores and p-values are highlighted in red if the clustering is significant (p-value < 0.1) or displayed in gray if the clustering is not significant. This clustering indicator provides an additional assessment of how related the genes are to each other and how relevant the specific gene-set libraries are for the input list of genes. The observation of one or two clusters on the grid suggests that a gene-set library is relevant to the input list. It also indicates that the terms in the clusters are relevant to the input list. Similar to the bar graph tab, the grid can be customized with the color wheel and exported into the three image formats. Clicking on any spot on the grid toggles between a p-value view and a grid view. The p-value view only highlights the enriched terms leaving all other spots black, while the grid view shows the similarity between terms as bright spots and the enriched terms as circles on top of the grid.
The final visualization option is a network view of the enriched terms (Figure 1 and Additional file 5: Figure S5). The network connects terms that are close to each other on the grid, giving a sense of how the enriched terms are related to each other. The nodes of the network are the enriched terms and they are arranged using a force-based layout. Users have the option to refine this arrangment by dragging the nodes to a desired place. These networks can also be color customized interactively and exported into one of the three image formats.
Enrichr makes it simple to share the analysis results with others. Users can click on the share icon to the right of the description box, resulting in a popup that provides the user with a link to the analysis results that they can copy and paste into an e-mail to send to a collaborator. Users can also create a user account where they can store and organize all their uploaded lists in one place. The user account will enable users to contribute their lists to the community generetaed gene-set library. This will allow other users to query their input lists against user contributed lists.
Enrichr also provides a mechanism to search for functions for specific genes with an auto-complete functionality. The results from the gene function search show all the terms for the gene from all gene-set libraries (Additional file 6: Figure S6). Enrichr is also mobile-friendly such that it supports touch gestures; for example, a simple swipe left and right on the main page switches between the tabs. On the results page, at the top level with no specific enrichment type selected, swipes left and right will navigate between the different enrichment categories. Once the user selects an enrichment type, swipes left and right will navigate between the different visualization types for the current enrichment type.
Statistics of the gene set libraries
Enrichr includes 35 gene-set libraries totaling 31,026 gene-sets that completely cover the human and mouse genome and proteome (Table 1). On average, each gene-set has ~350 genes and there are over six million connections between terms and genes. Further statistics and information of where the gene-set libraries were derived from can be found in the “Dataset Statistics” tab of the Enrichr main page. Histograms of gene frequencies for most gene-set libraries follow a power law, suggesting that some genes are much more common in gene-set libraries than others (Figure 2a). This has an implication for enrichment computations that we did not consider yet in Enrichr. Some genes are more likely to appear in various enrichment analyses more than others, this tendency can stem from various sources including well-studied genes. This research focus bias is in several of the libraries.
Evaluation of the enrichment scoring methods
List of gene set libraries ranked by number of terms
Mean genes per
Human CoR Complexome
Cancer Cell Line Encyclopedia
GO Biological Process
Genome Browser PWMs
MGI Mammalian Phenotype Top 4
Kinase Enrichment Analysis KEA
ENCODE TF ChIP-seq
GO Molecular Function
PPI Hub Proteins
Histone Modifications ChIP-seq
Pfam InterPro Domains
ChIP Enrichment Analysis ChEA
GO Cellular Component
MSigDB Oncogenic Signatures
Mouse Gene Atlas
NCI-60 Cancer Cell Lines
Human Gene Atlas
MGI Mammalian Phenotype Top 3
Rank of entries from the ChEA gene-set library using the three scoring methods implemented in Enrichr given input of lists of up or down regulated genes indentified from studies that profiled gene expression after knockdown or knockout of the same transcription factors
Application to obtain a global view of regulatory mechanisms in cancer cell lines and their matching normal tissues
An interesting signature pattern was also present in the WikiPathways grids that compared the enrichment signatures between CD33+ myeloid positive normal hematopoietic cells and K562 cells, which is a cell line often used to study a specific form of leukemia. The two cell lines share a cluster of pathways associated with Interleukin signaling (green circles in Figure 3), but the normal tissue is only enriched with Toll-like receptor signaling cluster, potentially indicating the alteration in signaling in leukemia shutting off this pathway. In addition, the highly expressed genes in the normal hematopoietic cells form a cluster in the MGI-MP grid which are defects in the hematopoietic system when these genes are knocked out in mice (gray circle in Figure 3). Finally, HUTU80 cells, a human duodenum adenocarcinoma cell line, have a cluster in the PPI hubs grid made of the EGFR cell signaling components including EGFR, GRB2, PI3K, and PTPN11 as well as Src signaling including LCK, JAK1 and STAT1, strongly suggesting up-regulation of this pathway in this cancer. Many more interesting clusters and patterns can be extracted from such global view of enrichment signatures and visualization of enriched terms on such grids.
In conclusion, Enrichr provides access to 35 gene-set libraries with many useful libraries such as those created from ENCODE enlisting many targets for many transcription factors as well as a gene-set library extracted from the NIH Roadmap Epigenomics Project for histone modifications. Other newly created libraries include genes highly expressed in different cell types and tissues; mouse phenotypes from MGI-MP; structural domains; protein-protein hubs; protein complexes; kinase substrates; differentially phosphorylated proteins from SILAC experiments; differentially expressed genes after approved drug perturbations; and virus-host protein interactions. The results from Enrichr are reported in four different ways: table, bar graph, network of enriched terms, and a grid that displays all the terms of a gene-set library while highlighting the enriched terms. Each visual display is easily exportable to vector graphic figures to be incorporated in publications and presentations. Enrichr also has a potentially improved method to compute enrichment, and we demonstrated that this method might be better than the currently widely used Fisher exact test. In addition, we show how figures generated by Enrichr can be used to obtain a global view of cell regulation in cancer by comparing highly expressed genes in cancer cell lines with genes highly expressed in normal matching tissues. Overall, Enrichr is a state-of-the-art gene set enrichment analysis web application. Code snippets are provided to embed Enrichr in any web-site. Enrichr is also available as a mobile app for iPhone, Android and Blackberry.
Enrichr is freely available online at: http://amp.pharm.mssm.edu/Enrichr.
Enrichr requires a browser that supports SVG. Recent versions of Chrome, Firefox, and Opera for Android are recommended. Enrichr only works with Internet Explorer (IE) 9 or higher. In addition, since the stock browsers in Android 2.3.7 (Gingerbread) or below do not support SVG, Enrichr does not work using these browsers.
This work is supported in part by NIH grants 1R01GM098316-01A1, U54HG006097-02S1, R01DK088541-01A1, and P50GM071558 to AM.
- Huang DW, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009, 37: 1-13. 10.1093/nar/gkn923.PubMed CentralView ArticleGoogle Scholar
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.PubMed CentralView ArticlePubMedGoogle Scholar
- Subramanian A, Kuehn H, Gould J, Tamayo P, Mesirov JP: GSEA-P: a desktop application for Gene Set Enrichment Analysis. Bioinformatics. 2007, 23: 3251-3253. 10.1093/bioinformatics/btm369.View ArticlePubMedGoogle Scholar
- Smirnov N: Tables for estimating the goodness of fit of empirical distributions. Ann Math Stat. 1948, 19: 279-281. 10.1214/aoms/1177730256.View ArticleGoogle Scholar
- Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P: Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011, 27: 1739-1740. 10.1093/bioinformatics/btr260.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H: Gene ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-10.1038/75556.PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher RA: On the interpretation of χ2 from contingency tables, and the calculation of P. J R Stat Soc. 1922, 85: 87-94. 10.2307/2340521.View ArticleGoogle Scholar
- Dannenfelser R, Clark N, Ma'ayan A: Genes2FANs: connecting genes through functional association networks. BMC Bioinforma. 2012, 13: 156-10.1186/1471-2105-13-156.View ArticleGoogle Scholar
- Clark N, Dannenfelser R, Tan C, Komosinski M, Ma'ayan A: Sets2Networks: network inference from repeated observations of sets. BMC Syst Biol. 2012, 6: 89-10.1186/1752-0509-6-89.PubMed CentralView ArticlePubMedGoogle Scholar
- Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR: ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics. 2010, 26: 2438-2444. 10.1093/bioinformatics/btq466.PubMed CentralView ArticlePubMedGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006, 34: D108-D110. 10.1093/nar/gkj143.PubMed CentralView ArticlePubMedGoogle Scholar
- Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X: JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010, 38: D105-D110. 10.1093/nar/gkp950.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuhn RM, Karolchik D, Zweig AS, Trumbower H, Thomas DJ: The UCSC genome browser database: update 2007. Nucleic Acids Res. 2007, 35: D668-D673. 10.1093/nar/gkl928.PubMed CentralView ArticlePubMedGoogle Scholar
- Rosenbloom KR, Dreszer TR, Long JC, Malladi VS, Sloan CA: ENCODE whole-genome data in the UCSC Genome Browser: update 2012. Nucleic Acids Res. 2012, 40: D912-D917. 10.1093/nar/gkr1012.PubMed CentralView ArticlePubMedGoogle Scholar
- Consortium TEP: An integrated encyclopedia of DNA elements in the human genome. Nature. 2012, 489: 57-74. 10.1038/nature11247.View ArticleGoogle Scholar
- Chadwick LH: The NIH roadmap epigenomics program data resource. Epigenomics. 2012, 4: 317-324. 10.2217/epi.12.18.PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Burge CB, Bartel DP: Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are MicroRNA targets. Cell. 2005, 120: 15-20. 10.1016/j.cell.2004.12.035.View ArticlePubMedGoogle Scholar
- Chen EY, Xu H, Gordonov S, Lim MP, Perkins MH: Expression2Kinases: mRNA profiling linked to multiple upstream regulatory layers. Bioinformatics. 2012, 28: 105-111. 10.1093/bioinformatics/btr625.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS: Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008, 9: R137-10.1186/gb-2008-9-9-r137.PubMed CentralView ArticlePubMedGoogle Scholar
- Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A: The NIH roadmap epigenomics mapping consortium. Nat Biotechnol. 2010, 28: 1045-1048. 10.1038/nbt1010-1045.PubMed CentralView ArticlePubMedGoogle Scholar
- Zang C, Schones DE, Zeng C, Cui K, Zhao K: A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics. 2009, 25: 1952-1958. 10.1093/bioinformatics/btp340.PubMed CentralView ArticlePubMedGoogle Scholar
- Pepke S, Wold B, Mortazavi A: Computation for ChIP-seq and RNA-seq studies. Nat Methods. 2009, 6: S22-S32. 10.1038/nmeth.1371.PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Shih I, Jones-Rhoades MW, Bartel DP, Burge CB: Prediction of mammalian microRNA targets. Cell. 2003, 115: 787-798. 10.1016/S0092-8674(03)01018-3.View ArticlePubMedGoogle Scholar
- Lachmann A, Ma'ayan A: Lists2Networks: integrated analysis of gene/protein lists. BMC Bioinforma. 2010, 11: 87-10.1186/1471-2105-11-87.View ArticleGoogle Scholar
- Pico AR, Kelder T, Van Iersel MP, Hanspers K, Conklin BR: WikiPathways: pathway editing for the people. PLoS Biol. 2008, 6: e184-10.1371/journal.pbio.0060184.PubMed CentralView ArticlePubMedGoogle Scholar
- Ogata H, Goto S, Fujibuchi W, Kanehisa M: Computation with the KEGG pathway database. Biosystems. 1998, 47: 119-128. 10.1016/S0303-2647(98)00017-3.View ArticlePubMedGoogle Scholar
- Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E: Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 2005, 33: D428-D432.PubMed CentralView ArticlePubMedGoogle Scholar
- Lachmann A, Ma'ayan A: KEA: kinase enrichment analysis. Bioinformatics. 2009, 25: 684-686. 10.1093/bioinformatics/btp026.PubMed CentralView ArticlePubMedGoogle Scholar
- Ruepp A, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C: CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 2008, 36: D646-D650.PubMed CentralView ArticlePubMedGoogle Scholar
- Malovannaya A, Lanz RB, Jung SY, Bulynko Y, Le NT: Analysis of the human endogenous coregulator complexome. Cell. 2011, 145: 787-799. 10.1016/j.cell.2011.05.006.PubMed CentralView ArticlePubMedGoogle Scholar
- Graauw M, Pimienta G, Chaerkady R, Pandey A: SILAC for Global Phosphoproteomic Analysis. 2009, Phospho-Proteomics: Humana Press, 107-116.Google Scholar
- Prasad TSK, Goel R, Kandasamy K, Keerthikumar S, Kumar S: Human protein reference databaseâ€”2009 update. Nucleic Acids Res. 2009, 37: D767-D772. 10.1093/nar/gkn892.View ArticleGoogle Scholar
- Hornbeck PV, Chabra I, Kornhauser JM, Skrzypek E, Zhang B: PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics. 2004, 4: 1551-1561. 10.1002/pmic.200300772.View ArticlePubMedGoogle Scholar
- Yang CY, Chang CH, Yu YL, Lin TCE, Lee SA: PhosphoPOINT: a comprehensive human kinase interactome and phospho-protein database. Bioinformatics. 2008, 24: i14-i20. 10.1093/bioinformatics/btn297.View ArticlePubMedGoogle Scholar
- Diella F, Cameron S, GemÃ¼nd C, Linding R, Via A: Phospho. ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinforma. 2004, 5: 79-10.1186/1471-2105-5-79.View ArticleGoogle Scholar
- Linding R, Jensen LJ, Pasculescu A, Olhovsky M, Colwill K: NetworKIN: a resource for exploring cellular phosphorylation networks. Nucleic Acids Res. 2008, 36: D695-D699.PubMed CentralView ArticlePubMedGoogle Scholar
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M: MINT: a Molecular INTeraction database. FEBS Lett. 2002, 513: 135-140. 10.1016/S0014-5793(01)03293-8.View ArticlePubMedGoogle Scholar
- Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE: The mouse genome database genotypes: phenotypes. Nucleic Acids Res. 2009, 37: D712-D719. 10.1093/nar/gkn886.PubMed CentralView ArticlePubMedGoogle Scholar
- Lamb J, Crawford ED, Peck D, Modell JW, Blat IC: The connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science Signalling. 2006, 313: 1929-Google Scholar
- Culhane AC, Schwarzl T, Sultana R, Picard KC, Picard SC: GeneSigDBâ€”a curated database of gene expression signatures. Nucleic Acids Res. 2010, 38: D716-D725. 10.1093/nar/gkp1015.PubMed CentralView ArticlePubMedGoogle Scholar
- Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA: Online Mendelian inheritance in man (OMIM). Hum Mutat. 1999, 15: 57-61.View ArticleGoogle Scholar
- Chatr-aryamontri A, Ceol A, Peluso D, Nardozza A, Panni S: VirusMINT: a viral protein interaction database. Nucleic Acids Res. 2009, 37: D669-D673. 10.1093/nar/gkn739.PubMed CentralView ArticlePubMedGoogle Scholar
- Berger SI, Posner JM, Ma'ayan A: Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinforma. 2007, 8: 372-10.1186/1471-2105-8-372.View ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.PubMed CentralView ArticlePubMedGoogle Scholar
- Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA: The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012, 483: 603-607. 10.1038/nature11003.PubMed CentralView ArticlePubMedGoogle Scholar
- Weinstein JN: Spotlight on molecular profiling:â€œintegromicâ€ analysis of the NCI-60 cancer cell lines. Mol Cancer Ther. 2006, 5: 2601-2605. 10.1158/1535-7163.MCT-06-0640.View ArticlePubMedGoogle Scholar
- Wishart DS, Tzur D, Knox C, Eisner R, Guo AC: HMDB: the human metabolome database. Nucleic Acids Res. 2007, 35: D521-D526. 10.1093/nar/gkl923.PubMed CentralView ArticlePubMedGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V: The Pfam protein families database. Nucleic Acids Res. 2004, 32: D138-D141. 10.1093/nar/gkh121.PubMed CentralView ArticlePubMedGoogle Scholar
- Apweiler R, Attwood TK, Bairoch A, Birney E, Biswas M: The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001, 29: 37-40. 10.1093/nar/29.1.37.PubMed CentralView ArticlePubMedGoogle Scholar
- Skellam J: Studies in statistical ecology: I Spatial pattern. Biometrika. 1952, 39: 346-362.Google Scholar
- Clark PJ, Evans FC: Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology. 1954, 35: 445-453. 10.2307/1931034.View ArticleGoogle Scholar
- Bostock M, Ogievetsky V, Heer J: D3 Data-Driven Documents. IEEE T Vis Comput Gr. 2011, 17: 2301-2309.View ArticleGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotech. 2010, 28: 511-515. 10.1038/nbt.1621.View ArticleGoogle Scholar
- Cao R, Wang L, Wang H, Xia L, Erdjument-Bromage H: Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science. 2002, 298: 1039-1043. 10.1126/science.1076997.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.