Skip to main content

iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data

Abstract

Background

RNA-seq is widely used for transcriptomic profiling, but the bioinformatics analysis of resultant data can be time-consuming and challenging, especially for biologists. We aim to streamline the bioinformatic analyses of gene-level data by developing a user-friendly, interactive web application for exploratory data analysis, differential expression, and pathway analysis.

Results

iDEP (integrated Differential Expression and Pathway analysis) seamlessly connects 63 R/Bioconductor packages, 2 web services, and comprehensive annotation and pathway databases for 220 plant and animal species. The workflow can be reproduced by downloading customized R code and related pathway files. As an example, we analyzed an RNA-Seq dataset of lung fibroblasts with Hoxa1 knockdown and revealed the possible roles of SP1 and E2F1 and their target genes, including microRNAs, in blocking G1/S transition. In another example, our analysis shows that in mouse B cells without functional p53, ionizing radiation activates the MYC pathway and its downstream genes involved in cell proliferation, ribosome biogenesis, and non-coding RNA metabolism. In wildtype B cells, radiation induces p53-mediated apoptosis and DNA repair while suppressing the target genes of MYC and E2F1, and leads to growth and cell cycle arrest. iDEP helps unveil the multifaceted functions of p53 and the possible involvement of several microRNAs such as miR-92a, miR-504, and miR-30a. In both examples, we validated known molecular pathways and generated novel, testable hypotheses.

Conclusions

Combining comprehensive analytic functionalities with massive annotation databases, iDEP (http://ge-lab.org/idep/) enables biologists to easily translate transcriptomic and proteomic data into actionable insights.

Background

RNA sequencing (RNA-Seq) [1] has become a routine technique for genome-wide expression analysis. At increasingly reduced cost, library construction and sequencing can often be carried out following standard protocols. For many researchers, especially those without bioinformatics experience, the bottleneck to fully leverage the power of the technique is how to translate expression profiles into actionable insights. A typical analytic workflow involves many steps, each requiring different tools. It can be time-consuming to learn, tune and connect these tools correctly. Another hurdle is the scattered annotation databases with diverse types of gene IDs. To mitigate these issues, we aim to develop an application that can greatly reduce the time and effort required for researchers to analyze RNA-Seq data.

RNA-Seq data analysis often starts with quality control, pre-processing, mapping and summarizing of raw sequencing reads. We assume these steps were completed, using either the traditional Tuxedo Suite [2, 3] or alternatives such as the faster, alignment-free quantification methods [4, 5]. These tools can be used in stand-alone mood or through platforms like GenePattern [6], Galaxy [7], and CyVerse [8].

After read mapping, we often obtain a matrix of gene-level read counts or normalized expression levels (Fragments Per Kilobase Million, or FPKM). For such tabular data, like DNA microarray data, R is a powerful tool for visualization and statistical analysis. In addition, many dedicated R and Bioconductor [9] packages have been developed to identify differentially expressed genes (DEGs) and altered pathways. Some of the packages, such as DESeq2 [10], are developed specifically for the statistical modeling of read counts, and are widely used. But these packages can be time-consuming, or even out of reach for researchers without coding experience.

Several web applications have been developed to analyze summarized expression data (Table 1). START App (Shiny Transcriptome Analysis Resource Tool) is a Shiny app that performs hierarchical clustering, principal component analysis (PCA), gene-level boxplots, and differential gene expression [11]. Another similar tool, Degust [12] can perform differential expression analysis using EdgeR [13] or limma-voom [14] and interactively plot the results. Other tools include Sleuth [15] and ShinyNGS [16]. Non-Shiny applications were also developed to take advantage of the R code base. This includes DEIVA [17] and VisRseq [18]. Beyond differential expression, several tools incorporate some capacity of pathway analysis. For quantified expression data, ASAP (Automated Single-cell Analysis Pipeline) [19] can carry out normalization, filtering, clustering, and enrichment analysis based on Gene Ontology (GO) [20] and KEGG [21] databases. With EXPath Tool [22], users can perform pathway search, GO enrichment and co-expression analysis. Several other Shiny-based tools, such as IRIS [23], are also being developed. The development of these tools in the last few years facilitated the interpretation of RNA-Seq data.

Table 1 Comparison of applications for analyzing RNA-Seq

In this study, we seek to develop a web application with substantially enhanced functionality with (1) automatic gene ID conversion with broad coverage, (2) comprehensive gene annotation and pathway database for both plant and animals, (3) several methods for in-depth EDA and pathway analysis, (4) access to web services such as KEGG [21] and STRING-db [24] via application programming interface (API), and (5) improved reproducibility by generating R scripts for stand-alone analysis.

We used iDEP to analyze two example datasets and generate all the figures and tables in this paper except Table 1 and Fig. 1. We first extensively analyzed a simple RNA-Seq dataset involving small interfering RNA (siRNA)-mediated Hoxa1 knockdown in human lung fibroblasts [3]. We identified the down-regulation of cell-cycle genes, in agreement with previous studies. Our analyses also reveal the possible roles of E2F1 and its target genes, including microRNAs, in blocking G1/S transition, and the upregulation of genes related to cytokines, lysosome, and neuronal parts. The second dataset was derived from an experiment with a factorial design to study the effect of ionizing radiation (IR) on mouse B cells with and without functional p53 [25]. In addition to correctly identifying p53 pathway and the enrichment of p53 target genes, we also found the p53-independent effects, including the regulation of ribosome biogenesis and non-coding RNA metabolism, and activation of c-MYC. These examples show that users can gain insights into both molecular pathways and gene regulatory mechanisms.

Fig. 1
figure 1

iDEP workflow and functional modules

Results

We developed an easy-to-use web application for in-depth analysis of gene expression data. iDEP (integrated Differential Expression and Pathway analysis) encompasses many useful R and Bioconductor packages, vast annotation databases, and related web services. The input is a gene-level expression matrix obtained from RNA-seq, DNA microarray, or other platforms. Main functionalities include (1) pre-processing, (2) exploratory data analysis (EDA), (3) differential expression, and (4) pathway analysis and visualization.

Leveraging many existing R packages (see Fig. 1) and the power of the Shiny framework, we developed iDEP to enable users to easily formulate new hypotheses from transcriptomic datasets. We also batch downloaded a massive amount of gene annotation information for 220 species (See Additional file 1: Table S1) from Ensembl [26, 27] Ensembl Plants [28], and Ensembl Metazoa. In addition, comprehensive pathway databases for human (Table 2), mouse [29], and arabidopsis [30] were also compiled from many sources to support in-depth pathway analyses.

Table 2 Gene set databases collected for enrichment analysis in human. The last column gives the version of the database, secondary source, or the date of access

Our goal was to develop an intuitive, graphical, and robust tool so that researchers without bioinformatics experience can routinely and quickly translate expression data into novel hypotheses. We also wanted to make an open system where users can download intermediate results so that other tools can be used. Also, users can upload custom pathway databases for unannotated species. For experienced bioinformaticians, it can serve as a tool for preliminary analysis as it circumcises the need for many tedious tasks such as converting gene IDs and downloading software packages and annotations. These users can also download customized R scripts and related data files so that the analysis can be reproduced and extended.

Use case 1: A simple experiment on Hoxa1 knockdown

We first analyzed a simple dataset studying the effect of Hoxa1 knockdown by siRNA in human lung fibroblasts [3]. With 3 replicates for each of the two biological samples, this RNA-Seq dataset was used as example data for the Cuffdiff2 paper [3]. Available as Additional file 2, the read count data was previously used in a tutorial for pathway analysis [31]. A flowchart for the analysis can be found in Additional file 3: Figure S1.

Pre-processing and EDA

After uploading the read count data, iDEP correctly recognized Homo sapiens as the likely species, based on the number of matched genes IDs. After ID conversion and the default filter (0.5 counts per million, or CPM, in at least one sample), 13,819 genes left. A bar plot of total read counts per library is generated (Fig. 2a), showing some small variation in library sizes. We chose the regularized log (rlog) transformation implemented in the DESeq2 package, as it effectively reduces mean-dependent variance (Additional file 3: Figure S2). Distribution of the transformed data is shown in Fig. 2b-c. Variation among replicates is small (Fig. 2d).

Fig. 2
figure 2

Diagnostic plots for read-counts data. a Total read-counts per library. b Distribution of transformed data using a density plot. c Boxplot of transformed data. d Scatter plot of the first two samples

iDEP also enables users to examine the expression level of one or more genes. Using “Hoxa” as a keyword, we obtained Fig. 3a, which shows that Hoxa1 expression level is reduced, but not abolished, in response to siRNA-mediated knockdown of Hoxa1. Noticeably, expression of other family members, especially Hoxa2, 4, and 5, also decrease. As these genes have similar mRNA sequences, it is unclear whether this is caused by off-target effects of the siRNA or ambiguous mapping of RNA-Seq reads. Figure 3b, obtained by using “E2F” as a keyword, shows the down-regulation of E2F1.

Fig. 3
figure 3

Expression patterns of (a) Hoxa and (b) E2f gene families generated by iDEP

We rank genes by their standard deviation across all samples and use the top 1000 genes in hierarchical clustering. The result in Fig. 4a suggests that Hoxa1 knockdown in lung fibroblast cells induce a drastic change in the expression of hundreds of genes. Variations among technical replicates are minimal. These observations can also be confirmed by the correlation matrix (Additional file 3: Figure S3) and k-means clustering (Additional file 3: Figure S4).

Fig. 4
figure 4

Hierarchical clustering (a) and PCA analyses (b) indicate the substantial difference in thousands of genes induced by Hoxa1 knockdown. There is little variation among replicates

PCA plot using the first and second principal components is shown in Fig. 4b. There is a clear difference between the Hoxa1 knockdown and the control samples, along the first principal component that explains 93% of the variance. Plot using multidimensional scaling (MDS), and t-SNE [32] also show a similar distribution of the samples (Additional file 3: Figure S5). We can choose to conduct pathway analysis using PGSEA [33, 34] by treating the loadings of the principal components as expression values. As suggested by Additional file 3: Figure S6, the first two components are related to cell cycle regulation.

Differentially expressed genes (DEGs)

With the DESeq2 package, we identified 907 upregulated and 1097 downregulated genes (see Additional file 1: Table S3) using a threshold of false discovery rate (FDR) < 0.1 and fold-change > 2. The volcano plot (Fig. 5a) and the MA plot (Fig. 5b) show that Hoxa1 knockdown leads to a massive transcriptomic response. Plotly-based interactive versions of these plots are also available, where users can zoom in and mouse over to see individual genes (Fig. 5c). A quick scan at the top genes ranked by the absolute values of fold-change (FCs) tells us that Hoxa1 knockdown induces cytokines (IL1B, IL24).

Fig. 5
figure 5

Summary plots for differential expression analysis using DESeq2. a Volcano plot, b MA plot, and (c) Interactive MA plot

The up and down-regulated genes are then subjected to enrichment analysis based on the hypergeometric distribution. Many different types of genes sets listed in Table 2 can be used to test various hypotheses. The GO Biological Process terms enriched in DEGs are shown in Table 3. Upregulated genes are related to regulation of cell proliferation, locomotion, and response to endogenous stimuli. This is perhaps the cell’s response to injected siRNAs. The downregulated genes are significantly enriched with cell cycle-related genes (FDR < 2.6 × 10− 47). The effect of Hoxa1 knockdown on cell cycle was reported and experimentally confirmed in the original study [3]. Cell cycle analysis revealed that loss of Hoxa1 leads to a block in G1 phase [3].

Table 3 Enriched GO terms in up and down-regulated genes

As many GO terms are related or redundant (i.e., cell cycle and cell cycle process), we provide two plots to summarize such correlation [35]. We first measure the distance among the terms by the percentage of overlapped genes. Then this distance is used to construct a hierarchical clustering tree (Fig. 6a) and a network of GO terms (Fig. 6b). Both plots show that the enriched terms are distinct in the two gene lists. The down-regulated genes are overwhelmingly involved in cell cycle. The upregulated genes are related to 4 related themes: cell proliferation, signaling, response to organic substance, and cell migration, possibly in reaction to the injected siRNAs.

Fig. 6
figure 6

Visualization of the relationship among enriched GO terms using (a) hierarchical clustering tree and (b) network

Choosing GO cellular component, we find that Hoxa1 knockdown suppresses genes that code for the spindle, cytoskeleton and chromosomal parts (Additional file 3: Figure S7). As Hoxa1 knockdown blocks G1/S transition [3], a smaller number of cells are in the S (synthesis) phase, leading to the reduction of proteins related to the spindle and chromosomal parts. Hoxa1 knockdown also induces genes related to plasma membrane, neurons and synapses (Additional file 3: Figure S7). This unexpected result is consistent with Hoxa1’s role in neuronal differentiation [36, 37]. Polymorphisms of this gene are associated with cerebellar volume in humans [38]. Hoxa1 may have different functions in various organs across developmental stages.

Choosing KEGG pathway, we confirm the overrepresentation of cell cycle-related genes in downregulated genes (Additional file 3: Figure S8). For up-regulated genes, we detect cytokine-cytokine receptor interaction (CCRI) pathway (FDR < 1.3 × 10− 10). “MSigDB.Curated” gene sets contain pathways from various databases as well as published lists of DEGs from previous expression studies [39]. As shown in Additional file 3: Figure S9, the most significant are oligodendrocyte differentiation and several cell-cycle related gene sets. As suggested by a meta-analysis of published gene lists [40], cell-cycle related expression signature is frequently triggered by diverse cellular perturbations [41]. We uncovered similarity of our expression signature with previously published ones.

Using the STRINGdb package, iDEP can analyze the lists of DEGs via the STRING API [24] for enrichment analysis and the retrieval of PPI networks. The enrichment analysis led to similar results (Additional file 1: Table S4) to those obtained using the internal iDEP gene sets. In addition, STRING detected that the Helix-loop-helix DNA-binding domain is overrepresented in proteins coded by the 907 upregulated genes, while the Tubulin/FtsZ family, GTPase domain is enriched in the 1097 down-regulated genes (Additional file 1: Table S5). Figure 7 is the network of PPIs among the top 20 upregulated genes. The highly connected network includes chemokine ligands 1 and 3 (CXCL1 and CXCL3), as well as interleukin 24 (IL24), suggesting the immune response caused by injected siRNA. A link to an interactive, richly annotated version of this network on the STRING website is also available.

Fig. 7
figure 7

Protein-protein interactions (PPI) among top 20 up-regulated genes. This is retrieved via API access to the STRING database. There is also an enrichment of PPIs compared with background. An interactive, annotated version of this network is also available through a link to the STRING website

iDEP can also reveal gene regulatory mechanisms. Using the transcription factor (TF) target gene sets in enrichment analyses, we can obtain Table 4, which suggest that target genes of SP1 (FDR < 9.80 × 10− 23) and E2F factors (FDR < 1.1 × 10− 16) are overrepresented in down-regulated genes. E2F factors are regulators of cell cycle [42]. E2F1 promotes G1/S transition [43] by regulation many genes, including itself. SP1 binding sites were identified in cell-cycle related genes such as Cyclin D1 (CCD1) [44]. SP1 is a G1 phase specific TF [45]. The interaction of E2F1 and SP1 proteins mediate cell cycle regulation [46]. The upregulated genes are enriched with target genes of NF-κB (FDR < 4.9 × 10− 9) and FOXO3 (FDR < 4.9 × 10− 9), known regulators of the immune response [47, 48].

Table 4 Enriched transcription factor (TF) binding motifs

The Motif gene sets from MSigDB are derived from [49] and contain sets of genes sharing TF binding motifs in gene promoters and microRNA target motifs in 3′ untranslated regions (UTRs). Using this gene set, we again detect the enrichment of E2F motifs in promoters of downregulated genes (Additional file 1: Table S16). We also detected overrepresentation of a “GCACTTT” motif in 3’ UTRs of upregulated genes. This motif is targeted by several microRNAs, namely miR-17-5P, miR-20a, miR-106a. Cloonan et al. showed that miR-17-5P targets more than 20 genes involved in the G1/S transition [30]. Trompeter et al. provided evidence that miR-17, miR-20a, and miR-106b enhance the activities of E2F factors to influence G1/S transition [50]. miR-106b resides in the intron of Mcm7 along the sense direction. Mcm7 is an E2F1 target gene that is also downregulated by Hoxa1 knockdown (see Fig. 8a). Petrocca et al. showed that E2F1 regulates miR-106b, which can conversely control E2F1 expression [51]. Thus, it is possible that Hoxa1 knockdown reduces E2F1 expression (see Fig. 3b) and its target genes, including Mcm7, which hosts miR-106b. We can speculate that downregulated miR-106b, in turn, causes the increases in the expression of its target genes. Leveraging the comprehensive pathway databases, iDEP enables researchers to develop new hypotheses that could be further investigated.

Fig. 8
figure 8

Expression profiles of cell-cycle related genes visualized on an KEGG pathway diagram using the Pathview package. Red and green indicate genes induced or suppressed by Hoxa1 knockdown, respectively

For many species, predicted TF target genes are not available. We downloaded 300 bp and 600 bp promoter sequences from ENSEMBL and scanned them with a large collection of TF binding motifs [52]. As shown in Table 5, the promoters of DEGs are overrepresented with many G-rich motifs bound by E2F and other factors such as TCFL5 and SP2. We compared the best possible scores for each TF and promoter pair and run t-tests to compare these scores. Further study is needed to validate this approach.

Table 5 TF motifs enriched in gene promoters (300 bp) of up- or down-regulated genes

For human (Table 2), mouse [29] and Arabidopsis [53], we also include predicted target genes for many miRNAs from multiple sources. Using these gene sets, we detected significant enrichment (Table 6) of miRNA-193b, miR-192, and miR-215 target genes among the down-regulated genes. miR-193b was shown to suppress cell proliferation and down-regulate CCND1 [54]. Proposed as biomarkers of several cancers, miR-192 also inhibit proliferation and can cause cell cycle arrest when overexpressed [55]. miR-215 shares many target genes with miR-192 and are also downregulated in cancer tissues [56]. These miRNAs may play a role in the regulation of cell cycle upon Hoxa1 knockdown.

Table 6 Enriched miRNA target gene sets

Pathway analysis

Instead of using selected DEGs that are sensitive to arbitrary cutoffs, pathway analysis can use fold-change values of all genes to identify coherently altered pathways. We used the GAGE (generally applicable gene set enrichment) [57] as a method and KEGG as gene sets. The results (Additional file 1: Table S6) is similar to those from a previous analysis by Turner in an online tutorial [31] and also agrees with our enrichment analysis based on DEGs. For each of the significant KEGG pathways, we can view the fold-changes of related genes on a pathway diagram using the Pathview Bioconductor package [58]. Many cell cycle genes are marked as green in Fig. 8, indicating reduced expression in Hoxa1-knockdown samples. We also detected upregulation of genes related to CCRI, arthritis, and lysosome. Many CCRI related genes are up-regulated (Additional file 3: Figure S10). Not detected using DEGs, lysosome-related genes are mostly upregulated (Additional file 3: Figure S11). Injected siRNAs might be degraded in the lysosome.

By changing the gene sets database for pathway analysis, we can gain further insights. Using MSigDB.Motif gene sets, we can verify the enrichment of E2F binding motifs (Additional file 1: Table S7). For non-KEGG gene sets, heatmaps are created to show the expression of genes in significant gene sets. Figure 9a shows part of such a plot, highlighting genes that share the “SGCGSSAAA” motif bound by E2F1. Note that E2F1 gene itself is included in the figure, as it binds to its own promoter and forms a positive feedback loop [43]. The downloaded expression data indicate that E2F1 is downregulated by more than 3-fold in Hoxa1 knockdown samples (see Fig. 3b). Upon Hoxa1 knockdown, downregulation of E2F1 and downstream genes, including microRNAs, may be part of the transcription program that blocks G1/S transition.

Fig. 9
figure 9

Pathway analysis results using different options. a expression patterns of genes with E2F1 binding motifs. E2F1 gene itself is also downregulated in Hoxa1 knockdown. So is the Mcm7 gene, whose intron host miR-106b-25 clusters. b Results from running PGSEA on KEGG gene sets. c PGSEA applied on MSigDB.Motif gene sets

Users can use many combinations of methods and gene sets to conduct pathway analysis. For example, using PGSEA on KEGG pathways yielded Fig. 9a and b, again confirming previous results on suppressed cell cycle genes and induced lysosome and CCRI related genes. Using the MSigDB.Motif gene sets, we can also confirm the E2F1 binding motifs (Fig. 9). The most highly activated gene sets are related to miR-17-5p, miR-20a, miR106a,b and so on (Fig. 9c), which agrees with enrichment analysis using just gene lists.

Some pathways can be attenuated by upregulating some of its associated genes while downregulating others. To detect such pathways, we can use the absolute values of fold changes in pathway analysis. This is achieved by checking the box labeled “Use absolute values of fold changes for GSEA and GAGE.” Instead of detecting up or down-regulated pathways, the results show which pathways are more regulated. As shown in Additional file 1: Table S8, while the expression of ribosome related genes is less variable upon Hoxa1 knockdown, genes related to CCRI are highly regulated.

The expression of neighboring genes can be correlated due to mechanisms such as super-enhancers [59], 3D chromatin structure [60], or genomic gain or loss in cancer. To help users detect such correlation, we use ggplot2 [61] and Plotly to interactively visualize fold-changes on all the chromosomes (Fig. 10a). Based on regression analysis, the PREDA package [62] can detect statistically significant chromosomal regions with coherent expression change among neighboring genes. Figure 10b shows many such regions in response to Hoxa1 knockdown. Detailed information obtained from downloaded files (Additional file 1: Table S9) suggests, for example, a 4.3 Mbps region on Chr.1q31 contains 6 upregulated genes (PRG4, TPR, C1orf27, PTGS2, PLA2G4A, and BRINP3).

Fig. 10
figure 10

Visualizing expression profiles on chromosomes. a Zoom-in on Chr. 1 using the dynamic graphics, showing the upregulation of RUX3 gene. b Statistically significant genomic regions identified by PREDA

To further validate our parameterization of PREDA, we analyzed DNA microarray data (Additional file 4) of thymus tissues from patients with Down syndrome [63]. We detected large, upregulated regions on chromosome 21 (Additional file 3: Figure S12), as expected. Even though PREDA analysis is slow and has low-resolution due to the use of gene-level expression score, it might be useful in cancer studies where localized expression change on the chromosome can happen.

To improve reproducibility, iDEP generates custom R code and R Markdown code based on user data and choices of parameters (Additional files 5, 6 and 7). Users with some R coding experience should be able to re-run most analyses by downloading the related annotation and gene sets used by iDEP. An example is shown here [64].

Use case 2: p53’s role in response to ionizing radiation

Tonelli et al. [25] used RNA-Seq to study the effect of whole-body ionizing radiation (IR) on the mouse with or without p53. B cells and non-B cells were isolated from mouse spleen after treatment. We analyzed the B cell data involving two genotypes (p53 wildtype and p53 null) with mock or IR treatment, a typical 2 × 2 factorial design. The read count and experimental design files are available as Additional files 8 and 9. A converted, filtered version of this dataset is incorporated into iDEP as a demo data.

With this dataset, we demonstrate how users can easily generate hypothesis on molecular pathways and gene regulatory mechanisms through three steps: (1) enrichment analysis of k-means clusters, (2) enrichment analysis of the lists of DEGs, and (3) pathway analysis using fold-changes values of all genes.

Pre-process and EDA of p53 dataset

We noticed reduced total reads for wildtype samples treated with IR (Fig. 11a). While this may be caused by biology, but biased sequencing depth presents a confounding factor, that has not been discussed widely. To quantify such biases, iDEP routinely performs ANOVA analysis of total read counts across sample groups. For this example, uneven read counts are detected (P = 0.047) and a warning is produced.

Fig. 11
figure 11

a Total read counts are smaller in the WT samples treated with IR. b Distribution of standard deviations for all genes

EDA shows that IR treatment led to the changes in thousands of genes. Based on the distribution of variances (Fig. 11b), we choose the top 2500 genes for clustering analysis. Hierarchical clustering (Additional file 3: Figure S13) shows the substantial differences between treated and untreated samples. It also shows the patterns of different groups of genes and the variations among some replicates of treated wild-type cells (wt_IR).

We then used k-means clustering to divide the top 2500 genes into groups. Based on the within-group sum of squares plot (Additional file 3: Figure S14) as a reference, we chose a slightly larger k = 9. Figure 12 shows the 9 gene clusters and the enriched GO terms. Details are available in Additional file 1: Tables S10 and S11. Genes in clusters B and I show similar responses to IR across genotypes. Strongly enriched in genes related to the immune system (FDR < 3.65 × 10− 18), cluster B are downregulated by IR in both cell types. The immune-suppressive effects of radiation [65] are clearly p53-independent. Induced by IR in both wildtype and Trp53−/− cells, cluster I genes are enriched in ribosome biogenesis but with much lower level of significance (FDR < 2.25 × 10− 5).

Fig. 12
figure 12

K-means clustering and enrichment analysis of the ionization dataset

On the other hand, genes in clusters A, C, and D are specific to the wild-type cells. Cluster A contains 13 genes that code for histone proteins and are involved in nucleosome assembly (FDR < 1.66 × 10− 11). Genes in Clusters C and D are induced by IR only in B cells with p53, but the former is more strongly upregulated. As expected, cluster C is related to the p53 pathway (FDR < 1.38 × 10− 10) and apoptosis (FDR < 3.59 × 10− 6). It is enriched with 15 p53 target genes like Mdm2 (FDR < 3.53 × 10− 18). Cluster D genes are related to the regulation of cell proliferation and cell cycle arrest, representing further downstream of the transcriptional cascade of p53 signaling.

Genes in cluster H are more highly upregulated in Trp53−/− B cells than wildtype cells. It is overrepresented with non-coding RNA (ncRNA) processing (FDR < 3.25 × 10− 36), ribosome biogenesis (FDR < 5.53 × 10− 43), and protein folding (FDR < 2.23 × 10− 16). Many of these genes code for proteins in the nucleus and mitochondrion. Significant enrichment of 7 c-Myc target genes is observed (FDR < 5.09 × 10− 7). Many of these enrichment results will be further validated in enrichment analysis of DEGs and pathway analysis. Enrichment analysis of the k-Means clusters provides an opportunity to gain insight into the molecular pathways underlying different patterns of gene expression.

Identifying DEGs in the p53 dataset

To identify genes induced by IR in both cell types, users can use pair-wise comparisons among the 4 sample groups. Alternatively, we can construct linear models through the GUI. Here we use the following model:

  • Expression ~ p53 + Treatment + p53:Treatment,

where the last term represents the interaction between genotype and treatment, capturing the additional effects of p53 in IR response. It is important to set the reference levels for factors in a model. Here we set the null (Trp53−/−) as a reference level for the factor “p53” and mock for the factor “Treatment”. More details about statistical models is available [66].

With FDR < 0.01 and fold-change > 2 as cutoffs, we used DESeq2 to identify DEGs (Fig. 13a and b). Without treatment, the two cell types have similar transcription profiles, with few DEGs. But even in Trp53−/− cells, IR caused the upregulation of 1570 genes, 469 of which is also upregulated in p53 wildtype B cells (see Venn diagram in Fig. 13c). PPI networks for the top up- and down-regulated genes in wildtype cells are shown in Additional file 3: Figures S15 and S16, respectively.

Fig. 13
figure 13

Statistics of DEGs identified by DESeq2. a and b shows the numbers of differentially expressed genes for each comparison. c Venn Diagram shows the overlap between IR induced genes in WT and P53 null samples

To further understand the molecular pathways, we perform enrichment analysis of the 10 gene lists (Additional file 1: Table S12) associated with 5 comparisons. We focus on two comparisons (1) “IR-mock” representing the baseline response of IR in mutant cells without p53, and (2) “I:p53_wt-Treatment_IR”, the interaction term capturing the additional effect of p53 compared to the baseline response.

For the first comparison, Additional file 3: Figure S17 shows IR induced DEGs in mutant cells. The 1570 upregulated genes are related to non-coding RNA (ncRNA) metabolic process (FDR < 1.33 × 10− 79), ribosome biogenesis (FDR < 2.54 × 10− 67), and translation (FDR < 3.03 × 10− 32). This enrichment profile is similar to cluster H derived from the k-Means clustering, as the two lists capture the same group of genes. The upregulated genes are surprisingly coherent in function. For example, 219 (14%) can be found in the nucleus, 286 (18%) is related to the mitochondrion, and, most significantly, 407 (26%) is RNA-binding (FDR < 3.54 × 10− 138). The 1570 upregulated genes contain 7 MYC target genes (FDR < 4.22 × 10− 7), consistent with the fact that MYC is a direct regulator of ribosome biogenesis [67]. This agrees with reports of the involvement of MYC in radiation treatment [68, 69], suggesting MYC may trigger proliferation pathways upon genotoxic stress, in the absence of p53.

Genes downregulated by IR in Trp53−/− B cells are related to immune system (FDR < 4.22 × 10− 8), GTPase activity (FDR < 3.75 × 10− 6), and actin cytoskeleton (FDR < 2.06 × 10− 5). As shown in Additional file 1: Table S13, we can also detect the enrichment of the target genes of miR-124 (FDR < 4.56 × 10− 12), an important modulator of immunity [70]. Others associated miRNAs, including miR-6931-5p, Mir-4321, and miR-576-5p, may also be involved.

For the second comparison, the expression profiles of DEGs associated with the interaction term is shown in Fig. 14. This is the p53 mediated IR response, compared to the baseline response without p53. The 676 genes that are upregulated in wild-type B cells following IR, but not in Trp53−/− B cells. As expected, these genes are enriched in p53-mediated response to DNA damage (FDR < 1.43 × 10− 6), and apoptosis (FDR < 9.72 × 10− 6). As shown in Additional file 1: Table S13, these genes are overrepresented with 25 target genes of p53 (FDR < 1.34 × 10− 13) and 76 target genes of miR-92a (FDR < 2.79 × 10− 11). Part of the miR-17/92 cluster, miR-92a is related to tumorigenesis and is regulated by p53 [71, 72]. Another miRNA with overrepresented target genes is miR-504 (FDR < 3.25 × 10− 8), which has been shown to binds to 3’ UTR of Trp53 and negatively regulate its expression [73]. Located in the introns of the fibroblast growth factor 13 (FGF13) gene, miR-504 is transcriptionally suppressed by p53, forming a negative feedback loop [74]. Following radiation, the expression of both miR-92a and miR-504 in wild-type B cells may be reduced, leading to the upregulation of their target genes. Further study is needed to verify this hypothesis.

Fig. 14
figure 14

Additional effect of p53 in IR response. a Expression patterns of selected DEGs. b Upregulated genes are enriched with genes related to p53 mediated response to DNA damage, especially apoptosis, and negative regulation of cell cycle. These genes are only induced by IR in cells with wildtype p53. One the other hand, p53 caused the relative downregulation of genes related to ribosome biogenesis, tRNA and rRNA processing, DNA replication, and protein folding. These genes are only upregulated by IR in Trp53−/−

As shown in Fig. 14, the 584 genes downregulated according to the interaction term are those that are induced in the Trp53−/− B cells, but not in wild-type B cells. These genes are overrepresented with ncRNA processing, ribosome biogenesis, cell cycle, and RNA transport (Additional file 1: Table S14). Most (411) of the 584 genes are included in the genes upregulated by IR in Trp53−/− B cells, as suggested by the Venn diagram in Additional file 3: Figure S18. MYC target genes are also downregulated by p53 upon IR. In wildtype B cells, p53 suppresses the MYC oncogenic pathway compared to Trp53−/− B cells. The most significant shared TF binding motif is E2F1 (FDR < 7.73 × 10− 11). This agrees with the role of p53 in cell cycle arrest through p21-mediated control of E2F factors [75].

Pathway analysis of p53 data

Many of the above observations can be confirmed by using pathway analysis based on the fold-change values of all genes. The results of GSEA on the interaction term can be found in Additional file 1: Table S15. The PGSEA package offers a convenient way to visualize the activities of pathways across all samples. Additional file 3: Figure S19 clearly shows that p53 signaling pathway, apoptosis, and positive regulation of cell cycle arrest are uniquely activated by IR in wild-type B cells. This is again confirmed by TF target genes (Fig. 15). In addition, the p53-independent upregulation of MYC target genes can also be observed in Fig. 15. Several ETS transcription factors, including SFPI1, SPI1, and ETS1, are suppressed by IR in both cell types. These factors may underlie the suppression of immune response as suggested [76]. Applying PGSEA on miRNA target genes highlights miRNA-30a (Additional file 3: Figure S20), whose target genes are specifically activated by IR in wild-type B cells. miRNA-30a was shown to be involved in response to IR [77] and mutually regulate p53 [78]. Thus, the complex p53 signaling pathways are unveiled with remarkable accuracy.

Fig. 15
figure 15

Differentially regulated TF target gene sets across sample types, identified by PGSEA

The upregulated p53 target genes can be seen in the KEGG pathway diagram (Additional file 3: Figure S21). This pathway map shows multifaceted roles of p53 in the regulation of apoptosis, cell cycle, DNA damage repair, and growth arrest. Many of these functions were re-discovered in our analyses above. This shows the power of comprehensive pathway databases coupled with broad analytic functionalities accessible via an intuitive user interface. Without iDEP, it can take days or weeks to write code and collect data to conduct all the analyses above. With iDEP, biologists can complete such analyses in as little as 20 min.

Discussions

Taking advantage of the Shiny platform, we were able to pack many useful functionalities into iDEP, including high-quality graphics based on ggplot2 and interactive plots using Plotly. Compared with traditional web applications, Shiny has its drawbacks and limitations. The interface is not as flexible as those developed using JavaScript. Nevertheless, we believe an integrated web application like iDEP is a valuable tool to both bench scientists and bioinformaticians.

As an example, we extensively analyzed an RNA-Seq dataset involving Hoxa1 knockdown by siRNA in lung fibroblasts, and identified the down-regulation of cell-cycle genes, in agreement with previous analyses and experimental confirmation. Our analyses also show E2F and SP1 binding motifs are enriched in the promoters of downregulated genes, mediating the cell cycle arrest. Furthermore, we also find evidence that microRNAs (miR-17-5P, miR-20a, miR-106a, miR-192, miRNA-193b, and miR-215) might work with E2F factors to block the G1/S transition in response to reduced Hoxa1 expression. Interestingly, miR-106a is located in the intron of Mcm7, an E2F1 target gene. DEGs are also enriched with genes related to neuron parts, synapse, as well as neurodegenerative diseases. This is consistent with reports of Hoxa1’s role in neuron differentiation [36,37,38]. Hoxa1 knockdown induces expression of genes associated with the cytokine-cytokine interaction, lysosome, and cell migration, probably in response to the injected siRNAs. These genes are overrepresented with target genes of NF-κB, known to be involved in immune response. By combining both annotation dataset and analytic functionality, iDEP help biologists to quickly analyze their data to form new hypotheses (Fig. 16a).

Fig. 16
figure 16

Bioinformatics analysis using iDEP generates many hypotheses regarding the molecular pathways underlying (a) Hoxa1 knockdown, and (b) Ionizing radiation

In the second example, our analysis shows that in B cell without p53, radiation treatment upregulates MYC oncogenic pathway, triggering downstream genes with highly coherent functions such as cell proliferation, ribosome biogenesis, and ncRNA metabolism. Enriched with target genes of miR-124 and ETS domain transcription factors, genes downregulated by IR in p53 null B cells are associated with immune response, GTPase activity and actin cytoskeleton. In wildtype B cells, a p53-dependent transcriptional response to IR is evidently related to p53-mediated apoptosis and DNA repair, as expected. The target genes of MYC and E2F1 are suppressed by p53, leading to growth and cell cycle arrest (Fig. 16b). iDEP helps unveil the multifaceted functions of p53, and also highlight the potential involvement of several miRNAs (miR-92a, miR-504, and miR-30a).

Users should be cautious when interpreting results from pathway analysis, which can be obtained through the many combinations of methods and gene set databases. The biomedical literature is large and heterogeneous [79], making it easy to rationalize and make a story out of any gene. True pathways, like the effect of Hoxa1 knockdown on cell cycle, should be robustly identified across different methods and databases. Also, as demonstrated in the two examples, for each enrichment or pathway analysis, we tried to focus on the most significant gene sets.

Conclusions

By integrating many Bioconductor packages with comprehensive annotation databases, iDEP enables users to conduct in-depth bioinformatics analysis of transcriptomic data through a GUI. The two use cases demonstrated that it can help pinpoint molecular pathways from large genomic datasets, thus eliminating some barriers for modern biologists.

Besides RNA-Seq and DNA microarray data, users can also use iDEP to analyze fold-change and FDR values calculated by other methods such as cuffdiff [80]. For unannotated genomes, iDEP can be used for EDA and differential expression analysis. For single-cell RNA-Seq data [81], only smaller, pre-processed datasets with hundreds of cells can be analyzed, as iDEP is mostly designed to handle transcriptomic data derived from bulk tissues.

In addition to updating the annotation database from Ensembl every year, we plan to continue to compile pathway databases for model organisms, similar to MSigDB and GSKB. For unsupported species, we will consider ways to incorporate user-submitted gene annotation. Based on user request and feedback, we will also add more functions by including additional Bioconductor packages.

Methods

Figure 1 outlines the iDEP workflow. Expression matrix is first filtered, transformed and converted to Ensemble gene IDs, which are used internally to identify genes. The pre-processed data is then used for EDA, with methods such as K-means clustering, hierarchical clustering, principal component analysis (PCA), and t-SNE [32]. Gene clusters identified by K-means are analyzed by enrichment analysis based on a large gene annotation and pathway database. The identification of DEGs is done with either the limma [82] or DESeq2 [10] packages. This is also followed by enrichment analysis on the DEGs. The fold-change values are then used in pathway analysis using several methods.

To enable gene ID conversion, we downloaded all available gene ID mappings for 220 species from Ensembl [26, 27] (Additional file 1: Table S1), including 98 from Ensembl (vertebrates, release 91), 53 from Ensembl Plants (release 37) [28], and 69 from Ensembl Metazoa (release 37). The final mapping table for the current iDEP v0.72 release consists of 135,832,098 rows, mapping various gene IDs (Additional file 1: Table S2) into Ensembl. For example, 67 types of human gene IDs can be converted to Ensembl gene IDs. Besides common ID like gene symbol, Entrez, Refseq, UCSC, UniGene, and Interpro IDs, the 67 kinds of human gene IDs also include probe IDs for popular DNA microarray platforms, making it possible to re-analyze thousands of microarray datasets available at public repositories.

When multiple gene IDs are mapped to the same ENSEMBL gene, only the one with largest standard deviation is kept. Gene IDs not recognized by iDEP will be kept in the data using original gene IDs. Users can also avoid gene ID conversion by checking the “Do not convert gene IDs to Ensembl” checkbox in the “Pre-Process” page. This is useful when the user’s data is already Ensembl gene IDs, or the user just wants to conduct EDA and identify differentially expressed genes (DEGs).

In the pre-processing stage, gene IDs are first compared to all gene IDs in the database for 220 organisms. This enables automatic ID conversion and species identification. Genes expressed at very low levels are removed and data are transformed as needed using one of several methods. iDEP enforces log-transformation when a highly skewed distribution is detected. This type of mechanisms can help avoid issues in downstream analyses. The pre-processing stage also generates diagnostic and summary plots to guide users to make their choices.

EDA enables the users to explore variations and patterns in the dataset as a whole [83]. The main methods include hierarchical clustering with heatmap, k-means clustering, and PCA. Enrichment analysis of genes derived from k-means clustering is conducted to gain insights into the functions of co-expressed genes. Initial attempts of pathway analysis are carried out using the PCA loadings on each gene. This can tell us the biological processes underlying each direction of expression change defined by the principal components.

Differential expression analysis relies on two Bioconductor packages, limma [82] and DESeq2 [10]. These packages can meet the needs for most studies, including those involving multiple biological samples and factorial design. See [84] for detailed review of other methods and consideration of sample size and variance. Normalized expression data is analyzed using limma. Read counts data can be analyzed using three methods, namely limma-trend [14], limma-voom [14, 85], and DESeq2. Other methods such as edgeR [13] may be incorporated in the future.

For simple study designs, iDEP runs differential gene expression analysis on all pairs of sample groups, which are defined by parsing sample names. For complex studies, users can upload a file with experiment design information and then build statistical models that can involve up to 6 factors. This also enables users to control for batch effects or dealing with paired samples.

Fold-change values for all genes returned by limma or DESeq2 are used in pathway analysis using GSEA [86], PAGE [33, 34], GAGE [57] or ReactomePA [87]. Taking advantage of centralized annotation databases for 98 species at Ensembl (release 92), 53 in Ensembl Plants (release 40), and 69 in Ensembl Metazoa (release 40), we downloaded not only GO functional categorizations, but also promoter sequences for defining transcription factor (TF) binding motifs for most species. Metabolic pathways were downloaded directly from KEGG [21] for 131 species (Additional file 1: Table S1). Also, we incorporated Pathview package [58] to show gene expression on KEGG pathway diagrams downloaded via API. In addition, we also included many species-specific pathway knowledgebases, such as Reactome [87, 88], GeneSetDB [89] and MSigDB [39] for human, GSKB for mouse [29], and araPath for Arabidopsis [53]. These databases contain diverse types of gene sets, ranging from TF and microRNA target genes, protein-protein interactions, to manually curated lists of published DEGs. For the human genome, we collected 140, 438 gene sets (Table 2). Such large, diverse databases enable in-depth analysis of expression data from different perspectives. Table 2 contains databases that we deemed useful. For human pathways, many other databases and tools exists [90,91,92].

The PGSEA package [33] implements the Parametric Analysis of Gene Set Enrichment (PAGE) algorithm [34] to display the activities of pathways in individual samples in terms of Z scores, which characterize how much the mean of the fold-changes for genes in a certain pathway deviates from the mean observed in all the genes. We modified the PGSEA code by adding an analysis of variance (ANOVA) on the Z scores across sample groups. Also, after cutoff with FDR, pathways are ranked by the standard deviation. This modification yields meaningful, intuitive display of differentially regulated pathways across sample groups.

PCA enables us to project samples into two-dimensional space. We also treated the PCA loadings onto each of the genes as expression data to run pathway analysis with the PGSEA package. For each pathway, this runs the PAGE algorithm which performs one-sample t-test on each gene set.The adjusted P-values are used to rank the pathways for each of the first 5 principal components. The pathways are labeled with FDR first, followed by the principal components (PC1, PC2 and so on). Only 5 pathways for each principal component are shown, but duplicated ones are skipped.

iDEP also enables users to retrieve protein-protein interaction (PPI) networks among top DEGs via an API access to STRING [24]. These networks can be rendered both as static images and as richly annotated, interactive graphs on the STRING website. The API access also provides enrichment analysis (GO, KEGG, and protein domains) for 115 archaeal, 1678 bacterial, and 238 eukaryotic species, thus greatly expanding the species coverage of iDEP.

Based on their chromosomal location obtained from Ensembl, we visualize fold-changes of genes on all the chromosomes as an interactive graph based on Plotly. iDEP can also use the PREDA package [62] to detect chromosomal regions overrepresented with up- or down-regulated genes. This is useful for studies such as cancer that might involve chromosomal deletion or amplification.

For larger datasets, users can use bi-clustering algorithms to identify genes with correlated expression among a subset of samples, using the 8 methods implemented in 3 Bioconductor packages biclust [93], QUBIC [94], and runibic [95]. Gene co-expression networks can also be constructed with the WGCNA package [96]. Enrichment analysis is routinely conducted on gene clusters derived from these methods.

To identify enriched TF binding motifs, transcript annotation and promoter sequences are retrieved from Ensembl. For genes with multiple transcripts, the transcription start site (TSS) with multiple transcripts is used. If multiple TSS locations have the same number of transcripts, then the most upstream TSS is used. Promoters are pre-scanned using TF binding motifs in CIS-BP [52]. Instead of defining a binary outcome of binding or not binding, which depends on arbitrary cutoffs, we recorded the best score for each of the TFs in every promoter sequence. Then student’s t-test is used to compare the scores observed in a group of genes against the rest of genes. The P-values are corrected for multiple testing using false discovery rate (FDR).

To enhance reproducibility in research, we will make older versions of iDEP software and database for each significant upgrade. iDEP also produces an R and R Markdown file which captures users’ parameterization during the analysis. These files could be downloaded, alongside related database files, to reproduce their analysis.

The Shiny package by RStudio provides a powerful web framework for developing applications using R. We used docker containers to configure and manage the Shiny server. Containerization also enables us to easily deploy the service and scale up to take advantage of multiple cores. Load balanced with Nginx, our web server can handle hundreds of concurrent users by distributing jobs to dozens of R processes. The source code for iDEP and our server configuration files are available at our GitHub repository [97]. Detailed documentation of iDEP, including video tutorial and a full list of supported species, is available at [98].

Abbreviations

CCRI:

Cytokine-cytokine receptor interaction

DEG:

Differentially expressed gene

EDA:

Exploratory data analysis

FC:

Fold-change

FDR:

False discovery rate

FPKM:

Fragments Per Kilobase Million

GAGE:

Generally Applicable Gene-set Enrichment

GGO:

Gene Ontology

iDEP:

Integrated Differential Expression and Pathway analysis

IR:

Ionizing radiation

MDS:

multidimensional scaling

PAGE:

Parametric Analysis of Gene Set Enrichment

PCA:

Principal component analysis

Rlog:

Regularized log

RNA-seq:

RNA sequencing

siRNA:

Small interfering RNA

START App:

Shiny Transcriptome Analysis Resource Tool

TSS:

Transcription start site

References

  1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8.

    Article  CAS  PubMed  Google Scholar 

  2. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol. 2013;31(1):46–53.

    Article  CAS  PubMed  Google Scholar 

  4. Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34(5):525–7.

    Article  CAS  PubMed  Google Scholar 

  5. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. GenePattern 2.0. Nat Genet. 2006;38(5):500–1.

    Article  CAS  PubMed  Google Scholar 

  7. Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D, Cech M, Chilton J, Clements D, Coraor N, Eberhard C, et al. The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016;44(W1):W3–W10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Merchant N, Lyons E, Goff S, Vaughn M, Ware D, Micklos D, Antin P. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016;14(1):e1002342.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12(2):115–21.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Nelson JW, Sklenar J, Barnes AP, Minnier J. The START app: a web-based RNAseq analysis and visualization resource. Bioinformatics. 2016.

  12. Degust tool. http://degust.erc.monash.edu. Accessed 2 Dec 2018.

  13. Dai Z, Sheridan JM, Gearing LJ, Moore DL, Su S, Wormald S, Wilcox S, O'Connor L, Dickins RA, Blewitt ME, et al. edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens. F1000Res. 2014;3:95.

    PubMed  PubMed Central  Google Scholar 

  14. Law CW, Chen Y, Shi W, Smyth GK. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15(2):R29.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Pimentel H, Bray N, Puente S, Melsted P, Pachter L. Differential analysis of RNA-Seq incorporating quantification uncertainty. In: BioRxiv; 2016. https://doi.org/10.1101/058164.

    Chapter  Google Scholar 

  16. ShinyNGS. https://github.com/pinin4fjords/shinyngs. Accessed 2 Dec 2018.

  17. Harshbarger J, Kratz A, Carninci P. DEIVA: a web application for interactive visual analysis of differential gene expression profiles. BMC Genomics. 2017;18(1):47.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Younesy H, Moller T, Lorincz MC, Karimi MM, Jones SJ. VisRseq: R-based visual framework for analysis of sequencing data. BMC Bioinformatics. 2015;16(Suppl 11):S2.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Gardeux V, David F, Shajkofci A, Schwalie P, Deplancke B. ASAP: A web-based platform for the analysis and interactive visualization of single-cell RNA-seq data. In: bioRxiv, vol. 096222; 2016.

    Google Scholar 

  20. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45(D1):D353–61.

    Article  CAS  PubMed  Google Scholar 

  22. Zheng HQ, Wu NY, Chow CN, Tseng KC, Chien CH, Hung YC, Li GZ, Chang WC. EXPath tool-a system for comprehensively analyzing regulatory pathways and coexpression networks from high-throughput transcriptome data. DNA Res. 2017.

  23. Monier B, McDermaid A, Zhao J, Fennell A, Ma Q. IRIS-EDA: an integrated RNA-Seq interpretation system for gene expression data analysis. In: bioRxiv; 2018.

    Google Scholar 

  24. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447–52.

    Article  CAS  PubMed  Google Scholar 

  25. Tonelli C, Morelli MJ, Bianchi S, Rotta L, Capra T, Sabo A, Campaner S, Amati B. Genome-wide analysis of p53 transcriptional programs in B cells upon exposure to genotoxic stress in vivo. Oncotarget. 2015;6(28):24611–26.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P, et al. Ensembl 2017. Nucleic Acids Res. 2017;45(D1):D635–42.

    Article  CAS  PubMed  Google Scholar 

  27. Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, Garcia Giron C, Hourlier T, et al. The Ensembl gene annotation system. Oxford: Database; 2016. p. 2016.

    Google Scholar 

  28. Bolser DM, Staines DM, Perry E, Kersey PJ. Ensembl plants: integrating tools for visualizing, mining, and Analyzing Plant Genomic Data. Methods Mol Biol. 2017;1533:1–31.

    Article  CAS  PubMed  Google Scholar 

  29. Lai EA. GSKB: A gene set database for pathway analysis in mouse. bioRxiv. 2016;0802511.

  30. Cloonan N, Brown MK, Steptoe AL, Wani S, Chan WL, Forrest AR, Kolle G, Gabrielli B, Grimmond SM. The miR-17-5p microRNA is a key regulator of the G1/S phase cell cycle transition. Genome Biol. 2008;9(8):R127.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Turner S: Tutorial: RNA-seq differential expression & pathway analysis with Sailfish, DESeq2, GAGE, and Pathview. http://www.gettinggeneticsdone.com/2015/12/tutorial-rna-seq-differential.html; 2015.

    Google Scholar 

  32. van der Maaten LJP, Hinton GE. Visualizing high-dimensional data using t-SNE. J Mach Learn Res. 2008;9(Nov):2579–605.

    Google Scholar 

  33. Furge K, Dykema K. PGSEA: Parametric Gene Set Enrichment Analysis. R package version 1480; 2012.

    Google Scholar 

  34. Kim SY, Volsky DJ. PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics. 2005;6:144.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Jung D, Ge SX. PPInfer: a Bioconductor package for inferring functionally related proteins using protein interaction networks. F1000Research. 2018;6:1969.

    Article  Google Scholar 

  36. Paraguison RC, Higaki K, Yamamoto K, Matsumoto H, Sasaki T, Kato N, Nanba E. Enhanced autophagic cell death in expanded polyhistidine variants of HOXA1 reduces PBX1-coupled transcriptional activity and inhibits neuronal differentiation. J Neurosci Res. 2007;85(3):479–87.

    Article  CAS  PubMed  Google Scholar 

  37. Gavalas A, Ruhrberg C, Livet J, Henderson CE, Krumlauf R. Neuronal defects in the hindbrain of Hoxa1, Hoxb1 and Hoxb2 mutants reflect regulatory interactions among these Hox genes. Development. 2003;130(23):5663–79.

    Article  CAS  PubMed  Google Scholar 

  38. Canu E, Boccardi M, Ghidoni R, Benussi L, Duchesne S, Testa C, Binetti G, Frisoni GB. HOXA1 A218G polymorphism is associated with smaller cerebellar volume in healthy humans. J Neuroimaging. 2009;19(4):353–8.

    Article  PubMed  Google Scholar 

  39. Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Ge SX. Large-scale analysis of expression signatures reveals hidden links among diverse cellular processes. BMC Syst Biol. 2011;5:87.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Vermeulen K, Van Bockstaele DR, Berneman ZN. The cell cycle: a review of regulation, deregulation and therapeutic targets in cancer. Cell Prolif. 2003;36(3):131–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Nahle Z, Polakoff J, Davuluri RV, McCurrach ME, Jacobson MD, Narita M, Zhang MQ, Lazebnik Y, Bar-Sagi D, Lowe SW. Direct coupling of the cell cycle and cell death machinery by E2F. Nat Cell Biol. 2002;4(11):859–64.

    Article  CAS  PubMed  Google Scholar 

  43. DeGregori J. The genetics of the E2F family of transcription factors: shared functions and unique roles. Biochim Biophys Acta. 2002;1602(2):131–50.

    CAS  PubMed  Google Scholar 

  44. Motokura T, Arnold A. PRAD1/cyclin D1 proto-oncogene: genomic organization, 5’ DNA sequence, and sequence of a tumor-specific rearrangement breakpoint. Genes Chromosomes Cancer. 1993;7(2):89–95.

    Article  CAS  PubMed  Google Scholar 

  45. Grinstein E, Jundt F, Weinert I, Wernet P, Royer HD. Sp1 as G1 cell cycle phase specific transcription factor in epithelial cells. Oncogene. 2002;21(10):1485–92.

    Article  CAS  PubMed  Google Scholar 

  46. Lin SY, Black AR, Kostic D, Pajovic S, Hoover CN, Azizkhan JC. Cell cycle-regulated association of E2F1 and Sp1 is related to their functional interaction. Mol Cell Biol. 1996;16(4):1668–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Baeuerle PA, Henkel T. Function and activation of NF-kappa B in the immune system. Annu Rev Immunol. 1994;12:141–79.

    Article  CAS  PubMed  Google Scholar 

  48. Dejean AS, Beisner DR, Ch'en IL, Kerdiles YM, Babour A, Arden KC, Castrillon DH, DePinho RA, Hedrick SM. Transcription factor Foxo3 controls the magnitude of T cell immune responses by modulating the function of dendritic cells. Nat Immunol. 2009;10(5):504–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M. Systematic discovery of regulatory motifs in human promoters and 3’ UTRs by comparison of several mammals. Nature. 2005;434(7031):338–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Trompeter HI, Abbad H, Iwaniuk KM, Hafner M, Renwick N, Tuschl T, Schira J, Muller HW, Wernet P. MicroRNAs MiR-17, MiR-20a, and MiR-106b act in concert to modulate E2F activity on cell cycle arrest during neuronal lineage differentiation of USSC. PLoS One. 2011;6(1):e16138.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Petrocca F, Visone R, Onelli MR, Shah MH, Nicoloso MS, de Martino I, Iliopoulos D, Pilozzi E, Liu CG, Negrini M, et al. E2F1-regulated microRNAs impair TGFbeta-dependent cell-cycle arrest and apoptosis in gastric cancer. Cancer Cell. 2008;13(3):272–86.

    Article  CAS  PubMed  Google Scholar 

  52. Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158(6):1431–43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Lai L, Liberzon A, Hennessey J, Jiang G, Qi J, Mesirov JP, Ge SX. AraPath: a knowledgebase for pathway analysis in Arabidopsis. Bioinformatics. 2012;28(17):2291–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Chen J, Feilotter HE, Pare GC, Zhang X, Pemberton JG, Garady C, Lai D, Yang X, Tron VA. MicroRNA-193b represses cell proliferation and regulates cyclin D1 in melanoma. Am J Pathol. 2010;176(5):2520–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Song B, Wang Y, Kudo K, Gavin EJ, Xi Y, Ju J. miR-192 regulates dihydrofolate reductase and cellular proliferation through the p53-microRNA circuit. Clin Cancer Res. 2008;14(24):8080–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Khella HW, Bakhet M, Allo G, Jewett MA, Girgis AH, Latif A, Girgis H, Von Both I, Bjarnason GA, Yousef GM. miR-192, miR-194 and miR-215: a convergent microRNA network suppressing tumor progression in renal cell carcinoma. Carcinogenesis. 2013;34(10):2231–9.

    Article  CAS  PubMed  Google Scholar 

  57. Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ. GAGE: generally applicable gene set enrichment for pathway analysis. BMC Bioinformatics. 2009;10:161.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Luo W, Brouwer C. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics. 2013;29(14):1830–1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153(2):307–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Wickham H. Ggplot2 : elegant graphics for data analysis. New York: Springer; 2009.

    Book  Google Scholar 

  62. Ferrari F, Solari A, Battaglia C, Bicciato S. PREDA: an R-package to identify regional variations in genomic data. Bioinformatics. 2011;27(17):2446–7.

    Article  CAS  PubMed  Google Scholar 

  63. Moreira-Filho CA, Bando SY, Bertonha FB, Silva FN, Costa Lda F, Ferreira LR, Furlanetto G, Chacur P, Zerbini MC, Carneiro-Sampaio M. Modular transcriptional repertoire and MicroRNA target analyses characterize genomic dysregulation in the thymus of Down syndrome infants. Oncotarget. 2016;7(7):7497–533.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Reproducing iDEP analyses with auto-generated R Markdown. https://gex.netlify.com/post/reproducing-idep-analyses-with-auto-generated-r-markdown/. Accessed 2 Dec 2018.

  65. Manda K, Glasow A, Paape D, Hildebrandt G. Effects of ionizing radiation on the immune system with special emphasis on the interaction of dendritic and T cells. Front Oncol. 2012;2:102.

    Article  PubMed  PubMed Central  Google Scholar 

  66. DESeq2 experimental design and interpretation. https://gex.netlify.com/post/deseq2-experimental-design-and-interpretation/. Accessed 2 Dec 2018.

  67. van Riggelen J, Yetil A, Felsher DW. MYC as a regulator of ribosome biogenesis and protein synthesis. Nat Rev Cancer. 2010;10(4):301–9.

    Article  CAS  PubMed  Google Scholar 

  68. Calaf GM, Hei TK. Ionizing radiation induces alterations in cellular proliferation and c-myc, c-Jun and c-fos protein expression in breast epithelial cells. Int J Oncol. 2004;25(6):1859–66.

    CAS  PubMed  Google Scholar 

  69. Watson NC, Di YM, Orr MS, Fornari FA Jr, Randolph JK, Magnet KJ, Jain PT, Gewirtz DA. Influence of ionizing radiation on proliferation, c-myc expression and the induction of apoptotic cell death in two breast tumour cell lines differing in p53 status. Int J Radiat Biol. 1997;72(5):547–59.

    Article  CAS  PubMed  Google Scholar 

  70. Qin Z, Wang PY, Su DF, Liu X. miRNA-124 in Immune System and Immune Disorders. Front Immunol. 2016;7:406.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. Li M, Guan X, Sun Y, Mi J, Shu X, Liu F, Li C. miR-92a family and their target genes in tumorigenesis and metastasis. Exp Cell Res. 2014;323(1):1–6.

    Article  CAS  PubMed  Google Scholar 

  72. Borkowski R, Du L, Zhao Z, McMillan E, Kosti A, Yang CR, Suraokar M, Wistuba II, Gazdar AF, Minna JD, et al. Genetic mutation of p53 and suppression of the miR-17 approximately 92 cluster are synthetic lethal in non-small cell lung cancer due to upregulation of vitamin D signaling. Cancer Res. 2015;75(4):666–75.

    Article  CAS  PubMed  Google Scholar 

  73. Hu W, Chan CS, Wu R, Zhang C, Sun Y, Song JS, Tang LH, Levine AJ, Feng Z. Negative regulation of tumor suppressor p53 by microRNA miR-504. Mol Cell. 2010;38(5):689–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Bublik DR, Bursac S, Sheffer M, Orsolic I, Shalit T, Tarcic O, Kotler E, Mouhadeb O, Hoffman Y, Fuchs G, et al. Regulatory module involving FGF13, miR-504, and p53 regulates ribosomal biogenesis and supports cancer cell survival. Proc Natl Acad Sci U S A. 2017;114(4):E496–505.

    Article  CAS  PubMed  Google Scholar 

  75. Parveen A, Akash MS, Rehman K, Kyunn WW. Dual role of p21 in the progression of Cancer and its treatment. Crit Rev Eukaryot Gene Expr. 2016;26(1):49–62.

    Article  PubMed  Google Scholar 

  76. Gallant S, Gilkeson G. ETS transcription factors and regulation of immunity. Arch Immunol Ther Exp. 2006;54(3):149–63.

    Article  CAS  Google Scholar 

  77. Fendler W, Malachowska B, Meghani K, Konstantinopoulos PA, Guha C, Singh VK, Chowdhury D. Evolutionarily conserved serum microRNAs predict radiation-induced fatality in nonhuman primates. Sci Transl Med. 2017;9(379):eaal2408.

  78. Park D, Kim H, Kim Y, Jeoung D. miR-30a regulates the expression of CAGE and p53 and regulates the response to anti-Cancer drugs. Mol Cells. 2016;39(4):299–309.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.

    Article  PubMed  PubMed Central  Google Scholar 

  80. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and cufflinks. Nat Protoc. 2012;7(3):562–78.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, Cahill DP, Nahed BV, Curry WT, Martuza RL, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344(6190):1396–401.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  83. Ge SX. Exploratory bioinformatics investigation reveals importance of “junk” DNA in early embryo development. BMC Genomics. 2017;18(1):200.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  84. Bandyopadhyay S, Mallik S, Mukhopadhyay A. A survey and comparative study of statistical tests for identifying differential expression from microarray data. IEEE/ACM Trans Comput Biol Bioinform. 2014;11(1):95–115.

    Article  PubMed  Google Scholar 

  85. Liu R, Holik AZ, Su S, Jansz N, Chen K, Leong HS, Blewitt ME, Asselin-Labat ML, Smyth GK, Ritchie ME. Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses. Nucleic Acids Res. 2015;43(15):e97.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  86. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Yu G, He QY. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol BioSyst. 2016;12(2):477–9.

    Article  CAS  PubMed  Google Scholar 

  88. Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, Jassal B, Jupe S, Korninger F, McKay S, et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 2016;44(D1):D481–7.

    Article  CAS  PubMed  Google Scholar 

  89. Araki H, Knapp C, Tsai P, Print C. GeneSetDB: a comprehensive meta-database, statistical and visualisation framework for gene set analysis. FEBS Open Bio. 2012;2:76–82.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  90. Wang J, Vasaikar S, Shi Z, Greer M, Zhang B. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res. 2017;45(W1):W130–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Wittkop T, TerAvest E, Evani US, Fleisch KM, Berman AE, Powell C, Shah NH, Mooney SD. STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation. BMC Bioinformatics. 2013;14:53.

    Article  PubMed  PubMed Central  Google Scholar 

  92. Wang J, Huang Q, Liu ZP, Wang Y, Wu LY, Chen L, Zhang XS. NOA: a novel network ontology analysis method. Nucleic Acids Res. 2011;39(13):e87.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Kluger Y, Basri R, Chang JT, Gerstein M. Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 2003;13(4):703–16.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Zhang Y, Xie J, Yang J, Fennell A, Zhang C, Ma Q. QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data. Bioinformatics. 2017;33(3):450–2.

    CAS  PubMed  Google Scholar 

  95. Orzechowski P, Panszczyk A, Huang XY, Moore JH. runibic: a Bioconductor package for parallel row-based biclustering of gene expression data. In: BioRxiv vol 2017; 2017. p. 210682.

    Google Scholar 

  96. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  97. iDEP code repository. https://github.com/iDEP-SDSU/idep. Accessed 2 Dec 2018.

  98. iDEP documentation site. https://idepsite.wordpress.com/. Accessed 2 Dec 2018.

  99. Towns J, Cockerill T, Dahan M, Foster I, Gaither K, Grimshaw A, Hazlewood V, Lathrop S, Lifka D, Peterson GD, et al. XSEDE: accelerating scientific discovery. Computing in Science & Engineering. 2014;16(5):62–72.

    Article  CAS  Google Scholar 

  100. Schug TT, Nadadur SS, Johnson AF. Nano GO consortium--a team science approach to assess engineered nanomaterials: reliable assays and methods. Environ Health Perspect. 2013;121(6):A176–7.

    Article  PubMed  Google Scholar 

  101. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62.

    Article  CAS  PubMed  Google Scholar 

  102. Nishimura D. Biocarta, vol. 2; 2000.

    Google Scholar 

  103. Glez-Pena D, Gomez-Lopez G, Pisano DG, Fdez-Riverola F. WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis. Nucleic Acids Res. 2009;37(Web Server):W329–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Ma H, Sorokin A, Mazein A, Selkov A, Selkov E, Demin O, Goryanin I. The Edinburgh human metabolic network reconstruction and its functional analysis. Mol Syst Biol. 2007;3:135.

    Article  PubMed  PubMed Central  Google Scholar 

  105. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, Thomas PD. PANTHER version 11: expanded annotation data from gene ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017;45(D1):D183–9.

    Article  CAS  PubMed  Google Scholar 

  106. Trupp M, Altman T, Fulcher CA, Caspi R, Krummenacker M, Paley S, Karp PD. Beyond the genome (BTG) is a (PGDB) pathway genome database: HumanCyc. Genome Biol. 2010;11(Suppl 1):O12.

  107. Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C. Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 2011;39(Database):D685–90.

    Article  CAS  PubMed  Google Scholar 

  108. Yamamoto S, Sakai N, Nakamura H, Fukagawa H, Fukuda K, Takagi T. INOH: ontology-based highly structured database of signal transduction pathways. Oxford: Database; 2011.

    Google Scholar 

  109. Kandasamy K, Mohan SS, Raju R, Keerthikumar S, Kumar GS, Venugopal AK, Telikicherla D, Navarro JD, Mathivanan S, Pecquet C, et al. NetPath: a public resource of curated signal transduction pathways. Genome Biol. 2010;11(1):R3.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  110. Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. PID: the pathway interaction database. Nucleic Acids Res. 2009;37(Database issue):D674–9.

    Article  CAS  PubMed  Google Scholar 

  111. Hornbeck PV, Chabra I, Kornhauser JM, Skrzypek E, Zhang B. Phosphosite: a bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics. 2004;4(6):1551–61.

    Article  CAS  PubMed  Google Scholar 

  112. Brunk E, Sahoo S, Zielinski DC, Altunkaya A, Drager A, Mih N, Gatto F, Nilsson A, Gonzalez GAP, Aurich MK, et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat Biotechnol. 2018;36(3):272.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Haw R, Team R. Reactome knowledgebase - linking biological pathways, networks and disease. Febs J. 2014;281:42.

    Google Scholar 

  114. Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, Melius J, Waagmeester A, Sinha SR, Miller R, et al. WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 2016;44(D1):D488–94.

    Article  CAS  PubMed  Google Scholar 

  115. Friard O, Re A, Taverna D, De Bortoli M, Cora D. CircuitsDB: a database of mixed microRNA/transcription factor feed-forward regulatory circuits in human and mouse. BMC Bioinformatics. 2010;11:435.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  116. Consortium EP. A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011;9(4):e1001046.

    Article  CAS  Google Scholar 

  117. Marbach D, Lamparter D, Quon G, Kellis M, Kutalik Z, Bergmann S. Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat Methods. 2016;13(4):366–70.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  118. Liu ZP, Wu C, Miao H, Wu H. RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Oxford: Database; 2015. p. 2015.

    Google Scholar 

  119. Essaghir A, Toffalini F, Knoops L, Kallin A, van Helden J, Demoulin JB. Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data. Nucleic Acids Res. 2010;38(11):e120.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  120. Zheng G, Tu K, Yang Q, Xiong Y, Wei C, Xie L, Zhu Y, Li Y. ITFP: an integrated platform of mammalian transcription factors. Bioinformatics. 2008;24(20):2416–7.

  121. Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. Circuitry and dynamics of human transcription factor regulatory networks. Cell. 2012;150(6):1274–86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Jiang C, Xuan Z, Zhao F, Zhang MQ. TRED: a transcriptional regulatory element database, new entries and other development. Nucleic Acids Res. 2007;35(Database):D137–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Han H, Cho JW, Lee S, Yun A, Kim H, Bae D, Yang S, Kim CY, Lee M, Kim E, et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res. 2018;46(D1):D380–6.

    Article  CAS  PubMed  Google Scholar 

  124. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36(Database issue):D154–8.

    CAS  PubMed  Google Scholar 

  125. Wong N, Wang X. miRDB: an online resource for microRNA target prediction and functional annotations. Nucleic Acids Res. 2015;43(Database issue):D146–52.

    Article  CAS  PubMed  Google Scholar 

  126. Chou CH, Shrestha S, Yang CD, Chang NW, Lin YL, Liao KW, Huang WC, Sun TH, Tu SJ, Lee WH, et al. miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic Acids Res. 2018;46(D1):D296–302.

    Article  CAS  PubMed  Google Scholar 

  127. Agarwal V, Bell GW, Nam JW, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. Elife. eLife 2015;4:e05005.

  128. Brentani H, Caballero OL, Camargo AA, da Silva AM, da Silva WA Jr, Dias Neto E, Grivet M, Gruber A, Guimaraes PE, Hide W, et al. The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags. Proc Natl Acad Sci U S A. 2003;100(23):13418–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Godec J, Tan Y, Liberzon A, Tamayo P, Bhattacharya S, Butte AJ, Mesirov JP, Haining WN. Compendium of immune signatures identifies conserved and species-specific biology in response to inflammation. Immunity. 2016;44(1):194–206.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Chatr-Aryamontri A, Oughtred R, Boucher L, Rust J, Chang C, Kolas NK, O’Donnell L, Oster S, Theesfeld C, Sellam A, et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017;45(D1):D369–79.

    Article  CAS  PubMed  Google Scholar 

  131. Ruepp A, Waegele B, Lechner M, Brauner B, Dunger-Kaltenbach I, Fobo G, Frishman G, Montrone C, Mewes HW. CORUM: the comprehensive resource of mammalian protein complexes--2009. Nucleic Acids Res. 2010;38(Database issue):D497–501.

    Article  CAS  PubMed  Google Scholar 

  132. Bader GD, Betel D, Hogue CW. BIND: the biomolecular interaction network database. Nucleic Acids Res. 2003;31(1):248–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Salwinski L, Eisenberg D. The MiSink plugin: Cytoscape as a graphical interface to the database of interacting proteins. Bioinformatics. 2007;23(16):2193–5.

    Article  CAS  PubMed  Google Scholar 

  134. Liu BL, Bo H. HPRD: a high performance RDF database. Lect Notes Comput Sc. 2007;4672:364.

    Article  Google Scholar 

  135. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, et al. IntAct: an open source molecular interaction database. Nucleic Acids Res. 2004;32(Database issue):D452–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Gunther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, Ahmed J, Urdiales EG, Gewiess A, Jensen LJ, et al. SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic Acids Res. 2008;36(Database issue):D919–22.

    PubMed  Google Scholar 

  137. Kuhn M, Letunic I, Juhl Jensen L, Bork P. The SIDER database of drugs and side effects, vol 44; 2015.

    Google Scholar 

  138. Szklarczyk D, Santos A, von Mering C, Jensen LJ, Bork P, Kuhn M. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016;44(D1):D380–4.

    Article  CAS  PubMed  Google Scholar 

  139. Wishart D, Arndt D, Pon A, Sajed T, Guo AC, Djoumbou Y, Knox C, Wilson M, Liang Y, Grant J, et al. T3DB: the toxic exposome database. Nucleic Acids Res. 2015;43(Database issue):D928–34.

    Article  CAS  PubMed  Google Scholar 

  140. Jewison T, Su Y, Disfany FM, Liang Y, Knox C, Maciejewski A, Poelzer J, Huynh J, Zhou Y, Arndt D, et al. SMPDB 2.0: big improvements to the small molecule pathway database. Nucleic Acids Res. 2014;42(Database issue):D478–84.

    Article  CAS  PubMed  Google Scholar 

  141. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The comparative Toxicogenomics database: update 2017. Nucleic Acids Res. 2017;45(D1):D972–8.

    Article  CAS  PubMed  Google Scholar 

  142. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–82.

    Article  CAS  PubMed  Google Scholar 

  143. Higgins ME, Claremont M, Major JE, Sander C, Lash AE. CancerGenes: a gene selection resource for cancer genome projects. Nucleic Acids Res. 2007;35(Database):D721–6.

    Article  CAS  PubMed  Google Scholar 

  144. Lauss M, Visne I, Weinhaeusel A, Vierlinger K, Noehammer C, Kriegner A. MethCancerDB--aberrant DNA methylation in human cancer. Br J Cancer. 2008;98(4):816–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. He X, Chang S, Zhang J, Zhao Q, Xiang H, Kusonmano K, Yang L, Sun ZS, Yang H, Wang J. MethyCancer: the database of human DNA methylation and cancer. Nucleic Acids Res. 2008;36(Database issue):D836–41.

    CAS  PubMed  Google Scholar 

  146. Smith CL, Eppig JT. The mammalian phenotype ontology as a unifying standard for experimental and high-throughput phenotyping data. Mamm Genome. 2012;23(9–10):653–68.

    Article  PubMed  PubMed Central  Google Scholar 

  147. Kohler S, Doelken SC, Mungall CJ, Bauer S, Firth HV, Bailleul-Forestier I, Black GCM, Brown DL, Brudno M, Campbell J, et al. The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42(D1):D966–74.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank Brian Moore and Kevin Brandt for technical support of the web server, Jianli Qi for collecting pathway databases, and Fuxi Ge for inspiration. iDEP is strengthened by constructive criticisms from several reviewers, and many suggestions from users. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) Bridges at the Pittsburgh Supercomputing Center through allocation TG-MCB180180.

Funding

This work was partially supported by National Institutes of Health (GM083226), National Science Foundation/EPSCoR (IIA-1355423) and by the State of South Dakota. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) [99], which is supported by National Science Foundation grant number ACI-1548562.

Availability of data and materials

The iDEP software is available at http://ge-lab.org/idep/. Source code is available at https://github.com/iDEP-SDSU/idep/. Database files for running iDEP locally can be downloaded at https://doi.org/10.5281/zenodo.1451847.

Author information

Authors and Affiliations

Authors

Contributions

SXG conceived the project, developed the Shiny App, and wrote the paper. EWS setup and maintained all infrastructure including docker containers, load balancing, scaling up, web server and code repository. RY provided improvements to iDEP software. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Steven Xijin Ge.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Tables S1-S16. Table S1. contains list of 220 species covered by current version of iDEP. Table S2. include the 2196 types of gene IDs that can be recognized. Tables S3-S16. are results from the analyses of two example datasets. (XLSX 1970 kb)

Additional file 2:

Read count file for Hoxd1 knockdown example. This file is derived from short read archive (SRA) SRP012607 using Sailfish. (CSV 718 kb)

Additional file 3:

Figures S1-S21. Results from the two example datasets. (PDF 2973 kb)

Additional file 4:

DNA microarray data of thymic tissue of down syndrome infants. Data is from GSE69210 from NCBI. (CSV 4745 kb)

Additional file 5:

An example of customized R code generated by iDEP. This code is generated for the analysis of the Hoxa1 dataset. (R 11 kb)

Additional file 6:

An example of R Markdown file generated by iDEP. This code is generated for the analysis of the Hoxa1 dataset. (RMD 15 kb)

Additional file 7:

Core R functions in iDEP. This code is generated for the analysis of the Hoxa1 dataset. (R 172 kb)

Additional file 8:

Read count file for the mouse ionization/p53 dataset. This file was used in our analysis. (CSV 1513 kb)

Additional file 9:

Experiment design file for the mouse ionization/p53 dataset. This file was used in our analysis. (CSV 226 bytes)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ge, S.X., Son, E.W. & Yao, R. iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 19, 534 (2018). https://doi.org/10.1186/s12859-018-2486-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-018-2486-6

Keywords