The Drosophila Gene Expression Tool (DGET) for expression analyses
© The Author(s). 2017
Received: 12 May 2016
Accepted: 31 January 2017
Published: 10 February 2017
Next-generation sequencing technologies have greatly increased our ability to identify gene expression levels, including at specific developmental stages and in specific tissues. Gene expression data can help researchers understand the diverse functions of genes and gene networks, as well as help in the design of specific and efficient functional studies, such as by helping researchers choose the most appropriate tissue for a study of a group of genes, or conversely, by limiting a long list of gene candidates to the subset that are normally expressed at a given stage or in a given tissue.
We report DGET, a Drosophila Gene Expression Tool (www.flyrnai.org/tools/dget/web/), which stores and facilitates search of RNA-Seq based expression profiles available from the modENCODE consortium and other public data sets. Using DGET, researchers are able to look up gene expression profiles, filter results based on threshold expression values, and compare expression data across different developmental stages, tissues and treatments. In addition, at DGET a researcher can analyze tissue or stage-specific enrichment for an inputted list of genes (e.g., ‘hits’ from a screen) and search for additional genes with similar expression patterns. We performed a number of analyses to demonstrate the quality and robustness of the resource. In particular, we show that evolutionary conserved genes expressed at high or moderate levels in both fly and human tend to be expressed in similar tissues. Using DGET, we compared whole tissue profile and sub-region/cell-type specific datasets and estimated a potential source of false positives in one dataset. We also demonstrated the usefulness of DGET for synexpression studies by querying genes with expression profile similar to the mesodermal master regulator Twist.
Altogether, DGET provides a flexible tool for expression data retrieval and analysis with short or long lists of Drosophila genes, which can help scientists to design stage- or tissue-specific in vivo studies and do other subsequent analyses.
KeywordsDrosophila RNA-Seq Expression profile Synexpression
The application of next-generation sequence technologies to RNA analysis has opened the door to relatively rapid, large-scale analyses of gene expression. ‘Standard’ RNA-seq analysis, for example, can provide a snapshot of gene expression in specific cell types or tissues , and related technologies such as Ribo-seq  provide more refined views, such as a snapshot of what genes are actively transcribed in a given cell or tissue. For Drosophila, efforts such as the modENCODE project [1, 2, 7, 12] have provided a baseline overview of expression under standard laboratory conditions for various cultured cell types, developmental stages, and tissues, as well as treatment conditions. Moreover, studies such as those investigating expression in sub-regions of the fly gut [6, 10] are providing increasingly detailed views of the baseline expression levels of various genes in various tissues, cell types and sub-regions. Altogether, these RNA-seq data resources provide helpful starting points for analysis of other gene lists.
Resources such as FlyBase  make it possible to quickly view modENCODE data for a given gene and make these data generally accessible to the community. The value of these data to the community can be further increased by facilitating search of lists of genes. For example, for gene lists originating from whole-animal or cultured cell studies, or for studies based on a list of orthologs of genes from another species, it can be very helpful to get a picture of what stages or tissues normally express those genes, as that will help focus stage- or tissue-specific in vivo studies and other subsequent analyses. We implemented DGET to help scientists retrieve modENCODE expression data in batch mode. DGET also hosts other relevant RNA-Seq datasets published in individual studies, such as profiles of specific sub-regions and cell types of the Drosophila gut [6, 10]. Here, we describe DGET and perform a number of analyses to demonstrate the quality and robustness of the resource.
Processed modENCODE data were retrieved from FlyBase (ftp://ftp.flybase.net/releases/FB2015_05/precomputed_files/genes/gene_rpkm_report_fb_2015_05.tsv.gz). Data published by Marianes and Spradling  were retrieved from NCBI Gene Expression Omnibus at (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE47780). Data published by Dutta et al.  were retrieved from the flygut-seq website (http://flygutseq.buchonlab.com/resources). Data retrieved were mapped to FlyBase identifiers from release 2015_5 and formatted for upload into the FlyRNAi database .
Expression pattern analysis
Human protein expression data were retrieved from proteinatlas.org and tissue-specific genes were selected using the file “ProteinAtlas_Normal_tissue_vs14.” Proteins with high or medium expression levels with a reliability value of “supportive” were selected. Proteins expressed in a broad range of tissues (i.e., more than 5 tissues) were filtered out. DIOPT vs5 was used to map genes from human to Drosophila . ‘Ortholog pair rank’ was added at recent DIOPT release 5.2.1 (http://www.flyrnai.org/DRSC-ORH.html#versions). Drosophila genes with high or moderate rank were selected. The high/moderate rank mapping include the gene pairs that are best score in either forward or reverse mapping (and DIOPT score >1) as well as gene pairs with DIOPT score >3 if not best score either way.
Results and discussion
Database content and features of the user interface (UI)
At the “Search Gene Expression” tab, users can enter a list of genes or choose one of the predefined gene classes from GLAD , e.g., kinases, then specify the datasets to be displayed. There are two search options, “look at expression” and “enrichment analysis.” The results page for “look at expression” displays expression values in a heatmap format. At this results page, users have the option to download the relevant expression values; download the heatmap; and further filter the list by defining a cutoff, limit to specific dataset(s), or filtering out genes, for example with less than 1 RPKM value based on carcass and/or digestion system expression of 1 day adult. We used an RPKM cutoff of 1 because this is considered the cutoff for ‘no or extremely low expression’ at FlyBase. The results page for an enrichment analysis displays the distribution of genes at different expression levels using a bar graph and heatmap. The cutoff values for different levels are defined based on FlyBase guidelines (Fig. 1a).
Using the “Search Similar Genes” tab, users can enter a gene of interest and search for other genes with similar expression pattern based on Pearson correlation score. Users have the options to download the list of genes with similar expression patterns, a heatmap, and a normalized heatmap. Using the “Build Network” tab, users can enter a list of genes and build synexpression network based on the correlation of expression using the dataset and Pearson correlation cutoff specified by the user (Fig. 1b).
Expression pattern of Drosophila regulatory genes
Correlation of expression with confidence in an ortholog relationship
We next analyzed the 418 Drosophila essential genes identified by Spradling et al.  using a large-scale single P-element insertion fly stock collection. The proportions of essential genes expressed at detectable levels in various tissues are very similar to the genes with DIOPT score 7–10 (Fig. 3, light purple and dark purple bars) with a Pearson correlation coefficient equal to 0.92.
Expression patterns of Drosophila orthologs of human genes that are highly expressed in specific tissues
Mining information from distinct but related fly gut gene expression data sets
We next sought to compare the results of whole-gut profiling with results from profiling of specific sub-regions or cell types with the goal of identifying genes only expressed in specific sub-populations. Our rationale for the analysis was to determine the likelihood that genes expressed in a sub-population are missed in expression analysis of an entire organ. This type of false negative analysis should provide helpful information for interpreting results of whole-organ or whole-tissue studies. Thus, we compared the whole gut profiling data obtained by modENCODE consortium for 20 day old adult flies  with data generated by profiling sub-regions of the midgut in 16–20 day old adult flies . Whole gut profiling indicates that 9109 genes are expressed in the gut of 20 day old adult flies (RPKM cutoff value of 1). Among the 4790 protein-coding genes not detected as expressed in the whole-gut study, 136 genes are detected in at least 3 sub-regions of the gut (RPKM ≥ 3). These genes are either false negative in whole gut profiling or false positive in sub-region profiling. We next did a gene set enrichment analysis with these 136 genes and found that stress response genes, such as heat-shock genes (Hsp70Aa, Hsp70Ab, Hsp70Ba, Hsp70Bbb) are enriched (P value = 3.05E-07). This suggests that the sample used for sub-region profiling was associated with some level of stress. Comparing the list of 136 genes with the Drosophila essential gene list, we found only one overlapping gene. In addition, only 23 of the 136 genes have DIOPT score 7–10 when mapping to human genes. Thus, a small fraction of these genes might be false negative with regards to whole tissue profiling while the majority of the genes are likely to be false positives not normally present in the gut under non-stress conditions.
Synexpression analysis for the transcription factor twist
DGET similar gene search results for Twist with cell line data
Yes, high confident
Yes, high confident
Yes, low confident
Yes, high confident
Yes, high confident
Yes, high confident
Yes, low confident
Yes, low confident
Yes, high confident
Yes, high confident
Yes, high confident
We observed a less significant enrichment with development data (p-value 5.00E-02 for all Twist target genes and p-value of 2.70E-03 for high-confidence targets), likely reflecting the diversity of cell types present in the developmental data and that not enough cells express twist. Thus, DGET will be very powerful when applied to RNA-seq data sets from single cell or groups of homogeneous cell populations.
In summary, DGET makes it possible to retrieve and compare Drosophila gene expression patterns generated by various groups using RNA-Seq. The tool can help scientists design experiments based on expression and analyze experiment results. The backend database for DGET is designed to easily accommodate the addition of new high quality RNA-Seq datasets as they become available. Finally, although the anatomy of human and Drosophila are quite different, by using DGET, we demonstrate that expression patterns of genes that are conserved and highly expressed are conserved between human and Drosophila in many matching tissues, underscoring the utility of the Drosophila model to understand the role of human genes with unknown functions.
The Drosophila gene expression tool
DRSC integrative ortholog prediction tool
Drosophila RNAi screening center
Model organism ENCyclopedia of DNA elements
The authors would like to thank the members of the DRSC, Transgenic RNAi Project (TRiP), and Perrimon lab for helpful suggestions and discussions.
Work at the DRSC is supported by NIGMS R01 GM067761, NIGMS R01 GM084947, and ORIP/NCRR R24 RR032668. S.E.M. is additionally supported in part by NCI Cancer Center Support Grant NIH 5 P30 CA06516 (E. Benz, PI). N.P. is an Investigator of the Howard Hughes Medical Institute.
Availability of data and materials
YH designed and tested the application, implemented the back-end of the application, performed the analysis and drafted the manuscript. AC implemented the user interface and contributed to the back-end of the application. NP provided critical input on key features and the analysis as well as edited the manuscript. SEM provided oversight and critical input on key features and the analysis, and helped draft the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Boley N, Wan KH, Bickel PJ, Celniker SE. Navigating and mining modENCODE data. Methods. 2014;68(1):38–47.View ArticlePubMedPubMed CentralGoogle Scholar
- Cherbas L, Willingham A, Zhang D, Yang L, Zou Y, Eads BD, Carlson JW, Landolin JM, Kapranov P, Dumais J, Samsonova A, Choi JH, Roberts J, Davis CA, Tang H, van Baren MJ, Ghosh S, Dobin A, Bell K, Lin W, Langton L, Duff MO, Tenney AE, Zaleski C, Brent MR, Hoskins RA, Kaufman TC, Andrews J, Graveley BR, Perrimon N, Celniker SE, Gingeras TR, Cherbas P. The transcriptional diversity of 25 Drosophila cell lines. Genome Res. 2011;21(2):301–14.View ArticlePubMedPubMed CentralGoogle Scholar
- Clough E, Barrett T. The gene expression omnibus database. Methods Mol Biol. 2016;1418:93–110.View ArticlePubMedPubMed CentralGoogle Scholar
- Dequeant ML, Fagegaltier D, Hu Y, Spirohn K, Simcox A, Hannon GJ, Perrimon N. Discovery of progenitor cell signatures by time-series synexpression analysis during Drosophila embryonic cell immortalization. Proc Natl Acad Sci U S A. 2015;112(42):12974–9.View ArticlePubMedPubMed CentralGoogle Scholar
- dos Santos G, Schroeder AJ, Goodman JL, Strelets VB, Crosby MA, Thurmond J, Emmert DB, Gelbart WM, FlyBase C. FlyBase: introduction of the Drosophila melanogaster release 6 reference genome assembly and large-scale migration of genome annotations. Nucleic Acids Res. 2015;43(Database issue):D690–7.View ArticlePubMedGoogle Scholar
- Dutta D, Dobson AJ, Houtz PL, Glasser C, Revah J, Korzelius J, Patel PH, Edgar BA, Buchon N. Regional cell-specific transcriptome mapping reveals regulatory complexity in the adult Drosophila midgut. Cell Rep. 2015;12(2):346–58.View ArticlePubMedGoogle Scholar
- Graveley BR, Brooks AN, Carlson JW, Duff MO, Landolin JM, Yang L, Artieri CG, van Baren MJ, Boley N, Booth BW, Brown JB, Cherbas L, Davis CA, Dobin A, Li R, Lin W, Malone JH, Mattiuzzo NR, Miller D, Sturgill D, Tuch BB, Zaleski C, Zhang D, Blanchette M, Dudoit S, Eads B, Green RE, Hammonds A, Jiang L, Kapranov P, Langton L, Perrimon N, Sandler JE, Wan KH, Willingham A, Zhang Y, Zou Y, Andrews J, Bickel PJ, Brenner SE, Brent MR, Cherbas P, Gingeras TR, Hoskins RA, Kaufman TC, Oliver B, Celniker SE. The developmental transcriptome of Drosophila melanogaster. Nature. 2011;471(7339):473–9.View ArticlePubMedGoogle Scholar
- Hu Y, Comjean A, Perkins LA, Perrimon N, Mohr SE. GLAD: an online database of gene list annotation for Drosophila. J Genomics. 2015;3:75–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Hu Y, Flockhart I, Vinayagam A, Bergwitz C, Berger B, Perrimon N, Mohr SE. An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics. 2011;12:357.View ArticlePubMedPubMed CentralGoogle Scholar
- Marianes A, Spradling AC. Physiological and stem cell compartmentalization within the Drosophila midgut. Elife. 2013;2:e00886.View ArticlePubMedPubMed CentralGoogle Scholar
- Michel AM, Baranov PV. Ribosome profiling: a Hi-Def monitor for protein synthesis at the genome-wide scale. Wiley Interdiscip Rev RNA. 2013;4(5):473–90.View ArticlePubMedPubMed CentralGoogle Scholar
- mod EC, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, Lin MF, Washietl S, Arshinoff BI, Ay F, Meyer PE, Robine N, Washington NL, Di Stefano L, Berezikov E, Brown CD, Candeias R, Carlson JW, Carr A, Jungreis I, Marbach D, Sealfon R, Tolstorukov MY, Will S, Alekseyenko AA, Artieri C, Booth BW, Brooks AN, Dai Q, Davis CA, Duff MO, Feng X, Gorchakov AA, Gu T, Henikoff JG, Kapranov P, Li R, MacAlpine HK, Malone J, Minoda A, Nordman J, Okamura K, Perry M, Powell SK, Riddle NC, Sakai A, Samsonova A, Sandler JE, Schwartz YB, Sher N, Spokony R, Sturgill D, van Baren M, Wan KH, Yang L, Yu C, Feingold E, Good P, Guyer M, Lowdon R, Ahmad K, Andrews J, Berger B, Brenner SE, Brent MR, Cherbas L, Elgin SC, Gingeras TR, Grossman R, Hoskins RA, Kaufman TC, Kent W, Kuroda MI, Orr-Weaver T, Perrimon N, Pirrotta V, Posakony JW, Ren B, Russell S, Cherbas P, Graveley BR, Lewis S, Micklem G, Oliver B, Park PJ, Celniker SE, Henikoff S, Karpen GH, Lai EC, MacAlpine DM, Stein LD, White KP, Kellis M. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 2010;330(6012):1787–97.View ArticleGoogle Scholar
- Perrimon N, Bonini NM, Dhillon P. Fruit flies on the front line: the translational impact of Drosophila. Dis Model Mech. 2016;9(3):229–31.View ArticlePubMedPubMed CentralGoogle Scholar
- Sandmann T, Girardot C, Brehme M, Tongprasit W, Stolc V, Furlong EE. A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 2007;21(4):436–49.View ArticlePubMedPubMed CentralGoogle Scholar
- Spradling AC, Stern D, Beaton A, Rhem EJ, Laverty T, Mozden N, Misra S, Rubin GM. The Berkeley Drosophila Genome Project gene disruption project: single P-element insertions mutating 25% of vital Drosophila genes. Genetics. 1999;153(1):135–77.PubMedPubMed CentralGoogle Scholar
- Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Szigyarto CA, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist PH, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Ponten F. Proteomics. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419.View ArticlePubMedGoogle Scholar
- Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63.View ArticlePubMedPubMed CentralGoogle Scholar