Integrating the interactome and the transcriptome of Drosophila
© Murali et al.; licensee BioMed Central Ltd. 2014
Received: 29 August 2013
Accepted: 28 May 2014
Published: 10 June 2014
Skip to main content
© Murali et al.; licensee BioMed Central Ltd. 2014
Received: 29 August 2013
Accepted: 28 May 2014
Published: 10 June 2014
Networks of interacting genes and gene products mediate most cellular and developmental processes. High throughput screening methods combined with literature curation are identifying many of the protein-protein interactions (PPI) and protein-DNA interactions (PDI) that constitute these networks. Most of the detection methods, however, fail to identify the in vivo spatial or temporal context of the interactions. Thus, the interaction data are a composite of the individual networks that may operate in specific tissues or developmental stages. Genome-wide expression data may be useful for filtering interaction data to identify the subnetworks that operate in specific spatial or temporal contexts. Here we take advantage of the extensive interaction and expression data available for Drosophila to analyze how interaction networks may be unique to specific tissues and developmental stages.
We ranked genes on a scale from ubiquitously expressed to tissue or stage specific and examined their interaction patterns. Interestingly, ubiquitously expressed genes have many more interactions among themselves than do non-ubiquitously expressed genes both in PPI and PDI networks. While the PDI network is enriched for interactions between tissue-specific transcription factors and their tissue-specific targets, a preponderance of the PDI interactions are between ubiquitous and non-ubiquitously expressed genes and proteins. In contrast to PDI, PPI networks are depleted for interactions among tissue- or stage- specific proteins, which instead interact primarily with widely expressed proteins. In light of these findings, we present an approach to filter interaction data based on gene expression levels normalized across tissues or developmental stages. We show that this filter (the percent maximum or pmax filter) can be used to identify subnetworks that function within individual tissues or developmental stages.
These observations suggest that protein networks are frequently organized into hubs of widely expressed proteins to which are attached various tissue- or stage-specific proteins. This is consistent with earlier analyses of human PPI data and suggests a similar organization of interaction networks across species. This organization implies that tissue or stage specific networks can be best identified from interactome data by using filters designed to include both ubiquitously expressed and specifically expressed genes and proteins.
The phenotypic identities of cells and tissues are governed in part by the particular regulatory networks that are active in them. Steady progress has been made to map the molecular interactions that constitute these networks, including the interactions among proteins and between transcription factors (TFs) and the genes that they regulate. Tens of thousands of protein-protein interactions (PPI) and TF-gene interactions have been identified for human and several model organisms, providing a foundation for identifying cell or tissue specific regulatory networks [1–4]. Most of the available interaction data, however, are noisy (i.e., include false positives and false negatives) and are derived from methods that are independent of the in vivo spatial or temporal context of the interactions. A majority of available PPI, for example, have come from two methods: the yeast two-hybrid system, which detects interactions between proteins expressed in a yeast nucleus [5–7], and protein complex determination [8, 9], which usually involves forced expression of a tagged bait protein in a cultured cell line (for example, see [10, 11]). The resulting data from these approaches can be used to build composite interactome networks representing many of the possible in vivo interactions. However, since only a fraction of these interactions may be active in a particular spatial or temporal context, filters are needed to identify the regulatory networks that are relevant to specific cells, tissues, or developmental time points.
To identify spatially or temporally relevant subnetworks, composite interactome networks can be filtered using gene expression or transcriptome data. It has been shown, for example, that interactions between proteins encoded by genes with similar or correlated expression patterns are more likely than those with dissimilar expression patterns to be genuine in vivo interactions [11–14]. This correlation can be used to predict new protein interactions [15, 16], to score experimentally detected PPI [17–19], and to characterize different types of hub proteins within composite networks [20, 21]. Correlated expression, however, is a relatively weak predictor of in vivo PPI and thus may not be useful for filtering interactome data to identify relevant subnetworks. An alternative approach would be to search the interactome data for subnetworks of genes that are specifically expressed in cells or tissues of interest. Recent studies on human PPI and transcriptome data, however, have suggested that tissue-specific proteins are involved in relatively few interactions, most of which are with house-keeping or ubiquitous proteins [22–24]. These studies suggested that tissue-specific proteins primarily interact with the more conserved ubiquitous proteins to modulate tissue-specific functions. If this were a general principle, filtering interaction data based on tissue-specific expression patterns would not be an effective method for identifying tissue-relevant subnetworks.
In this study we examined the relationship between the interactome and transcriptome of Drosophila. We took advantage of the extensive PPI and protein-DNA interaction (PDI) data available for Drosophila, and recent high quality transcriptome data for tissues  and developmental stages . As suggested by the studies with human PPI data, we found that for Drosophila, tissue-specific proteins infrequently interacted among themselves but instead interacted primarily with widely expressed proteins. In addition, we show that stage-specific proteins have many more interactions with ubiquitously expressed proteins than with other stage-specific proteins. In contrast, we find that the Drosophila PDI network is enriched for interactions between tissue- and stage-specific TFs and their relevant tissue- and stage-specific targets, yet there is a preponderance of interactions between specifically expressed TFs and non-specifically expressed targets.
The specific interaction networks active in particular cells or tissues will be determined in part by the genes that are expressed in them. The problem of filtering an interactome network based on quantitative gene expression data, however, is particularly challenging because fully active genes can be expressed at widely different levels. In Drosophila, for example, mRNA abundances of different genes at their maximal level of expression can range over four orders of magnitude [26, 27]. This problem, along with the finding that specifically expressed genes frequently interact with genes that are not expressed specifically, led us to develop a normalization procedure for gene expression data that takes into account the levels of expression across samples. The normalized expression value for a gene in a tissue or developmental stage is represented as the percentage of its maximum expression level across all tissues or stages, respectively. In order to identify networks that operate in different contexts, composite interactome networks can be filtered using this intuitive and quantitative gene expression filter. We show that the subnetworks identified with this filter are enriched for genes with mutant phenotypes that are relevant to different stages and tissues.
To examine the interaction properties of Drosophila genes expressed in different patterns we classified genes based on the specificity of their expression across tissues or developmental stages (Methods). We used tissue expression data from FlyAtlas , which covers 15 adult and 8 larval tissues, and developmental stage expression data from the modENCODE project , which covers 30 developmental times points from embryo to adult. We classified genes expressed predominantly in one tissue as tissue specific (2838 genes), or in one stage as stage specific (3566 genes). We classified genes expressed across all tissues or all stages as tissue ubiquitous (3960 genes) or stage ubiquitous (4972 genes), respectively; 3226 of these genes are both tissue ubiquitous and stage ubiquitous and we refer to these as the common ubiquitous genes (Additional file 1). Overall, we classified 31% of the Drosophila genes as tissue ubiquitous and 22% as tissue-specific in this study. This is comparable to a study of human genes  in which 26% of the genes were classified as ubiquitous across 15 human tissues and cell lines and 13% were considered tissue-specific. As expected, the Drosophila common ubiquitous genes are enriched for genes involved in basic cellular processes, including protein synthesis, trafficking and degradation, RNA transcription and processing, and cytoskeleton organization (p values < 1 × 10−8). The ubiquitously expressed genes are also enriched for intracellular proteins (p values < 1 × 10−8) while the tissue- and stage-specific genes are enriched for extracellular proteins (p values < 5 × 10−9) (Methods). This is in partial agreement with the results of a human study where the tissue-specific proteins were found to be enriched for extracellular and membrane proteins . The Drosophila ubiquitously expressed genes are also more evolutionarily conserved than the tissue- or stage-specific genes (Additional file 2). For example, about 37% of the ubiquitously expressed genes have yeast orthologs while only 12% of the tissue-specific genes and 8% of the stage-specific genes have yeast orthologs. Clear human orthologs exist for 80% of the ubiquitously expressed genes and only 19% and 14% of the tissue- and stage-specific genes, respectively. Among the genes that have orthologs in yeast or metazoans, about 50% are ubiquitously expressed while only 9-12% are tissue-specific and only 11-15% are stage-specific. This is in agreement with several studies examining the conservation of ubiquitously expressed and tissue-specific proteins in other organisms [28–32].
Interactions involving ubiquitously and specifically expressed proteins
Percent of interactions with:
It is perhaps not surprising to find few interactions among all of the tissue- or stage-specific proteins, since many of them are never expressed together in the same tissue or at the same developmental stage. To determine whether the tissue- or stage- specific proteins interacted with each other within each tissue or stage, we built networks of genes expressed in each of the 23 adult and larval tissues and performed the same comparisons. The tissue-specific proteins within every tissue except the ovary had many fold fewer interactions among themselves than did random sets of other proteins expressed in the same tissue (Additional file 3A). Similarly, stage-specific proteins at each of the 30 developmental time points had relatively few interactions among themselves, with the exception of proteins specifically expressed in the one-day old female and the 6–8 hour embryo (Additional file 3B). The tissue- or stage-specific proteins rarely interacted among themselves even indirectly through third non-specific proteins (Additional file 4). We did the same analyses with larval tissue networks and obtained similar results (Additional file 5). Because the tissue- or stage-specific proteins rarely interacted with each other directly or indirectly, we determined whether they primarily interacted with the ubiquitous proteins. Both the tissue- and stage-specific proteins interacted about five-fold more with the ubiquitous network than did random sets of proteins (Figure 1). Thus, tissue- and stage-specific proteins generally do not form well-connected subnetworks but frequently interact with widely expressed proteins.
Next we determined the interaction tendencies of the ubiquitous and specific genes in the PDI. Not surprisingly, the ubiquitous TFs bound to ubiquitous targets more than expected by random chance (p-value 1.92 × 10−5). In addition, approximately half of the regulatory interactions of ubiquitous TFs involve non-ubiquitous targets. The PDI network is more limited than the PPI network because of the relatively small number of TFs that have been experimentally shown to bind target genes . For each adult tissue there were only two or fewer tissue-specific TFs in the available PDI. Thus it is difficult to determine whether the tissue-specific TFs preferentially interact with the corresponding tissue-specific targets. Nevertheless, it is clear that a large fraction of the targets of tissue-specific TFs are not expressed exclusively in the same tissue. For example, the two testis-specific TFs in the PDI are involved in 562 interactions in the PDI and only 14 (2.5%) of these are with testis-specific targets. Likewise the single ovary- and heart-specific TFs in the PDI have 279 and 282 targets, only 7 (2.5%) and 3 (1.1%) of which are ovary- or heart-specific, respectively. In each case, most of the targets for tissue-specific TFs are expressed ubiquitously or in several tissues. We also examined a TF regulatory network that was predicted based on the integration of physical and functional interaction data . Because one of the predictors for this network was developmental expression data, it would not be surprising to find enrichment for interactions between TFs and targets that are specifically expressed in the same patterns. Indeed, we found that the predicted network was from 1- to 9-fold enriched for interactions between adult tissue- or stage-specific TFs and targets that are specifically expressed in the corresponding tissue or stage (data not shown).
Overall, tissue-specific TFs interact with their corresponding tissue-specific targets about 3-fold more than expected from random chance. Thus, in contrast to the PPI network, the PDI network is enriched for interactions between TFs and targets that are expressed exclusively in the same tissue or stage (Additional file 6). Nevertheless, like the PPI network, in the PDI network most of the tissue-specific TF interactions are with non-specific targets. In the predicted PDI network, the 65 tissue-specific TFs are involved in 11,963 interactions, only 615 (5.1%) of which are with the corresponding tissue-specific targets. In contrast, 4355 (36.4%) of the interactions are with ubiquitously expressed targets, and most of the rest are with targets that we classified as neither tissue-specific nor ubiquitous; i.e., expressed in a subset of all tissues. Since the ubiquitous targets are by definition ubiquitously expressed, their regulation by specific TFs suggests that their levels are modulated in a context-dependent manner.
Because the techniques used to collect much of the available interactome data are context independent, filters are needed to identify the subnetworks that operate in specific tissues or at specific developmental times. One type of filter that can be envisioned is one that retains only genes that are specifically expressed in a tissue or stage. However, as demonstrated above for Drosophila and elsewhere for human [22–24], proteins expressed in specific tissues or stages frequently interact with ubiquitously expressed proteins rather than with other specifically expressed proteins. In the PDI, ubiquitously expressed and specifically expressed genes are regulated by both types of TFs. Thus, a filter based on expression specificity is likely to remove many of the PPI or PDI interactions that are relevant in specific contexts. A second type of filter that can be envisioned is one that relies on absolute expression levels. For example, genes are frequently classified as “on” or “off” in a particular sample based on arbitrary expression thresholds. The problem with using filters based on expression thresholds, however, is that different genes may function at widely different expression levels; two genes, for example, may differ in their expression levels by several orders of magnitude even at their maximal levels.
We reasoned that a gene is more likely to be active when it is expressed at levels approaching its maximal level across all tissues or stages. For example, if a gene is maximally expressed in the ovary, it is likely to have a function in the ovary and in any other tissues where its level approaches the level in the ovary. To test this hypothesis we developed a scale to indicate the fraction of maximal expression for each gene in each tissue or developmental stage. For each tissue or stage we calculated a gene’s expression level as a percent of its level in the tissue or stage where it is maximally expressed. A gene, therefore, will have a percent maximum or “pmax” value for each tissue or stage.
To evaluate the pmax scale, we first asked if we could obtain gene lists enriched for tissue-relevant functions by selecting genes expressed above different pmax thresholds (e.g., pmax >50 or >75) in specific tissues. We compared these filters to one that selects genes expressed above the average value for all genes in a given tissue. We chose a set of six tissues (brain, thoracic ganglion, eye, testis, ovary, and larval central nervous system or CNS) and applied the three different filters to genes expressed in those tissues; the pmax filtered gene lists had in some cases more and in some cases fewer genes than the corresponding lists of genes expressed above the average level (Additional file 7). We evaluated the expression filters by checking for enrichment of tissue-relevant mutant phenotype annotations in the respective gene lists. The results show that the pmax filtered gene lists are highly enriched for tissue-relevant phenotypes in contrast to the gene lists filtered on average expression levels, which show little or no enrichment for tissue-relevant phenotypes (Additional file 8). The >75 pmax filter performed better than the >50 pmax filter for the selected tissues, supporting the hypothesis that genes expressed closer to their maximum level in a tissue are more likely to be functional in that tissue.
To further examine the effectiveness of the expression filter we asked whether the collection of enriched phenotypes in networks from each tissue or stage correlated with the collection of enriched phenotypes in related tissues or stages. We clustered the pmax-filtered subnetworks based on the similarity of phenotype terms that are enriched in them. In the case of the tissue-relevant pmax-filtered subnetworks (Additional file 9), the ovary and larval central nervous system clustered together as they are both enriched in genes that share ‘cell cycle’ and ‘lethality’ phenotypes among others. This is perhaps not surprising as it has been shown that many cell cycle and maternal genes play major roles in the asymmetric cell divisions in nervous system development [38, 39]. Another example is the pmax PPI networks for brain and thoracic ganglion, which cluster together based on shared neuroanatomy and neurophysiology phenotypes among others. The testis subnetwork is an outgroup as it alone is enriched for the ‘male sterile’ phenotype, and similarly, the eye subnetwork is the only one enriched for phenotypes related to vision. Most larval tissue subnetworks cluster together due to their enrichment for lethality phenotypes. The early embryo subnetworks also cluster together, as do the late embryo subnetworks along with late pupal subnetworks (Additional file 10). The subnetwork at the mid-embryo stage from 10 to 12 hours forms an outgroup, consistent with studies [12, 40] showing that transcripts specific for the early-embryo are down-regulated at this stage while late embryo-specific transcripts are just beginning to be expressed. The embryo subnetworks at 14–16 hrs and 16–18 hrs also do not cluster with the other embryo subnetworks and are not significantly enriched for embryo relevant phenotypes, corresponding to the transition to late embryo stages. It has been shown that related tissues show similar expression profiles and that tissues in consecutive developmental stages cluster together based on their gene expression patterns . Here we show that related tissue pmax-filtered PPI subnetworks and as well as consecutive stage pmax-filtered PPI subnetworks cluster together based on related gene functions, as indicated by their shared mutant phenotypes. This result further shows that the pmax filter can identify subnetworks with the appropriate context relevant functions.
In this study we used transcriptome data to examine the PPI and PDI interactomes of Drosophila and arrived at several general conclusions. First, ubiquitously expressed proteins interact among themselves significantly more than with specifically expressed proteins. Second, tissue- and stage-specific proteins interact with core networks of ubiquitously expressed proteins, potentially modifying them for tissue- or stage-specific functions. Third, we show that tissue- and stage-specific proteins rarely interact amongst themselves directly or even indirectly through other non-specific proteins. These results for PPI are in agreement with previous studies with human proteins showing that tissue-specific proteins have few interactions among themselves, ubiquitous proteins frequently interact with each other, and tissue-specific proteins primarily interact with ubiquitous proteins [22, 23]. In addition, we have shown that these results hold true for developmental stage-specific and stage-ubiquitous proteins. The interactions of the tissue- or stage-specific proteins with the ubiquitous proteins may take place to recruit the ubiquitous network to perform context specific functions (some examples in [47–53]). In the PDI network, the tissue- and stage-specific TFs tend to regulate tissue- and stage-specific targets more than expected by random chance. A surprising finding in the PDI is that the tissue-specific TFs frequently regulate ubiquitous targets. The levels of expression of ubiquitous genes, therefore, may be regulated differently in specific tissues and stages, potentially rewiring the networks for specific functions.
These findings indicate that protein networks frequently consist of a core of evolutionarily conserved, widely-expressed proteins to which are attached a set of relatively new, tissue-specific proteins. Many of the functions of specific tissues may emerge via a modification of core non-specific cell networks. This is in contrast to the notion of cell- or tissue-specific systems that consist primarily of cell or tissue-specific proteins. The Drosophila phototransduction network (Figure 5) is a good example. Only 11 of the 351 genes in this network are specifically expressed in the eye. A large number of genes in the network that play a defined role in the eye, as indicated by their mutant phenotypes, are expressed in many different tissues in addition to the eye. An integral part of the phototransduction pathway, for example, is the calcium-signaling module, which is expressed and has a unique role in many different contexts [44, 54].
The identification and characterization of cellular pathways and other functional modules is a major challenge in post-genome research. Clues about the constituents of pathways can be obtained from the huge amount of protein interaction data that is becoming available from high throughput screens and collections of data from published low throughput experiments. Since most of the available data comes from context-independent assays, many of the available interactions may not occur in any given biological setting. Gene expression information could be used as a first pass filter to identify interacting proteins that are expressed in the same cell or tissue. The finding that specifically expressed proteins frequently interact with ubiquitously expressed proteins shows that a filter based on tissue-specific expression would not be effective. Another implication of this finding is that genes encoding pathway members are not necessarily coordinately regulated across tissues. Thus, while correlated expression has been useful for finding groups of genes that may function together, the expression patterns of many members of specific pathways are not correlated. Another approach that has been used effectively in some studies is to first find genes that are expressed at high levels in a particular tissue in order to enrich for genes belonging to pathways that are active in that tissue (for example, ). Such an approach can generate false negatives, however, because some genes are active even when expressed at very low levels. As an example, the genes that are known to be required for Drosophila eye function (with eye-related mutant phenotypes) are expressed in the eye over a range of two orders of magnitude . Many of the genes that are expressed at very low levels are transcription factors that are effectors in signal transduction pathways. Such genes would not be identified as potential members of eye-relevant pathways if expression levels were used to filter protein interaction data.
We set out to develop a method for using gene expression data to identify context-relevant networks from interactome data. We reasoned that a gene is likely to be active in a tissue where it is expressed at its maximal level relative to other tissues and that the closer a gene is to its maximal expression level the more likely it is to be active. The availability of quantitative gene expression data from a wide range of tissues or developmental stages has made it possible to test this idea. We used a normalization procedure that scaled (on a percent scale) each gene’s expression based on its maximum expression level in any tissue or stage. Each gene in each tissue or stage is expressed at a percent of its maximum value, or pmax. We showed that genes expressed at higher pmax values in a tissue or stage are more enriched for genes that function in that tissue. Moreover, filtering composite protein interaction networks using this scale generates biologically relevant subnetworks. Such a gene expression filter will be useful for generating hypotheses about the composition of pathways and other functional modules in cells. The pmax filter along with other gene expression filters should also be useful for understanding how the interactome changes from cell to cell and in different conditions (i.e., the dynamic interactome), particularly as additional expression data with better time and spatial resolution becomes available.
Protein-protein interaction (PPI) and protein-DNA interaction (PDI) data were downloaded from DroID [14, 25] version 2011_05, and include 93,544 experimentally detected PPI from yeast two-hybrid data from three large-scale studies [56–58] and an ongoing project , literature curated PPI from other major databases [59–61], and recently added PPI data from two large co-AP complex studies for Drosophila[10, 62]. The PPI data also include 144,171 Drosophila interologs predicted from experimental data in yeast, worm and human [14, 25]. In total we analyzed 235,950 unique PPI involving proteins from 10,823 genes. The analysis in Figure 2A and B used all unique, non-self PPI from DroID version 2014_01. Individual data sets included data from yeast two-hybrid array screens [56, 58], yeast two-hybrid cDNA screens , interologs from yeast, C. elegans, or human , and protein interactions from co-affinity purification studies . The Drosophila PDI data include 158,508 unique regulatory PDI for 149 transcription factors (TF) and 12,441 of their target genes that were inferred using TF binding and correlated expression of targets [35, 63]. We separately analyzed ~300,000 computationally predicted PDI . Tissue-wide gene expression data was downloaded from Flyatlas.org . This data includes the mRNA signal that correlates with mRNA abundance for about 13,000 genes in 15 adult and 8 larval tissues measured using Affymetrix Drosophila expression arrays. Data was included for probes that mapped to single Flybase gene identifiers (FBgn). Probe data was included only where Affymetrix called the mRNA present for all four replicates. The mean signal from four replicates was used for all subsequent calculations. The stage-wide temporal gene expression data was obtained from Graveley et al. . This data includes the mRNA abundance determined by RNA-Seq for more than 14,000 genes spanning 30 developmental time points from embryo to adult.
where abundance i is the raw expression value of a gene in tissue or stage i. This results in Es i values between 0 and 1 for each gene in each tissue or stage. The sum of all expression specificity values for each gene across all tissues or stages equals 1.
We placed genes into three non-overlapping bins based on their specificity of expression across all 15 adult tissues: Genes with Es i values of > =0.8 in any of the adult tissues were labeled as tissue-specific (2838 genes); genes that were not tissue-specific and that had non-zero expression values across all adult tissues were labeled as ubiquitous (3960 genes); the remaining genes were labeled as tissue non-specific-non-ubiquitous (5830 genes). For the analyses in Additional file 5, we similarly classified genes using only the 8 larval tissues. In a separate classification, we placed genes into three non-overlapping bins based on their specificity of expression across 30 developmental stages: Genes with developmental stage Es i values of > =0.19 were labeled as stage-specific (3566 genes); genes that were not stage-specific and that had developmental stage Es i values >0.005 across all time points were labeled as ubiquitous (4972 genes); the remaining genes were labeled as stage non-ubiquitous, non-specific (6064 genes). The stage-specific bin included genes that showed a transient and sharp change in abundance that spanned a maximum of four consecutive developmental time points in a majority of the cases. The overlap among genes classified by the two specificity scales (tissue specificity and stage specificity) is shown in Additional file 1.
where abundance i is the raw expression value in tissue or stage i, and abundance max is the raw expression value in the tissue or stage where it is maximally expressed. For the analysis in Figure 2, genes were similarly classified based only on adult tissues expression data. The pmax gene expression data and filters are available at the Drosophila interactions database, DroID (http://www.droidb.org) . The database allows lists of PPI, PDI, and other interactions to be filtered based on user-defined pmax values. The pmax expression data can also be used to filter graphically displayed interaction networks using the interaction map browser tool (IM Browser) , which is also available at DroID. All precalculated pmax values for tissue and stage expression data are available for download at DroID.
Potential Drosophila orthologs of yeast, worm and human proteins were identified using data downloaded from InParanoid version 7.0 database . InParanoid performs pairwise comparisons of proteomes and constructs orthology groups. An orthology group has at least one protein from each species (seed orthologs) that are more similar to each other than to any other sequence in the other proteome. The orthology group may have additional sequences that are closer to the seed orthologs than to any sequences in the other proteome. We merged the InParanoid groups keeping a Drosophila gene as a unique reference to each orthology group.
To compare the numbers of interactions among different groups of genes we calculated the fold difference in interactions over random expectation. First we counted the number of interactions among genes in the test set. Then we picked the same number of random genes from the PPI network minus the test set and counted interactions among them. The fold difference is the number of interactions among genes in the test set divided by the number of interactions among genes in the random set. We repeated this 5000 times for each test set. To calculate a p-value for each test case, we performed 100,000 Monte Carlo simulations by picking gene sets of the same size randomly and counting the number of interactions between the genes in each of these random sets. We computed the number of times the interactions in the random sets were lower, or higher, depending on the test case. This number was used to calculate the binomial confidence interval using binom.confint in R to calculate the p-value at a confidence of 99.99% (CI = 0.9999) [66, 67]. We used the upper confidence interval as the p-value. Tissue-relevant networks were made from the composite network by including only the genes that had expression specificity values above zero in the respective tissues. Stage-relevant networks included only the genes that had expression specificity values above 0.005 in the respective stages. The PDI network is a directed network with a small number of TFs binding to numerous potentially regulated target genes. To find out if the tissue- and stage-specific TFs regulate tissue- and stage-specific targets more than expected by chance, for specific TFs in each tissue or stage we built random networks by assigning random targets while keeping the node degree constant. We built 100,000 such random networks for specific TFs in each of the tissues and stages. We computed the p-values (binom.confint, CI = 0.95) by counting the number of times specific interactions in the random networks were lower than the number of specific interactions in the tissue and stage PDI networks. Example networks were identified using IM Browser  to search and filter the DroID database . Edge colors in Figures 5 and 6 depict the source of the protein interaction: dark green and yellow are human and C. elegans interologs, respectively; light blue and pink are from large co-complex studies [10, 62]; dark blue, green, and dark grey are from three separate large scale two-hybrid studies [56–58]; orange are from literature databases [59–61]; red are from more than one source.
Gene Ontology enrichment analysis was performed using DAVID 6.7  and the Drosophila protein-coding genes as background. We used DroPhEA  and BiNGO  to perform phenotype enrichment analysis. We used Drosophila phenotype controlled vocabulary terms from the FlyBase phenotype ontology . The phenotypes resulting from single gene loci perturbations were used by DroPhEA and the same were used in the BiNGO analyses as background. About 4800 genes had phenotypes associated with them. The p-values were corrected by Bonferroni correction in DroPhEA and by Benjamini and Hochberg FDR correction in BiNGO and DAVID. The enriched and depleted p-values obtained for gene lists were negative log transformed and scaled (0–1) to create a distance matrix and then clustered hierarchically. R heatmap.plus package was used to plot the scaled values. We used BiNGO to compare phenotype enrichment in variously filtered gene lists and also to compare phenotype enrichment between filtered gene lists and filtered networks. We used DroPhEA to compare enrichment and depletion of phenotypes in different tissue and stage networks filtered for genes expressed at >75 pmaxi.
All interaction data used in this study is available along with calculated gene expression pmax values in DroID, the Drosophila Interactions Database (http://www.droidb.org).
Percent of maximum expression level.
We thank Dr. Gerardus Tromp for helpful discussions and advice. We also thank Dr. Murugesan Venkatapathi, Paul Albosta, Nermin Gerges, Dumrong Mairiang, and members of the Finley laboratory for helpful discussions and comments on drafts of the manuscript. We also thank Wayne State University Scientific Computing for grid computing resources. This work was supported in part by grant number HG001536 from the National Human Genome Research Institute and by the Wayne State University President’s Research Enhancement fund.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.