To identify successive layers of hierarchies in the local topological features of the E. coli transcriptional regulatory network, we first established its layout based on published data [3, 9]. Following the representation of Shen-Orr et al [3], we associate the E. coli transcriptional regulatory network with a directed graph in which each node represents a gene or an operon encoding a transcription factor (TF) and the gene or operon regulated by the TF, while the links denote the TFs themselves. Note, that many TFs are encoded within an operon, thus the directed links represent direct transcriptional modulation from the TF to an operon, or a TF-contained operon to another operon (Fig. 1A). This representation allows us to distinguish between two different elementary links: 59 autoregulatory loops, in which a TF regulates its own expression, and 519 directed links, in which a TF regulates another TF or operon (Fig. 1A). Note, that about half of the 116 TFs have an autoregulatory loop. For those TFs that are encoded as single genes the same trend is also evident, while for the TFs that are encoded as part of an operon a significantly higher proportions possess autoregulatory loops (Fig. 1B, middle panel).
First organizational level: motifs
Motifs can be explicitly identified and enumerated in various cellular networks [3–7]. Within the E. coli transcriptional regulatory network we detected the two previously described motifs with uniform topology [3, 7], i.e., the feed-forward and bi-fan motifs (Fig. 1B, top panel). Both motifs can be further classified by the functionality of their links (activating or inhibitory). In a coherent feed-forward or bi-fan motif all the directed links are activating (Fig. 1B, top panel, 1 and 3), while in incoherent motifs one of the links inhibits the activity of its target node (Fig. 1B, top panel, 2 and 4). We find that coherent motif types are significantly more common than incoherent ones both for feed-forward and bi-fan motifs (Fig. 1B, bottom panel). We can further group the detected motifs according to the number of autoregulatory loops they possess, finding that both motifs have predominantly one or two autoregulatory loops, while no motif has an autoregulatory loop associated with each of its nodes (Fig. 1B, bottom panel).
Second organizational level: homologous motif clusters
While statistically significant motifs can be explicitly identified and enumerated, the nodes (i.e., the TFs and operons) that take part in such motifs do not exist in isolation but almost always have additional interactions with nodes outside the motif. To systematically identify such interactions, we first searched for feed-forward motifs that share at least one link and/or node with another feed-forward motif (Fig. 2A). We have also performed a similar search for bi-fan motifs that interact with each other in this manner (Fig. 2B). We find that in the E. coli transcriptional regulatory network the vast majority of motifs overlap generating distinct topological units that we refer to as homologous motif clusters (Fig. 2A,2B).
Forty-one of the 42 individual feed-forward motifs coalesce into six feed-forward motif clusters (Fig. 2A). Of these six motif clusters, three have one highly shared link, while a shared node plays a critical role in establishing the other three motif clusters (Fig. 2A). Similarly, 208 of the 209 bi-fan motifs join together into just two bi-fan motif clusters in which most of the links are shared by at least two adjacent motifs, and also among multiple motifs (Fig. 2B). The majority of links within the motif clusters are either activating or inhibitory (Figs. S1, S2, see Additional file: 1), suggesting that most of the network motifs do not function in isolation but are embedded into a multi-level hierarchy of regulatory interactions. This notion is further supported by the finding that in both cases many of the topological motif clusters overlap to a large extent with known biological functions. For example, one of the feed-forward motif clusters largely overlaps with the flagella motor module, while another contains a significant number of elements responsible for regulating the aerobic/ anaerobic switch in E. coli (see Additional file: 1 for details). While some of the motif clusters are topologically highly similar, the number of links connecting them to other network constituents is uneven. For example, the cluster encompassing most elements of the flagella motor module is relatively isolated, yet the topologically highly similar cluster overlapping the aerobic/ anaerobic switch is densely integrated with other motifs (Fig. 2C). This suggests that despite their highly similar topology, they may display qualitatively different dynamical features.
Third organizational level: motif supercluster
The homologous motif clusters are not isolated either, but are embedded into the E. coli transcriptional regulatory network as a whole. To understand the topological relations between different homologous motif clusters, we merged all feed-forward and bi-fan homologous motif clusters, finding that they form a single large connected component (i.e., motif supercluster) in which the previously identified feed-forward and bi-fan homologous motif clusters are no longer clearly separable. Indeed, we find only one feed-forward- and one bi-fan motif to be isolated from the obtained supercluster (Fig. 2C). This integration is especially evident for the feed-forward motif clusters, the vast majority of which share the same links with the bi-fan motif clusters (Fig. 2C).
The relationship of organizational levels to the global network topology
When considering the full E. coli transcriptional regulatory network undirected, the statistical analysis of the cumulative of its connectivity distribution demonstrates that it belongs to a class of scale-free networks [10], as previously described [3, 11] (Fig. 2E), with embedded topological hierarchy [12] (Fig. 2F), and having a single connected giant component (Fig. 1A, also see Fig. S3, Additional file: 1, for separate in- and out-degree distributions). To study the global relationship of motifs with the whole topological architecture of the network, we overlay the heterologous motif superclusters on the full network (Fig. 2D). It is visually evident that all the nodes from the single giant heterologous motif supercluster are part of the giant component of the full network, comprising 41.46% of its nodes and 53.53 % of its links, respectively. In fact, it appears that the heterologous motif supercluster defines the core of the connected giant component with most other nodes being connected to its nodes (Fig. 2D). Compared to the heterologous motif superstructure, the FF motif clusters use only 20.42% nodes and 21.84% links, while the BF motifs comprise 30.48% nodes and 38.32% links.
To test if the heterologous motif supercluster in fact represents the backbone of the connected giant component, we have examined the effect of removing all 250 links of the supercluster (that mimics the lack of TF binding to a promoter region) from the network [13]. Removing these 250 links (out of a total of 467) fragmented the network into 29 small, isolated subgraphs (Fig. 3A). In contrast, while the removal of 250 randomly chosen links disconnected the network into 16 small subgraphs, a connected giant component was retained (Fig. 3B). To quantitatively characterize the two types of reduced networks we compared the statistical features of the network following the removal of the 250 supercluster links (Fig. 3A) against 5,000 different realizations of randomized removal of the same number of links (of which one realization is shown in Fig. 3B). For networks perturbed by random link removal the cumulative of the connectivity distribution (Fig. 3C), and the scaling of C
k
and k (Fig. 3D) were relatively unaltered, being reminiscent to that observed for the original network (Fig. 2E,2F). However, for the network in which those links contributing to the supercluster were missing the scaling of C
k
and k was completely absent (Fig. 3D). This observation quantitatively demonstrates the collapse of the network structure and its inherent topological hierarchy upon the targeted removal of the links of the motif supercluster.