Regulatory network operations in the Pathway Tools software
© Paley et al.; licensee BioMed Central Ltd. 2012
Received: 28 April 2012
Accepted: 31 August 2012
Published: 24 September 2012
Skip to main content
© Paley et al.; licensee BioMed Central Ltd. 2012
Received: 28 April 2012
Accepted: 31 August 2012
Published: 24 September 2012
Biologists are elucidating complex collections of genetic regulatory data for multiple organisms. Software is needed for such regulatory network data.
The Pathway Tools software supports storage and manipulation of regulatory information through a variety of strategies. The Pathway Tools regulation ontology captures transcriptional and translational regulation, substrate-level regulation of enzyme activity, post-translational modifications, and regulatory pathways. Regulatory visualizations include a novel diagram that summarizes all regulatory influences on a gene; a transcription-unit diagram, and an interactive visualization of a full transcriptional regulatory network that can be painted with gene expression data to probe correlations between gene expression and regulatory mechanisms. We introduce a novel type of enrichment analysis that asks whether a gene-expression dataset is over-represented for known regulators. We present algorithms for ranking the degree of regulatory influence of genes, and for computing the net positive and negative regulatory influences on a gene.
Pathway Tools provides a comprehensive environment for manipulating molecular regulatory interactions that integrates regulatory data with an organism’s genome and metabolic network. Curated collections of regulatory data authored using Pathway Tools are available for Escherichia coli, Bacillus subtilis, and Shewanella oneidensis.
Cells have evolved multiple molecular regulatory modalities. For example, in addition to having its activity regulated directly by a ligand, an enzyme can be regulated at the point of transcription, translation or degradation. It can be sequestered or covalently modified. And all of these processes can themselves be subject to regulation.
Here we report our progress in developing a comprehensive environment for capturing, interrogating, visualizing, and computing with individual regulatory interactions, and with regulatory networks. Currently this environment emphasizes prokaryotic rather than eukaryotic regulatory mechanisms. At the core of our efforts is a regulation ontology for capturing regulatory interactions in a declarative, computable fashion. A set of interactive editing tools allows curation of regulatory interactions and the molecules they regulate. We have also developed computational tools for interrogating and displaying individual regulatory interactions, and genome-scale regulatory networks.
These tools have been implemented in the Pathway Tools software , which is a comprehensive systems-biology software environment for management, analysis, and visualization of integrated collections of genome, pathway, and regulatory data. It supports creation, curation, dissemination and Web-publishing of organism-specific databases, called Pathway/Genome Databases (PGDBs), that integrate many types of data. It performs computational inferences, including prediction of metabolic pathways, prediction of metabolic pathway hole fillers, and prediction of operons. The software also supports the development of metabolic-flux models using flux-balance analysis .
Major new features
Regulation Summary Diagram (section The regulation summary diagram)
Specialized Signaling Pathway display and editing (section Pathway diagrams)
Port of Regulatory Overview (section Regulatory overview) to BioCyc website
Object Group Operations and Regulation Enrichment Analysis (section Object group operations and regulation-enrichment analysis) Export of Regulatory Network
to XGMML (Cytoscape) (section Export to cytoscape) Ranking Genes According
to Regulatory Influence (section Ranking genes according to regulatory influence)
Web services access to regulatory data
The implementation of the regulation operations within Pathway Tools follow the same implementation approach as described in .
The Pathway Tools schema (ontology) organizes biological information in a structured fashion, so that data can be made readily accessible for computational analysis. The ontology is designed to enable high-fidelity representation of regulatory relationships. It is also designed to represent incomplete information (e.g., we might know that a given transcription factor controls all the genes within an operon without knowing the location of the promoter for that operon). Currently, the ontology is qualitative: it does not capture quantitative information about regulation.
The Regulation class defines several relationship slots that are inherited by all of its subclasses and instances. The slot Regulator specifies the regulator object in the regulatory interaction (such as a protein or a small molecule). The slot Regulated-Entity specifies the object whose activity is being regulated (such as a gene, a transcription unit (TU) a, a reaction, or a catalysis object). The slot Mode indicates whether the regulation is positive (activating), negative (inhibitory) or unknown. Subclasses of the Regulation class define additional slots specific to those types of regulatory interactions. A few of the major subclasses are described below.
Regulation of Enzymatic Activity: This class defines substrate-level modulation of an enzyme. Its Mechanism slot indicates whether regulation is allosteric, competitive, etc. Because many purely in vitro activators and inhibitors are reported in the literature, an additional slot indicates whether or not the regulation is physiologically relevant in vivo.
Transcription Factor Binding: This class represents the binding of a regulator to a DNA binding site in order to regulate the binding of RNA polymerase to a promoter and subsequent transcription. The regulator is the transcription factor — when the ligand that activates or deactivates the transcription factor is known, that information is indicated by specifying as the regulator the database object representing the appropriate chemically modified form of the transcription factor. An additional slot, Associated-Binding-Site, provides a link to the binding site. The regulated entity here is the promoter object. The process of transcription is not explicitly represented in the Pathway Tools schema. Rather, regulation of a promoter implies regulation of all genes in the TU governed by that promoter (the promoter object indicates the sigma factor that recognizes that promoter).
Transcriptional Attenuation: Attenuation is the premature termination of transcription. In most cases, the presence or absence of a regulator determines whether the mRNA secondary structure of the attenuator region forms a terminator or anti-terminator structure. Only genes downstream from the potential terminator are regulated by attenuation. Thus for this class, we consider the terminator to be the regulated entity: it is implicit that regulation of a terminator affects all downstream genes in the same TU. The Regulation of Attenuation class contains six subclasses, each describing a different mechanism of attenuation. Some subclasses have additional slots for the genome coordinates of the anti-terminator (and anti-anti-terminator). The Ribosome-Mediated Attenuation subclass has a slot for the ribosome pause site, whereas other subclasses have a slot that identifies the mRNA binding site recognized by the regulator.
Regulation of Translation: Like transcription, neither the process of translating mRNA to protein, nor mRNA itself, are explicitly represented in the Pathway Tools schema. Rather, the Regulated-Entity for translational regulation is the TU (a collection of genes transcribed together, governed by a single promoter) or individual gene. In general, regulation of translation occurs either by blocking or unblocking the binding of the ribosome (directly or indirectly), or by stabilizing or destabilizing the mRNA, thereby governing whether or not it can be translated before it is degraded. Because these two mechanisms often occur in concert, we did not define separate subclasses for them. Rather, the Mechanism slot indicates whether the regulation is by ribosome-blocking, mRNA-degradation or both. The Associated-Binding-Site slot links to the mRNA binding site for the regulator. Subclasses of the Regulation of Translation class allow for additional slots. For example, the RNA-Mediated Translation Regulation class includes slots that identify accessory proteins and associated RNases for a given interaction.
Some proteins are regulated, not by any of the mechanisms described above, but by post-translational modifications, ligand-binding, or sequestering in various ways. Rather than try to model these phenomena using children of the Regulation class, we represent these interactions explicitly as individual reaction objects.
The use of the Regulation class can be considered a level of abstraction above that of using individual reactions. We could have chosen instead to explicitly represent each of the phenomena above as individual reactions; for example, explicitly modeling the binding of a transcription factor to its binding site or the conversion of one RNA secondary structure to another as reactions. How do we decide when it is preferable to represent a regulatory phenomenon as discrete reactions and when to simplify by using a Regulation frame? A Regulation frame represents a biological idiom — it is shorthand for a set of largely stereotyped interactions in which most of the details remain the same in each example, with only a few key differences (e.g., the identity of the regulator or regulated entity, the location of the binding site). By using such idioms, we not only reduce the complexity of the model, making it easier to understand and manipulate, but we also highlight commonalities that might not otherwise be obvious. For example, the complete set of interactions that would be needed to represent regulation of transcription would look very different from the set of interactions needed to represent regulation of translation. By modeling both as Regulation objects, however, it becomes clear that the most important aspect is the presence or absence of the regulator impacts whether or not a particular protein is produced.
However, the idiom becomes less useful when it is not appreciably simpler than the underlying interactions, or when it costs too much in the way of representational power. It is no simpler to say that one protein inhibits another by sequestering it in a complex than to represent the formation of the complex as a reaction; by representing the reaction explicitly, we can then incorporate it into a larger signaling pathway when appropriate. Thus, the boundaries are not always clear-cut; for example, competitive inhibition of an enzyme could be modeled relatively simply as competing reactions of the enzyme with its substrate or its inhibitor, but since enzyme modulation is a fairly well-understood idiom, and because we have chosen not to explicitly model the formation of the enzyme-substrate complex, we use a Regulation object for this kind of interaction. We use the regulation abstraction when we consider its value to outweigh its cost. The end result is that to determine the full set of regulatory influences on a given protein, we must consider the Regulation objects that affect it or its gene, and the reactions in which it participates.
Our representation of regulation is still a work in progress. For example, although we have created a class for regulation of protein degradation, it has not yet been fleshed out with the details a curator might wish to capture. Nor can we yet represent regulation in which the regulator is an environmental condition (e.g., temperature, pH) instead of a protein or small molecule. We expect to address these issues in the future.
Selected statistics on regulation data content for several organisms
E. coli K-12
Transcription factor binding
Allosteric regulation of RNA-polymerase
Genes with ≥ 1 transcriptional/translational regulator
Percent of genome
Transcriptional or translational regulators
Enzymes subject to modulation
Because different users are interested in different aspects of regulation, we have found that there is no single visualization that best captures all regulation data. Some users may be primarily interested in the local effects of operon structure on transcription or translation, whereas others are interested in a wider view of all the regulatory effects on a protein, possibly including indirect regulators. Still others may be more interested in a pathway-based or an organism-wide view of regulation. Thus, we have developed a range of visualizations, each with a different focus and a different level of detail. All of our diagrams except for signaling pathway diagrams are computationally generated based on queries to the regulatory interactions in a given PGDB.
The metabolic pathway diagrams generated by Pathway Tools can incorporate regulatory information. Pathway enzymes that are subject to regulation (whether at the substrate level or at the expression level) have a small plus or minus sign inside a circle next to their names. If the user passes the mouse over the icon, a tooltip appears indicating the regulator and the type of regulation. If an enzyme is regulated by some substrate in the same pathway, such as in the case of feedback inhibition, an arrow is drawn from the substrate to the enzyme it regulates. An example pathway that includes this kind of regulatory information can be found at http://biocyc.org/ECOLI/new-image?object=ARG%2bPOLYAMINE-SYN.
Signaling pathways consist of sets of reactions that form a regulatory cascade. Pathway Tools has specialized editing and display tools for signaling pathways. Signaling pathway diagrams are constructed manually by a curator. A number of two-component response regulator systems in E. coli and B. subtilis are represented as signaling pathways; an example is available at http://biocyc.org/ECOLI/new-image?object=PWY0-1493.
The Regulatory Overview diagram presents a global picture of transcriptional regulation across the entire organism. Genes are clustered into groups on the basis of the set of transcription factors and sigma factors that regulate them, and arrows denoting regulatory interactions can be selectively drawn between genes of interest and the genes that regulate or are regulated by them (directly or indirectly, depending on the user’s preference).
Users can color the Regulatory Overview diagram to show the results of a gene expression experiment in a regulatory context. This Omics Viewer mode can be used for all genes or for a subnetwork of genes to allow visual interrogation of the correlations among gene expression measurements and known regulatory interactions.
This section describes components of Pathway Tools that analyze a collection of regulatory data to generate new biological insights. In addition, the Pathway Tools Application Programmer Interface (API) provides a rich set of operations to enable users to develop their own computational analyses, interacting with PGDB data via Lisp, Perl, Java, or our Web services interface .
Currently, analysis capabilities are limited because our ontology does not include quantitative data, nor does it describe how the effects of multiple regulatory elements combine. For example, if a protein has both an activator and an inhibitor, what is the effect if both are present? Does one override the other? Our model does not provide answers to these questions. Nonetheless, a variety of interesting qualitative deductions can be made with data encoded using our regulatory-interaction ontology.
Suppose a scientist has identified a set of genes of interest — perhaps the genes behaved similarly in a gene-expression or other high-throughput experiment — and wants to find out more about how those genes are related. One reason a group of genes might behave similarly is if they are subject to similar regulatory influences. Thus, it is natural to ask what are the set of regulatory influences on a group of genes? Which genes not in the original set are also subject to the same regulatory influences, and what are the differences among them? Performing this kind of analysis is straightforward using the Pathway Tools Groups facility. A user can create a group of genes, for example, by uploading an omics dataset and selecting all genes whose expression level exceeds a threshold. Once the group has been specified, the user can ask for the complete set of regulators (transcriptional or translational) that regulate any gene in the group. The user can choose to retrieve only direct regulators of the gene group, or both direct and indirect regulators; the user can also extend the original group to include all genes in the same operon as the original genes. The user can proceed in the opposite direction, too — given the resulting set of regulators (or some subset of them), retrieve the complete set of regulated genes (again, either directly only, or both directly and indirectly). Or if only a handful of regulators seem to be relevant, the user can create groups of genes regulated by each, and then combine them by taking either the union or intersection. The user can then subtract the original group from the full set of regulated genes to obtain a comparison group. (Note that although all the base queries and transformations are available both through the BioCyc Website and on the locally installed software, the abilities to combine and subtract groups are currently available only by installing the software locally).
Another way to identify the most relevant set of regulators of a group of genes is to run a novel type of enrichment analysis that we have developed called regulation enrichment analysis. This analysis determines if a gene group is statistically enriched for containing genes regulated by certain regulators, relative to the complete set of genes and their regulators in the organism. As described above, the user can specify whether to consider direct regulation only, or both direct and indirect regulation. The regulation enrichment analysis can be used in isolation, or in combination with enrichment analysis for metabolic pathways and Gene Ontology terms. The results of the enrichment analysis form a new group of regulator genes that can be further manipulated in the manner described in the previous paragraph.
Additional group operations make it possible to determine the set of post-translational regulatory influences on a group of genes. A group of genes can be transformed to the group of proteins the genes code for. For that group of proteins, we can then ask what are their substrate-level activators, inhibitors, cofactors, and ligands. And for a group of metabolites, we can ask what are the set of proteins that bind the metabolites, or what are the set of enzymes the metabolites activate or inhibit.
These group operations could not be performed without the regulation ontology.
Given a set of regulatory interactions, we can create a directed graph representing the global regulatory network for an organism. In this graph, an edge from gene A to gene B means that gene A regulates gene B in some way. We have written software to generate such a graph containing all transcriptional and translational influences of one gene on another. Edges are labeled as either activating, inhibiting, dual (both activating and inhibiting, depending on context), or unknown regulatory effect. The graph can be used for analyses such as the one described in section Ranking genes according to regulatory influence below, or can be exported to XGMML  format, which can then be imported into Cytoscape . Cytoscape is a generic network visualization package popular for displaying, visually querying, and analyzing (via plug-ins) biological networks. By exporting the regulatory network to Cytoscape, users gain access to a number of third party Cytoscape plug-ins (e.g. [10, 11]) that analyze regulatory networks in the context of experimental data such as expression data.
How can we assess the degree of regulatory influence of each gene within the regulatory network? Although lacking quantitative information, we posit that the influence of a gene is proportional to the number of genes it directly regulates and, to a lesser extent that falls off with number of intervening steps, to the number of genes that it indirectly regulates. In addition, if a gene is the sole regulator of some other gene, its influence is likely to be greater than if it is just one of many regulators for that gene. We can compute an influence score for every gene in the regulatory network using a simplified version of the approach used in Google’s PageRank algorithm .
The 20 genes in EcoCyc and BsubCyc with the highest influence scores
E. coli K-12
cAMP receptor protein
SOS response transcriptional repressor
sporulation-specific sigma-29 factor
integration host factor, α-subunit
sporulation-specific sigma-K factor precursor (N-terminal half)
integration host factor, β-subunit
sporulation-specific sigma-K factor precursor (C-terminal half)
sporulation-specific transcriptional regulator
stress response regulator
regulator of transition state genes
factor for inversion stimulation
acid resistance system regulator
repressor of comK
ferric uptake regulator
competence transcription factor
anaerobic response regulator
two-component response regulator
histone-like nucleoid structuring protein
GTP and BCAA-dependent transcriptional regulator
aerobic to anaerobic transition regulator
post-exponential phase response regulator
chromosomal replication initiator protein
positive regulator of comK
extracellular protease production and sporulation regulator
Mrp family regulator
acid resistance system regulator
sporulation-specific sigma-G factor
If we want to determine the overall effect of a regulator on a gene, we need to construct a different graph, one that considers the parity of the regulation (activating or inhibiting) and includes all possible ways by which one entity can regulate another. We have written such a programb that constructs the graph of all factors that unambiguously positively or negatively affect a given gene product, either directly or indirectly. Effects are considered to be multiplicative in terms of their parity – if A inhibits B, and C inhibits A, then we consider C an indirect activator of B. If A activates B by one path and inhibits B by another, then in the absence of any quantitative information, we cannot determine the overall effect (if any) of A on B; accordingly, we declare its regulatory effect as unknown and further regulatory effects on A are ignored (alternative hypotheses are that the overall effect of A on B is determined by the shortest path from A to B when paths are of different lengths, or by a majority rules test when more activating than inhibiting paths exist or vice versa; we have not implemented any of these alternatives).
This algorithm takes a very broad view of regulation; in addition to transcription factors, translational regulators and substrate-level modulators, we also consider sigma factors and producing/consuming reactions to be regulating influences. If an entity is product of a reaction, then both the reactants and the enzyme of the reaction are considered activators of the entity. If an entity is a reactant in a reaction, then the enzyme and any other reactants are considered inhibitors of the entity (because they promote its consumption). Reactions that can proceed in either direction are ignored for the purpose of this analysis. Reactions of substrates that participate in large numbers of reactions are also ignored because we consider them unlikely to be used for regulatory purposes, and omitting them simplifies the graph.
Most other databases containing information on biological regulation specialize in one or a few specific types of regulation — none attempt to cover the full range of regulatory interactions as Pathway Tools does. Other databases that contain regulatory network information include DBTBS , RegTransBase , TRANSFAC , CoryneRegNet , ProdoNet , TransmiR , and YEASTRACT . Most of these databases contain information on transcription-factor-based regulation only. RegulonDB  contains transcription-factor-based data as well as RNA-based regulation such as information on riboswitches, attenuation and small RNA regulators. BRENDA  contains extensive data on enzyme activators and inhibitors.
Most of the databases listed above offer visualizations similar to our transcription unit diagram. CoryneRegNet, ProdoNet and RegulonDB also include a network-based diagram, similar to a subset of our Regulatory Overview Diagram. CoryneRegNet provides the ability to display omics data on its regulatory network diagrams, as well as a plug-in, CoryneRegNetLoader , capable of importing its data into Cytoscape where the entire network can be visualized and analyzed. We know of no other software tool or database that is capable of generating anything similar to our Regulation Summary Diagram.
There exist a variety of analytical tools for regulatory networks. BioQuali  and COMA  attempt to validate regulatory networks against gene expression datasets and point out inconsistencies or suggested changes. While Pathway Tools has no similar capability, BioQuali and COMA are both implemented as Cytoscape plug-ins, and therefore can accept a Pathway Tools-generated regulatory network as input. Other tools, such as DEGAS  and KeyPathwayMiner , use omics data to mine protein interaction networks and attempt to infer regulatory sub-networks. While these tools can be considered somewhat analogous to our enrichment analysis, their approach is very different.
Pathway Tools provides the ability to represent and capture a wide range of regulatory data. It differs from the other software and database environments in the wider range of regulatory interactions that it supports, in the greater number of tools that it affords to manipulate those interactions, and in the value it adds by integrating different types of regulatory data with one another. Regulatory data can also be integrated with the reaction, pathway and genomic data that Pathway Tools provides. It is this integration that allows Pathway Tools to show regulatory information on pathway diagrams, and to build visualizations such as the Regulation Summary Diagram, which combines many disparate types of information.
Project name: Pathway Tools
Project home page: http://bioinformatics.ai.sri.com/ptools/
Operating system(s): MacOS, Windows, Linux
Programming language: Common Lisp
License: Free to academics; includes source code with limited rights to redistribute
Any restrictions to use by non-academics: Fee required
aThe term transcription unit refers to a set of one or more genes transcribed together from one promoter — the term operon implies more than one gene.bThe tool described in this section is a prototype that is not currently a part of Pathway Tools.
This work was supported by award numbers GM077678 and GM075742 from the National Institute of General Medical Sciences of the National Institutes of Health. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.