In order to understand how biological systems behave, a branch of systems biology [1, 2] called "executable cell biology" [3] aims to construct computational models which mimic their behavior and which can be used for simulating, in a faithful and cost-effective way, their reactions to external stimuli. The computational model, which is built upon knowledge obtained by performing some in vitro experiments, should be complete (it should be able to reproduce all the experimental data) and correct (it should be possible to reproduce its behavior experimentally).
The correspondence between the in silico model and in vitro observed behaviors is verified by applying model checking techniques [4]. If the model is found to be not consistent with the experimental data, it must be refined and experimentally validated again.
A notable side-effect of the model construction process is that the computational model may suggest new hypotheses about the behavior of the biological system which can then be verified by performing in vitro or in vivo experiments.
A largely studied class of biological systems is constituted by systems which regulate the expression of genes in an organism. Their behavior is often represented by using gene regulatory networks (GRNs), which describe the interactions among genes, proteins and other components at the intra-cellular level. GRNs have been successful among biologists because they constitute an easy to use and intuitive tool which can be used to represent the biological model under consideration. However, their lack of formal semantics prevents their direct use for performing reliable and consistent simulations and for model checking with experimental data.
There have been several attempts to define formal mathematical and computational frameworks for modeling GRNs. They can be classified into quantitative approaches, using differential equations or stochastic models [5], and qualitative approaches, mostly based on boolean networks [6], Petri nets [7, 8], and bayesian networks [9]. See [10] for a detailed analysis and survey of modelling and analysis of GRNs. Motifs have been identified that are significantly overrepresented in biological networks [5, 11–14]. The same motifs have been found in organisms at different levels of complexity, ranging from bacteria to humans. The relationships between different types of motifs and their function have been explored in a number of simple cases, in silico and in vivo [15, 16].
Recently, Shin and Nourani [17] have used Statecharts (SCs) [18], a computational framework with a visual language and well-defined semantics, for modeling some small and recurring patterns of interactions in GRNs, called motifs [13].
Gene Regulatory Network motifs
GRN motifs are pattern of interconnections occurring in real GRNs with a frequence that is significantly higher than that in a randomly generated GRN.
Their high frequency suggests that they play an important role in the GRN function and can, thus, be considered as its building blocks.
The functional role of most common GRN motifs has been extensively studied in some organisms, such as E. coli and other model organisms [19].
The simple regulation motif
The simple regulation motif is one of the most basic interaction patterns. It is composed of two genes X, Y, where X regulates Y and the interaction is mediated by a signal S
X
. The signal can act as an inducer molecule that binds X or can represent a modification of X which activates it. Since the regulation of X on Y is either activation or repression and S
X
can mediate the regulation with either presence or absence, four possible types of motifs can be described.
A simple regulation motif is coherent if both the effects are of the same polarity, i.e. activation of Y in presence of S
X
(s1 in Figure 1A) or repression of Y in absence of S
X
(s2). It is incoherent if the effects are of different polarity, i.e. repression of Y in presence of S
X
(s3 in Figure 1A) or activation of Y in absence of S
X
(s4).
The feedback loop motif
The feedback loop motif is composed of two genes X and Y, which regulate each other, and their interactions are mediated by a signal S
X
(for X regulating Y ) and a signal S
Y
(for Y regulating X). Since the reciprocal regulations between X and Y can be either activations or repressions we have different feedback loop motifs.
A feedback loop motif is double-positive if both the reciprocal regulations of the two genes X and Y are positive, that is, X and Y activate each other (Figure 1B, left). Similarly, a feedback loop motif is double-negative if X and Y repress each other (Figure 1B, middle). If the effects of the reciprocal regulations of the two genes X and Y are of different polarity, that is, X represses Y and Y activates X or viceversa, the feedback loop motif is said to be negative. Due to symmetry, we consider only the former negative feedback loop motif (see Figure 1B, right).
The feedforward loop motifs
The feedforward loop (FFL) motifs are commonly found in many GRNs of widely studied organisms like yeast and E. coli. They are composed of three genes X, Y, and Z, where X regulates Y and Z, and Y regulates Z. For reasons of simplicity from now on we discuss only the motifs where the regulatory effect depends on the presence of the mediating signals, but our findings apply also to the cases of their absence. Each type of regulation can be either activation or repression. Here we use the term coherent (resp. incoherent) to denote the case where the sign of the direct regulation from X to Z is the same (resp. the opposite) as the overall sign of the indirect regulation path through Y, as in the seminal paper of Mangan and Alon [20]. Out of the eight possible FFL motifs, the most frequently encountered ones [20] are the coherent type-1 FFL motif c1 and the incoherent type-1 FFL motif i1, both shown in Figure 1C.
The combination of the regulations on gene Z by genes X and Y can be given different interpretations [20]. In the following we will assume that such regulations are combined using the AND logic function, as in the arabinose system of E. coli [21]. Although other functions seem to be more appropriate for use in other systems, the AND and OR functions are sufficient to explain the most peculiar properties of FFL.
The autoregulation motifs
The characteristic element of an autoregulation motif is a gene regulating itself. The autoregulation motif is positive if Y activates itself (see par in Figure 1D) and is negative if Y represses itself (see nar in Figure 1D).
Statecharts
SCs extend state transition diagrams by adding concurrency (i.e., the capability of representing a state as made up by smaller components all active at the same time) and hierarchy (i.e., the possibility of representing a state with a set of more detailed substates). The hierarchical structuring capabilities of SCs allow one to model systems at different levels of detail, while concurrency is useful for modeling multiple, mostly independent, portions of a system. Moreover, SCs are compositional, that is, they can be defined in terms of other SCs, thus making the specifications more reusable.
These additional features, if correctly exploited, provide a solution to the scalability problems of other computational modeling techniques like, e.g., those based on boolean networks and Petri nets, whose effectiveness rapidly decreases when applied to larger systems [3].
We now summarize some of the SCs features that we believe are essential to understand their potential. Please refer to [18] for more complete and detailed information.
A SC is composed of states and of transitions between states. A state is composite, if it contains other states, and is simple, otherwise. A composite state is parallel if its sub-states are executed concurrently, and is exclusive if exactly one of its sub-states is executed. The overall state of a SC is given by all the atomic states currently under execution.
Transitions are used to specify how a system evolves changing its internal state according to the external stimuli. They can be labeled by events which trigger their activation and the consequent change of state of the system, conditions for their applicability, and actions to be performed during their execution.
SCs have an intuitive graphical representation: see Figure 2A showing a SC modeling the movement and feeding of an organism by means of two concurrent substates.
SCs have very good software tool support [22–27], which can be used to generate source code (e.g. in Java) whose execution corresponds to the SCs semantics, and to interactively simulate the system execution. SCs have been extensively studied in software and systems engineering, and have demonstrated to be particularly well-suited for modeling and designing reactive systems, that is, systems which evolve reacting to internal or external events, or changed conditions. In the case of GRNs these events can be, for example, the introduction or removal of a protein or of another component.
SCs have also been successfully used to model pancreatic organogenesis in the embryonic mouse [28], cell fate specification during C. elegans vulval development [29], and T-cell development in the thymus [30].
Shin and Nourani have used SCs to model GRN motifs [17]. In their approach, each element (gene, protein, signal) can be in one of the two states: "on", which means that the gene is expressed or that the protein is present and active, and "off", which means that the gene is not expressed or that the protein is not present or present in its inactive form.
Moreover, activating interactions in GRNs are translated to transitions from the "off" state to the "on" state for the gene being activated. Similarly, inhibiting interactions correspond to transitions from the "on" state to the "off" state.
Their SCs model of the coherent simple regulation motifs s1 and s2 is shown in Figure 2B, which in their approach represents also the autoregulation motifs.