Incremental and unifying modelling formalism for biological interaction networks

Background An appropriate choice of the modeling formalism from the broad range of existing ones may be crucial for efficiently describing and analyzing biological systems. Results We propose a new unifying and incremental formalism for the representation and modeling of biological interaction networks. This formalism allows automated translations into other formalisms, thus enabling a thorough study of the dynamic properties of a biological system. As a first illustration, we propose a translation into the R. Thomas' multivalued logical formalism which provides a possible semantics; a methodology for constructing such models is presented on a classical benchmark: the λ phage genetic switch. We also show how to extract from our model a classical ODE description of the dynamics of a system. Conclusion This approach provides an additional level of description between the biological and mathematical ones. It yields, on the one hand, a knowledge expression in a form which is intuitive for biologists and, on the other hand, its representation in a formal and structured way.


Background
Often, modeling approaches in biology try to fit the data into the Procrustean bed of a particular modeling formalism [1][2][3][4][5]. However, if the area of interest changes, the modeling process has to be continued (or even restarted) using a different modeling language, more adapted to the new area. An appropriate choice of the modeling formalism may be crucial for efficiently describing biological systems, avoiding to change the description language and permitting to reuse the previous work.
In this paper, we propose a modeling formalism for the biologists that enables the expression of various types of biological knowledge in a formal manner and its translation into target formalisms for analysis or simulation. It aims at satisfying the following requirements: • universality: the integration of various kinds of biological data available today; • parsimony: the simplest possible representation of the data; • incrementality: the construction of more complex models from simpler ones; • precision: expression of relations in a non-ambiguous (mathematical) way; • transposability: formal rules for the translation of the information contained in the model into commonly used (target) modeling formalisms.
In such a formalism, the model can be seen rather as a well-organised knowledge base of information about the biological system. Every unit of information (which has no biological sense when divided) inside the model can be called a data. In this approach, we assume that there is neither contradictory nor "bad" data. In other words, every measurement, every observation may be true in some context.
Our approach, called Modular Interaction Network (MIN), is a formalism designed to represent biological data, having a bipartite network structure and admitting a graphical representation, even if not focused on it. MIN enables the integration of microscopic (molecular interactions) and macroscopic (system states) data, thus allowing to provide the desired level of abstraction. This abstraction allows to avoid the rather common problem of explosion of the model complexity [6]. MIN has a limited number of node and edges types, which enables to represent biological networks in a simple way, even if more detailed information can also be stored and recovered. MIN suits for the representation of genetic regulation as well as of metabolism with multi-molecular biological processes, in a natural and incremental manner. MIN is also provided with algorithms enabling a translation to two classical modeling formalisms: multilevel logical modeling [7] and differential equations. These translations can be performed at any stage of the modeling process.
The paper is structured as follows. After recalling the biology of the λ phage, which will be used as a running example, the formal MIN model is introduced. Next, the multilevel logical approach is first recalled and then used as a semantics of MIN. In Results section, the translation from MIN into multi-level logical approach is presented and extensively illustrated on the λ phage example. A translation to ordinary differential equations is then sketched. Finally, comparison with previous work, perspectives and some concluding remarks are presented.

Biology of λ phage
In order to illustrate our approach, we shall use as a running example a classical biological benchmark: the genetic switch of the λ phage, which will be presented first.
The λ phage is a virus which infects the Escherichia coli bacteria. It turns out that a lot of quantitative and qualitative information is now available on it, so that it has become a benchmark organism and plays a central role in modeling [8,1,5,9,3,4,10].
When a λ phage encounters a bacterium, it can attach itself to specific receptors on the bacterial membrane. At this moment, the virus genome enters the bacterium. Then, two alternative pathways are possible: • lytic pathway: the virus uses the host machinery in order to replicate its genetic material and create new viruses. This phase takes about 45 minutes, then the bacterium is destroyed and about one hundred viruses are released in the external media (Figure 1(a)).
• lysogenic pathway: the virus integrates its genetic material in the bacterial genome. There is no production of viruses. The bacterium is said to be lysogenised. The virus can stay indefinitely in the genome of its host. But there exists an escape mechanism: in some cases, the virus can extract itself from the bacterial genome and enter a lytic phase as a response to some stimuli (Figure 1(b)).
A small region of the viral genome controls the decision between lytic or lyso-genic pathway. This region is composed of two genes and their two promoters (sites of regulation of the gene expression) and is referred to as the genetic switch region (see Figure 1). The decision results from the competition between two major proteins: • the first one is referred to as CRO, encoded by gene cro, and expressed during lytic phase.
• the second one is called λ repressor, referred to as CI. It is encoded by gene cI, and it can activate other genes, including itself, and repress others. cI is expressed during lysogenic phase.
Note that the competition between CI and CRO is also influenced by the host environment. The host environment is captured through CI and CRO and their influence on the regulator region, i.e., the genetic switch.

Modular Interaction Network (MIN)
Modular Interaction Network (MIN) formalism considers two types of entities: variables (chemical species and regulatory sites) and influences (IRCs and ICRs). Every model entity (site, species, influence) is characterised by its attributes which can be any data concerning the biolog-ical object or interaction represented by this entity; for example: • physical attributes: size and shape for a protein, position in DNA for a genomic sequence; • localization in space (cell compartments: nucleus, cytosol); • expression pattern (cell types, tissues etc.); • observable values of the activity level for the biological object; • velocity, force, speed, amplification factor, cooperativity increase, energy of the interaction.
From the very beginning, for any bit of information added to the model, the link to the source (the set of references to papers, databases, etc.) of it should be specified. This will be important in later steps of the modeling, for example in order to estimate the data quality. We assume that all the data in the model has a representation which allows it to be compared (it may be, for instance, a textual "string" representation).

Variables
Both species and regulatory sites may represent biological objects of some abstraction level (molecules or parts of them, complex processes like regulatory pathways, com-plex systems like sensors, or even an entire organism). As our knowledge about biological systems is based on observations and experiments, the observable level of activity of biological objects can change in different states of the biological system. These objects can influence the levels of activity of the other biological objects. So, every species and site in MIN will be assumed to have a set of observable values, corresponding to the observable levels of activity of the corresponding biological objects.
The formal definition of a MIN variable reflects the presence of various features (attributes) in biological objects. Also, in different sources a biological object can have different names (hence the name set of a variable). Moreover, the measurement methods used to observe the activity level of this object yield a set of possible values for the variable, usually (partially) ordered.

Definition 1
A variable V is an entity characterized by a tuple (N, W, P, L) where: • N is a non-empty set of known names of the variable; • W is a partially ordered (by ՞ V ) set of observable values representing the activity level of the biological object associated to the variable. We shall assume that this set has at least the default value undef, unordered with respect to the other values, and two defined values, meaning that the variable is not a constant; The genetic switch of the λ phage

Figure 1
The genetic switch of the λ phage. The cI and cro genes lie on opposite sides of the operator region, containing three operators (OR1, OR2, OR3). The two genes are transcribed in opposite directions from their respective promoters, which overlap in the middle operator, OR2.
• P is a set of attributes, having a type, a value and the boolean unique field. unique = 1 indicates that this attribute can not be present in P more than once. Otherwise, several attributes of the same type can have different values; • L is a non-empty set of links to (bibliographic) sources of the information about the variable. This set of attributes will always include the kind of the variable (which is unique and can be either "regulatory site" or "chemical species").

Chemical species
A species represents a biological object with catalytic or binding capabilities, which influence one or more regulatory sites. These influences have a chemical nature: association/dissociation reactions, electron transfers, etc. A species may have one or more influence capabilities, that will be called affinities.
An affinity is the ability of a biological object to interact with (potentially) a set of other biological objects through a particular regulatory site. Thus, an affinity may correspond to a protein domain for a protein or a surface molecule (receptor) for a cell.

Definition 2
An affinity a is a tuple (l a , P a , L a ) where: • l a is a label representing the affinity name (which is indeed the label of the binding regulatory site); • P a is a set of attributes of the affinity, having a type and a value (not necessarily unique); • L a is a non-empty set of links on sources of the information about this affinity (bibliographic references). Now we are able to formally introduce chemical species:

Definition 3
A chemical species C is a variable (N C , W C , P C , L C ) whose set of attributes P C contains (Kind, "chemical species", 1) and one or more data (Affinity, a, 0), where different a's enumerate the influence abilities of the species C.
Chemical species are graphically represented by rectangular boxes. Various affinities can be represented inside the species (by named triangles) omitting all the details except for their label. The nature of the interaction between two biological entities can be unknown. So, a wild-card affinity, labeled "*", may be defined for every species, standing for an unknown mechanism of regulation (see Figure 2 for an example of a chemical species).

Regulatory sites
A regulatory site regulates species activity in a manner which cannot be represented by a chemical reaction, like for example by three-dimensional conformation changes in a molecule or cooperativity effects. A regulatory site may represent a genome region or a protein domain that changes its state after a chemical reaction.
A regulatory site has a label which characterizes its capabilities of being influenced through affinities. If a regulatory site and an affinity of a species have the same label, it means that the interaction is possible between the biological objects corresponding to the site and the species. A regulatory site represents an "input" for a species and regulates its activity through integration of several influences on it.
Representation of a chemical species and of a regulatory site Figure 2 Representation of a chemical species and of a regulatory site.

Definition 4
A regulatory site R is a variable (N R , W R , P R , L R ) with the attributes (Kind, "regulatory site", 1) and (Label, l R , 1) in the set P R , where l R is a label representing the site type.
Regulatory sites are graphically represented by ellipses containing the label l R inside a triangle. An example of a regulatory site is given on the Figure 2. The presented site has two different states: free (OR1·) and regulated ((OR1·CI)). This means that the corresponding biological object can participate in binding with another object. The label of this site is OR, so it can be influenced by a species having an affinity labeled OR, like the one represented on Figure 2.
In the MIN representation, different biological objects are associated to different entities in the model. The attributes of sites and species may have types like "position", "size", "location" etc. expressing a knowledge about these biological objects. For example, if a gene has more than one regulatory site of the same type in its regulatory region, several sites will be present in the model, having the same label but with different positions (mentioned in the attribute set); clearly, in this case, the corresponding variables will not be compatible. All these sites will influence the species corresponding to the gene. However, several species with the same name may be present in MIN, if they have attributes with different values. So, we can represent a molecule of the same protein in free or dimerised state, or the same gene at its natural location and translocated in a different place in the genome.

Influences
Biological objects, represented by species and sites in MIN, may interact and play specific roles in these interactions. For example, they can take part in a chemical reaction, one object modifying, creating or destroying another one. We assume that every interaction happens through an affinity and a regulatory site. More formally, a chemical species C 1 having an affinity a with a label l a can influence a chemical species C 2 if there is a regulatory site R labeled by the same label (l R = l a ) which influences the species C 2 . An influence is defined between two MIN variables as follows:

Definition 5
An influence I between variables is a tuple (V, V', P, L) where: • V is the influencing variable; • V' is the variable influenced by V; • P is the set of influence attributes, having a type and a value (not necessarily unique); • L is the set of links to sources of the information about the influence.
The influence (ICR) of a species on a regulatory site of another species represents the chemical interaction between two biological objects in which the state of the regulatory site is modified by the species through an affinity. Symmetrically, a regulatory site can influence the value of a species, through the influence (IRC) of a regulatory site on a chemical species. In this case the interaction between corresponding biological objects cannot be represented by a chemical reaction, and there is no specific affinity associated to such an influence.

The network
After presenting the species and the regulatory sites, the influences between them, we can now give a formal definition of the MIN for the modeling of a biological system. The information about the possible connections between species of the system is already coded in the labels of the regulatory sites and affinities. We consider that the states of the model are expressed through observable values of species and sites, so that Ω C denotes the set of functions associating a value of its value set to each species of the model, Ω R is the same for the sites of the model, and Ω is the set of all possible observable states of the model. In the following, ω ∈ Ω stands for any given observable state of the system and ω(V) will stand for the value of the variable V in the state ω.
In general, in a single biological experiment (an observation), the values of only a subset of biological objects are measured. In this case, the observable values of non observed species and sites take the special value "undef" and the state of the system will be considered as "partly" defined.
In the set Ω of observable system states a subset ⊂ Ω of observed system states will yield all the partly defined system states which were really observed in biological experiments and described by biologists. plays the role of a databank from which the parameters of the dynamics of the system interactions could be inferred. If some of these parameters (as, for example, kinetic rates for biochemical reactions) are known (were measured in biology), they will be directly mentioned in the attributes of the corresponding influences (there will be some attribute of the kind (Kinetic_rate, 15) belonging to P ICR or P IRC , for instance).

Definition 7 (MIN)
A Modular Interaction Network is a tuple ( ) where: • is the set of variables of the model; it is partitioned in a set of chemical species and a set of regulatory sites; • is a set of influences from chemical species to regulatory sites through an affinity of the former and there is no more than one influence between such a pair of variables through the same affinity; • is a set of influences from regulatory sites to chemical species and there is no more than one influence between such a pair of variables; • ⊂ Ω is a set of observed partly defined states of the biological system; • is a set of links to sources of the information about those observations.
In figures, species will be represented by boxes, affinities by triangles inside the boxes of species, regulatory sites by ellipses, influences of a species on a regulatory site by plain arcs, and influences of a regulatory site on a species by dashed arcs. A small example of an interaction network is presented in Figure 3.
A MIN model having a highest level of detail has the property that each regulatory site corresponds to a (single) chemical reaction. We present an example of such a model in Figure 4. It illustrates the CI protein synthesis from the CI gene regulated by the OR1 regulatory site in function of the presence of CI protein dimer.
The corresponding chemical species are represented by chemical species of the MIN model. The biochemical reactions of this example are represented by regulatory sites, because a reaction is possible when all the substrates are present. This reaction regulates the level of activity of a chemical species by increasing or decreasing its quantity A small interaction network representing the chemical species CI and the (regulatory) site named OR1 Figure 3 A small interaction network representing the chemical species CI and the (regulatory) site named OR1. Left. The influence ICR links the affinity labeled OR of species CI with the site OR1, and the influence IRC links the site OR1 and the species CI. In the λ switch, the regulatory site OR1 corresponds to the regulatory region in the DNA molecule coding for the protein CI. Thus, CI can influence the regulatory site OR1, and the activity of CI can be regulated through the regulatory site OR1. Right. The corresponding relation indicating the biologically observed states of the network.  (concentration). Each reaction has an attribute "reversible" or "not reversible". For instance, if a reaction is reversible, this means that all the species connected to this reaction can be either products or substrates of the reaction. Another attribute of the regulatory site is a kinetic rate, which is in general a function of other mensurable parameters of the system such as concentrations of species catalyzing the reaction or even non participating directly in the reaction but influencing its kinetics. For example, such species can sequestrate one or more substrates or products or catalyze intermediate reaction steps. Another natural parameter of the kinetic rate function is the temperature: biochemical reactions go faster when the temperature increases.
On each influence adjacent to the regulatory site, an attribute corresponding to the stoichiometric coefficient is indicated. It may have 3 qualitatively different values: • 0, which means that the corresponding species is an enzyme, i.e., it is not consumed or produced in this reaction, even if its presence is necessary for the reaction takes place; • a numerical value, which corresponds to the number of molecules implicated in the reaction, generally one or two; • any other label, standing for a vector of coefficients saying how many molecules of each of the 20 types of aminoacids (a 1 , a 2 ,...,a 20 ) or each of the 5 types of nucleotides (n 1 , n 2 , n 3 , n 4 , n 5 ) is needed to synthesize the macromolecular product of the reaction.
For example, the stoichiometric coefficients for Nucleotides and Aminoacids in Figure 4 are labels, and each label represents the composition of the corresponding macromolecule: CI RNA or CI protein. In general, the opposite reaction of the biochemical synthesis is degradation, and it liberates the same quantities of the corresponding substrate residuals. The stoichiometric coefficients for RNA_pol or Ribosome are 0, which means that these are enzymes in the reactions of CI RNA synthesis and of CI protein synthesis. The stoichiometric coefficient for CI is 2 for the reaction of the dimerisation of CI, meaning that two molecules of CI are needed to form a dimer.

Compression of MINs
In order to simplify MIN models, it may be interesting to find the variables representing the same biological object and to combine them. So, the following defini-tion introduces the syntactic compatibility and the union of variables.

Definition 8 (Compatibility and union of variables)
Let {V i | i = 1, 2,...,k} be the set of variables of the MIN , The variables in this set will be said to be compatible if they have the same names (∀V i , V j ), their unique attributes are compatible ((x, y, 1) In such a case, their union , with .
As the values of variables come from different biological experiments, in order to compare them we need to use the same approximations as generally accepted by biological science. This means that the "equality" of values w i = w j should be confirmed by a biologist when it is not obvious.
Notice also that chemical species may only be compatible with other chemical species, and similarly for regulatory sites.
This definition will sometimes allow to reduce the representation of a MIN, by replacing compatible sets of variables by their union. Moreover, the translation of MIN representation in other formalism can allow further compression of variables depending on the capability of the formalism to distinguish between different biological objects.
Thus, the simplification is an operation on MIN which produces MIN ' in a following way: • First of all, the compatible variables of the MIN are combined; • then, the ICRs (IRCs) of a variable V 1 on V 2 of the MIN are linked to the variables and of ', where is compatible with V 1 and is compatible with V 2 ; • the relation is updated: the entries containing a pair of combined variables with different observed values are splitted in two entries where only one value at a time is listed for the combined variable.
The formal definition of MIN simplification is presented below. • each variable V' ∈ ' represents the union of compatible variables composing the set V' (V' = ∫ V∈V' V);

Composition of MINs
One of the main characteristics of MINs is that they are modular and enable an incremental construction of models of biological systems. The operation of composition of two MINs includes establishing new, composed, sets of species, sites and influences. The species set of the resulting MINs is the union of species of the composing MINs, and the new sites set is the union of regulatory site sets of composing MINs. All the information about the interactions in composing systems must be also preserved. That means that a particular attention should be paid on the conversion of influences from composing MINs to the resulting one. If source MINs do not contain common species, there is no transformation to perform; the data from these MINs should be just put together.

Definition 10 (Union of MINs)
If This means that MIN models can be composed from parts that share the same species or are completely independ-

 
ent. This can be very useful at the first construction stages of biological regulatory networks where the data is incomplete and is not necessarily connected.
In case of presence of equivalent regulatory sites or species in the resulting MIN, the union of these sites or species must replace them. In this case the in-fluences between all sites and all the species, which were influencing one another in the source MIN, must be established (see Figure 5). If there are in the source MIN two different influences between the same affinity of a species and the same regulatory site, they must be replaced by only one influence carrying the union of all possible attributes of both connections. In a same way, if there are two different influences from a regulatory site on a given species, it must be replaced by the influence carrying the union of all possible data, using the previously defined operation of simplification of MIN.

Multivalued logical formalism (MLM): basics
The multivalued logical approach is designed to express the interdependency between activity levels (often concentrations) of biological objects, e.g., proteins. It applies when this interdependency can be represented by a sigmoidal curve, which is approximated by a multivalued logical function. This function can distinguish between different levels of activity of a biological object, so it may be multivalued (see Figure 6). The multivalued logical model (MLM) consists of two parts: a directed graph of interactions and a table of dynamic parameters.
The goal of modeling genetic regulatory networks in the multivalued logical formalism [7] is to obtain a state graph representing the behaviour of a biological system from a qualitative point of view. This means that an observable sequence of states of a biological system is represented by a path in the state graph of the model.
The multivalued logical formalism, which has been shown very useful for genetic networks study [11,12], is composed of a directed labeled regulatory graph and a table of dynamic parameters. The state of the regulatory graph, expressed through the labels of its vertices, can evolve according to dynamic parameters. The possible traces of this evolution can be represented in the form of a state graph. The nodes of the state graph represent the different states of the system and the arcs of the state graph represent the possible activity modifications of the biological objects.
For dynamic systems with saturation (like genetic regulatory networks) one can approximate the sigmoid curve, Union and compression of interaction networks Figure 5 Union and compression of interaction networks. Three networks sharing species and regulatory sites can be combined into one by a composition and compressed by collapsing equivalent species and sites. All existing interactions are preserved.
representing the level of the activity of a variable as a function of the level of another one, by a multivalued logical function. This approximation is called logical abstraction because it allows to distinguish between only two activity states of the system: below the threshold level and above it.
The following definition describes an instance of MLM as introduced by R. Thomas. It is composed of a regulatory graph (U, E) and a table K of dynamic parameters (see Figure 7). Each node u of the graph corresponds to a variable with integer values between 0 and the boundary b u of the variable, which drives the topology of the corresponding state graph. The influences between variables in MLM can be positive (inducing) or negative (inhibiting).

Definition 11 (Instance of a Multivalued logical model)
An instance M of an MLM of a genetic regulatory network is a pair ( , K) where: -each vertex u ∈ U is called a variable of the genetic regulatory network, and is provided with a strictly positive integer b u called the boundary of u; -each arc (u 1 , u 2 ) ∈ E is labeled by a pair (θ, ε) where θ, called the threshold, is an integer between 1 and , and ε, called The multivalued logical approximation of the level of activity of biological objects Figure 6 The multivalued logical approximation of the level of activity of biological objects. the sign, belongs to {+, -}. When ε = +, u 1 is called an inducer of u 2 . When ε = -, u 1 is called an inhibitor of u 2 . The set of predecessors of u 2 is denoted -1 (u 2 ). In order to unify the treatment of different influences between variables, the definition of resources of a variable is introduced in MLM. The variable u 1 influencing the variable u 2 is a resource in some state if u 1 helps the variable u 2 in that state, meaning that u 1 acts to increase the activity level of u 2 .

Definition 12 (Resources of a Variable)
Given a state μ and a variable u ∈ U of a MLM M, the set of resources of u is the set ω u (μ) containing all the variables u' of M such that: • u' ∈ -1 (u) is a predecessor of u in the underlying directed graph G of M; • the arc (u', u) is labeled by (θ, ε) and The set of variables ω u (μ) is consequently the subset of - The dynamics of the MLM reflects the dynamics of a "continuous" biological process, so the model variables cannot "skip" values: going from "1" to "3", for example, without passing by the value "2". So, the multivalued logi-cal function is introduced to describe the evolution of a variable level in a given system state.

Definition 13 (Multivalued Logical Function)
Given a state μ and a variable u of an instance M of MLM, the multivalued logical function κ u (μ) is defined as follows: The function κ u represents a "step by step" evolution of the expression level of u from its current expression level μ(u) to its dynamic parameters . The state graph of a MLM is often called asynchronous because only one variable can evolve at a time. Then, the evolution of the model can be represented as a state graph, where the system can move on a graph of system states according to its multivalued logical function.

Definition 14 ("Asynchronous" State Graph)
The state graph of a MLM M is the directed graph whose vertices are all the possible states of M and such that there is an edge from μ to μ' if and only if there exists a variable u satisfying: is the multivalued logical function for u; • for any variable u' ≠ u we have μ'(u') = μ(u').
An arc of the state graph from μ to μ' is usually denoted as (μ → μ') and is called a transition. This is illustrated in Figure 7(right).

Translation of a MIN into an MLM
This section presents the translation algorithm of MIN into MLM formalism. It is structured in a following way. First of all, we note that multiple translations of MIN model into MLM formalism are possible, and the impact that it has on the translation algorithm. After that, the translation itself is described, starting with the construction of the MLM regulatory graph topology, then determining the dynamic parameters. At the end, this section contains an example of a translation of a small MIN network into MLM.
The obtained by translation MLM model will be called the translated network. As in many cases, the values of all parameters of the MLM model cannot be deduced precisely from the experimental data; the set of all possible parametrisations consistent with biological observations must be considered as a model which can be studied and later be refined by adding other information.
The biological information presented in MIN is much richer than that of an MLM instance, so one MIN can have multiple semantics expressed through a set of MLM instances. In other words, an MLM may be assimilated to the set of its instances. The topology of the regulatory network, as well as the boundaries, will be the same for all instances (deduced from that of MIN). However, dynamic parameters, as well as arc labels can be different since an arc of an MLM regulatory graph may correspond to several arcs of a MIN (one by affinity). As the observable values of a variable of a MIN are partially ordered (see Definition 1), the different ways of enumerating values of u (topological sort) will be considered as yielding different instances of the MLM. So, in the following, we will consider every combinations of possible parameters as one instance of MLM, and the translation procedure of MIN into MLM will give all these possible parameters that can be deduced from MIN data. The arcs of the regulatory graph of the MLM are deduced from the MIN structure in the following way: there is an arc between the translated variables u 1 and u 2 iff there is a pair (ICR, IRC) in MIN such that R ICR = R IRC , and C ICR and C IRC are the original species of variables u 1 and u 2 , respectively (see Figure 8).
The MLM regulatory graph is not complete yet, as we need to find the arc labels. These labels depend on the observed values of MIN variables. The information on the possible combinations of observed values of variables is contained in the relation . The same type of knowledge enables us to determine also the dynamic parameters of the MLM model. However, the influences are defined in MIN between chemical species and regulatory sites, but the MLM model encompasses the regulatory sites inside the variables representing the species, as shown in the previous definition. Thus, we need to reconstruct the parameters of influ-ences of species on species from and the MIN topology.
In order to find the arc labels of the translated regulatory graph and the corresponding dynamic parameters K, we introduce the relation Ψ ik between values of the species C i and the species C k , called interspecies regulation relation. In order to translate the information about the dynamics of the biological system, contained in , we need to define the choice operation σ, which we will call a selection, as presented in following definition. For each pair of variables V i , V j , the selection returns the observed system states in which both values of variables i and j were measured.

Definition 16 (Selection of observed states for a pair of MIN variables)
The selection of observed states of a biological system for a pair of variables V i , V j is the subset such that if and only if ω(V i ) and ω(V j ) are both defined.
The selection will be used in the next definition in order to formally define the interspecies regulation relation Ψ i,k , which links the values of species i and k which could be observed experimentally at the same time. This relation lists the values coming from lines where states were observed for species i, species k and the regulatory site R, influenced by i and influencing k. That means that the interaction of species i and k is transmitted by the regulatory site R.

An interspecies regulation relation is a relation between values of the species C i and C k of a MIN , defined when the species C i regulates the species
. Thus, the Ψ relation lists the pairs of values (w i , w k ) of species C i and C k such that the value w i of the species C i and the value w k of the species C k where observed simultaneously or when the regulatory site linking them was in the same state (for an example see Figure 8).
The next definition uses the interspecies regulation relation in order to add the missing labels on the arcs of MLM regulatory graph, translated from MIN. The observed values, returned by the interspecies regulation relation, are sorted by the first value, and then the algorithm tries to fit them to a sigmoid curve, an ascendant or a descendant one. If such fitting is possible, the algorithm tries to determine the threshold for this sigmoid curve. The first fact is translated by the sign, "+" or "-", in the arc label. The threshold value is also mentioned on the corresponding arc, when found.

Definition 18 (Translated regulatory graph)
translated regulatory graph = (U, ) (representing a set of genetic regulatory graphs) is a directed graph where: • U is a set of translated variables of ; • is the set of arcs (u 1 , u 2 ) between variables of U such that: and ∃ICR ∈ , ∃IRC ∈ such that C ICR = C 1 , R ICR = R = R IRC and C IRC = C 2 . For each pair (ICR, IRC) satisfying these conditions we will use the notation (ICR + IRC) ∈ (u 1 , u 2 ).
-the arc (u 1 , u 2 ) is labeled with a set of pairs (θ, ε) such that: The translated regulatory graph looks very much like a MLM model, but there are still some differences. It may contain several labels by arc, and these labels contains observed values, which are not necessary numerical ones. Thus, the next definition describes how to obtain a family of well formed MLM models from .

E) constructed in the following way:
• (u, u') ∈ E iff (u, u') ∈ and it is labeled with at most one of pairs (θ, ε) from the set labelling (u, u') ∈ , if any.
• For each node u of the so constructed translated regulatory graph, let us consider the set Θ u of all thresholds occuring on the arcs originating from u. The bound b u associated to u will be the • If (u, u') ∈ has an empty label, (u, u') ∈ E should be labeled with (t, ε) such that 1 ≤ t <b u and ε = + or -.
A state μ of such a graph G ∈ associates then to the node u a numerical value in {0,...,b u } identifying an interval between two successive thresholds.
The MIN representation of biological systems is richer than that of MLM, already because the last does not take into account states of regulatory sites. So, several states of the MIN may be represented by only one state of the MLM. In order to establish the connection between dynamic parameters of both systems, the correspondence between states of them must be introduced: one MLM state corresponds to a domain of states in MIN.

= ( ) is a MIN, and G = (U, E) is one of the family of labeled directed graphs compatible with the translated regulatory graph of
, μ is a state of G, μ is the set of states ω ∈ Ω such that ∀u ∈ U if C ∈ is the original species of the variable u then (μ(u) = 0 ∧ ω(C) ՟ θ 1 ) ∨ (0 ). μ is called the translated state of the domain μ , and μ is the set of original states of μ.
In order to obtain the MLM translation of a MIN, we still need to define the dynamic parameters K associated to the possible states of the graphs G compatible with . The dynamic parameters for a variable are composed of observed states found in at lines determined by possible values of this variable's resources.

Definition 20 (MLM translation)
is a family of instances M = (G, K) such that: • G is one of the family of labeled directed graphs compatible with the translated regulatory graph of ; Numerical values are associated to dynamic parameters using the partial order on values of the original species or other information, preserving the order obtained after the threshold ordering.
The Figure 9 illustrates the dynamic parameters translation from MIN model which is presented in Figure 3.

Application to the λ phage genetic switch Modeling the interacting entities
The chemical species of the model are associated to the chemically active molecules of the system: proteins CI and CRO, which are able to bind the regulatory sites of the λ switch. The regulatory sites named OR1, OR2 and OR3 can be distinguished in the regulatory region of the λ switch. Both proteins can bind these regulatory sites. This binding capability will be represented by the affinity labeled OR. The regulatory sites will be labeled with the same label OR.
The corresponding regulatory DNA regions OR1, OR2 and OR3, controlling the expression of CI and CRO, are shared by two genes: cI and cro. It means that the same regulatory site is used to control both genes, and that its state determines the activity level of both proteins simultaneously. So, the influences of CI and CRO on regulatory sites OR1, OR2 and OR3, and of these sites on the proteins' activity can be added into the model.
The static information about the biological system includes the information about observable values of variables. The observable states of regulatory sites OR1, OR2 and OR3 are "CI_bound, CRO_bound" or "free". Three different observable levels of activity (concentrations) of proteins can be measured: "absent", "low", "high" for CI and "absent", "present", "high" for CRO.

Dynamics of the system
The dynamic description of the biological system in MIN is expressed through the attributes of influences and in relation (see Figure 8).
The "affinity of CI for OR1 is tenfold higher than for OR2 and OR3" [1] can be translated in our formalism by placing the entry (CI = low; OR1 = CI_bound, OR2 = free, OR3 = free) in .
The property of the cooperativity between interacting molecules such as "CI bound to OR1 increases the affinity of OR2 for another tenfold" can be represented in MIN The next type of information concerns the influence of regulatory sites on the protein activity level. The fact that the "Polymerase binding to the CRO promoter is disabled if CI is bound to OR1" can be translated in our formalism by the fact that the protein CRO is absent when the OR1 site is bound, so we add the entry (OR1 = CI_bound; CRO = absent) in .
In the same way the cooperativity could be represented in the expression of CI. Its promoter is naturally weak, but it can produce important quantities of CI if the site OR2 is The highest binding affinity of CRO is for OR3, so that CRO rapidly shuts off CI production by excluding the RNA polymerase from CI promoter, so, another condition for CI production is that OR3 remains vacant. It can be represented by entries (OR3 = CRO_bound, CI = absent) and (OR3 = free, CI = present) in .
Pr, the CRO protein promoter, is inherently a strong one, so as soon as the site OR1 is vacant, CRO protein is produced, which is represented in MIN by entries (OR1 = CI_bound, CRO = absent), (OR1 = CRO_bound, CRO = absent) and (OR1 = free, CRO = high) in .
The resulting MIN is represented in Figure 10.
In order to transform the MIN representation of the λ switch in MLM we need to obtain the corresponding interaction graph and the dynamic parameters.

Translated interaction graph
The choice of variables of MLM is obvious: variables CRO and CI will represent the interacting molecular species of the MLM.
We can also follow in the MIN all described interactions between these two variables: CI regulates its own expression and the expression of CRO through sites OR1, OR2 and OR3. In the following, the ICR i,a,j notation means the ICR from the variable V i to the variable V j of MIN through the affinity a, and IRC ij means the IRC from the variable V j to V j .
CRO regulates its own expression and the expression of CI through the same regulatory sites: A MIN representing the genetic switch of the λ phage

Figure 10
A MIN representing the genetic switch of the λ phage. Species CRO and CI represent proteins which bind with the affinity OR to the regulatory sites OR1, OR2 and OR3. These sites are present in the regulatory regions of genes encoding both proteins, so that they influence the corresponding species CI and CRO. The relation is the same as in Figure 8.  In order to obtain the labels of arcs of the MLM model, the corresponding relations are calculated from the relation , as shown in Table 1.
Using the Definition 18 of the translated regulatory graph, we can obtain the subsets of relations in which the values of C i are fully ordered.
For Ψ CI,CRO and Ψ CRO,CRO two fully ordered subsets can be constructed (see Table 2).
For the relation Ψ CRO,CI , four fully ordered subsets can be constructed, as presented in Table 3.
Three of four cases lead to the same threshold pair, and the fourth does not have one. So, the arc (CRO, CI) of the translated regulatory graph should be labeled with θ CRO,CI = low and ε CRO,CI = -.
For the relation Ψ CI,CI 18 fully ordered subsets are possible, and they are presented in Figure 9, as well as four labels of the arc (CI, CI).
Here we can take an assumption that the MLM can not distinguish between the variable values "present" and "low" and we will attribute the same numerical values to them. Replacing the MIN value "absent" by MLM value 0 and thresholds "low"/"present" and "high" by numerical values {1 and 2}, the family of interaction graphs of the translated MLM of the λ switch is obtained (see the Figure   11).
Dynamic parameters for every instance of the obtained MLM can be derived from the relations Ψ according to definition of translated parameters.
Dynamic parameters for the variable CRO are the same in all three instances and are shown in Table 4.
Dynamic parameters for the variable CI can have different values according to the chosen MLM instance. The sets of possible values are shown in Table 5.
This example illustrates the construction of the MIN model from the biological data and shows that this model can be automatically translated in the MLM formalism. In the worst case, the interaction graph of the MLM is constructed from the MIN representation, but no constraint is found on the dynamic parameters (as for parameters in networks C and D, Figure 11). In the best case, only one value for each dynamic paramter will be produced (as for K CI,{CRO} ).

From MIN to ODEs
An important part of the biological knowledge comes from biochemistry. It covers information about the dynamics of chemical reactions, which are treated in the in silico models through the device of ordinary differential equations (ODEs).
Differential equations aim at expressing the concentration of a chemical species as a function of time, knowing its production and degradation rates: where k i is the reaction rate for the i-th P-production chemical reaction, α ij is the stoichiometric coefficient of is the concentration of the latter, and k l , α lj , [S lj ] denote the corresponding elements for the l-th P-degradation reaction and its co-substrates.
In order to translate the MIN model in ODEs, we need to write the set of chemical reactions in the biological system, and to deduce (if possible) the reaction rates from the parameters of the influences of the MIN model. In a case where the mechanism of the reaction is unknown, it may be written in Michaelis-Menten form: , where E is an enzyme catalyzing the reaction but not consumed in it. The translation of this reaction into differential equations is a known issue.  A MIN model detailed enough to be directly translated to ODEs is presented in Figure 4. For each chemical species in Figure 4 we can write a differential equation summing its consumption and production in chemical reactions the species is participating (see Figure 12). If the additional information is available and encoded in MIN in attributes such as k i and K aff , they will be used in the translation to ODEs procedure. If this information is not available, a free constant denoted in a standard way will be generated.
The stoichiometric coefficients give the α i power coefficients in the formula, and the k j reaction rates come form the corresponding reaction attributes.
For example, in the third equation describing the production of the CI RNA from nucleotides, CI_RNA corresponds to the quantities of each of the four nucleotides composing the CI RNA: A, U, C and G (the last one, T, being absent from the RNAs). The RNA polymerase (RNA_pol in Figure 4) is the enzyme which catalyzes the CI RNA synthesis without being consumed in this reaction, so its concentration influences the reaction rate k CI_RNA_synth and it is taken into account in the function f·OR1·CI 2 stands for the DNA information source for the CI RNA synthesis, and it acts also as a catalyzer: without this species the CI RNA synthesis is impossible. One molecule of CI_RNA species is produced from all the necessary nucleotides on the matrix OR1·CI 2 and under the action of the RNA_pol. The first equation describes the concentration of the CI protein dimer CI 2 . The right part represents the synthesis of one molecule of CI 2 from 2 molecules of CI (first term) minus the dissociation of the CI 2 species on 2 CI proteins (second term).
More generally, any MIN model can be translated into differential equations with an automated procedure, even if it was not explicitly constructed to represent a set of biochemical reactions. In some cases, it may be necessary to first demulti-ply MIN regulatory sites in order to translate the model directly as for the example in Figure 4.
While the states of a chemical species may characterize the degree of its activity, through a discrete indication like "absent", "low", "high", or through a quantitative information like the concentration, leading quite directly to a representation in ODEs, the states of a regulatory site may potentially be more difficult to interpret. In the simplest case a regulatory site represents a single chemical reaction. The regulatory sites modeling to single chemical reactions, like "CI RNA synthesis", "CI protein synthesis" or "CI dimerisation" in Figure 4, correspond to such a situation, and are easy to translate in ODEs.
However, in a more complex case, a regulatory site may encompass through its different states a family of biochemical reactions, making a direct translation difficult. Actually, the concentrations of participating species for a single chemical reaction are sufficient to find out its activity rate, thus represented by a function. For a family of reactions, the reaction rate is not always a function (but a relation) of the concentrations of each species, and this is precisely the difficulty of the translation to ODEs.
Let us consider the example in Figure 13. The MIN model looks very much like the one in Figure 3, but the IRC and ICR are provided with additional properties such as k i , K aff and production_rate which reflect the kinetic properties of the corresponding biochemical reactions. If the regulatory site "OR1" in Figure 13 is in the state OR1, it means that neither of the two reactions ("CI RNA synthesis" and "CI protein synthesis") take place in the cell. When the same site is in the state OR1·CI, it means that both "CI RNA synthesis" and "CI protein synthesis" take place. Thus, it is possible to reduce this complexity by demultiplicating the regulatory sites as a first step of the translation of a MIN model in ODEs. The demultiplication of a regulatory site R replaces it by a set of (new) species associated to the states of R and a set of (new) regulatory sites associated to the chemical reactions. In other words, every regulatory Differential equations obtained by an automatic translation of the MIN model in Figure 4  state of R will now give a chemical species participating in a defined set of chemical reactions, represented by newly generated regulatory sites. After the demultiplication, each regulatory site represents a single chemical reaction, which means that the species connected to it may potentially be produced or consumed, and may be automatically translated to ODEs. Some optimizations may be performed at this stage, for instance, if one knows if the species are consumed or produced, which may be indicated in the attributes (such as "stoichiometry", "production rate", "degradation rate" or "kinetic rate") of the corresponding influences ICRs and IRCs.

Discussion
The MIN representation proposes a rich formal description of biological interaction networks. The methodology of modelling biological systems in an incremental MIN representation is illustrated by a case study on the λ switch system. The formalisation of biological data is independent of any given modeling or simulation approach. The main goal of MIN is to contain as many different data about interacting entities as possible in order to make them accessible to any particular modeling approach. A translation into R. Thomas' formalism allows the modeler to obtain an MLM model from the available data, and the MLM is consistent with other models of the same system [11]. While the translation from MIN into MLM is rather complicated, it can be easily automated using the algorithm presented in this paper. However, without the expert intervention, the number of MLM models can be high. The modeler can act on the data put into the MIN model, changing and refining it, and this change will have an impact on the produced MLM translated models. However, there is no need for an expert to deeply understand the algorithm itself. The translation of MLM instances can be further continued into Petri nets as studied in [2] and, thus, provides an access to the available Petri net tools for analysis. Each formalism has its advantages and fits the description of a certain data type, the complete and efficient description of biological systems is possible only by combining these tools. A formalism forces an interpretation of available data in order to fit them in its framework. Some data which are incompatible with the chosen framework will inevitably be lost. Sometimes the same model represented in different formalisms can hardly be recognized [13,1,5,4].
The situation where a MIN variable have a high number of observed (quantitative or qualitative) values may occur. However, this is not necessarily a problem, as the fact of having a lot of observations for the same variable means that the corresponding biological object plays an important role in the biological process being studied. In this case, every species regulated by this object through a regulatory site is supposed to generate a logical threshold of action. In addition, the fact that several quantitative values are not significantly different is the additional information, which, if available, may be encoded in the partial order for the variable values as a class of equivalence for several variable values.
The representation of regulatory sites and affinities separately from chemical species helps to represent in a "formal" way large proteins with many functional domains, or a complex set of regulatory sites in a protein or in a gene. The specificity of the λ phage genetic switch is that the promoter region of two different genes is represented by the same biological object (DNA region). This fact is represented in our formalism by having only one set of regulatory sites of the λ switch which influence two different species: CI and CRO.
The same MIN model as the one used for genetic regulation modeling, enriched with complementary information allowing the translation into differential equations Figure 13 The same MIN model as the one used for genetic regulation modeling, enriched with complementary information allowing the translation into differential equations.
MIN enables an incremental model construction through the composition of MINs and the storage (in the species affinities and regulatory site labels) of the information about possible interaction capabilities of biological entities. Thus, MIN can help in the model construction by a rational choice of new variables to be added to the model: with compatible regulatory sites or affinities.
Experimental techniques in biology collect massive amounts of information on the behavior and interaction of thousands of genes and proteins across diverse conditions. These techniques are used to question complex biological systems that use highly intricate regulatory mechanisms and control schemes. One cannot fully characterize such complex cellular systems by focusing on a single control mechanism, as measured by a single experimental technique. In MIN, the data coming from different experimental techniques are all stored in . To gain a deeper understanding of the system, it is pertinent to analyze heterogeneous data sources in a truly integrated fashion and to shape the analysis results into one body of knowledge [14,15].
We proposed a new paradigm for the modeling of biological systems, in which all available experimental data are considered as a set of snapshots of the real system and stored in without any interpretation. The information about the system is added and refined incrementally. The current state of knowledge in MIN can be automatically translated into a given formalism framework for the analysis of the dynamics of the system; it could also be used in the future by an inference system applying artificial intelligence techniques [16] to solve complex biological problems.
Over the last few years, some work has been carried out in the field of integration of biological and, in particular, biochemical data which includes rich but informal visualisation conventions [17,18]. Even if MIN is not designed as a graphical model, it provides a quite simple visualisation convention with two types of nodes and two types of Recently, a method for representing and communicating biological networks in both human and machine readable form has been presented in [19]. The ambition of this work is obtaining a semantically and visually unambiguous diagram scheme, but this leads to a very low level representation of processes and the use of many kinds of nodes and links. Compared to this, MIN does not require an equivalent degree of details and enables to adjust the abstraction level of the model. Another approach, based on formal but not very expressive exchange formalisms, like SBML [20], attempts to standardize the expression of ODE based models of cellular systems, concentrating on chemical reactions. Obviously, existing SBML models can be wrapped in a MIN description. In the same standardisation effort more abstract and universal meta-modelling approaches [21][22][23][24] tend to create a general visual language for systems biology, similar to UML. For instance, BioUML [24] provides an abstract layer to present structure of any biological system as a clustered graph. MIN should be expressed in this language to use the infrastructure based on BioUML, to access to the biological databases and to automatically generate the executable models.
Thus, the proposed new formalism, MIN, can play the role of an intermediate level between insufficiently formalized "natural language" and too specialized "mathematical descriptions" of biological systems. The MIN construction is a process of inference of the biological interaction networks from the biological observations of microscopic and macroscopic levels. Its underlying structure provides a skeleton for the understanding of "first principles" of the organisation of biological systems. A computer analysis tool to study the properties of MIN models, to perform automatically their composition and translation into different formalisms, is currently under developed and should soon become available for download. The study of the relation between the information available in MIN and the best suited model is on of the perspectives of this project.

Conclusion
The description of a biological system is often obtained by constructing an interaction network. Intuitively, as biological interactions are considered to always rely on so called regulatory sites, the network construction starts by their identification. Every regulatory site has a set of regulating and regulated chemical species and their role is expressed by influences. Sometimes, and in particular when the abstraction level is high, the choice of representing a set of biochemical reactions by a species or by a regulatory site is rather arbitrary. However, at the base level the chemical reactions are represented by regulatory sites and chemical species by species of MIN. Furthermore, both species and regulatory sites are fully characterized by their levels of activity indicated (as string value) in the modeler's description of the states of a biological system. For the translation into other formalisms the values of the level of activity may be interpreted, if allowed by the target formalism, or ignored. As a consequence, regulatory sites and chemical species form the set of variables of the interaction network (see Table 6 for some examples of variables). Thus, two main classes of abstract entities are chosen to be components of interaction networks: variables and influences between them. We consider two kinds of influences between the variables of the model: Influences of Chemical species on Regulatory sites (ICR) and Influences of Regulatory sites on Chemical species (IRC). We also assume that there is no influence between variables of the same kind. The whole representation is called Modular Interaction Network (MIN).
Such models may be composed. The trivial case of a composition is the union of models having no common species or sites. The union of data contained in these models is the new, composed, model. In the case of models sharing common entities, the repeated nodes of the resulting network are collapsed. MIN being an abstract formalism, its semantics is not intended to be defined directly, but rather as a translation into a target model. In this paper, we first define a translation of MIN into the Multivalued Logical modeling formalism (MLM) [7].
The multivalued logical representation of genetic regulatory networks [7] is one of the closest to the biological intuition. The major problem of this formalism is that it is not incremental, which means that updating an existing model (by adding or removing nodes or edges in the regulatory graph, for instance) leads to the situation where the set of dynamic parameters changes in an unpredictable way, as well as the dynamics of the system. In order to cope with this problem, the idea is to describe the biological system in MIN and translate it automatically, when needed, at any modeling step, into the multivalued logical formalism. This translation should preserve as many as possible of the biological properties already expressed in MIN. The dynamics of the translated MIN is then based on the information available in the attributes of its influences. The interaction graph can be obtained more or less directly from the MIN presentation of a biological regulatory network. The variables of the MLM (nodes of the graph) are obtained from the species of the MIN. The influences of MLM (edges of the graph) are obtained from pairs of (ICR, IRC) present in the MIN and having a common regulatory site. The dynamic parameters of MIN indicated as attributes of its influences will serve to constrain possible dynamic parameters in the obtained multivalued logical model.
In order to further illustrate the flexibility of the MIN approach, we have also shown how to extract the dynamics of the associated chemical reactions in terms of ordinary differential equations, either directly or through a demultiplication of the regulatory sites which may represent various different reactions.