HMGB1 signaling pathway
Our HMGB1 signaling pathway model is illustrated in Fig. 1. It includes 31 molecular species (6 tumor suppressor proteins), 59 chemical reactions, and three different signaling pathways activated by HMGB1: the RAS-ERK, Rb-E2F and p53-MDM2 pathways. Since the interaction between HMGB1 and its receptors TLR and RAGE is not clear at the mechanistic level, RAGE is used to represent all the receptors in our model in order to reduce the number of unknown parameters. We now briefly discuss the three pathways and their crosstalk. We denote activation (or promotion) by →, while inhibition (or repression) is denoted by ⊣.
The p53-MDM2 pathway is regulated by a negative feedback loop [26]: PI3K → PIP3 → AKT → MDM2 ⊣ p53 → MDM2, and a positive feedback loop: p53 → PTEN ⊣ PIP3 → AKT → MDM2 ⊣ p53. The protein PI3K is activated by the toll-like receptors (TLR2/4) within several minutes after TLR2/4 activation by HMGB1 [27]. In turn, PI3K phosphorylates the phosphatidylinositol 4,5-bisphosphate (PIP2) to phosphatidylinositol (3,4,5)-trisphosphate (PIP3), leading to phosphorylation of AKT. The unphosphorylated oncoprotein MDM2, which is one of p53's transcription targets [28], resides in the cytoplasm, and cannot enter the nucleus until it is phosphorylated by activated AKT. The phosphorylated MDM2 translocates into the nucleus to bind with p53, inhibiting p53's transcription activity and initializing p53 polyubiquitination [29], which targets it for degradation. Also, p53 can regulate the transcription of PTEN [30], a tumor suppressor protein, which can hydrolyze PIP3 to PIP2, thereby inhibiting the activation of AKT and MDM2.
The RAS-ERK pathway is the activation sequence: RAS → RAF → MEK → ERK → CyclinD. Activation of RAGE by HMGB1 leads to RAS activation, which in turn activates its effector protein RAF. Activated RAF will phosphorylate the MEK proteins (mitogen-activated protein kinase kinases (MAPKK)), leading to the phosphorylation of ERK1/2 (also called MAPKs). Activated ERK can phosphorylate some transcription factors which activate the expression of the regulatory protein CyclinD and Myc, enabling progression of the cell cycle through the G1 phase. K-RAS, a member of the RAS protein family, is found to be mutated in over 90% of pancreatic cancers [31].
The Rb-E2F pathway is composed of the interactions: CyclinD ⊣ Rb ⊣ E2F → CyclinE ⊣ Rb. The Rb-E2F pathway regulates the G1-S phase transition in the cell cycle during cell proliferation. E2F is a transcription factor that can activate the transcription of many proteins involved in DNA replication and cell-cycle progression [32]. In quiescent cells, E2F is bound by unphosphorylated Rb - a tumor suppressor protein - forming an Rb-E2F complex which inhibits E2F's transcription activity. E2F will be activated and released when its inhibitor Rb is phosphorylated by some oncoproteins (CyclinD and Myc in Fig. 1), leading to the transcription of CyclinE and Cyclin-dependent protein kinase 2 (CDK2) which promote cell-cycle progression. CyclinE, in turn, continues to inhibit the activity of Rb, leading to a positive feedback loop [33–35]. Fig. 1 shows that the activity of CyclinD-CDK4/6 (only CyclinD is shown in Fig. 1) is inhibited by the tumor suppressor protein INK4A, which is inactivated in up to 90% pancreatic cancers [36].
The crosstalk between these pathways can influence the cell's fate since the three signaling pathways in HMGB1 signal transduction are not independent. As shown in Fig. 1, the oncoprotein RAS can also activate the PI3K-AKT signaling pathway; the tumor suppressor ARF protein induced by E2F can bind to MDM2 to promote its rapid degradation and stabilize p53. Furthermore, it has been experimentally observed [13] that the p53-dependent tumor suppressor proteins P21 and FBXW7 can inhibit the activity of cyclin dependent kinases (In Fig. 1, we use P21 to represent both P21 and FBXW7's contribution). Mutations of RAS, ARF, P21 and FBXW7 have been found in many cancers [31, 36, 37]. One of our aims is to investigate how these mutations might influence the cell's fate.
In the HMGB1 model, all substrates are expressed in the number of molecules; proteins with the subscript "a" or "p" correspond respectively to active or phosphorylated forms of the proteins. For example,
We denote the mRNA transcript of MDM2 by mdm 2. We assume that the total number of active and inactive forms of the RAGE, PI3K, PIP, AKT, RAS, RAF, MEK, and ERK molecules is constant. For example, AKT + AKT
p
= AKT
tot
, PIP2 + PIP3 = PIP
tot
. We sometimes use CD to stand for the CyclinD-CDK4/6 complex, CE for CyclinE, and RE for the Rb-E2F complex.
The p53-MDM2 and RAS-ERK pathways have been studied individually using deterministic ODE methods [16–19, 32]. We instead formulated a reaction model corresponding to the reactions illustrated in Fig. 1 in the form of rules specified in the BioNetGen language [38]. We used Hill functions to describe the rate laws governing protein synthesis, including PTEN, MDM2, CyclinD (CD), Myc, E2F and CyclinE (CE). Our choice was motivated by several studies [19, 39–41], which showed that transcription rates of these proteins are sigmoidal functions of transcription factor (TF) concentrations with positive cooperative Hill coefficients. We used mass action rules for other types of chemical reactions. Both ODEs and Gillespie's stochastic simulation algorithm (SSA) [42] are used to simulate the model with BioNetGen [38]. Stochastic simulation is important because when the number of molecules involved in the reactions is small, stochasticity and discretization effects become more prominent [43–45]. In the online Additional file 1, we list 23 ordinary differential equations which describe the deterministic HMGB1 model and all the input parameters. The BioNetGen code which implements SSA and ODE models is available at [46].
Since our understanding of many chemical reactions at the mechanistic level is not clear, a large number of parameters involved in these reactions are difficult to estimate based on existing data. We emphasize that in our HMGB1 model the values for some undetermined parameters were chosen in order to produce a qualitative agreement with previous experiments.
Model Checking
Model Checking [22, 47] is one of the leading techniques for the automated verification and analysis of hardware and software systems. Given a high-level behavior specification, a model checker verifies whether a system (or model) satisfies it. A specification might be satisfied by many different models. Thus, model checking is the process of determining whether or not a given system model satisfies (is a model of) a property describing the desired behavior of the system. Mathematically, system models take the form of state-transition diagrams, while some version of temporal logic [48] is used to describe the desired properties (specifications) of system executions. A typical property stated in temporal logic is G(grant_req→ F ack), meaning that it is always (G = globally) true that a grant request eventually (F = future) triggers an acknowledgment. One important aspect of Model Checking is that it can be performed algorithmically - user intervention is limited to providing a system model and a property to check.
The Probabilistic Model Checking problem (PMC) is to decide whether a stochastic model satisfies a temporal logic property with a probability greater than or equal to a certain threshold. To express temporal properties, we use a logic in which the temporal operators are equipped with bounds. For example, the property "CyclinD will always stay below 10 in the next fifty time units " is written as G50(CyclinD < 10). We now ask whether our stochastic system M satisfies that formula with a probability greater than or equal to a fixed threshold (say 0.9), and we write M |= Pr≥ 0.9[G50(CyclinD < 10)]. In the next section, we formally define the temporal logic used in this work, Bounded Linear Temporal Logic [23].
Bounded Linear Temporal Logic (BLTL)
Let SV be a finite set of real-valued variables, an atomic proposition AP be a boolean predicate of the form e1 ~ e2, where e1 and e2 are arithmethic expressions over variables in SV, and ~ is either ≥, ≤, <, >, or = . A BLTL property is built over atomic propositions using boolean connectives and bounded temporal operators. The syntax of the logic is the following:
The bounded until operator ϕ1 Ut ϕ2 requires that, within time t, ϕ2 will be true and ϕ1 will hold until then. Bounded versions of the F and G operators can be easily defined: Ft ϕ = true U t ϕ requires ϕ to hold true within time t;Gt ϕ = ¬Ft ¬ ϕ requires ϕ to hold true up to time t.
The semantics of BLTL is defined with respect to traces (or executions) of a system. In our case, a trace will be the output of a simulation of a BioNetGen stochastic model. Formally, a trace is a sequence of time-stamped state transitions of the form σ = (s0,t0), (s1,t1),..., which means that the system moved to state si+1after having sojourned for time t
i
in state s
i
. The fact that a trace σ satisfies the BLTL property ϕ is written as σ |= ϕ. We denote the trace suffix starting at step k by σk. We have the following semantics of BLTL:
-
σk⊨ AP if and only if AP holds true in state s
k
;
-
σk⊨ ϕ1 ∧ ϕ2 if and only if σk⊨ ϕ1 and σk⊨ ϕ2;
-
σk⊨ ϕ1 ∨ ϕ2 if and only if σk⊨ ϕ1 or σk⊨ ϕ2;
-
σk⊨ ¬ϕ1 if and only if σk⊨ ϕ1 does not hold;
-
σk⊨ ϕ1Utϕ2 if and only if there exists i ∈ N such that, (a)∑0 ≤ l <itk+l≤ t, (b) σk+i⊨ ϕ2 and (c) for each 0 ≤ j <i, σk+j⊨ ϕ1.
The semantics of BLTL are defined over infinite traces, but it can be shown that traces of an appropriate (finite) length are sufficient to decide BLTL properties [49].
Statistical Model Checking
We briefly explain Statistical Model Checking [50, 51], the technique we use for verifying BioNetGen models simulated by Gillespie's algorithm. Statistical Model Checking treats the Probabilistic Model Checking problem as a statistical inference problem, and solves it by randomized sampling of the traces (simulations) from the model. In particular, the PMC problem is naturally phrased as a hypothesis testing problem, i.e., deciding between two hypotheses - M ⊨ Pr≥θ[ϕ] versus M ⊨ Pr
< θ
[ϕ]. In other words, to determine whether a stochastic system M satisfies ϕ with a probability p ≥ θ, we test the hypothesis H0 : p ≥ θ against H1 : p < θ. Sampled traces are model checked individually to determine whether a given property ϕ holds, and the number of satisfying traces is used by a hypothesis testing procedure to decide between H0 and H1. Note that Statistical Model Checking cannot guarantee a correct answer to the PMC problem. However, the probability of giving a wrong answer can be made arbitrarily small.
We have introduced a Bayesian sequential hypothesis testing approach and applied it to the verification of rule-based models of signaling pathways and other stochastic systems [23, 49]. Sequential sampling means that the number of sampled traces is not fixed a priori, but is instead determined at "run-time ", depending on the evidence gathered by the samples seen so far. This often leads to a significantly smaller number of sampled traces.
Suppose that the stochastic system M satisfies the BLTL formula ϕ with some (unknown) probability p. The key idea behind statistical model checking [50] is that the behavior of M (with respect to property ϕ) can be modeled by a Bernoulli random variable with success parameter p. Such a random variable can be repeatedly evaluated via system simulation in the following way. Let σ be a trace of M, then the Bernoulli random variable X with (conditional) probability mass function:
(1)
denotes the outcome of σ ⊨ ϕ (i.e., model checking ϕ on σ). In other words, we have that:
(2)
Therefore, by running a system simulation (i.e., a BioNetGen stochastic simulation) and by checking ϕ on the resulting trace we can obtain a sample from random variable X. When a sample of X evaluates to 1 we call it a success, otherwise, a failure.
Recall that in hypothesis testing we decide between a null hypothesis H0 and an alternative hypothesis H1:
(3)
The Bayesian approach assumes that p is given by a random variable whose distribution is called the prior distribution. The prior is usually based on our previous experiences and knowledge about the system.
Since p is a probability, we need prior distributions defined over [0,1]. In particular, Beta priors are mathematically convenient to use. They are defined by the following probability density:
(4)
where the Beta function B(α, β) is defined as:
(5)
For later use, the Beta distribution function F(α;β)(u) of parameters α, β is defined as for all u ∈ [0, 1] as:
(6)
(7)
Let d = (x1,..., x
n
) denote n samples of the Bernoulli random variable X defined by (2). Let H0 and H1 be the hypotheses in (3), and suppose that the prior probabilities P (H0) and P (H1) are strictly positive and satisfy P (H0) + P (H1) = 1. By Bayes's theorem, the posterior probabilities of H0 and H1, with respect to data d, are:
for every d with P (d) > 0. In our case, P (d) is always non-zero (there are no impossible finite sequences of data). The hypothesis test method is based on the Bayes Factor, that is, the likelihood ratio of the sampled data with respect to the two hypotheses. The Bayes Factor ℬ of sample d and hypotheses H0 and H1 is
and by Bayes' theorem, we have that:
(8)
Therefore, B can be interpreted as a measure of evidence (given by the data d) in favor of H0. Now, fix a threshold T > 1. The algorithm iteratively draws independent and identically distributed (iid) sample traces in the form of BioNetGen stochastic simulations, and checks whether they satisfy ϕ (Note that BioNetGen ensures by construction that each simulation, or trace, is actually iid.) After each trace, the algorithm computes the Bayes Factor B to check if it has obtained conclusive evidence. The algorithm accepts H0 if B >T, and rejects H0 (accepting H1) if . Otherwise , it continues drawing iid samples. The statistical Model Checking algorithm is shown in Figure 2.
The following Proposition shows that, in our special case of Bernoulli samples, the computation of the Bayes Factor is straightforward.
Proposition 1. [49]The Bayes Factor of H0 : p ≥ θ vs. H
1
: p <θ with Bernoulli samples (x1,..., x
n
) and Beta prior of parameters α, β is:
where is the number of successes in(x1,..., x
n
) and F(s,t)(·) is the Beta distribution function of parameters s, t.
The Beta distribution function can be efficiently computed by standard software packages. Thus, no numerical integration is required for the evaluation of the Bayes Factor.
Finally, we must show that the error probability of our decision procedure, i.e., the probability that we reject (accept) the null hypothesis although it is true (false), can be bounded.
Theorem. [49]The error probability for the sequential Bayesian hypothesis testing algorithm is bounded above by where T is the Bayes Factor threshold given as input.