HMGB1 signaling pathway
Our HMGB1 signaling pathway model is illustrated in Fig. 1. It includes 31 molecular species (6 tumor suppressor proteins), 59 chemical reactions, and three different signaling pathways activated by HMGB1: the RASERK, RbE2F and p53MDM2 pathways. Since the interaction between HMGB1 and its receptors TLR and RAGE is not clear at the mechanistic level, RAGE is used to represent all the receptors in our model in order to reduce the number of unknown parameters. We now briefly discuss the three pathways and their crosstalk. We denote activation (or promotion) by →, while inhibition (or repression) is denoted by ⊣.
The p53MDM2 pathway is regulated by a negative feedback loop [26]: PI3K → PIP3 → AKT → MDM2 ⊣ p53 → MDM2, and a positive feedback loop: p53 → PTEN ⊣ PIP3 → AKT → MDM2 ⊣ p53. The protein PI3K is activated by the tolllike receptors (TLR2/4) within several minutes after TLR2/4 activation by HMGB1 [27]. In turn, PI3K phosphorylates the phosphatidylinositol 4,5bisphosphate (PIP2) to phosphatidylinositol (3,4,5)trisphosphate (PIP3), leading to phosphorylation of AKT. The unphosphorylated oncoprotein MDM2, which is one of p53's transcription targets [28], resides in the cytoplasm, and cannot enter the nucleus until it is phosphorylated by activated AKT. The phosphorylated MDM2 translocates into the nucleus to bind with p53, inhibiting p53's transcription activity and initializing p53 polyubiquitination [29], which targets it for degradation. Also, p53 can regulate the transcription of PTEN [30], a tumor suppressor protein, which can hydrolyze PIP3 to PIP2, thereby inhibiting the activation of AKT and MDM2.
The RASERK pathway is the activation sequence: RAS → RAF → MEK → ERK → CyclinD. Activation of RAGE by HMGB1 leads to RAS activation, which in turn activates its effector protein RAF. Activated RAF will phosphorylate the MEK proteins (mitogenactivated protein kinase kinases (MAPKK)), leading to the phosphorylation of ERK1/2 (also called MAPKs). Activated ERK can phosphorylate some transcription factors which activate the expression of the regulatory protein CyclinD and Myc, enabling progression of the cell cycle through the G1 phase. KRAS, a member of the RAS protein family, is found to be mutated in over 90% of pancreatic cancers [31].
The RbE2F pathway is composed of the interactions: CyclinD ⊣ Rb ⊣ E2F → CyclinE ⊣ Rb. The RbE2F pathway regulates the G1S phase transition in the cell cycle during cell proliferation. E2F is a transcription factor that can activate the transcription of many proteins involved in DNA replication and cellcycle progression [32]. In quiescent cells, E2F is bound by unphosphorylated Rb  a tumor suppressor protein  forming an RbE2F complex which inhibits E2F's transcription activity. E2F will be activated and released when its inhibitor Rb is phosphorylated by some oncoproteins (CyclinD and Myc in Fig. 1), leading to the transcription of CyclinE and Cyclindependent protein kinase 2 (CDK2) which promote cellcycle progression. CyclinE, in turn, continues to inhibit the activity of Rb, leading to a positive feedback loop [33–35]. Fig. 1 shows that the activity of CyclinDCDK4/6 (only CyclinD is shown in Fig. 1) is inhibited by the tumor suppressor protein INK4A, which is inactivated in up to 90% pancreatic cancers [36].
The crosstalk between these pathways can influence the cell's fate since the three signaling pathways in HMGB1 signal transduction are not independent. As shown in Fig. 1, the oncoprotein RAS can also activate the PI3KAKT signaling pathway; the tumor suppressor ARF protein induced by E2F can bind to MDM2 to promote its rapid degradation and stabilize p53. Furthermore, it has been experimentally observed [13] that the p53dependent tumor suppressor proteins P21 and FBXW7 can inhibit the activity of cyclin dependent kinases (In Fig. 1, we use P21 to represent both P21 and FBXW7's contribution). Mutations of RAS, ARF, P21 and FBXW7 have been found in many cancers [31, 36, 37]. One of our aims is to investigate how these mutations might influence the cell's fate.
In the HMGB1 model, all substrates are expressed in the number of molecules; proteins with the subscript "a" or "p" correspond respectively to active or phosphorylated forms of the proteins. For example,
We denote the mRNA transcript of MDM2 by mdm 2. We assume that the total number of active and inactive forms of the RAGE, PI3K, PIP, AKT, RAS, RAF, MEK, and ERK molecules is constant. For example, AKT + AKT_{
p
} = AKT_{
tot
}, PIP2 + PIP3 = PIP_{
tot
}. We sometimes use CD to stand for the CyclinDCDK4/6 complex, CE for CyclinE, and RE for the RbE2F complex.
The p53MDM2 and RASERK pathways have been studied individually using deterministic ODE methods [16–19, 32]. We instead formulated a reaction model corresponding to the reactions illustrated in Fig. 1 in the form of rules specified in the BioNetGen language [38]. We used Hill functions to describe the rate laws governing protein synthesis, including PTEN, MDM2, CyclinD (CD), Myc, E2F and CyclinE (CE). Our choice was motivated by several studies [19, 39–41], which showed that transcription rates of these proteins are sigmoidal functions of transcription factor (TF) concentrations with positive cooperative Hill coefficients. We used mass action rules for other types of chemical reactions. Both ODEs and Gillespie's stochastic simulation algorithm (SSA) [42] are used to simulate the model with BioNetGen [38]. Stochastic simulation is important because when the number of molecules involved in the reactions is small, stochasticity and discretization effects become more prominent [43–45]. In the online Additional file 1, we list 23 ordinary differential equations which describe the deterministic HMGB1 model and all the input parameters. The BioNetGen code which implements SSA and ODE models is available at [46].
Since our understanding of many chemical reactions at the mechanistic level is not clear, a large number of parameters involved in these reactions are difficult to estimate based on existing data. We emphasize that in our HMGB1 model the values for some undetermined parameters were chosen in order to produce a qualitative agreement with previous experiments.
Model Checking
Model Checking [22, 47] is one of the leading techniques for the automated verification and analysis of hardware and software systems. Given a highlevel behavior specification, a model checker verifies whether a system (or model) satisfies it. A specification might be satisfied by many different models. Thus, model checking is the process of determining whether or not a given system model satisfies (is a model of) a property describing the desired behavior of the system. Mathematically, system models take the form of statetransition diagrams, while some version of temporal logic [48] is used to describe the desired properties (specifications) of system executions. A typical property stated in temporal logic is G(grant_req→ F ack), meaning that it is always (G = globally) true that a grant request eventually (F = future) triggers an acknowledgment. One important aspect of Model Checking is that it can be performed algorithmically  user intervention is limited to providing a system model and a property to check.
The Probabilistic Model Checking problem (PMC) is to decide whether a stochastic model satisfies a temporal logic property with a probability greater than or equal to a certain threshold. To express temporal properties, we use a logic in which the temporal operators are equipped with bounds. For example, the property "CyclinD will always stay below 10 in the next fifty time units " is written as G^{50}(CyclinD < 10). We now ask whether our stochastic system M satisfies that formula with a probability greater than or equal to a fixed threshold (say 0.9), and we write M = Pr_{≥ 0.9}[G^{50}(CyclinD < 10)]. In the next section, we formally define the temporal logic used in this work, Bounded Linear Temporal Logic [23].
Bounded Linear Temporal Logic (BLTL)
Let SV be a finite set of realvalued variables, an atomic proposition AP be a boolean predicate of the form e_{1} ~ e_{2}, where e_{1} and e_{2} are arithmethic expressions over variables in SV, and ~ is either ≥, ≤, <, >, or = . A BLTL property is built over atomic propositions using boolean connectives and bounded temporal operators. The syntax of the logic is the following:
\varphi ::=AP{\varphi}_{1}\vee {\varphi}_{2}{\varphi}_{1}\wedge {\varphi}_{2}\left\neg {\varphi}_{1}\right{\varphi}_{1}{U}^{t}{\varphi}_{2}.
The bounded until operator ϕ_{1} U^{t} ϕ_{2} requires that, within time t, ϕ_{2} will be true and ϕ_{1} will hold until then. Bounded versions of the F and G operators can be easily defined: F^{t} ϕ = true U ^{t} ϕ requires ϕ to hold true within time t;G^{t} ϕ = ¬F^{t} ¬ ϕ requires ϕ to hold true up to time t.
The semantics of BLTL is defined with respect to traces (or executions) of a system. In our case, a trace will be the output of a simulation of a BioNetGen stochastic model. Formally, a trace is a sequence of timestamped state transitions of the form σ = (s_{0},t_{0}), (s_{1},t_{1}),..., which means that the system moved to state s_{i+1}after having sojourned for time t_{
i
}in state s_{
i
}. The fact that a trace σ satisfies the BLTL property ϕ is written as σ = ϕ. We denote the trace suffix starting at step k by σ^{k}. We have the following semantics of BLTL:

σ^{k}⊨ AP if and only if AP holds true in state s_{
k
};

σ^{k}⊨ ϕ_{1} ∧ ϕ_{2} if and only if σ^{k}⊨ ϕ_{1} and σ^{k}⊨ ϕ_{2};

σ^{k}⊨ ϕ_{1} ∨ ϕ_{2} if and only if σ^{k}⊨ ϕ_{1} or σ^{k}⊨ ϕ_{2};

σ^{k}⊨ ¬ϕ_{1} if and only if σ^{k}⊨ ϕ_{1} does not hold;

σ^{k}⊨ ϕ_{1}U^{t}ϕ_{2} if and only if there exists i ∈ N such that, (a)∑_{0 ≤ l <i}t_{k+l}≤ t, (b) σ^{k+i}⊨ ϕ_{2} and (c) for each 0 ≤ j <i, σ^{k+j}⊨ ϕ_{1}.
The semantics of BLTL are defined over infinite traces, but it can be shown that traces of an appropriate (finite) length are sufficient to decide BLTL properties [49].
Statistical Model Checking
We briefly explain Statistical Model Checking [50, 51], the technique we use for verifying BioNetGen models simulated by Gillespie's algorithm. Statistical Model Checking treats the Probabilistic Model Checking problem as a statistical inference problem, and solves it by randomized sampling of the traces (simulations) from the model. In particular, the PMC problem is naturally phrased as a hypothesis testing problem, i.e., deciding between two hypotheses  M ⊨ Pr_{≥θ}[ϕ] versus M ⊨ Pr_{
< θ
}[ϕ]. In other words, to determine whether a stochastic system M satisfies ϕ with a probability p ≥ θ, we test the hypothesis H_{0} : p ≥ θ against H_{1} : p < θ. Sampled traces are model checked individually to determine whether a given property ϕ holds, and the number of satisfying traces is used by a hypothesis testing procedure to decide between H_{0} and H_{1}. Note that Statistical Model Checking cannot guarantee a correct answer to the PMC problem. However, the probability of giving a wrong answer can be made arbitrarily small.
We have introduced a Bayesian sequential hypothesis testing approach and applied it to the verification of rulebased models of signaling pathways and other stochastic systems [23, 49]. Sequential sampling means that the number of sampled traces is not fixed a priori, but is instead determined at "runtime ", depending on the evidence gathered by the samples seen so far. This often leads to a significantly smaller number of sampled traces.
Suppose that the stochastic system M satisfies the BLTL formula ϕ with some (unknown) probability p. The key idea behind statistical model checking [50] is that the behavior of M (with respect to property ϕ) can be modeled by a Bernoulli random variable with success parameter p. Such a random variable can be repeatedly evaluated via system simulation in the following way. Let σ be a trace of M, then the Bernoulli random variable X with (conditional) probability mass function:
\begin{array}{cc}f(xp)={p}^{x}{(1p)}^{1x}& x\in \{0,1\}\end{array}
(1)
denotes the outcome of σ ⊨ ϕ (i.e., model checking ϕ on σ). In other words, we have that:
X=\{\begin{array}{ll}1\text{withprobability}p\hfill & (\sigma =\varphi ),\hfill \\ 0\text{withprobability}1p\hfill & (\sigma =\neg \varphi ).\hfill \end{array}
(2)
Therefore, by running a system simulation (i.e., a BioNetGen stochastic simulation) and by checking ϕ on the resulting trace we can obtain a sample from random variable X. When a sample of X evaluates to 1 we call it a success, otherwise, a failure.
Recall that in hypothesis testing we decide between a null hypothesis H_{0} and an alternative hypothesis H_{1}:
\begin{array}{cc}{H}_{0}:p\ge \theta & {H}_{\text{1}}:p<\theta .\end{array}
(3)
The Bayesian approach assumes that p is given by a random variable whose distribution is called the prior distribution. The prior is usually based on our previous experiences and knowledge about the system.
Since p is a probability, we need prior distributions defined over [0,1]. In particular, Beta priors are mathematically convenient to use. They are defined by the following probability density:
\begin{array}{cc}\forall u\in [0,1]& g(u,\alpha ,\beta )=\frac{1}{B(\alpha ,\beta )}{u}^{\alpha 1}{(1u)}^{\beta 1}\end{array}
(4)
where the Beta function B(α, β) is defined as:
B(\alpha ,\beta )={\displaystyle {\int}_{0}^{1}{t}^{\alpha 1}}{(1t)}^{\beta 1}dt.
(5)
For later use, the Beta distribution function F_{(α;β)}(u) of parameters α, β is defined as for all u ∈ [0, 1] as:
{F}_{(\alpha ,\beta )}(u)={\displaystyle {\int}_{0}^{u}g}(t,\alpha ,\beta )dt
(6)
=\frac{1}{B(\alpha ,\beta )}{\displaystyle {\int}_{0}^{u}{t}^{\alpha 1}}{(1t)}^{\beta 1}dt.
(7)
Let d = (x_{1},..., x_{
n
}) denote n samples of the Bernoulli random variable X defined by (2). Let H_{0} and H_{1} be the hypotheses in (3), and suppose that the prior probabilities P (H_{0}) and P (H_{1}) are strictly positive and satisfy P (H_{0}) + P (H_{1}) = 1. By Bayes's theorem, the posterior probabilities of H_{0} and H_{1}, with respect to data d, are:
\begin{array}{cc}P({H}_{i}d)=\frac{P(d{H}_{i})P({H}_{i})}{P(d)}& (i=0,1)\end{array}
for every d with P (d) > 0. In our case, P (d) is always nonzero (there are no impossible finite sequences of data). The hypothesis test method is based on the Bayes Factor, that is, the likelihood ratio of the sampled data with respect to the two hypotheses. The Bayes Factor ℬ of sample d and hypotheses H_{0} and H_{1} is
\mathcal{B}=\frac{P(d{H}_{0})}{P(d{H}_{1})}
and by Bayes' theorem, we have that:
\mathcal{B}=\frac{P({H}_{0}d)}{P({H}_{1}d)}\xb7\frac{P({H}_{1})}{P({H}_{0})}.
(8)
Therefore, B can be interpreted as a measure of evidence (given by the data d) in favor of H_{0}. Now, fix a threshold T > 1. The algorithm iteratively draws independent and identically distributed (iid) sample traces in the form of BioNetGen stochastic simulations, and checks whether they satisfy ϕ (Note that BioNetGen ensures by construction that each simulation, or trace, is actually iid.) After each trace, the algorithm computes the Bayes Factor B to check if it has obtained conclusive evidence. The algorithm accepts H_{0} if B >T, and rejects H_{0} (accepting H_{1}) if \mathcal{B}<\frac{1}{T}. Otherwise \left(\frac{1}{T}\le \mathcal{B}\le T\right), it continues drawing iid samples. The statistical Model Checking algorithm is shown in Figure 2.
The following Proposition shows that, in our special case of Bernoulli samples, the computation of the Bayes Factor is straightforward.
Proposition 1. [49]The Bayes Factor of H_{0} : p ≥ θ vs. H_{
1
} : p <θ with Bernoulli samples (x_{1},..., x_{
n
}) and Beta prior of parameters α, β is:
{\mathcal{B}}_{n}=\frac{P({H}_{1})}{P({H}_{0})}\xb7\left(\frac{1}{{F}_{(x+\alpha ,nx+\beta )}(\theta )}1\right)
where x={\displaystyle {\sum}_{i=1}^{n}{x}_{i}} is the number of successes in(x_{1},..., x_{
n
}) and F_{(s,t)}(·) is the Beta distribution function of parameters s, t.
The Beta distribution function can be efficiently computed by standard software packages. Thus, no numerical integration is required for the evaluation of the Bayes Factor.
Finally, we must show that the error probability of our decision procedure, i.e., the probability that we reject (accept) the null hypothesis although it is true (false), can be bounded.
Theorem. [49]The error probability for the sequential Bayesian hypothesis testing algorithm is bounded above by \frac{1}{T}where T is the Bayes Factor threshold given as input.