Our method draws on concepts from several different fields. We begin by briefly surveying the semantics of stochastic differential equations, a language for formally specifying dynamic behaviors, Girsanov's theorem on change of measures, and results on consistency and concentration of Bayesian posteriors.
Stochastic differential equation models
A stochastic differential equation (SDE) [
26,
27] is a differential equation in which some of the terms evolve according to Brownian Motion [
28]. A typical SDE is of the following form:
where X is a system variable, b is a Riemann integrable function, v is an Itō integrable function, and W is Brownian Motion. The Brownian Motion W is a continuoustime stochastic process satisfying the following three conditions:
1. W
_{0} = 0
2. W
_{
t
} is continuous (almost surely).
3. W
_{
t
} has independent normally distributed increments:

W
_{
t
} W
_{
s
}and W
_{
t'} W
_{
s'}are independent if 0 ≤ s <t <s' <t'.

, where
denotes the normal distribution with mean 0 and variance t  s. Note that the symbol ~ is used to indicate "is distributed as".
Consider the time between 0 and
t as divided into
m discrete steps 0,
t
_{1},
t
_{2} ...
t
_{
m
} =
t. The solution to a stochastic differential equation is the limit of the following discrete difference equation, as
m goes to infinity:
In what follows,
will refer to a system of stochastic differential equations. We note that a system of stochastic differential equations comes equipped with an inherent probability space and a natural probability measure μ. Our algorithm repeatedly and randomly perturbs the probability measure of the Brownian motion in the model
which, in turn, changes the underlying measure in an effort to expose rare behaviors. These changes can be characterized using Girsanov's Theorem.
Girsanov's theorem for perturbing stochastic differential equation models
Given a process {
θ
_{
t
}  0 ≤
t ≤
T} satisfying the Novikov condition [
29], such as an SDE, the following exponential martingale
Z
_{
t
} defines the change from measure
P to new measure
:
Here, Z
_{
t
} is the RadonNikodym derivative of
with respect to P for t <T. The Brownian motion
under
is given by:
. The nonstochastic component of the stochastic differential equation is not affected by change of measures. Thus, a change of measure for SDEs is a stochastic process (unlike importance sampling for explicit probability distributions).
Specifying dynamic behaviors
Next, we define a formalism for encoding highlevel behavioral specification that our algorithm will test against
.
Definition 1 (Adapted Finitely Monitorable). Let σ be a finitelength trace from the stochastic differential equation
. A specification ϕ is said to be adapted finitely monitorable (AFM) if it is possible to decide whether σ satisfies ϕ, denoted ϕ ⊨ ϕ.
Certain AFM specifications can be expressed as formulas in Bounded Linear Temporal Logic (BLTL) [30–32]. Informally, BLTL formulas can capture the ordering of events.
Definition 2 (Probabilistic Adapted Finitely Monitorable). A specification ϕ is said to be probabilistic adapted finitely monitorable (PAFM) if it is possible to (deterministically or probabilistically) decide whether
satisfies ϕ with probability at least θ, denoted
.
Some common examples of PAFM specifications include Probabilistic Bounded Linear Temporal Logic (PBLTL) (e.g., see [19]) and Continuous Specification Logic. Note that temporal logic is only one means for constructing specifications; other formalisms can also be used, like Statecharts [33, 34].
Semantics of bounded linear temporal logic (BLTL)
We define the semantics of BLTL with respect to the paths of
. Let σ = (s
_{0}, Δ_{0}), (s
_{1}, Δ_{1}),... be a sampled execution of the model along states s
_{0}, s
_{1},... with durations Δ_{0}, Δ_{1}, .. . ∈ ℝ. We denote the path starting at state i by σ
^{
i
} (in particular, σ
^{0} denotes the original execution σ). The value of the state variable x in σ at the state i is denoted by V (σ, i, x). The semantics of BLTL is defined as follows:
1. σ
^{
k
} ⊨ x ~ v if and only if V(σ, k, x) ~ v, where v ∈ ℝ and ~ ∈ {>, <, =}.
2. σ
^{
k
} ⊨ ϕ
_{1} ∨ϕ
_{2} if and only if σ
^{
k
} ⊨ ϕ
_{1} or σ
^{
k
} ⊨ ϕ
_{2}.
3. σ
^{
k
} ⊨ ϕ
_{1} ∧ ϕ
_{2} if and only if σ
^{
k
} ⊨ ϕ
_{1} and σ
^{
k
} ⊨ ϕ
_{2};
4. σ
^{
k
} ⊨ ¬ϕ
_{1} if and only if σ
^{
k
} ⊨ ϕ
_{1} does not hold.
5. σ
^{
k
}⊨ ϕ
_{1}
U
^{
t
}
ϕ
_{2} if and only if there exists i ∈ ℕ such that: (a) 0 ≤ Σ
_{0 ≤ l <i
}Δ_{
k+l
}≤ t; (b) σ
^{
k+i
}⊨ ϕ
_{2}; and (c) for each 0 ≤ j < i, σ
^{
k+j
}⊨ ϕ
_{1};
Statistical model validation
Our algorithm performs statistical model checking using Bayesian sequential hypothesis testing [23] on noni.i.d. samples. The statistical model checking problem is to decide whether a model
satisfies a probabilistic adapted finitely monitorable formula ϕ with probability at least θ. That is, whether
where θ ∈ (0, 1).
Sequential hypothesis testing
Let ρ be the (unknown but fixed) probability of the model satisfying ϕ. We can now restate the statistical model checking problem as deciding between the two composite hypotheses: H
_{0} : ρ ≽ θ and H
_{1} : ρ < θ. Here, the null hypothesis H
_{0} indicates that
satisfies the AFM formula ϕ with probability at least θ, while the alternate hypothesis H
_{1} denotes that
satisfies the AFM formula ϕ with probability less than θ.
Definition 3 (Type I and II errors). A Type I error is an error where the hypothesis test asserts that the null hypothesis H
_{0} is false, when in fact H
_{0} is true. Conversely, a Type II error is an error where the hypothesis test asserts that the null hypothesis H
_{0} is true, when in fact H
_{0} is false.
The basic idea behind any statistical model checking algorithm based on sequential hypothesis testing is to iteratively sample traces from the stochastic process. Each trace is then evaluated with a trace verifier [32], which determines whether the trace satisfies the specification i.e. σ
_{
i
} ⊨ ϕ. This is always feasible because the specifications used are adapted and finitely monitorable. Two accumulators keep track of the total number of traces sampled, and the number of satisfying traces, respectively. The procedure continues until there is enough information to reject either H
_{0} or H
_{1}.
Bayesian sequential hypothesis testing
Recall that for any finite trace σ
_{
i
} of the system and an Adapted Finitely Monitorable (AFM) formula ϕ, we can decide whether σ
_{
i
} satisfies ϕ. Therefore, we can define a random variable X
_{
i
} denoting the outcome of σ
_{
i
} ⊨ ϕ. Thus, X
_{
i
} is a Bernoulli random variable with probability mass function
, where x
_{
i
} = 1 if and only if σ
_{
i
} ⊨_{
q
}
ϕ, otherwise x
_{
i
} = 0.
Bayesian statistics requires that prior probability distributions be specified for the unknown quantity, which is ρ in our case. Thus we will model the unknown quantity as a random variable u with prior density g(u). The prior probability distribution is usually based on our previous experiences and beliefs about the system. Noninformative or objective prior probability distribution [35] can be used when nothing is known about the probability of the system satisfying the AFM formula.
Suppose we have a sequence of independent random variables X
_{1},..., X
_{
n
} defined as above, and let d = (x
_{1},..., x
_{
n
}) denote a sample of those variables.
Definition 4. The Bayes factor
of sample d and hypotheses H
_{0} and H
_{1} is
.
The Bayes factor may be used as a measure of relative confidence in
H
_{0} vs.
H
_{1}, as proposed by Jeffreys [
36]. The Bayes factor is the ratio of two likelihoods:
We note that the Bayes factor depends on both the data d and on the prior g, so it may be considered a measure of confidence in H
_{0} vs. H
_{1} provided by the data x
_{1},..., x
_{
n
}, and "weighted" by the prior g.
Noni.i.d. Bayesian sequential hypothesis testing
Traditional methods for hypothesis testing, including those outlined in the previous two subsection*s, assume that the samples are drawn i.i.d.. In this section* we show that noni.i.d. samples can also be used, provided that certain conditions hold. In particular, if one can bound the change in measure associated with the nonidentical sampling, one can can also bound the Type I and Type II errors under a change of measure. Our algorithm bounds the change of measure, and thus the error.
We begin by reviewing some fundamental concepts from Bayesian statistics including KL divergence, KL support, affinity, and δseparation, and then restate an important result on the concentration of Bayesian posteriors [35, 37].
Definition 5 (KullbackLeibler (KL) Divergence). Given a parameterized family of probability distributions {f
_{
θ
}}, the KullbackLeibler (KL) divergence K(θ
_{0}, θ) between the distributions corresponding to two parameters θ and θ
_{0} is:
. Note that
is the expectation computed under the probability measure
.
Definition 6 (KL Neighborhood). Given a parameterized family of probability distributions {f
_{
θ
}}, the KL neighborhood K
_{
ε
} (θ
_{0}) of a parameter value θ
_{0} is given by the set {θ : K (θ
_{0}, θ) < ε}.
Definition 7 (KL Support). A point θ
_{0} is said to be in the KL support of a prior Π if and only if for all ε >0, Π (K
_{
ε
} (θ
_{0})) >0.
Definition 8 (Affinity). The affinity Aff(f, g) between any two densities is defined as
.
Definition 9 (Strong δSeparation). Let A ⊂ [0, 1] and δ >0. The set A and the point θ
_{0} are said to be strongly δseparated if and only if for any proper probability distribution v on A,
.
Given these definitions, it can be shown that the Bayesian posterior concentrates exponentially under certain technical conditions [35, 37].
Bounding errors under a change of measure
Next, we develop the machinery needed to compute bounds on the TypeI/TypeII errors for a testing strategy based on noni.i.d. samples.
A stochastic differential equation model
is naturally associated with a probability measure μ. Our noni.i.d. sampling strategy can be thought of as the assignment of a set of probability measures μ
_{1}, μ
_{2},... to
. Each unique sample σ
_{
i
} is associated with an implied probability measure μ
_{
i
} and is generated from
under μ
_{
i
} in an i.i.d. manner. Our proofs require that all the implied probability measures are equivalent. That is, an event is possible (resp. impossible) under a probability measure if and only if it is possible (resp. impossible) under the original probability measure μ.
We use the following result regarding change of measures. Suppose a given behavior, say
ϕ, holds on the original model with an (unknown) probability
ρ.
Here,
X
_{
i
} is a Bernoulli random variable denoting the event that
i
^{
th
} sample satisfies the given behavior
ϕ. Note that the
X
_{
i
} s must be independent of one another. Now, we can rewrite the above expression as:
Note that the term
denotes the probability of observing the event
X
_{
i
} under the modified probability measure
μ
_{
i
} if the unknown probability
ρ were
u. In order to ensure the independence assumption, the new probability measures
μ
_{
i
} are chosen independently of one another. The ratio
is the
implied RadonNikodym derivative for the change of measure between two equivalent probability measures. Suppose, the testing strategy has made
n observations
X
_{1},
X
_{2},...
X
_{
n
}. Then,
A sampling algorithm can compute
by drawing independent samples from a stochastic differential equation model under the new "modified" probability measure. We note that it is not easy to compute the change of measure
algebraically or numerically. However, our algorithm does not need to compute this quantity explicitly. It simply establishes bounds on it.
Consider the following expression that is computable without knowing the implied RadonNikodym derivative or change of measure explicitly.
Now, we can rewrite the above expression as:
Our result will exploit the fact that we do not allow our testing or sampling procedures to have arbitrary implied RadonNikodym derivatives. This is reasonable as no statistical guarantees should be available for an intelligently designed but adversarial test procedure that (say) tries to avoid sampling from the given behavior. Suppose that the implied RadonNikodym derivative always lies between a constant
c and another constant 1
/c. That is, the change of measure does not distort the probabilities of observable events by more than a factor of
c. Then, we observe that:
Thus, by allowing the sampling algorithm to change measures by at most c, we have changed the posterior probability of observing a behavior by at most c
^{2}.
Example: Suppose, the testing strategy has made
n observations
X
_{1},
X
_{2},...
X
_{
n
}. Then,
Termination conditions for noni.i.d. sampling
Traditional (i.e., i.i.d.) Bayesian Sequential Hypothesis Testing is guaranteed to terminate. That is, only a finite number of samples are required before the test selects one of the hypotheses. We now consider the conditions under which a Bayesian Sequential Hypothesis Testing based procedure using noni.i.d. samples will terminate. To do this, we first need to show that the posterior probability distribution will concentrate on a particular value as we see more an more samples from the model.
To consider the conditions under which our algorithm will terminate after observing n samples, note that the factor introduced due to the change of measure c
^{2n
}can outweigh the gain made by the concentration of the probability measure e
^{nb
}. This is not surprising because our construction thus far does not force the test not to bias against a sample in an intelligent way. That is, a maliciously designed testing procedure could simply avoid the error prone regions of the design. To address this, we define the notion of a fair testing strategy that does not engage in such malicious sampling.
Definition 10. A testing strategy is
η
fair (
η ≥ 1) if and only if the geometric average of the implied
RadonNikodym derivatives over a number of samples is within a constant factor
η of unity, i.e.,
Note that a fair test strategy does not need to sample from the underlying distribution in an i.i.d. manner. However, it must guarantee that the probability of observing the given behavior in a large number of observations is not altered substantially by the noni.i.d. sampling. Intuitively, we want to make sure that we bias for each sample as many times as we bias against it. Our main result shows that such a long term neutrality is sufficient to generate statistical guarantees on an otherwise noni.i.d. testing procedure.
Definition 11. An ηfair test is said to be eventually fair if and only if 1 ≤ η
^{4} <e
^{
b
}, where b is the constant in the exponential posterior concentration theorem.
The notion of a eventually fair test corresponds to a testing strategy that is not malicious or adversarial, and is making an honest attempt to sample from all the events in the long run.