#### Selective Monitoring

C O N C U R
Stefan Kiefer 0 2
0 University of Oxford , UK
1 University of Kent, UK https://orcid. org/0000-0003-1128-0311
2 Radu Grigore
We study selective monitors for labelled Markov chains. Monitors observe the outputs that are generated by a Markov chain during its run, with the goal of identifying runs as correct or faulty. A monitor is selective if it skips observations in order to reduce monitoring overhead. We are interested in monitors that minimize the expected number of observations. We establish an undecidability result for selectively monitoring general Markov chains. On the other hand, we show for non-hidden Markov chains (where any output identifies the state the Markov chain is in) that simple optimal monitors exist and can be computed efficiently, based on DFA language equivalence. These monitors do not depend on the precise transition probabilities in the Markov chain. We report on experiments where we compute these monitors for several open-source Java projects. 2012 ACM Subject Classification Theory of computation → Randomness, geometry and discrete structures Related Version A full version of the paper is available at https://arxiv.org/abs/1806. 06143.
and phrases runtime monitoring; probabilistic systems; Markov chains; automata; language equivalence
Introduction
Consider an MC (Markov chain) whose transitions are labelled with letters, and a finite
automaton that accepts languages of infinite words. Computing the probability that the
random word emitted by the MC is accepted by the automaton is a classical problem at the
heart of probabilistic verification. A finite prefix may already determine whether the random
infinite word is accepted, and computing the probability that such a deciding finite prefix is
produced is a nontrivial diagnosability problem. The theoretical problem we study in this
paper is how to catch deciding prefixes without observing the whole prefix; i.e., we want to
minimize the expected number of observations and still catch all deciding prefixes.
Motivation. In runtime verification a program sends messages to a monitor, which decides
if the program run is faulty. Usually, runtime verification is turned off in production code
because monitoring overhead is prohibitive. QVM (quality virtual machine) and ARV
(adaptive runtime verification) are existing pragmatic solutions to the overhead problem,
which perform best-effort monitoring within a specified overhead budget [1, 3]. ARV relies
on RVSE (runtime verification with state estimation) to also compute a probability that
the program run is faulty [21, 15]. We take the opposite approach: we ask for the smallest
overhead achievable without compromising precision at all.
Previous Work. Before worrying about the performance of a monitor, one might want to
check if faults in a given system can be diagnosed at all. This problem has been studied
under the term diagnosability, first for non-stochastic finite discrete event systems [19],
which are labelled transition systems. It was shown in [14] that diagnosability can be
checked in polynomial time, although the associated monitors may have exponential size.
Later the notion of diagnosability was extended to stochastic discrete-event systems, which
are labelled Markov chains [22]. Several notions of diagnosability in stochastic systems
exist, and some of them have several names, see, e.g., [20, 4] and the references therein.
Bertrand et al. [4] also compare the notions. For instance, they show that for one variant
of the problem (referred to as A-diagnosability or SS-diagnosability or IF-diagnosability) a
previously proposed polynomial-time algorithm is incorrect, and prove that this notion of
diagnosability is PSPACE-complete. Indeed, most variants of diagnosability for stochastic
systems are PSPACE-complete [4], with the notable exception of AA-diagnosability (where
the monitor is allowed to diagnose wrongly with arbitrarily small probability), which can be
solved in polynomial time [5].
Selective Monitoring. In this paper, we seem to make the problem harder: since
observations by a monitor come with a performance overhead, we allow the monitor to skip
observations. In order to decide how many observations to skip, the monitor employs an
observation policy. Skipping observations might decrease the probability of deciding (whether
the current run of the system is faulty or correct). We do not study this tradeoff: we
require policies to be feasible, i.e., the probability of deciding must be as high as under the
policy that observes everything. We do not require the system to be diagnosable; i.e., the
probability of deciding may be less than 1. Checking whether the system is diagnosable is
PSPACE-complete ([4], Theorem 8).
The Cost of Decision in General Markov Chains. The cost (of decision) is the number
of observations that the policy makes during a run of the system. We are interested in
minimizing the expected cost among all feasible policies. We show that if the system is
diagnosable then there exists a policy with finite expected cost, i.e., the policy may stop
observing after finite expected time. (The converse is not true.) Whether the infimum cost
(among feasible policies) is finite is also PSPACE-complete (Theorem 14). Whether there
is a feasible policy whose expected cost is smaller than a given threshold is undecidable
(Theorem 15), even for diagnosable systems.
Non-Hidden Markov Chains. We identify a class of MCs, namely non-hidden MCs, where
the picture is much brighter. An MC is called non-hidden when each label identifies the state.
Non-hidden MCs are always diagnosable. Moreover, we show that maximally procrastinating
policies are (almost) optimal (Theorem 27). A policy is called maximally procrastinating
when it skips observations up to the point where one further skip would put a decision
on the current run in question. We also show that one can construct an (almost) optimal
maximally procrastinating policy in polynomial time. This policy does not depend on the
exact probabilities in the MC, although the expected cost under that policy does. That is, we
efficiently construct a policy that is (almost) optimal regardless of the transition probabilities
on the MC transitions. We also show that the infimum cost (among all feasible policies) can
be computed in polynomial time (Theorem 28). Underlying these results is a theory based
on automata, in particular, checking language equivalence of DFAs.
Experiments. We evaluated the algorithms presented in this paper by implementing them
in Facebook Infer, and trying them on 11 of the most forked Java projects on GitHub. We
found that, on average, selective monitoring can reduce the number of observations to a half.
2
Preliminaries
Let S be a finite set. We view elements of RS as vectors, more specifically as row vectors.
We write 1 for the all-1 vector, i.e., the element of {1}S . For a vector μ ∈ RS , we denote by
μT its transpose, a column vector. A vector μ ∈ [0, 1]S is a distribution over S if μ1T = 1.
For s ∈ S we write es for the (Dirac) distribution over S with es(s) = 1 and es(t) = 0
for t ∈ S \ {s}. We view elements of RS×S as matrices. A matrix M ∈ [0, 1]S×S is called
stochastic if each row sums up to one, i.e., M 1T = 1T.
For a finite alphabet Σ, we write Σ∗ and Σω for the finite and infinite words over Σ,
respectively. We write ε for the empty word. We represent languages L ⊆ Σω using
deterministic finite automata, and we represent probability measures Pr over Σω using
Markov chains.
A (discrete-time, finite-state, labelled) Markov chain (MC) is a quadruple (S, Σ, M, s0)
where S is a finite set of states, Σ a finite alphabet, s0 an initial state, and M : Σ → [0, 1]S×S
specifies the transitions, such that Pa∈Σ M (a) is a stochastic matrix. Intuitively, if the
MC is in state s, then with probability M (a)(s, s0) it emits a and moves to state s0. For
the complexity results in this paper, we assume that all numbers in the matrices M (a) for
a ∈ Σ are rationals given as fractions of integers represented in binary. We extend M to
the mapping M : Σ∗ → [0, 1]S×S with M (a1 · · · ak) = M (a1) · · · M (ak) for a1, . . . , ak ∈ Σ.
Intuitively, if the MC is in state s then with probability M (u)(s, s0) it emits the word u ∈ Σ∗
and moves (in |u| steps) to state s0. An MC is called non-hidden if for each a ∈ Σ all non-zero
entries of M (a) are in the same column. Intuitively, in a non-hidden MC, the emitted letter
identifies the next state. An MC (S, Σ, M, s0) defines the standard probability measure Pr
over Σω, uniquely defined by assigning probabilities to cylinder sets {u}Σω, with u ∈ Σ∗, as
follows:
Pr({u}Σω) := es0 M (u)1T
A deterministic finite automaton (DFA) is a quintuple (Q, Σ, δ, q0, F ) where Q is a finite
set of states, Σ a finite alphabet, δ : Q × Σ → Q a transition function, q0 an initial state, and
F ⊆ Q a set of accepting states. We extend δ to δ : Q × Σ∗ → Q as usual. A DFA defines a
language L ⊆ Σω as follows:
L := { w ∈ Σω | δ(q0, u) ∈ F for some prefix u of w }
Note that we do not require accepting states to be visited infinitely often: just once suffices.
Therefore we can and will assume without loss of generality that there is f with F = {f }
and δ(f, a) = f for all a ∈ Σ.
For the rest of the paper we fix an MC M = (S, Σ, M, s0) and a DFA A = (Q, Σ, δ, q0, F ).
We define their composition as the MC M × A := (S × Q, Σ, M 0, (s0, q0)) where M 0(a)((s, q),
(s0, q0)) equals M (a)(s, s0) if q0 = δ(q, a) and 0 otherwise. Thus, M and M × A induce the
same probability measure Pr.
An observation o ∈ Σ⊥ is either a letter or the special symbol ⊥ 6∈ Σ, which stands for
“not seen”. An observation policy ρ : Σ∗⊥ → {0, 1} is a (not necessarily computable) function
that, given the observations made so far, says whether we should observe the next letter. An
observation policy ρ determines a projection πρ : Σω → Σω⊥: we have πρ(a1a2 . . . ) = o1o2 . . .
when
on+1 =
(an+1 if ρ(o1 . . . on) = 1
⊥
if ρ(o1 . . . on) = 0
for all n ≥ 0
We denote the see-all policy by •; thus, π•(w) = w.
In the rest of the paper we reserve a for letters, o for observations, u for finite words,
w for infinite words, υ for finite observation prefixes, s for states from an MC, and q for
states from a DFA. We write o1 ∼ o2 when o1 and o2 are the same or at least one of them
is ⊥. We lift this relation to (finite and infinite) sequences of observations (of the same
length). We write w & υ when u ∼ υ holds for the length-|υ| prefix u of w.
We say that υ is negatively deciding when Pr({w & υ | w ∈ L}) = 0. Intuitively, υ is
negatively deciding when υ is incompatible (up to a null set) with L. Similarly, we say that
υ is positively deciding when Pr({w & υ | w 6∈ L}) = 0. An observation prefix υ is deciding
when it is positively or negatively deciding. An observation policy ρ decides w when πρ(w)
has a deciding prefix. A monitor is an interactive algorithm that implements an observation
policy: it processes a stream of letters and, after each letter, it replies with one of “yes”,
“no”, or “skip n letters”, where n ∈ N ∪ {∞}.
I Lemma 1. For any w, if some policy decides w then • decides w.
Proof. Let ρ decide w. Then there is a deciding prefix υ of πρ(w). Suppose υ is positively
deciding, i.e., Pr({w0 & υ | w0 6∈ L}) = 0. Let u be the length-|υ| prefix of w. Then
Pr({w & u | w0 6∈ L}) = 0, since υ can be obtained from u by possibly replacing some letters
with ⊥. Hence u is also positively deciding. Since u is a prefix of w = π•(w), we have that •
decides w. The case where υ is negatively deciding is similar. J
It follows that maxρ Pr({w | ρ decides w}) = Pr({w | • decides w}). We say that a policy ρ
is feasible when it also attains the maximum, i.e., when
Pr({w | ρ decides w}) = Pr({w | • decides w}) .
Equivalently, ρ is feasible when Pr({w | • decides w implies ρ decides w}) = 1, i.e., almost
all words that are decided by the see-all policy are also decided by ρ. If υ = o1o2 . . . is the
shortest prefix of πρ(w) that is deciding, then the cost of decision Cρ(w) is P|υ|−1 ρ(o1 . . . ok).
k=0
This paper is about finding feasible observation policies ρ that minimize Ex(Cρ), the
expectation of the cost of decision with respect to Pr.
3
Qualitative Analysis of Observation Policies
In this section we study properties of observation policies that are qualitative, i.e., not
directly related to the cost of decision. We focus on properties of observation prefixes that a
policy may produce.
Observation Prefixes
We have already defined deciding observation prefixes. We now define several other types of
prefixes: enabled, confused, very confused, and finitary. A prefix υ is enabled if it occurs with
positive probability, Pr({w & υ}) > 0. Intuitively, the other types of prefixes υ are defined
in terms of what would happen if we were to observe all from now on: if it is not almost
sure that eventually a deciding prefix is reached, then we say υ is confused; if it is almost
sure that a deciding prefix will not be reached, then we say υ is very confused; if it is almost
sure that eventually a deciding or very confused prefix is reached, then we say υ is finitary.
To say this formally, let us make a few notational conventions: for an observation prefix υ,
we write Pr(υ) as a shorthand for Pr({ uw | u ∼ υ }); for a set Υ of observation prefixes, we
write Pr(Υ) as a shorthand for Pr Sυ∈Υ{ uw | u ∼ υ } . With these conventions, we define:
1. υ is confused when Pr({ υu | υu deciding }) < Pr(υ)
2. υ is very confused when Pr({ υu | υu deciding }) = 0
3. υ is finitary when Pr({ υu | υu deciding or very confused }) = Pr(υ)
Observe that (a) confused implies enabled, (b) deciding implies not confused, and (c) enabled
and very confused implies confused. The following are alternative equivalent definitions:
1. υ is confused when Pr({ uw | u ∼ υ, no prefix of υw is deciding }) > 0
2. υ is very confused when υu0 is non-deciding for all enabled υu0
3. υ is finitary when Pr({uw | u ∼ υ, no prefix of υw is deciding or very confused}) = 0
I Example 2. Consider the MC and the DFA depicted here:
1a
s1
12 a
s0
12 a
s2
21 a
21 b
a
q0
b
f
a
b
All observation prefixes that do not start with b are enabled. The observation prefixes ab
and ⊥b and, in fact, all observation prefixes that contain b, are positively deciding. For all
n ∈ N we have Pr({w & an | w ∈ L}) > 0 and Pr({w & an | w 6∈ L}) > 0, so an is not
deciding. If the MC takes the right transition first then almost surely it emits b at some
point. Thus Pr({aaa · · · }) = 12 . Hence ε is confused. In this example only non-enabled
observation prefixes are very confused. It follows that ε is not finitary.
Beliefs
For any s we write Prs for the probability measure of the MC Ms obtained from M by
making s the initial state. For any q we write Lq ⊆ Σω for the language of the DFA Aq
obtained from A by making q the initial state. We call a pair (s, q) negatively deciding when
Prs(Lq) = 0; similarly, we call (s, q) positively deciding when Prs(Lq) = 1. A subset of S × Q
is called belief. We call a belief negatively (positively, respectively) deciding when all its
elements are. We fix the notation B0 := {(s0, q0)} (for the initial belief ) for the remainder of
the paper. Define the belief NFA as the NFA B = (S × Q, Σ⊥, Δ, B0, ∅) with:
Δ((s, q), a) = {(s0, q0) | M (a)(s, s0) > 0, δ(q, a) = q0} for a ∈ Σ
Δ((s, q), ⊥) =
[ Δ((s, q), a)
a∈Σ
We extend the transition function Δ : (S × Q) × Σ⊥ → 2S×Q to Δ : 2S×Q × Σ∗⊥ → 2S×Q in
the way that is usual for NFAs. Intuitively, if belief B is the set of states where the product
M × A could be now, then Δ(B, υ) is the belief adjusted by additionally observing υ. To
reason about observation prefixes υ algorithmically, it will be convenient to reason about the
belief Δ(B0, υ).
We define confused, very confused, and finitary beliefs as follows:
1. B is confused when Prs({ uw | Δ(B, u) deciding }) < 1 for some (s, q) ∈ B
2. B is very confused when Δ(B, u) is empty or not deciding for all u
3. B is finitary when Prs({ uw | Δ(B, u) deciding or very confused }) = 1 for all (s, q) ∈ B
I Example 3. In Example 2 we have B0 = {(s0, q0)}, and Δ(B0, an) = {(s1, q0), (s2, q0)} for
all n ≥ 1, and Δ(B0, b) = ∅, and Δ(B0, a⊥) = {(s1, q0), (s2, q0), (s2, f )}, and Δ(B0, ⊥υ) =
{(s2, f )} for all υ that contain b. The latter belief {(s2, f )} is positively deciding. We have
Prs1 ({uw | Δ({(s1, q0)}, u) is deciding}) = 0, so any belief that contains (s1, q0) is confused.
Also, B0 is confused as Prs0 ({uw | Δ({(s0, q0)}, u) is deciding}) = 12 .
Relation Between Observation Prefixes and Beliefs
By the following lemma, the corresponding properties of observation prefixes and beliefs are
closely related.
I Lemma 4. Let υ be an observation prefix.
1. υ is enabled if and only if Δ(B0, υ) 6= ∅.
2. υ is negatively deciding if and only if Δ(B0, υ) is negatively deciding.
3. υ is positively deciding if and only if Δ(B0, υ) is positively deciding.
4. υ is confused if and only if Δ(B0, υ) is confused.
5. υ is very confused if and only if Δ(B0, υ) is very confused.
6. υ is finitary if and only if Δ(B0, υ) is finitary.
The following lemma gives complexity bounds for computing these properties.
I Lemma 5. Let υ be an observation prefix, and B a belief.
1. Whether υ is enabled can be decided in P.
2. Whether υ (or B) is negatively deciding can be decided in P.
3. Whether υ (or B) is positively deciding can be decided in P.
4. Whether υ (or B) is confused can be decided in PSPACE.
5. Whether υ (or B) is very confused can be decided in PSPACE.
6. Whether υ (or B) is finitary can be decided in PSPACE.
Proof sketch. The belief NFA B and the MC M × A can be computed in polynomial time
(even in deterministic logspace). For items 1–3, there are efficient graph algorithms that
search these product structures. For instance, to show that a given pair (s1, q1) is not
negatively deciding, it suffices to show that B has a path from (s1, q1) to a state (s2, f ) for
some s2. This can be checked in polynomial time (even in NL).
For items 4–6, one searches the (exponential-sized) product of M and the determinization
of B. This can be done in PSPACE. For instance, to show that a given belief B is confused, it
suffices to show that there are (s1, q1) ∈ B and u1 and s2 such that M has a u1-labelled path
from s1 to s2 such that there do not exist u2 and s3 such that M has a u2-labelled path from
s2 to s3 such that Δ(B, u1u2) is deciding. This can be checked in NPSPACE = PSPACE by
nondeterministically guessing paths in the product of M and the determinization of B. J
Diagnosability
We call a policy a diagnoser when it decides almost surely.
I Example 6. In Example 2 a diagnoser does not exist. Indeed, the policy • does not decide
when the MC takes the left transition, and decides (positively) almost surely when the MC
takes the right transition in the first step. Hence Pr({w | • decides w}) = Pr(Σ∗{b}Σω) = 12 .
So • is not a diagnoser. By Lemma 1, it follows that there is no diagnoser.
Diagnosability can be characterized by the notion of confusion:
I Proposition 7. There exists a diagnoser if and only if ε is not confused.
The following proposition shows that diagnosability is hard to check.
I Theorem 8 (cf. [4, Theorem 6]). Given an MC M and a DFA A, it is PSPACE-complete
to check if there exists a diagnoser.
Theorem 8 essentially follows from a result by Bertrand et al. [4]. They study several different
notions of diagnosability; one of them (FA-diagnosability) is very similar to our notion of
diagnosability. There are several small differences; e.g., their systems are not necessarily
products of an MC and a DFA. Therefore we give a self-contained proof of Theorem 8.
Proof sketch. By Proposition 7 it suffices to show PSPACE-completeness of checking whether
ε is confused. Membership in PSPACE follows from Lemma 5.4. For hardness we reduce
from the following problem: given an NFA U over Σ = {a, b} where all states are initial
and accepting, does U accept all (finite) words? This problem is PSPACE-complete [16,
Lemma 6]. J
Allowing Confusion
We say an observation policy allows confusion when, with positive probability, it produces
an observation prefix υ⊥ such that υ⊥ is confused but υ is not.
I Proposition 9. A feasible observation policy does not allow confusion.
Hence, in order to be feasible, a policy must observe when it would get confused otherwise.
In § 5 we show that in the non-hidden case there is almost a converse of Proposition 9; i.e.,
in order to be feasible, a policy need not do much more than not allow confusion.
4
Analyzing the Cost of Decision
In this section we study the computational complexity of finding feasible policies that
minimize the expected cost of decision. We focus on the decision version of the problem: Is
there a feasible policy whose expected cost is smaller than a given threshold? Define:
cinf :=
Since the see-all policy • never stops observing, we have Pr(C• = ∞) = 1, so Ex(C•) = ∞.
However, once an observation prefix υ is deciding or very confused, there is no point in
continuing observation. Hence, we define a light see-all policy ◦, which observes until the
observation prefix u is deciding or very confused; formally, ◦(υ) = 0 if and only if υ is deciding
or very confused. It follows from the definition of very confused that the policy ◦ is feasible.
Concerning the cost C◦ we have for all w
C◦(w) =
∞
X 1 − Dn(w) ,
n=0
(1)
where Dn(w) = 1 if the length-n prefix of w is deciding or very confused, and Dn(w) = 0
otherwise. The following results are proved in the full version of the paper, on arXiv:
I Lemma 10. If ε is finitary then Ex(C◦) is finite.
I Lemma 11. Let ρ be a feasible observation policy. If Pr(Cρ < ∞) = 1 then ε is finitary.
I Proposition 12. cinf is finite if and only if ε is finitary.
I Proposition 13. If a diagnoser exists then cinf is finite.
I Theorem 14. It is PSPACE-complete to check if cinf < ∞.
Lemma 10 holds because, in M × A, a bottom strongly connected component is reached
in expected finite time. Lemma 11 says that a kind of converse holds for feasible policies.
Proposition 12 follows from Lemmas 10 and 11. Proposition 13 follows from Propositions 7
and 12. To show Theorem 14, we use Proposition 12 and adapt the proof of Theorem 8.
The main negative result of the paper is that one cannot compute cinf :
I Theorem 15. It is undecidable to check if cinf < 3, even when a diagnoser exists.
Proof sketch. By a reduction from the undecidable problem whether a given probabilistic
automaton accepts some word with probability > 12 . The proof is somewhat complicated.
In fact, in the full version of the paper (arXiv) we give two versions of the proof: a short
incorrect one (with the correct main idea) and a long correct one. J
5
The Non-Hidden Case
Now we turn to positive results. In the rest of the paper we assume that the MC M is
non-hidden, i.e., there exists a function →−· : Σ → S such that M (a)(s, s0) > 0 implies s0 = →−a .
We extend →−· to finite words so that u−→a = →−a . We write s −→u to indicate that there is s0
with M (u)(s, s0) > 0.
I Example 16. Consider the following non-hidden MC and DFA:
1b
→−
b
B2 := Δ(B0, ⊥2) = {(→−b , q0), (→−a , f )}
B3 := Δ(B0, ⊥2b) = {(→−b , q0), (→−b , f )}
B0 is the initial belief. The beliefs B0 and B1 are not confused: indeed, Δ(B1, b) = {(→−b , q0)}
is negatively deciding, and Δ(B1, a) = {(→−a , f )} is positively deciding. The belief B2 is
confused, as there is no i ∈ N for which Δ(B2, bi) is deciding. Finally, B3 is very confused.
We will show that in the non-hidden case there always exists a diagnoser (Lemma 23). It
follows that feasible policies need to decide almost surely and, by Proposition 13, that cinf is
finite. We have seen in Proposition 9 that feasible policies do not allow confusion. In this
section we construct policies that procrastinate so much that they avoid confusion just barely.
We will see that such policies have an expected cost that comes arbitrarily close to cinf .
Language Equivalence
We characterize confusion by language equivalence in a certain DFA. Consider the belief
NFA B. In the non-hidden case, if we disallow ⊥-transitions then B becomes a DFA B0.
For B0 we define a set of accepting states by FB0 := {(s, q) | Prs(Lq) = 1}.
I Example 17. For the previous example, a part of the DFA B0 looks as follows:
b
(→−b , q0)
b (→−a , q0) c
(→−c , f )
(→−a , f ) b
(→−b , f )
a
c
b
States that are unreachable from (→−a , q0) are not drawn here.
Ls,q = Ls0,q0 .
We associate with each (s, q) the language Ls,q ⊆ Σ∗ that B0 accepts starting from initial
state (s, q). We call (s, q), (s0, q0) language equivalent, denoted by (s, q) ≈ (s0, q0), when
I Lemma 18. One can compute the relation ≈ in polynomial time.
Proof. For any (s, q) one can use standard MC algorithms to check in polynomial time if
Prs(Lq) = 1 (using a graph search in the composition M × A, as in the proof of Lemma 5.3).
Language equivalence in the DFA B0 can be computed in polynomial time by minimization. J
We call a belief B ⊆ S × Q settled when all (s, q) ∈ B are language equivalent.
I Lemma 19. A belief B ⊆ S × Q is confused if and only if there is a ∈ Σ such that Δ(B, a)
is not settled.
It follows that one can check in polynomial time whether a given belief is confused. We
generalize this fact in Lemma 22 below.
I Example 20. In Example 16 the belief B3 is not settled. Indeed, from the DFA in
Example 17 we see that L→−b ,q0 = ∅ 6= {b}∗ = L→−b ,f . Since B3 = Δ(B2, b), by Lemma 19, the
belief B2 is confused.
Procrastination
For a belief B ⊆ S × Q and k ∈ N, if Δ(B, ⊥k) is confused then so is Δ(B, ⊥k+1). We define:
cras(B) := sup{ k ∈ N | Δ(B, ⊥k) is not confused } ∈ N ∪ {−1, ∞}
We set cras(B) := −1 if B is confused. We may write cras(s, q) for cras({(s, q)}).
I Example 21. In Example 16 we have cras(B0) = cras(→−a , q0) = 1 and cras(B1) = 0 and
cras(B2) = cras(B3) = −1 and cras(→−b , q0) = cras(→−a , f ) = ∞.
I Lemma 22. Given a belief B, one can compute cras(B) in polynomial time. Further, if
cras(B) is finite then cras(B) < |S|2 · |Q|2.
Proof. Let k ∈ N. By Lemma 19, Δ(B, ⊥k) is confused if and only if:
∃ a. ∃ (s, q), (t, r) ∈ Δ(B, ⊥k) : s −→a, t −→a, (→−a , δ(q, a)) 6≈ (→−a , δ(r, a))
This holds if and only if there is B2 ⊆ B with |B2| ≤ 2 such that:
∃ a. ∃ (s, q), (t, r) ∈ Δ(B2, ⊥k) : s −→a, t −→a, (→−a , δ(q, a)) 6≈ (→−a , δ(r, a))
Let G be the directed graph with nodes in S × Q × S × Q and edges
((s, q, t, r), (s0, q0, t0, r0))
⇐⇒
Δ({(s, q), (t, r)}, ⊥) ⊇ {(s0, q0), (t0, r0)} .
Also define the following set of nodes:
U := {(s, q, t, r) | ∃ a : s −→a, t −→a, (→−a , δ(q, a)) 6≈ (→−a , δ(r, a))}
By Lemma 18 one can compute U in polynomial time. It follows from the argument above
that Δ(B, ⊥k) is confused if and only if there are (s, q), (t, r) ∈ B such that there is a length-k
path in G from (s, q, t, r) to a node in U . Let k ≤ |S × Q × S × Q| be the length of the
shortest such path, and set k := ∞ if no such path exists. Then k can be computed in
polynomial time by a search of the graph G, and we have cras(B) = k − 1. J
The Procrastination Policy
For any belief B and any observation prefix υ, the language equivalence classes represented
in Δ(B, υ) depend only on υ and the language equivalence classes in B. Therefore, when
tracking beliefs along observations, we may restrict B to a single representative of each
equivalence class. We denote this operation by B↓. A belief B is settled if and only if
|B↓| ≤ 1.
A procrastination policy ρpro(K) is parameterized with (a large) K ∈ N. Define (and
precompute) k(s, q) := min{K, cras(s, q)} for all (s, q). We define ρpro(K) by the following
monitor that implements it:
1. i := 0
2. while (si, qi) is not deciding:
a. skip k(si, qi) observations, then observe a letter ai
b. {(si+1, qi+1)} := Δ((si, qi), ⊥k(si,qi)ai)↓;
c. i := i + 1;
3. output yes/no decision
It follows from the definition of cras and Lemma 19 that Δ((si, qi), υi)↓ is indeed a singleton
for all i. We have:
I Lemma 23. For all K ∈ N the procrastination policy ρpro(K) is a diagnoser.
Proof. For a non-hidden MC M and a DFA A, there is at most one successor for (s, q)
on letter a in the belief NFA B, for all s, q, a. Then, by Lemma 19, singleton beliefs are
not confused, and in particular the initial belief B0 is not confused. By Lemma 4.4, ε is
not confused, which means that Pr({ u | u deciding }) = Pr(ε) = 1. Since almost surely a
deciding word u is produced and since Δ(B0, u) ⊆ Δ(B0, υ) whenever u ∼ υ, it follows that
eventually an observation prefix υ is produced such that Δ(B0, υ) contains a deciding pair
(s, q). But, as remarked above, Δ(B0, υ) is settled, so it is deciding. J
The Procrastination MC Mpro(K)
The policy ρpro(K) produces a (random, almost surely finite) word a1a2 · · · an with n =
Cρpro(K). Indeed, the observations that ρpro(K) makes can be described by an MC. Recall
that we have previously defined a composition MC M × A = (S × Q, Σ, M 0, (s0, q0)). Now
define an MC Mpro(K) := (S × Q, Σ ∪ {$}, Mpro(K), (s0, q0)) where $ 6∈ Σ is a fresh letter
and the transitions are as follows: when (s, q) is deciding then Mpro(K)($) (s, q), (s, q) := 1,
and when (s, q) is not deciding then
Mpro(K)(a) (s, q), (→−a , q0) :=
M 0(⊥)k(s,q)M 0(a)
(s, q), (→−a , q0) ,
where the matrix M 0(⊥) := Pa M 0(a) is powered by k(s, q). The MC Mpro(K) may not
be non-hidden, but could be made non-hidden by (i) collapsing all language equivalent
(s, q1), (s, q2) in the natural way, and (ii) redirecting all $-labelled transition to a new
state →−$ that has a self-loop. In the understanding that $$$ · · · indicates ‘decision made’,
the probability distribution defined by the MC Mpro(K) coincides with the probability
distribution on sequences of non-⊥ observations made by ρpro(K).
I Example 24. For Example 16 the MC Mpro(K) for K ≥ 1 is as follows:
1$
Here the lower number in a state indicate the cras number. The left state is negatively
deciding, and the right state is positively deciding. The policy ρpro(K) skips the first
observation and then observes either b or a, each with probability 12 , each leading to a
deciding belief.
Maximal Procrastination is Optimal
The following lemma states, loosely speaking, that when a belief {(s, q)} with cras(s, q) = ∞
is reached and K is large, then a single further observation is expected to suffice for a decision.
I Lemma 25. Let c(K, s, q) denote the expected cost of decision under ρpro(K) starting in
(s, q). For each ε > 0 there exists K ∈ N such that for all (s, q) with cras(s, q) = ∞ we have
c(K, s, q) ≤ 1 + ε.
Proof sketch. The proof is a quantitative version of the proof of Lemma 23. The singleton
belief {(s, q)} is not confused. Thus, if K is large then with high probability the belief
B := Δ({(s, q)}, ⊥K a) (for the observed next letter a) contains a deciding pair (s0, q0). But
if cras(s, q) = ∞ then, by Lemma 19, B is settled, so if B contains a deciding pair then B is
deciding. J
I Example 26. Consider the following variant of the previous example:
1b
→−
b
31 b
31 a
→−
a
31 c
1c
→−
c
q0
a
b
c
Σ
f
The MC Mpro(K) for K ≥ 0 is as follows:
1$
The left state is negatively deciding, and the right state is positively deciding. We have
c(K, →−b , q0) = c(K, →−c , f ) = 0 and c(K, →−a , q0) = 1/(1 − ( 13 )K+1).
Now we can prove the main positive result of the paper:
I Theorem 27. For any feasible policy ρ there is K ∈ N such that:
Ex(Cρpro(K)) ≤ Ex(Cρ)
Proof sketch. Let ρ be a feasible policy. We choose K > |S|2 · |Q|2, so, by Lemma 22,
ρpro(K) coincides with ρpro(∞) until time, say, n∞ when ρpro(K) encounters a pair (s, q)
with cras(s, q) = ∞. (The time n∞ may, with positive probability, never come.) Let us
compare ρpro(K) with ρ up to time n∞. For n ∈ {0, . . . , n∞}, define υpro(n) and υρ(n) as
the observation prefixes obtained by ρpro and ρ, respectively, after n steps. Write `pro(n) and
`ρ(n) for the number of non-⊥ observations in υpro(n) and υρ(n), respectively. For beliefs
B, B0 we write B B0 when for all (s, q) ∈ B there is (s0, q0) ∈ B0 with (s, q) ≈ (s0, q0). One
can show by induction that we have for all n ∈ {0, . . . , n∞}:
`pro(n) ≤ `ρ(n)
and
Δ(B0, υpro(n))
Δ(B0, υρ(n)) or `pro(n) < `ρ(n)
If time n∞ does not come then the inequality `pro(n) ≤ `ρ(n) from above suffices. Similarly, if
at time n∞ the pair (s, q) is deciding, we are also done. If after time n∞ the procrastination
policy ρpro(K) observes at least one more letter then ρ also observes at least one more
letter. By Lemma 25, one can choose K large so that for ρpro(K) one additional observation
probably suffices. If it is the case that ρ almost surely observes only one letter after n ,
∞
then ρpro(K) also needs only one more observation, since it has observed at time n∞. J
It follows that, in order to compute cinf , it suffices to analyze Ex(Cρpro(K)) for large K.
This leads to the following theorem:
I Theorem 28. Given a non-hidden MC M and a DFA A, one can compute cinf in
polynomial time.
Proof. For each (s, q) define c(K, s, q) as in Lemma 25, and define c(s, q) :=
limK→∞ c(K, s, q). By Lemma 25, for each non-deciding (s, q) with cras(s, q) = ∞ we
have c(s, q) = 1. Hence the c(s, q) satisfy the following system of linear equations where
some coefficients come from the procrastination MC Mpro(∞):
0
c(s, q) = 1
c0(s, q) =
1 + c0(s, q)
X X Mpro(∞) (s, q), (→−a , q0) · c(−→a , q0)
a q0
if (s, q) is deciding
if (s, q) is not deciding and cras(s, q) = ∞
otherwise
if cras(s, q) < ∞
By solving the system one can compute c(s0, q0) in polynomial time. We have:
cinf =
Hence one can compute cinf in polynomial time.
J
Empirical Evaluation of the Expected Optimal Cost
We have shown that maximal procrastination is optimal in the non-hidden case (Theorem 27).
However, we have not shown how much better the optimal policy is than the see-all baseline.
It appears difficult to answer this question analytically, so we address it empirically. We
implemented our algorithms in a fork of the Facebook Infer static analyzer [8], and applied
them to 11 open-source projects, totaling 80 thousand Java methods. We found that in
> 90% of cases the maximally procrastinating monitor is trivial and thus the optimal cost
is 0, because Infer decides statically if the property is violated. In the remaining cases, we
found that the optimal cost is roughly half of the see-all cost, but the variance is high.
Design. Our setting requires a DFA and an MC representing, respectively, a program
property and a program. For this empirical estimation of the expected optimal cost, the DFA
is fixed, the MC shape is the symbolic flowgraph of a real program, and the MC probabilities
are sampled from Dirichlet distributions.
The DFA represents the following property: ‘there are no two calls to next without an
intervening call to hasNext’. To understand how the MC shape is extracted from programs,
some background is needed. Infer [8, 9] is a static analyzer that, for each method, infers several
preconditions and, attached to each precondition, a symbolic path. For a simple example,
consider a method whose body is ‘if (b) x.next(); if (!b) x.next()’. Infer would generate two
preconditions for it, b and ¬b. In each of the two attached symbolic paths, we can see that
next is not called twice, which we would not notice with a control flowgraph. The symbolic
paths are inter-procedural. If a method f calls a method g, then the path of f will link to
a path of g and, moreover, it will pick one of the paths of g that corresponds to what is
currently known at the call site. For example, if g(b) is called from a state in which ¬b holds,
then Infer will select a path of g compatible with the condition ¬b.
The symbolic paths are finite because abstraction is applied, including across mutually
recursive calls. But, still, multiple vertices of the symbolic path correspond to the same
vertex of the control flowgraph. For example, Infer may go around a for-loop five times before
noticing the invariant. By coalescing those vertices of the symbolic path that correspond to
the same vertex of the control flowgraph we obtain an SFG (symbolic flowgraph). We use such
SFGs as the skeleton of MCs. Intuitively, one can think of SFGs as inter-procedural control
flowgraphs restricted based on semantic information. Vertices correspond to locations in the
program text, and transitions correspond to method calls or returns. Transition probabilities
should then be interpreted as a form of static branch prediction. One could learn these
probabilities by observing many runs of the program on typical input data, for example by
using the Baum–Welch algorithm [17]. Instead, we opt to show that the improvement in
expected observation cost is robust over a wide range of possible transition probabilities,
which we do by drawing several samples from Dirichlet distributions. Besides, recall that the
(optimal) procrastination policy does not depend on transition probabilities.
Once we have a DFA and an MC we compute their product. In some cases, it is clear
that the product is empty or universal. These are the cases in which we can give the verdict
right away, because no observation is necessary. We then focus on the non-trivial cases.
For non-trivial MC × DFA products, we compute the expected cost of the light see-all
policy Ex(C◦), which observes all letters until a decision is made and then stops. We can do
so by using standard algorithms [2, Chapter 10.5]. Then, we compute Mpro, which we use to
compute the expected observation cost cinf of the procrastination policy (Theorem 28). Recall
that in order to compute Mpro, one needs to compute the cras function, and also to find
language equivalence classes. Thus, computing Mpro entails computing all the information
necessary for implementing a procrastinating monitor.
Methodology. We selected 11 Java projects among those that are most forked on GitHub.
We ran Infer on each of these projects. From the inferred specifications, we built SFGs
and monitors that employ light see-all policies and maximal procrastination policies. From
these monitors, we computed the respective expected costs, solving the linear systems using
Gurobi [12]. Our implementation is in a fork of Infer, on GitHub.
Results. The results are given in Table 1. We first note that the number of monitors is
much smaller than the number of methods, by a factor of 10 or 100. This is because in
most cases we are able to determine the answer statically, by analyzing the symbolic paths
produced by Infer. The large factor should not be too surprising: we are considering a fixed
property about iterators, not all Java methods use iterators, and, when they do, it is usually
easy to tell that they do so correctly. Still, each project has a few hundred monitors, which
handle the cases that are not so obvious.
cinf
We note that Ex(C ) ≈ 0.5. The table supports this by presenting the median and the
geometric average, whi◦ch are close to each-other; the arithmetic average is also close. There
is, however, quite a bit of variation from monitor to monitor, as shown in Figure 1. We
conclude that selective monitoring has the potential to significantly reduce the overhead of
runtime monitoring.
7
Future Work
In this paper we required policies to be feasible, which means that our selective monitors
are as precise as non-selective monitors. One may relax this and study the tradeoff between
efficiency (skipping even more observations) and precision (probability of making a decision).
Further, one could replace the diagnosability notion of this paper by other notions from the
literature; one could investigate how to compute cinf for other classes of MCs, such as acyclic
MCs; one could study the sensitivity of cinf to changes in transition probabilities; and one
could identify classes of MCs for which selective monitoring helps and classes of MCs for
which selective monitoring does not help.
A nontrivial extension to the formal model would be to include some notion of data, which
is pervasive in practical specification languages used in runtime verification [13]. This would
entail replacing the DFA with a more expressive device, such as a nominal automaton [7], a
symbolic automaton [10], or a logic with data (e.g., [11]). Alternatively, one could side-step
the problem by using the slicing idea [18], which separates the concern of handling data at
the expense of a mild loss of expressive power. Finally, the monitors we computed could be
used in a runtime verifier, or even in session type monitoring where the setting is similar [6].
1
2
3
4
5
6
7
8
9
Matthew Arnold , Martin T. Vechev, and Eran Yahav . QVM: an efficient runtime for detecting defects in deployed systems . In OOPSLA , 2008 .
C. Baier and J.-P. Katoen . Principles of model checking . MIT Press, 2008 .
Ezio Bartocci , Radu Grosu, Atul Karmarkar , Scott A. Smolka , Scott D. Stoller , Erez Zadok, and Justin Seyster . Adaptive runtime verification . In RV , 2012 .
N. Bertrand , S. Haddad , and E. Lefaucheux . Foundation of diagnosis and predictability in probabilistic systems . In Proceedings of FSTTCS , volume 29 of LIPIcs , pages 417 - 429 , 2014 .
N. Bertrand , S. Haddad , and E. Lefaucheux . Accurate approximate diagnosability of stochastic systems . In Proceedings of LATA , pages 549 - 561 . Springer, 2016 .
Monitoring networks through multiparty session types . TCS , 2017 .
LMCS , 2014 .
In NASA Formal Methods Symposium , 2015 .
C. Calcagno , D. Distefano , P. W. O'Hearn , and H. Yang . Compositional shape analysis by means of bi-abduction . JACM , 2011 .
Loris D'Antoni and Margus Veanes . The power of symbolic automata and transducers . In CAV , 2017 .
TOCL , 2009 .
Gurobi Optimization , Inc. Gurobi optimizer reference manual . http://www.gurobi.com, 2017 .
Klaus Havelund , Martin Leucker, Giles Reger, and Volker Stolz . A Shared Challenge in Behavioural Specification (Dagstuhl Seminar 17462) . Dagstuhl Reports , 2018 . doi: 10 .4230/DagRep.7.11.59.
S. Jiang , Z. Huang , V. Chandra , and R. Kumar . A polynomial algorithm for testing diagnosability of discrete-event systems . IEEE Transactions on Automatic Control , 46 ( 8 ): 1318 - 1321 , 2001 .
K. Kalajdzic , E. Bartocci , S.A. Smolka , S.D. Stoller , and R. Grosu . Runtime verification with particle filtering . In RV , 2013 .
J.-Y. Kao , N. Rampersad , and J. Shallit . On NFAs where all states are final, initial, or both . Theoretical Computer Science , 410 ( 47 ): 5010 - 5021 , 2009 .
Brian G. Leroux . Maximum-likelihood estimation for hidden markov models . Stochastic Processes and Their Applications , 1992 .
Grigore Ros,u and Feng Chen. Semantics and algorithms for parametric monitoring . LMCS , 2012 .
M. Sampath , R. Sengupta , S. Lafortune , K. Sinnamohideen , and D. Teneketzis . Diagnosability of discrete-event systems . IEEE Transactions on Automatic Control , 40 ( 9 ): 1555 - 1575 , 1995 .
A. Prasad Sistla , Miloš Žefran, and Yao Feng . Monitorability of stochastic dynamical systems . In Proceedings of CAV , pages 720 - 736 . Springer, 2011 .
Runtime verification with state estimation . In RV , 2011 .
D. Thorsley and D. Teneketzis . Diagnosability of stochastic discrete-event systems . IEEE Transactions on Automatic Control , 50 ( 4 ): 476 - 492 , 2005 .