Selective Monitoring

LIPICS - Leibniz International Proceedings in Informatics, Aug 2018

We study selective monitors for labelled Markov chains. Monitors observe the outputs that are generated by a Markov chain during its run, with the goal of identifying runs as correct or faulty. A monitor is selective if it skips observations in order to reduce monitoring overhead. We are interested in monitors that minimize the expected number of observations. We establish an undecidability result for selectively monitoring general Markov chains. On the other hand, we show for non-hidden Markov chains (where any output identifies the state the Markov chain is in) that simple optimal monitors exist and can be computed efficiently, based on DFA language equivalence. These monitors do not depend on the precise transition probabilities in the Markov chain. We report on experiments where we compute these monitors for several open-source Java projects.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://drops.dagstuhl.de/opus/volltexte/2018/9558/pdf/LIPIcs-CONCUR-2018-20.pdf

Selective Monitoring

C O N C U R Stefan Kiefer 0 2 0 University of Oxford , UK 1 University of Kent, UK https://orcid. org/0000-0003-1128-0311 2 Radu Grigore We study selective monitors for labelled Markov chains. Monitors observe the outputs that are generated by a Markov chain during its run, with the goal of identifying runs as correct or faulty. A monitor is selective if it skips observations in order to reduce monitoring overhead. We are interested in monitors that minimize the expected number of observations. We establish an undecidability result for selectively monitoring general Markov chains. On the other hand, we show for non-hidden Markov chains (where any output identifies the state the Markov chain is in) that simple optimal monitors exist and can be computed efficiently, based on DFA language equivalence. These monitors do not depend on the precise transition probabilities in the Markov chain. We report on experiments where we compute these monitors for several open-source Java projects. 2012 ACM Subject Classification Theory of computation → Randomness, geometry and discrete structures Related Version A full version of the paper is available at https://arxiv.org/abs/1806. 06143. and phrases runtime monitoring; probabilistic systems; Markov chains; automata; language equivalence Introduction Consider an MC (Markov chain) whose transitions are labelled with letters, and a finite automaton that accepts languages of infinite words. Computing the probability that the random word emitted by the MC is accepted by the automaton is a classical problem at the heart of probabilistic verification. A finite prefix may already determine whether the random infinite word is accepted, and computing the probability that such a deciding finite prefix is produced is a nontrivial diagnosability problem. The theoretical problem we study in this paper is how to catch deciding prefixes without observing the whole prefix; i.e., we want to minimize the expected number of observations and still catch all deciding prefixes. Motivation. In runtime verification a program sends messages to a monitor, which decides if the program run is faulty. Usually, runtime verification is turned off in production code because monitoring overhead is prohibitive. QVM (quality virtual machine) and ARV (adaptive runtime verification) are existing pragmatic solutions to the overhead problem, which perform best-effort monitoring within a specified overhead budget [1, 3]. ARV relies on RVSE (runtime verification with state estimation) to also compute a probability that the program run is faulty [21, 15]. We take the opposite approach: we ask for the smallest overhead achievable without compromising precision at all. Previous Work. Before worrying about the performance of a monitor, one might want to check if faults in a given system can be diagnosed at all. This problem has been studied under the term diagnosability, first for non-stochastic finite discrete event systems [19], which are labelled transition systems. It was shown in [14] that diagnosability can be checked in polynomial time, although the associated monitors may have exponential size. Later the notion of diagnosability was extended to stochastic discrete-event systems, which are labelled Markov chains [22]. Several notions of diagnosability in stochastic systems exist, and some of them have several names, see, e.g., [20, 4] and the references therein. Bertrand et al. [4] also compare the notions. For instance, they show that for one variant of the problem (referred to as A-diagnosability or SS-diagnosability or IF-diagnosability) a previously proposed polynomial-time algorithm is incorrect, and prove that this notion of diagnosability is PSPACE-complete. Indeed, most variants of diagnosability for stochastic systems are PSPACE-complete [4], with the notable exception of AA-diagnosability (where the monitor is allowed to diagnose wrongly with arbitrarily small probability), which can be solved in polynomial time [5]. Selective Monitoring. In this paper, we seem to make the problem harder: since observations by a monitor come with a performance overhead, we allow the monitor to skip observations. In order to decide how many observations to skip, the monitor employs an observation policy. Skipping observations might decrease the probability of deciding (whether the current run of the system is faulty or correct). We do not study this tradeoff: we require policies to be feasible, i.e., the probability of deciding must be as high as under the policy that observes everything. We do not require the system to be diagnosable; i.e., the probability of deciding may be less than 1. Checking whether the system is diagnosable is PSPACE-complete ([4], Theorem 8). The Cost of Decision in General Markov Chains. The cost (of decision) is the number of observations that the policy makes during a run of the system. We are interested in minimizing the expected cost among all feasible policies. We show that if the system is diagnosable then there exists a policy with finite expected cost, i.e., the policy may stop observing after finite expected time. (The converse is not true.) Whether the infimum cost (among feasible policies) is finite is also PSPACE-complete (Theorem 14). Whether there is a feasible policy whose expected cost is smaller than a given threshold is undecidable (Theorem 15), even for diagnosable systems. Non-Hidden Markov Chains. We identify a class of MCs, namely non-hidden MCs, where the picture is much brighter. An MC is called non-hidden when each label identifies the state. Non-hidden MCs are always diagnosable. Moreover, we show that maximally procrastinating policies are (almost) optimal (Theorem 27). A policy is called maximally procrastinating when it skips observations up to the point where one further skip would put a decision on the current run in question. We also show that one can construct an (almost) optimal maximally procrastinating policy in polynomial time. This policy does not depend on the exact probabilities in the MC, although the expected cost under that policy does. That is, we efficiently construct a policy that is (almost) optimal regardless of the transition probabilities on the MC transitions. We also show that the infimum cost (among all feasible policies) can be computed in polynomial time (Theorem 28). Underlying these results is a theory based on automata, in particular, checking language equivalence of DFAs. Experiments. We evaluated the algorithms presented in this paper by implementing them in Facebook Infer, and trying them on 11 of the most forked Java projects on GitHub. We found that, on average, selective monitoring can reduce the number of observations to a half. 2 Preliminaries Let S be a finite set. We view elements of RS as vectors, more specifically as row vectors. We write 1 for the all-1 vector, i.e., the element of {1}S . For a vector μ ∈ RS , we denote by μT its transpose, a column vector. A vector μ ∈ [0, 1]S is a distribution over S if μ1T = 1. For s ∈ S we write es for the (Dirac) distribution over S with es(s) = 1 and es(t) = 0 for t ∈ S \ {s}. We view elements of RS×S as matrices. A matrix M ∈ [0, 1]S×S is called stochastic if each row sums up to one, i.e., M 1T = 1T. For a finite alphabet Σ, we write Σ∗ and Σω for the finite and infinite words over Σ, respectively. We write ε for the empty word. We represent languages L ⊆ Σω using deterministic finite automata, and we represent probability measures Pr over Σω using Markov chains. A (discrete-time, finite-state, labelled) Markov chain (MC) is a quadruple (S, Σ, M, s0) where S is a finite set of states, Σ a finite alphabet, s0 an initial state, and M : Σ → [0, 1]S×S specifies the transitions, such that Pa∈Σ M (a) is a stochastic matrix. Intuitively, if the MC is in state s, then with probability M (a)(s, s0) it emits a and moves to state s0. For the complexity results in this paper, we assume that all numbers in the matrices M (a) for a ∈ Σ are rationals given as fractions of integers represented in binary. We extend M to the mapping M : Σ∗ → [0, 1]S×S with M (a1 · · · ak) = M (a1) · · · M (ak) for a1, . . . , ak ∈ Σ. Intuitively, if the MC is in state s then with probability M (u)(s, s0) it emits the word u ∈ Σ∗ and moves (in |u| steps) to state s0. An MC is called non-hidden if for each a ∈ Σ all non-zero entries of M (a) are in the same column. Intuitively, in a non-hidden MC, the emitted letter identifies the next state. An MC (S, Σ, M, s0) defines the standard probability measure Pr over Σω, uniquely defined by assigning probabilities to cylinder sets {u}Σω, with u ∈ Σ∗, as follows: Pr({u}Σω) := es0 M (u)1T A deterministic finite automaton (DFA) is a quintuple (Q, Σ, δ, q0, F ) where Q is a finite set of states, Σ a finite alphabet, δ : Q × Σ → Q a transition function, q0 an initial state, and F ⊆ Q a set of accepting states. We extend δ to δ : Q × Σ∗ → Q as usual. A DFA defines a language L ⊆ Σω as follows: L := { w ∈ Σω | δ(q0, u) ∈ F for some prefix u of w } Note that we do not require accepting states to be visited infinitely often: just once suffices. Therefore we can and will assume without loss of generality that there is f with F = {f } and δ(f, a) = f for all a ∈ Σ. For the rest of the paper we fix an MC M = (S, Σ, M, s0) and a DFA A = (Q, Σ, δ, q0, F ). We define their composition as the MC M × A := (S × Q, Σ, M 0, (s0, q0)) where M 0(a)((s, q), (s0, q0)) equals M (a)(s, s0) if q0 = δ(q, a) and 0 otherwise. Thus, M and M × A induce the same probability measure Pr. An observation o ∈ Σ⊥ is either a letter or the special symbol ⊥ 6∈ Σ, which stands for “not seen”. An observation policy ρ : Σ∗⊥ → {0, 1} is a (not necessarily computable) function that, given the observations made so far, says whether we should observe the next letter. An observation policy ρ determines a projection πρ : Σω → Σω⊥: we have πρ(a1a2 . . . ) = o1o2 . . . when on+1 = (an+1 if ρ(o1 . . . on) = 1 ⊥ if ρ(o1 . . . on) = 0 for all n ≥ 0 We denote the see-all policy by •; thus, π•(w) = w. In the rest of the paper we reserve a for letters, o for observations, u for finite words, w for infinite words, υ for finite observation prefixes, s for states from an MC, and q for states from a DFA. We write o1 ∼ o2 when o1 and o2 are the same or at least one of them is ⊥. We lift this relation to (finite and infinite) sequences of observations (of the same length). We write w & υ when u ∼ υ holds for the length-|υ| prefix u of w. We say that υ is negatively deciding when Pr({w & υ | w ∈ L}) = 0. Intuitively, υ is negatively deciding when υ is incompatible (up to a null set) with L. Similarly, we say that υ is positively deciding when Pr({w & υ | w 6∈ L}) = 0. An observation prefix υ is deciding when it is positively or negatively deciding. An observation policy ρ decides w when πρ(w) has a deciding prefix. A monitor is an interactive algorithm that implements an observation policy: it processes a stream of letters and, after each letter, it replies with one of “yes”, “no”, or “skip n letters”, where n ∈ N ∪ {∞}. I Lemma 1. For any w, if some policy decides w then • decides w. Proof. Let ρ decide w. Then there is a deciding prefix υ of πρ(w). Suppose υ is positively deciding, i.e., Pr({w0 & υ | w0 6∈ L}) = 0. Let u be the length-|υ| prefix of w. Then Pr({w & u | w0 6∈ L}) = 0, since υ can be obtained from u by possibly replacing some letters with ⊥. Hence u is also positively deciding. Since u is a prefix of w = π•(w), we have that • decides w. The case where υ is negatively deciding is similar. J It follows that maxρ Pr({w | ρ decides w}) = Pr({w | • decides w}). We say that a policy ρ is feasible when it also attains the maximum, i.e., when Pr({w | ρ decides w}) = Pr({w | • decides w}) . Equivalently, ρ is feasible when Pr({w | • decides w implies ρ decides w}) = 1, i.e., almost all words that are decided by the see-all policy are also decided by ρ. If υ = o1o2 . . . is the shortest prefix of πρ(w) that is deciding, then the cost of decision Cρ(w) is P|υ|−1 ρ(o1 . . . ok). k=0 This paper is about finding feasible observation policies ρ that minimize Ex(Cρ), the expectation of the cost of decision with respect to Pr. 3 Qualitative Analysis of Observation Policies In this section we study properties of observation policies that are qualitative, i.e., not directly related to the cost of decision. We focus on properties of observation prefixes that a policy may produce. Observation Prefixes We have already defined deciding observation prefixes. We now define several other types of prefixes: enabled, confused, very confused, and finitary. A prefix υ is enabled if it occurs with positive probability, Pr({w & υ}) > 0. Intuitively, the other types of prefixes υ are defined in terms of what would happen if we were to observe all from now on: if it is not almost sure that eventually a deciding prefix is reached, then we say υ is confused; if it is almost sure that a deciding prefix will not be reached, then we say υ is very confused; if it is almost sure that eventually a deciding or very confused prefix is reached, then we say υ is finitary. To say this formally, let us make a few notational conventions: for an observation prefix υ, we write Pr(υ) as a shorthand for Pr({ uw | u ∼ υ }); for a set Υ of observation prefixes, we write Pr(Υ) as a shorthand for Pr Sυ∈Υ{ uw | u ∼ υ } . With these conventions, we define: 1. υ is confused when Pr({ υu | υu deciding }) < Pr(υ) 2. υ is very confused when Pr({ υu | υu deciding }) = 0 3. υ is finitary when Pr({ υu | υu deciding or very confused }) = Pr(υ) Observe that (a) confused implies enabled, (b) deciding implies not confused, and (c) enabled and very confused implies confused. The following are alternative equivalent definitions: 1. υ is confused when Pr({ uw | u ∼ υ, no prefix of υw is deciding }) > 0 2. υ is very confused when υu0 is non-deciding for all enabled υu0 3. υ is finitary when Pr({uw | u ∼ υ, no prefix of υw is deciding or very confused}) = 0 I Example 2. Consider the MC and the DFA depicted here: 1a s1 12 a s0 12 a s2 21 a 21 b a q0 b f a b All observation prefixes that do not start with b are enabled. The observation prefixes ab and ⊥b and, in fact, all observation prefixes that contain b, are positively deciding. For all n ∈ N we have Pr({w & an | w ∈ L}) > 0 and Pr({w & an | w 6∈ L}) > 0, so an is not deciding. If the MC takes the right transition first then almost surely it emits b at some point. Thus Pr({aaa · · · }) = 12 . Hence ε is confused. In this example only non-enabled observation prefixes are very confused. It follows that ε is not finitary. Beliefs For any s we write Prs for the probability measure of the MC Ms obtained from M by making s the initial state. For any q we write Lq ⊆ Σω for the language of the DFA Aq obtained from A by making q the initial state. We call a pair (s, q) negatively deciding when Prs(Lq) = 0; similarly, we call (s, q) positively deciding when Prs(Lq) = 1. A subset of S × Q is called belief. We call a belief negatively (positively, respectively) deciding when all its elements are. We fix the notation B0 := {(s0, q0)} (for the initial belief ) for the remainder of the paper. Define the belief NFA as the NFA B = (S × Q, Σ⊥, Δ, B0, ∅) with: Δ((s, q), a) = {(s0, q0) | M (a)(s, s0) > 0, δ(q, a) = q0} for a ∈ Σ Δ((s, q), ⊥) = [ Δ((s, q), a) a∈Σ We extend the transition function Δ : (S × Q) × Σ⊥ → 2S×Q to Δ : 2S×Q × Σ∗⊥ → 2S×Q in the way that is usual for NFAs. Intuitively, if belief B is the set of states where the product M × A could be now, then Δ(B, υ) is the belief adjusted by additionally observing υ. To reason about observation prefixes υ algorithmically, it will be convenient to reason about the belief Δ(B0, υ). We define confused, very confused, and finitary beliefs as follows: 1. B is confused when Prs({ uw | Δ(B, u) deciding }) < 1 for some (s, q) ∈ B 2. B is very confused when Δ(B, u) is empty or not deciding for all u 3. B is finitary when Prs({ uw | Δ(B, u) deciding or very confused }) = 1 for all (s, q) ∈ B I Example 3. In Example 2 we have B0 = {(s0, q0)}, and Δ(B0, an) = {(s1, q0), (s2, q0)} for all n ≥ 1, and Δ(B0, b) = ∅, and Δ(B0, a⊥) = {(s1, q0), (s2, q0), (s2, f )}, and Δ(B0, ⊥υ) = {(s2, f )} for all υ that contain b. The latter belief {(s2, f )} is positively deciding. We have Prs1 ({uw | Δ({(s1, q0)}, u) is deciding}) = 0, so any belief that contains (s1, q0) is confused. Also, B0 is confused as Prs0 ({uw | Δ({(s0, q0)}, u) is deciding}) = 12 . Relation Between Observation Prefixes and Beliefs By the following lemma, the corresponding properties of observation prefixes and beliefs are closely related. I Lemma 4. Let υ be an observation prefix. 1. υ is enabled if and only if Δ(B0, υ) 6= ∅. 2. υ is negatively deciding if and only if Δ(B0, υ) is negatively deciding. 3. υ is positively deciding if and only if Δ(B0, υ) is positively deciding. 4. υ is confused if and only if Δ(B0, υ) is confused. 5. υ is very confused if and only if Δ(B0, υ) is very confused. 6. υ is finitary if and only if Δ(B0, υ) is finitary. The following lemma gives complexity bounds for computing these properties. I Lemma 5. Let υ be an observation prefix, and B a belief. 1. Whether υ is enabled can be decided in P. 2. Whether υ (or B) is negatively deciding can be decided in P. 3. Whether υ (or B) is positively deciding can be decided in P. 4. Whether υ (or B) is confused can be decided in PSPACE. 5. Whether υ (or B) is very confused can be decided in PSPACE. 6. Whether υ (or B) is finitary can be decided in PSPACE. Proof sketch. The belief NFA B and the MC M × A can be computed in polynomial time (even in deterministic logspace). For items 1–3, there are efficient graph algorithms that search these product structures. For instance, to show that a given pair (s1, q1) is not negatively deciding, it suffices to show that B has a path from (s1, q1) to a state (s2, f ) for some s2. This can be checked in polynomial time (even in NL). For items 4–6, one searches the (exponential-sized) product of M and the determinization of B. This can be done in PSPACE. For instance, to show that a given belief B is confused, it suffices to show that there are (s1, q1) ∈ B and u1 and s2 such that M has a u1-labelled path from s1 to s2 such that there do not exist u2 and s3 such that M has a u2-labelled path from s2 to s3 such that Δ(B, u1u2) is deciding. This can be checked in NPSPACE = PSPACE by nondeterministically guessing paths in the product of M and the determinization of B. J Diagnosability We call a policy a diagnoser when it decides almost surely. I Example 6. In Example 2 a diagnoser does not exist. Indeed, the policy • does not decide when the MC takes the left transition, and decides (positively) almost surely when the MC takes the right transition in the first step. Hence Pr({w | • decides w}) = Pr(Σ∗{b}Σω) = 12 . So • is not a diagnoser. By Lemma 1, it follows that there is no diagnoser. Diagnosability can be characterized by the notion of confusion: I Proposition 7. There exists a diagnoser if and only if ε is not confused. The following proposition shows that diagnosability is hard to check. I Theorem 8 (cf. [4, Theorem 6]). Given an MC M and a DFA A, it is PSPACE-complete to check if there exists a diagnoser. Theorem 8 essentially follows from a result by Bertrand et al. [4]. They study several different notions of diagnosability; one of them (FA-diagnosability) is very similar to our notion of diagnosability. There are several small differences; e.g., their systems are not necessarily products of an MC and a DFA. Therefore we give a self-contained proof of Theorem 8. Proof sketch. By Proposition 7 it suffices to show PSPACE-completeness of checking whether ε is confused. Membership in PSPACE follows from Lemma 5.4. For hardness we reduce from the following problem: given an NFA U over Σ = {a, b} where all states are initial and accepting, does U accept all (finite) words? This problem is PSPACE-complete [16, Lemma 6]. J Allowing Confusion We say an observation policy allows confusion when, with positive probability, it produces an observation prefix υ⊥ such that υ⊥ is confused but υ is not. I Proposition 9. A feasible observation policy does not allow confusion. Hence, in order to be feasible, a policy must observe when it would get confused otherwise. In § 5 we show that in the non-hidden case there is almost a converse of Proposition 9; i.e., in order to be feasible, a policy need not do much more than not allow confusion. 4 Analyzing the Cost of Decision In this section we study the computational complexity of finding feasible policies that minimize the expected cost of decision. We focus on the decision version of the problem: Is there a feasible policy whose expected cost is smaller than a given threshold? Define: cinf := Since the see-all policy • never stops observing, we have Pr(C• = ∞) = 1, so Ex(C•) = ∞. However, once an observation prefix υ is deciding or very confused, there is no point in continuing observation. Hence, we define a light see-all policy ◦, which observes until the observation prefix u is deciding or very confused; formally, ◦(υ) = 0 if and only if υ is deciding or very confused. It follows from the definition of very confused that the policy ◦ is feasible. Concerning the cost C◦ we have for all w C◦(w) = ∞ X 1 − Dn(w) , n=0 (1) where Dn(w) = 1 if the length-n prefix of w is deciding or very confused, and Dn(w) = 0 otherwise. The following results are proved in the full version of the paper, on arXiv: I Lemma 10. If ε is finitary then Ex(C◦) is finite. I Lemma 11. Let ρ be a feasible observation policy. If Pr(Cρ < ∞) = 1 then ε is finitary. I Proposition 12. cinf is finite if and only if ε is finitary. I Proposition 13. If a diagnoser exists then cinf is finite. I Theorem 14. It is PSPACE-complete to check if cinf < ∞. Lemma 10 holds because, in M × A, a bottom strongly connected component is reached in expected finite time. Lemma 11 says that a kind of converse holds for feasible policies. Proposition 12 follows from Lemmas 10 and 11. Proposition 13 follows from Propositions 7 and 12. To show Theorem 14, we use Proposition 12 and adapt the proof of Theorem 8. The main negative result of the paper is that one cannot compute cinf : I Theorem 15. It is undecidable to check if cinf < 3, even when a diagnoser exists. Proof sketch. By a reduction from the undecidable problem whether a given probabilistic automaton accepts some word with probability > 12 . The proof is somewhat complicated. In fact, in the full version of the paper (arXiv) we give two versions of the proof: a short incorrect one (with the correct main idea) and a long correct one. J 5 The Non-Hidden Case Now we turn to positive results. In the rest of the paper we assume that the MC M is non-hidden, i.e., there exists a function →−· : Σ → S such that M (a)(s, s0) > 0 implies s0 = →−a . We extend →−· to finite words so that u−→a = →−a . We write s −→u to indicate that there is s0 with M (u)(s, s0) > 0. I Example 16. Consider the following non-hidden MC and DFA: 1b →− b B2 := Δ(B0, ⊥2) = {(→−b , q0), (→−a , f )} B3 := Δ(B0, ⊥2b) = {(→−b , q0), (→−b , f )} B0 is the initial belief. The beliefs B0 and B1 are not confused: indeed, Δ(B1, b) = {(→−b , q0)} is negatively deciding, and Δ(B1, a) = {(→−a , f )} is positively deciding. The belief B2 is confused, as there is no i ∈ N for which Δ(B2, bi) is deciding. Finally, B3 is very confused. We will show that in the non-hidden case there always exists a diagnoser (Lemma 23). It follows that feasible policies need to decide almost surely and, by Proposition 13, that cinf is finite. We have seen in Proposition 9 that feasible policies do not allow confusion. In this section we construct policies that procrastinate so much that they avoid confusion just barely. We will see that such policies have an expected cost that comes arbitrarily close to cinf . Language Equivalence We characterize confusion by language equivalence in a certain DFA. Consider the belief NFA B. In the non-hidden case, if we disallow ⊥-transitions then B becomes a DFA B0. For B0 we define a set of accepting states by FB0 := {(s, q) | Prs(Lq) = 1}. I Example 17. For the previous example, a part of the DFA B0 looks as follows: b (→−b , q0) b (→−a , q0) c (→−c , f ) (→−a , f ) b (→−b , f ) a c b States that are unreachable from (→−a , q0) are not drawn here. Ls,q = Ls0,q0 . We associate with each (s, q) the language Ls,q ⊆ Σ∗ that B0 accepts starting from initial state (s, q). We call (s, q), (s0, q0) language equivalent, denoted by (s, q) ≈ (s0, q0), when I Lemma 18. One can compute the relation ≈ in polynomial time. Proof. For any (s, q) one can use standard MC algorithms to check in polynomial time if Prs(Lq) = 1 (using a graph search in the composition M × A, as in the proof of Lemma 5.3). Language equivalence in the DFA B0 can be computed in polynomial time by minimization. J We call a belief B ⊆ S × Q settled when all (s, q) ∈ B are language equivalent. I Lemma 19. A belief B ⊆ S × Q is confused if and only if there is a ∈ Σ such that Δ(B, a) is not settled. It follows that one can check in polynomial time whether a given belief is confused. We generalize this fact in Lemma 22 below. I Example 20. In Example 16 the belief B3 is not settled. Indeed, from the DFA in Example 17 we see that L→−b ,q0 = ∅ 6= {b}∗ = L→−b ,f . Since B3 = Δ(B2, b), by Lemma 19, the belief B2 is confused. Procrastination For a belief B ⊆ S × Q and k ∈ N, if Δ(B, ⊥k) is confused then so is Δ(B, ⊥k+1). We define: cras(B) := sup{ k ∈ N | Δ(B, ⊥k) is not confused } ∈ N ∪ {−1, ∞} We set cras(B) := −1 if B is confused. We may write cras(s, q) for cras({(s, q)}). I Example 21. In Example 16 we have cras(B0) = cras(→−a , q0) = 1 and cras(B1) = 0 and cras(B2) = cras(B3) = −1 and cras(→−b , q0) = cras(→−a , f ) = ∞. I Lemma 22. Given a belief B, one can compute cras(B) in polynomial time. Further, if cras(B) is finite then cras(B) < |S|2 · |Q|2. Proof. Let k ∈ N. By Lemma 19, Δ(B, ⊥k) is confused if and only if: ∃ a. ∃ (s, q), (t, r) ∈ Δ(B, ⊥k) : s −→a, t −→a, (→−a , δ(q, a)) 6≈ (→−a , δ(r, a)) This holds if and only if there is B2 ⊆ B with |B2| ≤ 2 such that: ∃ a. ∃ (s, q), (t, r) ∈ Δ(B2, ⊥k) : s −→a, t −→a, (→−a , δ(q, a)) 6≈ (→−a , δ(r, a)) Let G be the directed graph with nodes in S × Q × S × Q and edges ((s, q, t, r), (s0, q0, t0, r0)) ⇐⇒ Δ({(s, q), (t, r)}, ⊥) ⊇ {(s0, q0), (t0, r0)} . Also define the following set of nodes: U := {(s, q, t, r) | ∃ a : s −→a, t −→a, (→−a , δ(q, a)) 6≈ (→−a , δ(r, a))} By Lemma 18 one can compute U in polynomial time. It follows from the argument above that Δ(B, ⊥k) is confused if and only if there are (s, q), (t, r) ∈ B such that there is a length-k path in G from (s, q, t, r) to a node in U . Let k ≤ |S × Q × S × Q| be the length of the shortest such path, and set k := ∞ if no such path exists. Then k can be computed in polynomial time by a search of the graph G, and we have cras(B) = k − 1. J The Procrastination Policy For any belief B and any observation prefix υ, the language equivalence classes represented in Δ(B, υ) depend only on υ and the language equivalence classes in B. Therefore, when tracking beliefs along observations, we may restrict B to a single representative of each equivalence class. We denote this operation by B↓. A belief B is settled if and only if |B↓| ≤ 1. A procrastination policy ρpro(K) is parameterized with (a large) K ∈ N. Define (and precompute) k(s, q) := min{K, cras(s, q)} for all (s, q). We define ρpro(K) by the following monitor that implements it: 1. i := 0 2. while (si, qi) is not deciding: a. skip k(si, qi) observations, then observe a letter ai b. {(si+1, qi+1)} := Δ((si, qi), ⊥k(si,qi)ai)↓; c. i := i + 1; 3. output yes/no decision It follows from the definition of cras and Lemma 19 that Δ((si, qi), υi)↓ is indeed a singleton for all i. We have: I Lemma 23. For all K ∈ N the procrastination policy ρpro(K) is a diagnoser. Proof. For a non-hidden MC M and a DFA A, there is at most one successor for (s, q) on letter a in the belief NFA B, for all s, q, a. Then, by Lemma 19, singleton beliefs are not confused, and in particular the initial belief B0 is not confused. By Lemma 4.4, ε is not confused, which means that Pr({ u | u deciding }) = Pr(ε) = 1. Since almost surely a deciding word u is produced and since Δ(B0, u) ⊆ Δ(B0, υ) whenever u ∼ υ, it follows that eventually an observation prefix υ is produced such that Δ(B0, υ) contains a deciding pair (s, q). But, as remarked above, Δ(B0, υ) is settled, so it is deciding. J The Procrastination MC Mpro(K) The policy ρpro(K) produces a (random, almost surely finite) word a1a2 · · · an with n = Cρpro(K). Indeed, the observations that ρpro(K) makes can be described by an MC. Recall that we have previously defined a composition MC M × A = (S × Q, Σ, M 0, (s0, q0)). Now define an MC Mpro(K) := (S × Q, Σ ∪ {$}, Mpro(K), (s0, q0)) where $ 6∈ Σ is a fresh letter and the transitions are as follows: when (s, q) is deciding then Mpro(K)($) (s, q), (s, q) := 1, and when (s, q) is not deciding then Mpro(K)(a) (s, q), (→−a , q0) := M 0(⊥)k(s,q)M 0(a) (s, q), (→−a , q0) , where the matrix M 0(⊥) := Pa M 0(a) is powered by k(s, q). The MC Mpro(K) may not be non-hidden, but could be made non-hidden by (i) collapsing all language equivalent (s, q1), (s, q2) in the natural way, and (ii) redirecting all $-labelled transition to a new state →−$ that has a self-loop. In the understanding that $$$ · · · indicates ‘decision made’, the probability distribution defined by the MC Mpro(K) coincides with the probability distribution on sequences of non-⊥ observations made by ρpro(K). I Example 24. For Example 16 the MC Mpro(K) for K ≥ 1 is as follows: 1$ Here the lower number in a state indicate the cras number. The left state is negatively deciding, and the right state is positively deciding. The policy ρpro(K) skips the first observation and then observes either b or a, each with probability 12 , each leading to a deciding belief. Maximal Procrastination is Optimal The following lemma states, loosely speaking, that when a belief {(s, q)} with cras(s, q) = ∞ is reached and K is large, then a single further observation is expected to suffice for a decision. I Lemma 25. Let c(K, s, q) denote the expected cost of decision under ρpro(K) starting in (s, q). For each ε > 0 there exists K ∈ N such that for all (s, q) with cras(s, q) = ∞ we have c(K, s, q) ≤ 1 + ε. Proof sketch. The proof is a quantitative version of the proof of Lemma 23. The singleton belief {(s, q)} is not confused. Thus, if K is large then with high probability the belief B := Δ({(s, q)}, ⊥K a) (for the observed next letter a) contains a deciding pair (s0, q0). But if cras(s, q) = ∞ then, by Lemma 19, B is settled, so if B contains a deciding pair then B is deciding. J I Example 26. Consider the following variant of the previous example: 1b →− b 31 b 31 a →− a 31 c 1c →− c q0 a b c Σ f The MC Mpro(K) for K ≥ 0 is as follows: 1$ The left state is negatively deciding, and the right state is positively deciding. We have c(K, →−b , q0) = c(K, →−c , f ) = 0 and c(K, →−a , q0) = 1/(1 − ( 13 )K+1). Now we can prove the main positive result of the paper: I Theorem 27. For any feasible policy ρ there is K ∈ N such that: Ex(Cρpro(K)) ≤ Ex(Cρ) Proof sketch. Let ρ be a feasible policy. We choose K > |S|2 · |Q|2, so, by Lemma 22, ρpro(K) coincides with ρpro(∞) until time, say, n∞ when ρpro(K) encounters a pair (s, q) with cras(s, q) = ∞. (The time n∞ may, with positive probability, never come.) Let us compare ρpro(K) with ρ up to time n∞. For n ∈ {0, . . . , n∞}, define υpro(n) and υρ(n) as the observation prefixes obtained by ρpro and ρ, respectively, after n steps. Write `pro(n) and `ρ(n) for the number of non-⊥ observations in υpro(n) and υρ(n), respectively. For beliefs B, B0 we write B B0 when for all (s, q) ∈ B there is (s0, q0) ∈ B0 with (s, q) ≈ (s0, q0). One can show by induction that we have for all n ∈ {0, . . . , n∞}: `pro(n) ≤ `ρ(n) and Δ(B0, υpro(n)) Δ(B0, υρ(n)) or `pro(n) < `ρ(n) If time n∞ does not come then the inequality `pro(n) ≤ `ρ(n) from above suffices. Similarly, if at time n∞ the pair (s, q) is deciding, we are also done. If after time n∞ the procrastination policy ρpro(K) observes at least one more letter then ρ also observes at least one more letter. By Lemma 25, one can choose K large so that for ρpro(K) one additional observation probably suffices. If it is the case that ρ almost surely observes only one letter after n , ∞ then ρpro(K) also needs only one more observation, since it has observed at time n∞. J It follows that, in order to compute cinf , it suffices to analyze Ex(Cρpro(K)) for large K. This leads to the following theorem: I Theorem 28. Given a non-hidden MC M and a DFA A, one can compute cinf in polynomial time. Proof. For each (s, q) define c(K, s, q) as in Lemma 25, and define c(s, q) := limK→∞ c(K, s, q). By Lemma 25, for each non-deciding (s, q) with cras(s, q) = ∞ we have c(s, q) = 1. Hence the c(s, q) satisfy the following system of linear equations where some coefficients come from the procrastination MC Mpro(∞):  0  c(s, q) = 1 c0(s, q) = 1 + c0(s, q) X X Mpro(∞) (s, q), (→−a , q0) · c(−→a , q0) a q0 if (s, q) is deciding if (s, q) is not deciding and cras(s, q) = ∞ otherwise if cras(s, q) < ∞ By solving the system one can compute c(s0, q0) in polynomial time. We have: cinf = Hence one can compute cinf in polynomial time. J Empirical Evaluation of the Expected Optimal Cost We have shown that maximal procrastination is optimal in the non-hidden case (Theorem 27). However, we have not shown how much better the optimal policy is than the see-all baseline. It appears difficult to answer this question analytically, so we address it empirically. We implemented our algorithms in a fork of the Facebook Infer static analyzer [8], and applied them to 11 open-source projects, totaling 80 thousand Java methods. We found that in > 90% of cases the maximally procrastinating monitor is trivial and thus the optimal cost is 0, because Infer decides statically if the property is violated. In the remaining cases, we found that the optimal cost is roughly half of the see-all cost, but the variance is high. Design. Our setting requires a DFA and an MC representing, respectively, a program property and a program. For this empirical estimation of the expected optimal cost, the DFA is fixed, the MC shape is the symbolic flowgraph of a real program, and the MC probabilities are sampled from Dirichlet distributions. The DFA represents the following property: ‘there are no two calls to next without an intervening call to hasNext’. To understand how the MC shape is extracted from programs, some background is needed. Infer [8, 9] is a static analyzer that, for each method, infers several preconditions and, attached to each precondition, a symbolic path. For a simple example, consider a method whose body is ‘if (b) x.next(); if (!b) x.next()’. Infer would generate two preconditions for it, b and ¬b. In each of the two attached symbolic paths, we can see that next is not called twice, which we would not notice with a control flowgraph. The symbolic paths are inter-procedural. If a method f calls a method g, then the path of f will link to a path of g and, moreover, it will pick one of the paths of g that corresponds to what is currently known at the call site. For example, if g(b) is called from a state in which ¬b holds, then Infer will select a path of g compatible with the condition ¬b. The symbolic paths are finite because abstraction is applied, including across mutually recursive calls. But, still, multiple vertices of the symbolic path correspond to the same vertex of the control flowgraph. For example, Infer may go around a for-loop five times before noticing the invariant. By coalescing those vertices of the symbolic path that correspond to the same vertex of the control flowgraph we obtain an SFG (symbolic flowgraph). We use such SFGs as the skeleton of MCs. Intuitively, one can think of SFGs as inter-procedural control flowgraphs restricted based on semantic information. Vertices correspond to locations in the program text, and transitions correspond to method calls or returns. Transition probabilities should then be interpreted as a form of static branch prediction. One could learn these probabilities by observing many runs of the program on typical input data, for example by using the Baum–Welch algorithm [17]. Instead, we opt to show that the improvement in expected observation cost is robust over a wide range of possible transition probabilities, which we do by drawing several samples from Dirichlet distributions. Besides, recall that the (optimal) procrastination policy does not depend on transition probabilities. Once we have a DFA and an MC we compute their product. In some cases, it is clear that the product is empty or universal. These are the cases in which we can give the verdict right away, because no observation is necessary. We then focus on the non-trivial cases. For non-trivial MC × DFA products, we compute the expected cost of the light see-all policy Ex(C◦), which observes all letters until a decision is made and then stops. We can do so by using standard algorithms [2, Chapter 10.5]. Then, we compute Mpro, which we use to compute the expected observation cost cinf of the procrastination policy (Theorem 28). Recall that in order to compute Mpro, one needs to compute the cras function, and also to find language equivalence classes. Thus, computing Mpro entails computing all the information necessary for implementing a procrastinating monitor. Methodology. We selected 11 Java projects among those that are most forked on GitHub. We ran Infer on each of these projects. From the inferred specifications, we built SFGs and monitors that employ light see-all policies and maximal procrastination policies. From these monitors, we computed the respective expected costs, solving the linear systems using Gurobi [12]. Our implementation is in a fork of Infer, on GitHub. Results. The results are given in Table 1. We first note that the number of monitors is much smaller than the number of methods, by a factor of 10 or 100. This is because in most cases we are able to determine the answer statically, by analyzing the symbolic paths produced by Infer. The large factor should not be too surprising: we are considering a fixed property about iterators, not all Java methods use iterators, and, when they do, it is usually easy to tell that they do so correctly. Still, each project has a few hundred monitors, which handle the cases that are not so obvious. cinf We note that Ex(C ) ≈ 0.5. The table supports this by presenting the median and the geometric average, whi◦ch are close to each-other; the arithmetic average is also close. There is, however, quite a bit of variation from monitor to monitor, as shown in Figure 1. We conclude that selective monitoring has the potential to significantly reduce the overhead of runtime monitoring. 7 Future Work In this paper we required policies to be feasible, which means that our selective monitors are as precise as non-selective monitors. One may relax this and study the tradeoff between efficiency (skipping even more observations) and precision (probability of making a decision). Further, one could replace the diagnosability notion of this paper by other notions from the literature; one could investigate how to compute cinf for other classes of MCs, such as acyclic MCs; one could study the sensitivity of cinf to changes in transition probabilities; and one could identify classes of MCs for which selective monitoring helps and classes of MCs for which selective monitoring does not help. A nontrivial extension to the formal model would be to include some notion of data, which is pervasive in practical specification languages used in runtime verification [13]. This would entail replacing the DFA with a more expressive device, such as a nominal automaton [7], a symbolic automaton [10], or a logic with data (e.g., [11]). Alternatively, one could side-step the problem by using the slicing idea [18], which separates the concern of handling data at the expense of a mild loss of expressive power. Finally, the monitors we computed could be used in a runtime verifier, or even in session type monitoring where the setting is similar [6]. 1 2 3 4 5 6 7 8 9 Matthew Arnold , Martin T. Vechev, and Eran Yahav . QVM: an efficient runtime for detecting defects in deployed systems . In OOPSLA , 2008 . C. Baier and J.-P. Katoen . Principles of model checking . MIT Press, 2008 . Ezio Bartocci , Radu Grosu, Atul Karmarkar , Scott A. Smolka , Scott D. Stoller , Erez Zadok, and Justin Seyster . Adaptive runtime verification . In RV , 2012 . N. Bertrand , S. Haddad , and E. Lefaucheux . Foundation of diagnosis and predictability in probabilistic systems . In Proceedings of FSTTCS , volume 29 of LIPIcs , pages 417 - 429 , 2014 . N. Bertrand , S. Haddad , and E. Lefaucheux . Accurate approximate diagnosability of stochastic systems . In Proceedings of LATA , pages 549 - 561 . Springer, 2016 . Monitoring networks through multiparty session types . TCS , 2017 . LMCS , 2014 . In NASA Formal Methods Symposium , 2015 . C. Calcagno , D. Distefano , P. W. O'Hearn , and H. Yang . Compositional shape analysis by means of bi-abduction . JACM , 2011 . Loris D'Antoni and Margus Veanes . The power of symbolic automata and transducers . In CAV , 2017 . TOCL , 2009 . Gurobi Optimization , Inc. Gurobi optimizer reference manual . http://www.gurobi.com, 2017 . Klaus Havelund , Martin Leucker, Giles Reger, and Volker Stolz . A Shared Challenge in Behavioural Specification (Dagstuhl Seminar 17462) . Dagstuhl Reports , 2018 . doi: 10 .4230/DagRep.7.11.59. S. Jiang , Z. Huang , V. Chandra , and R. Kumar . A polynomial algorithm for testing diagnosability of discrete-event systems . IEEE Transactions on Automatic Control , 46 ( 8 ): 1318 - 1321 , 2001 . K. Kalajdzic , E. Bartocci , S.A. Smolka , S.D. Stoller , and R. Grosu . Runtime verification with particle filtering . In RV , 2013 . J.-Y. Kao , N. Rampersad , and J. Shallit . On NFAs where all states are final, initial, or both . Theoretical Computer Science , 410 ( 47 ): 5010 - 5021 , 2009 . Brian G. Leroux . Maximum-likelihood estimation for hidden markov models . Stochastic Processes and Their Applications , 1992 . Grigore Ros,u and Feng Chen. Semantics and algorithms for parametric monitoring . LMCS , 2012 . M. Sampath , R. Sengupta , S. Lafortune , K. Sinnamohideen , and D. Teneketzis . Diagnosability of discrete-event systems . IEEE Transactions on Automatic Control , 40 ( 9 ): 1555 - 1575 , 1995 . A. Prasad Sistla , Miloš Žefran, and Yao Feng . Monitorability of stochastic dynamical systems . In Proceedings of CAV , pages 720 - 736 . Springer, 2011 . Runtime verification with state estimation . In RV , 2011 . D. Thorsley and D. Teneketzis . Diagnosability of stochastic discrete-event systems . IEEE Transactions on Automatic Control , 50 ( 4 ): 476 - 492 , 2005 .


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2018/9558/pdf/LIPIcs-CONCUR-2018-20.pdf

Radu Grigore, Stefan Kiefer. Selective Monitoring, LIPICS - Leibniz International Proceedings in Informatics, 2018, 20:1-20:16, DOI: 10.4230/LIPIcs.CONCUR.2018.20