Learning deterministic probabilistic automata from a model checking perspective

Machine Learning, May 2016

Probabilistic automata models play an important role in the formal design and analysis of hard- and software systems. In this area of applications, one is often interested in formal model-checking procedures for verifying critical system properties. Since adequate system models are often difficult to design manually, we are interested in learning models from observed system behaviors. To this end we adopt techniques for learning finite probabilistic automata, notably the Alergia algorithm. In this paper we show how to extend the basic algorithm to also learn automata models for both reactive and timed systems. A key question of our investigation is to what extent one can expect a learned model to be a good approximation for the kind of probabilistic properties one wants to verify by model checking. We establish theoretical convergence properties for the learning algorithm as well as for probability estimates of system properties expressed in linear time temporal logic and linear continuous stochastic logic. We empirically compare the learning algorithm with statistical model checking and demonstrate the feasibility of the approach for practical system verification.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs10994-016-5565-9.pdf

Learning deterministic probabilistic automata from a model checking perspective

Mach Learn Learning deterministic probabilistic automata from a model checking perspective Hua Mao 0 1 Yingke Chen 0 1 Manfred Jaeger 0 1 Thomas D. Nielsen 0 1 Kim G. Larsen 0 1 Brian Nielsen 0 1 0 Department of Computer Science, Aalborg University , 9220 Aalborg East , Denmark 1 College of Computer Science, Sichuan University , Chengdu 610065 , China Probabilistic automata models play an important role in the formal design and analysis of hard- and software systems. In this area of applications, one is often interested in formal model-checking procedures for verifying critical system properties. Since adequate system models are often difficult to design manually, we are interested in learning models from observed system behaviors. To this end we adopt techniques for learning finite probabilistic automata, notably the Alergia algorithm. In this paper we show how to extend the basic algorithm to also learn automata models for both reactive and timed systems. A key question of our investigation is to what extent one can expect a learned model to be a good approximation for the kind of probabilistic properties one wants to verify by model checking. We establish theoretical convergence properties for the learning algorithm as well as for probability estimates of system properties expressed in linear time temporal logic and linear continuous stochastic logic. We empirically compare the learning algorithm with statistical model checking and demonstrate the feasibility of the approach for practical system verification. Editors: Jeffrey Heinz, C. de la Higuera and Tim Oates. Probabilistic model checking; Probabilistic automata learning; Linear time temporal logic 1 Introduction Grammatical inference (GI) (Higuera 2010) , also known as grammar induction or grammar learning, is concerned with learning language specifications in the form of grammars or automata from data consisting of strings over some alphabet. Starting with Angluin’s seminal work (Angluin 1987) , methods have been developed for learning deterministic, nondeterministic and probabilistic grammars and automata. The learning techniques in GI have been applied in many areas, such as speech recognition, software development, pattern recognition, and computational biology. In this paper we adapt the learning techniques in the GI area to learn models for model checking. Model Checking is a verification technique for determining whether a system model complies with a specification provided in a formal language (Baier and Katoen 2008) . In the simplest case, system models are given by finite non-deterministic or probabilistic automata, but model-checking techniques have also been developed for more sophisticated system models, e.g., timed automata (Laroussinie et al. 1995; Bouyer et al. 2011, 2008) . Powerful software tools that are available for model checking include UPPAAL (Behrmann et al. 2011) and PRISM (Kwiatkowska et al. 2011) . Traditionally, models used in model-checking are manually constructed, either in the development phase as system designs, or for existing hard- or software systems from known specifications and documentation. This procedure can be both time-consuming and errorprone, especially for systems lacking updated and detailed documentation, such as legacy software, 3rd party components, and black-box systems. These difficulties are generally considered a hindrance for adopting otherwise powerful model checking techniques, and have led to an increased interest in methods for data-driven model learning (or specification mining) for formal verification (Ammons et al. 2002; Sen et al. 2004a; Mao et al. 2011, 2012) . In this paper we investigate methods for learning deterministic probabilistic finite automata (DPFA) from data consisting of previously observed system behaviors, i.e., sample executions. The probabilistic models considered in this paper include labeled Markov decision processes (MDPs) and continuous-time labeled Markov chains (CTMCs), where the former model class also covers labeled Markov chains (LMCs) as a special case. Labeled Markov decision processes can be used to model reactive systems, where input actions are chosen non-deterministically and the resulting output for a given input action is determined probabilistically. Nondeterminism can model the free and unpredictable choices from an environment or the concurrency between components in a system. MDPs and by extension LMCs are discrete-time models, where each transition takes a universal discrete time unit. CTMCs, on the other hand, are real-time models, where the time delays between transitions are determined probabilistically. We show how methods for learning deterministic probabilistic finite automata (DPFA) (Carrasco and Oncina 1994, 1999; Higuera 2010) can be adapted for learning the above three model classes and pose the results within a model checking context. We give consistency results for the learning algorithms, and we analyze both t (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs10994-016-5565-9.pdf

Hua Mao, Yingke Chen, Manfred Jaeger, Thomas D. Nielsen, Kim G. Larsen, Brian Nielsen. Learning deterministic probabilistic automata from a model checking perspective, Machine Learning, 2016, pp. 255-299, Volume 105, Issue 2, DOI: 10.1007/s10994-016-5565-9