Learning and statistical model checking of system response times (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s11219-018-9432-8.pdf

Learning and statistical model checking of system response times

Software Quality Journal (2019) 27:757–795 https://doi.org/10.1007/s11219-018-9432-8 Learning and statistical model checking of system response times Bernhard K. Aichernig1 · Priska Bauerstätter2 · Elisabeth Jöbstl3 · Severin Kann3 · Robert Korošec3 · Willibald Krenn2 · Cristinel Mateis2 · Rupert Schlick2 · Richard Schumi1 Published online: 3 January 2019 © The Author(s) 2019 Abstract Since computers have become increasingly more powerful, users are less willing to accept slow responses of systems. Hence, performance testing is important for interactive systems. However, it is still challenging to test if a system provides acceptable performance or can satisfy certain response-time limits, especially for different usage scenarios. On the one hand, there are performance-testing techniques that require numerous costly tests of the system. On the other hand, model-based performance analysis methods have a doubtful model quality. Hence, we propose a combined method to mitigate these issues. We learn response-time distributions from test data in order to augment existing behavioral models with timing aspects. Then, we perform statistical model checking with the resulting model for a performance prediction. Finally, we test the accuracy of our prediction with hypotheses testing of the real system. Our method is implemented with a property-based testing tool with integrated statistical model checking algorithms. We demonstrate the feasibility of our techniques in an industrial case study with a web-service application. Keywords Statistical model checking · Property-based testing · Model-based testing · FsCheck · User profiles · Response time · Cost learning · Performance testing 1 Introduction Performance testing is important, especially for critical systems. It is usually done with sophisticated load techniques that are computationally expensive and even infeasible when various user populations should be analyzed. Alternatively, the performance may be analysed by simulating a model of the system. Simulation allows faster analysis and requires less computing resources, but the quality of the model is often questionable. We present a simulation method based on statistical model checking (SMC) that enables a fast probability estimation with a model and also a verification of the resulting probabilities on the real system. Richard Schumi Extended author information available on the last page of the article. 758 Software Quality Journal (2019) 27:757–795 SMC is a simulation method that can answer both quantitative and qualitative questions. The questions are expressed as properties of a stochastic model which are checked by analyzing simulations of this model. Depending on the SMC algorithm, either a fixed number of samples or a stopping criterion is needed. We implement our method with the help of a property-based test-case generator that is originally intended for functional testing. Property-based testing (PBT) is a random testing technique that tries to falsify a given property, which describes the expected behavior of a function-under-test. In order to test such a property, a PBT tool generates inputs for the function and checks if the expected behavior is observed. PBT tools were originally designed for testing algebraic properties of functional programs, but nowadays, they also support model-based testing. In previous work (Aichernig and Schumi 2017a, b), we have demonstrated how SMC can be integrated into a PBT tool in order to evaluate properties of stochastic models as well as stochastic implementations. Based on this previous work, we present a simulation method for stochastic user profiles in order to answer questions about the expected response time of a system-under-test (SUT). Figure 1 illustrates this process. (1) First, we apply a PBT tool to run model-based testing (MBT) with a functional model concurrently in several threads in order to obtain log-files that include the response times of the tested web-service requests. Since the model serves as an oracle, we also test for conformance violations in this phase. This functional aspect was discussed in earlier work (Aichernig and Schumi 2016a), here the focus is on timing. (2) Next, we derive response-time distributions per type of service request via linear regression, which was a suitable learning method for our logs. Since the response time is influenced by the parallel activity on the server, the distributions are parametrized by the number of active users. (3) These cost distributions are added to the transitions in the functional model resulting in, so called, cost models. These models have the semantics of stochastic timed automata (STA) (Ballarini et al. 2013). The name cost model shall emphasize that our method may be generalized to other type of cost indicators, e.g., energy consumption. We also combine these models with user profiles, containing probabilities for transitions and input durations, in order to simulate realistic user behavior and the expected response time. (4) These combined models can be utilized for SMC, in order to evaluate response-time properties, like “What is the probability that the response time of each user within a user population is under a certain threshold?” or “Is this probability above or below a specific limit?”. We apply them for a Monte Carlo simulation, in order to estimate the probability of such properties. (5) Additionally, we can check such properties directly on the SUT, e.g., to verify the results of the model simulation. In principle, it is also possible to skip the model simulation and (statistically) test response-time properties directly on the SUT. However, running a realistic user population on the SUT is time-consuming and might not be feasible due to very long waiting times. A simulation on the model is much faster. Therefore, also properties that require a larger number of samples can be checked, e.g., using Monte Carlo simulation. We run the SUT only with a limited number of samples in order to check, if the simulation results of the model are satisfied by the SUT. Therefore, we test the SUT with the sequential probability ratio test (Wald 1973), a form of hypothesis testing, as this allows us to stop testing as soon as we have sufficient evidence. Software Quality Journal (2019) 27:757–795 759 Fig. 1 Overview of the steps for cost-model learning and response-time checking Related work A number of related approaches in the area of PBT are concerned with testing concurrent software. For example, Claessen et al. (2009) presented a testing method that can find race conditions in Erlang with QuickCheck and a user-level scheduler called PULSE. A similar approach was shown by Norell et al. (2013). They demonstrated an automated way to test blocking operations, i.e., operations that have to wait until a certain condition is met. Another concurrent PBT approach by Hughes et al. (2016) showed how PBT can be applied to test distributed file- (...truncated)