Learning and statistical model checking of system response times
Software Quality Journal (2019) 27:757–795
https://doi.org/10.1007/s11219-018-9432-8
Learning and statistical model checking of system response
times
Bernhard K. Aichernig1 · Priska Bauerstätter2 · Elisabeth Jöbstl3 · Severin Kann3 ·
Robert Korošec3 · Willibald Krenn2 · Cristinel Mateis2 · Rupert Schlick2 ·
Richard Schumi1
Published online: 3 January 2019
© The Author(s) 2019
Abstract
Since computers have become increasingly more powerful, users are less willing to accept
slow responses of systems. Hence, performance testing is important for interactive systems.
However, it is still challenging to test if a system provides acceptable performance or can
satisfy certain response-time limits, especially for different usage scenarios. On the one
hand, there are performance-testing techniques that require numerous costly tests of the
system. On the other hand, model-based performance analysis methods have a doubtful
model quality. Hence, we propose a combined method to mitigate these issues. We learn
response-time distributions from test data in order to augment existing behavioral models
with timing aspects. Then, we perform statistical model checking with the resulting model
for a performance prediction. Finally, we test the accuracy of our prediction with hypotheses
testing of the real system. Our method is implemented with a property-based testing tool
with integrated statistical model checking algorithms. We demonstrate the feasibility of our
techniques in an industrial case study with a web-service application.
Keywords Statistical model checking · Property-based testing · Model-based testing ·
FsCheck · User profiles · Response time · Cost learning · Performance testing
1 Introduction
Performance testing is important, especially for critical systems. It is usually done with
sophisticated load techniques that are computationally expensive and even infeasible when
various user populations should be analyzed. Alternatively, the performance may be analysed by simulating a model of the system. Simulation allows faster analysis and requires
less computing resources, but the quality of the model is often questionable. We present a
simulation method based on statistical model checking (SMC) that enables a fast probability estimation with a model and also a verification of the resulting probabilities on the real
system.
Richard Schumi
Extended author information available on the last page of the article.
758
Software Quality Journal (2019) 27:757–795
SMC is a simulation method that can answer both quantitative and qualitative questions.
The questions are expressed as properties of a stochastic model which are checked by analyzing simulations of this model. Depending on the SMC algorithm, either a fixed number
of samples or a stopping criterion is needed.
We implement our method with the help of a property-based test-case generator that is
originally intended for functional testing. Property-based testing (PBT) is a random testing technique that tries to falsify a given property, which describes the expected behavior
of a function-under-test. In order to test such a property, a PBT tool generates inputs for
the function and checks if the expected behavior is observed. PBT tools were originally
designed for testing algebraic properties of functional programs, but nowadays, they also
support model-based testing.
In previous work (Aichernig and Schumi 2017a, b), we have demonstrated how SMC
can be integrated into a PBT tool in order to evaluate properties of stochastic models as
well as stochastic implementations. Based on this previous work, we present a simulation
method for stochastic user profiles in order to answer questions about the expected response
time of a system-under-test (SUT). Figure 1 illustrates this process.
(1)
First, we apply a PBT tool to run model-based testing (MBT) with a functional model
concurrently in several threads in order to obtain log-files that include the response
times of the tested web-service requests. Since the model serves as an oracle, we also
test for conformance violations in this phase. This functional aspect was discussed in
earlier work (Aichernig and Schumi 2016a), here the focus is on timing.
(2) Next, we derive response-time distributions per type of service request via linear
regression, which was a suitable learning method for our logs. Since the response time
is influenced by the parallel activity on the server, the distributions are parametrized
by the number of active users.
(3) These cost distributions are added to the transitions in the functional model resulting in, so called, cost models. These models have the semantics of stochastic timed
automata (STA) (Ballarini et al. 2013). The name cost model shall emphasize that our
method may be generalized to other type of cost indicators, e.g., energy consumption.
We also combine these models with user profiles, containing probabilities for transitions and input durations, in order to simulate realistic user behavior and the expected
response time.
(4) These combined models can be utilized for SMC, in order to evaluate response-time
properties, like “What is the probability that the response time of each user within a
user population is under a certain threshold?” or “Is this probability above or below a
specific limit?”.
We apply them for a Monte Carlo simulation, in order to estimate the probability
of such properties.
(5) Additionally, we can check such properties directly on the SUT, e.g., to verify the results of
the model simulation. In principle, it is also possible to skip the model simulation and
(statistically) test response-time properties directly on the SUT. However, running a
realistic user population on the SUT is time-consuming and might not be feasible due
to very long waiting times. A simulation on the model is much faster. Therefore, also properties that require a larger number of samples can be checked, e.g., using Monte Carlo
simulation. We run the SUT only with a limited number of samples in order to check,
if the simulation results of the model are satisfied by the SUT. Therefore, we test
the SUT with the sequential probability ratio test (Wald 1973), a form of hypothesis
testing, as this allows us to stop testing as soon as we have sufficient evidence.
Software Quality Journal (2019) 27:757–795
759
Fig. 1 Overview of the steps for cost-model learning and response-time checking
Related work A number of related approaches in the area of PBT are concerned with testing concurrent software. For example, Claessen et al. (2009) presented a testing method
that can find race conditions in Erlang with QuickCheck and a user-level scheduler called
PULSE. A similar approach was shown by Norell et al. (2013). They demonstrated an
automated way to test blocking operations, i.e., operations that have to wait until a certain condition is met. Another concurrent PBT approach by Hughes et al. (2016) showed
how PBT can be applied to test distributed file- (...truncated)