Fast test suite-driven model-based fault localisation with application to pinpointing defects in student programs
Fast test suite-driven model-based fault localisation with application to pinpointing defects in student programs
Geoff Birch 0 1 2 3
Bernd Fischer 0 1 2 3
Michael Poppleton 0 1 2 3
0 Michael Poppleton
1 Bernd Fischer
2 Communicated by Prof. Alfonso Pierantonio , Jasmin Blanchette, Francis Bordeleau, Nikolai Kosmatov, Gabi Taentzer, and Manuel Wimmer
3 Stellenbosch University , Matieland 7602 , South Africa
Fault localisation, i.e. the identification of program locations that cause errors, takes significant effort and cost. We describe a fast model-based fault localisation algorithm that, given a test suite, uses symbolic execution methods to fully automatically identify a small subset of program locations where genuine program repairs exist. Our algorithm iterates over failing test cases and collects locations where an assignment change can repair exhibited faulty behaviour. Our main contribution is an improved search through the test suite, reducing the effort for the symbolic execution of the models and leading to speed-ups of more than two orders of magnitude over the previously published implementation by Griesmayer et al. We implemented our algorithm for C programs, using the KLEE symbolic execution engine, and demonstrate its effectiveness on the Siemens TCAS variants. Its performance is in line with recent alternative model-based fault localisation techniques, but narrows the location set further without rejecting any genuine repair locations where faults can be fixed by changing a single assignment. We also show how our tool can be used in an educational context to improve self-guided learning and accelerate assessment. We apply our algorithm to a large selection of actual student coursework submissions, provid-
Automated debugging; Model-based fault localisation; Symbolic execution; Automated assessment
-
University of Southampton, Southampton SO17 1BJ, UK
ing precise localisation within a sub-second response time.
We show this using small test suites, already provided in
the coursework management system, and on expanded test
suites, demonstrating the scalability of our approach. We also
show compliance with test suites does not reliably grade a
class of “almost-correct” submissions, which our tool
highlights, as being close to the correct answer. Finally, we show
an extension to our tool that extends our fast localisation
results to a selection of student submissions that contain two
faults.
1 Introduction
Fault localisation, i.e. the identification of program locations
that can cause erroneous state transitions that eventually lead
to observed program failures, is a critical component of the
debugging cycle. Since it puts a significant time [
47,50
]
and expertise burden [
1,66
] on programmers, a variety of
different automated fault localisation methods have been
proposed [
12,14,23,25,26,34,55,56,58
]. We describe a fast
model-based fault localisation algorithm that, given a test
suite, uses symbolic execution methods to fully
automatically identify a small subset of program locations within
which (under a single-fault assumption) a genuine program
repair exists. Our main contribution is an improved search
through the test suite that drastically reduces the effort for
the symbolic execution of the models.
Model-based fault localisation [
54
] (sometimes also
called model-based debugging [
15
]) is the application of
modelbased diagnosis methods [
18
] to programs. It involves three
main steps: (i) the construction of a logical model from the
original program; (ii) the symbolic analysis of this model;
and (iii) mapping any faults found in the model back to
program locations. One popular approach to model-based fault
localisation is to transform the program so that a symbolic
program verification tool can be used for all three steps.
For example, Griesmayer et al. describe a method [
23
]
in which the model (in the form of a logical satisfiability
problem) is derived by running the CBMC model checker
over a transformed program and then analysed by means
of the model checker’s integrated SAT solver. The
transformation “inverts” the program’s specification (cf. Sect. 2,
producing failures where the original program would
complete and blocking paths where the original program would
fail), and replaces each original assignment by a conditional
assignment with either the original value or an unconstrained
symbolic value, depending on the value of a toggle variable.
The actual localisation can then be reduced to extracting the
possible values of the toggle variable from the satisfying
assignments that the SAT solver returns.
However, the technique initially described by Griesmayer
et al. requires detailed specifications to achieve acceptable
precision—the weaker the specification, the more program
locations are flagged as potential faults. Such detailed
specifications rarely exist in practice. What do commonly exist,
though, are extensive unit test suites, in particular in the
(...truncated)