Understanding with Toy Surrogate Models in Machine Learning
Minds and Machines
(2024) 34:45
https://doi.org/10.1007/s11023-024-09700-1
Understanding with Toy Surrogate Models in Machine
Learning
Andrés Páez1
Received: 8 June 2023 / Accepted: 6 October 2024
© The Author(s) 2024
Abstract
In the natural and social sciences, it is common to use toy models—extremely
simple and highly idealized representations—to understand complex phenomena.
Some of the simple surrogate models used to understand opaque machine learning
(ML) models, such as rule lists and sparse decision trees, bear some resemblance
to scientific toy models. They allow non-experts to understand how an opaque ML
model works globally via a much simpler model that highlights the most relevant
features of the input space and their effect on the output. The obvious difference
is that the common target of a toy and a full-scale model in the sciences is some
phenomenon in the world, while the target of a surrogate model is another model.
This essential difference makes toy surrogate models (TSMs) a new object of study
for theories of understanding, one that is not easily accommodated under current
analyses. This paper provides an account of what it means to understand an opaque
ML model globally with the aid of such simple models.
Keywords Toy models · Surrogate models · Machine learning · Understanding ·
Idealization
1 Introduction
In the natural and social sciences, it is common to use extremely simple and highly
idealized models to understand complex phenomena. Unlike regular models, these
very simple models—often referred to as toy models—are not required to be linked
to the real world through structural similarity or resemblance relations. They are not
Andrés Páez
1
Department of Philosophy and Center for Research and Formation in Artificial Intelligence
(CinfonIA), Universidad de los Andes, Carrera 1 No. 18A-12 (G-533), Bogotá, DC
111711, Colombia
13
45
Page 2 of 26
A. Páez
meant to be approximations of the target world system, and in some cases, they are
not even required to be representational. In semantic terms, they do not accurately
map onto their targets. Despite these limitations, they are still useful in understanding
theoretical concepts and possible configurations of the target system. Paradigmatic
examples of toy models include Boyle’s law and the Ising model in physics, the
Lotka–Volterra model in population ecology, and the Schelling model in the social
sciences (Weisberg, 2013).
In recent years, philosophers of science have become interested in toy models
(Grüne-Yanoff, 2009; Luczak, 2017; Reutlinger et al., 2018; Frigg & Nguyen, 2017;
Nguyen, 2020). The main purpose of this literature is to explore the nature of these
models and examine how they perform their epistemic function. Despite lacking the
regular descriptive and predictive features of full-scale scientific models, they often
offer an elementary understanding of a phenomenon. Their definitions of “toy model”
differ as well as their assessment of the importance of representation in modelling
generally, but they all agree that toy models play an important epistemic role in scientific research, exploration, and pedagogy.
Prima facie, some of the proxy, interpretative, approximate, or surrogate models1
used in explainable AI (XAI) to make sense of black box machine learning (ML)
systems play an analogous role to toy models in the sciences.2 In both cases, the
models fulfill what Frigg and Nguyen (2020, p. 3), following Swoyer (1991), call
the surrogative reasoning condition for representation: models represent in a way
that allows scientists or users to make inferences about the models’ target systems;
they can generate claims about target systems by investigating models that represent
them. Although many surrogate models used by developers in ML are black boxes,3
the simplest of them—e.g., rule lists and sparse decision trees—allow non-experts to
understand how an opaque ML model works globally via a much simpler model that
highlights the most relevant features of the input space and their effect on the output. Toy surrogate models (TSMs), as I will call them, only work when the system’s
features can be interpreted semantically, that is, when they represent recognizable
elements of the user’s environment. It is well-known that many ML systems use noninterpretable features that would impede the extraction of a TSM. The examples used
in this paper therefore assume that the features are human-interpretable. The ultimate
goal of TSMs is to provide the end users of an AI system with valuable understanding that will result in informed decisions and/or actionable changes. TSMs can be a
valuable instrument to comply, for example, with Article 13 of the GDPR (Regulation EU 2016/679) which requires the data controller to provide the data subject with
“meaningful information about the logic involved” whenever automated decisionmaking tools are used.
1
I will refer to these models as “surrogate models,” but some papers use the other terms to refer to models
that perform the same epistemic function.
2
In this paper, I will assume that the reader is familiar with the problem of opacity in machine learning and with the literature on interpretability and XAI. For an introduction to the topic and some of the
controversies involved, see Beisbart and Räz (2022), Humphreys (forthcoming), Krishnan (2020), and
Lipton (2018).
3
For example, Xu et al. (2018) build a surrogate model by compressing an existing DNN model to a shallow DNN model, but the latter is still a black box.
13
Understanding with Toy Surrogate Models in Machine Learning
Page 3 of 26
45
Despite having similar epistemic roles, the relation between TSMs and opaque
ML models is different than the relation between their counterparts in the sciences.
Toy models and complex models in the sciences share a common target: some social
or physical phenomenon that can be understood either in highly idealized and simple terms through the toy model, or in a more complex and detailed fashion—often
involving causation and lawlike generalizations—via the main model. In contrast,
the most common use of discriminative ML models is to perform a prediction or
classification task that is not necessarily causally grounded in the world or reflective
of lawlike relations between inputs and outputs. In other words, most ML models
do not have the same representational and epistemic function as the models used in
the natural and social sciences. They do not aim at uncovering complex real-world
causal or lawlike structures that are responsible for the properties of a phenomenon,
but rather to detect useful correlations that optimize the predictive or classificatory
task at hand.4 Toy surrogate models in ML, in turn, focus on the statistical correlations in the main model, which they aim to approximate and present in simpler and
understandable terms. They are models of models, i.e., metamodels (Alaa & van der
Schaar, 2019).
As I will be discussing models of (...truncated)