A Decision-Theoretic Approach to Model Choice
Annals of Data Science
https://doi.org/10.1007/s40745-025-00589-w
ORIGINAL ARTICLE
A Decision-Theoretic Approach to Model Choice
Markku Karhunen1
Received: 21 January 2024 / Revised: 2 January 2025 / Accepted: 17 January 2025
© The Author(s) 2025
Abstract
Model choice algorithms are usually compared based on their accuracy, i.e. ability
to find true models. However, conservative algorithms (such as BIC minimisation)
are accurate when no true effects exist, while more liberal algorithms (such as Lasso)
are accurate when there are plenty of true effects. There is ambiguity, then, regarding the correct algorithm. The purpose of this paper is to show how expected utility
maximisation and Monte Carlo simulations can be used to compare model choice
algorithms. Two loss functions are derived from the expected utility function of the
researcher. Both loss functions turn out to be linear combinations of specificity and
one or two kinds of sensitivity which are discussed in this paper. Subsequently, this
paper experiments with four parametrisations of these loss functions, and then uses
these parametrised versions to compare nine algorithms within the contexts of both
logistic and Gaussian regression. The results demonstrate that researchers who avoid
false positives should either use BIC or BICc for model choice or report nothing at
all. AIC does not seem to be the optimal method for the range of parameters covered
in this study.
Keywords Model choice · Gaussian · Logistic · Expected utility · Loss function
1 Introduction
Model choice can have a number of meanings in statistics. For example, model choice
could mean a choice between multiplicative and additive models. However, the term
usually refers to the choice of covariates within a regression model. The information
criteria established by Akaike [1] and Schwarz [2] are the classic tools in this domain.
Another prominent method is the Lasso [3], which includes all covariates within the
B
1
Markku Karhunen
Built Environment Solutions Unit, Finnish Environment Institute (Syke), Jyväskylän Toimipaikka,
Survontie 9A, 40500 Jyväskylä, Finland
123
Annals of Data Science
model and attempts to force some coefficients to zero by using a penalty function,
thereby producing a parsimonious model.
Both AIC and BIC have been specifically adapted for small sample sizes [4, 5],
yielding formulas that asymptotically converge towards the original versions of these
statistics. On the other hand, the basic principle of the Lasso has been applied to various
other problems, such as precision matrix estimation [6], multi-response regression
[7], and multilevel medical data [8]. However, the focus of this paper is the simple
regression problem: how to choose the right covariates for a scalar response variable.
Karhunen recently compared the performance of nine model choice methods within
the context of logistic regression [9]. This comparison was based on a loss function
that was defined as a linear combination of sensitivity and specificity. While intuitively
appealing, this loss function was not justified by any derivation or proof. Here, however, this loss function and its generalisation are derived from a utility-maximisation
problem. This method is also applied to linear model choice tasks.
The theoretical framework of this paper is the expected utility maximisation, also
known as Von Neumann-Morgenstern utility [10]. To summarise, the idea is that a
rational agent should account for all possible world states and weigh them according
to their probability. Utility maximisation is a widely accepted paradigm in microeconomics and decision theory, and the term ‘expected utility’ is used when there is
uncertainty regarding the potential outcome of any or all actions. Expected utility maximisation has been applied to problems as diverse as traffic behaviour [11], strategic
management [12], oncology [13], and strategic deterrence [14].
Perhaps the closest point of comparison to the present study is a framework where
expected utility maximisation was used to determine the optimal threshold of a clinical
test [15]. With this kind of application, there is a trade-off between sensitivity and specificity. In clinical testing, sensitivity means the probability to correctly label affected
individuals, while specificity means the probability to correctly label the unaffected
individuals. In model choice, sensitivity means the power to detect true covariates,
and specificity means the power to avoid false covariates in the model equation. There
is a trade-off between sensitivity and specificity in this domain as well [9]. Intuitively,
the choice of method depends on whether the researchers require results (thus preferring high sensitivity), or whether they want to avoid false results (thus preferring
high specificity). The innovation of this particular paper is to formalise this trade-off
in terms of expeced utility maximisation.
In the next section, the two loss functions are derived, followed by the introduction
of two practical applications for logistic and Gaussian regression models. The results
are presented in Sect. 3, followed by analysis and drawing conclusions in Sect. 4.
2 Material and Methods
Let us assume that there is a true effect in the data with probability π , and let us assume
that it can be detected with probability p. Let us also assume that noise effects are
incorrectly included in the model with probability q.
123
Annals of Data Science
One may calculate the probability to detect the true effect and nothing but the true
effect, i.e. sensitivity-1 (Sens1), the probability to detect the true effect, i.e. sensitivity2 (Sens2), and the probability to avoid false covariates, i.e. specificity (Spec) [9]. From
these definitions, it follows that these quantities are:
Sens1 p(1 − q),
(1)
Sens2 p,
(2)
Spec 1 − q.
(3)
For any data-generating process and model choice algorithm, these quantities may
be estimated from Monte Carlo simulations, but the researcher needs a utility function
to rank the different algorithms. Below, two different utility functions are introduced,
yielding loss functions 1 and 2 which are linear functions of Sens1, Sens2 and Spec.
2.1 Loss Function 1
Let us assume that the payoff for the researchers does not depend on the noise covariates
if they detect a true effect, but that they try to avoid reporting anything if no true
covariate exists. The expected utility of this type of researcher is given by
U π pu 1 + π (1 − p)u 2 + (1 − π )qu 3 + (1 − π )(1 − q)u 4
(4)
where u 1 is the payoff in the case that they correctly detect a true effect, u 2 is the
payoff if they fail to detect a true effect, u 3 is the payoff if they detect a false effect,
and u 4 is the payoff if they do not detect anything and there is no effect in the data. It
can be assumed that u 1 > u 2 and u 4 > u 3 .
From Eq. (4), it follows that
U π pu 1 − π pu 2 + (1 − π )qu 3 − (1 − π )qu 4 + π u 2 + 1 − π .
(5)
Above, only p (...truncated)