Affine-Invariant Ensemble Transform Methods for Logistic Regression
Foundations of Computational Mathematics
https://doi.org/10.1007/s10208-022-09550-2
Affine-Invariant Ensemble Transform Methods for Logistic
Regression
Jakiw Pidstrigach1 · Sebastian Reich1
Received: 16 May 2021 / Accepted: 13 September 2021
© The Author(s) 2022
Abstract
We investigate the application of ensemble transform approaches to Bayesian inference
of logistic regression problems. Our approach relies on appropriate extensions of the
popular ensemble Kalman filter and the feedback particle filter to the cross entropy
loss function and is based on a well-established homotopy approach to Bayesian
inference. The arising finite particle evolution equations as well as their mean-field
limits are affine-invariant. Furthermore, the proposed methods can be implemented
in a gradient-free manner in case of nonlinear logistic regression and the data can
be randomly subsampled similar to mini-batching of stochastic gradient descent. We
also propose a closely related SDE-based sampling method which again is affineinvariant and can easily be made gradient-free. Numerical examples demonstrate the
appropriateness of the proposed methodologies.
Keywords Logistic regression · Bayesian inference · Interacting particle systems ·
Affine invariance · Ensemble Kalman filter · Langevin dynamics
Mathematics Subject Classification 62J02 · 65C05 · 62F15
Communicated by Teresa Krick and Hans Munthe-Kaas.
Invited paper based on the FoCM 2021 Online Seminar lecture Statistical inverse problems and
affine-invariant gradient flow structures in the space of probability measures presented by Sebastian
Reich in June 2021.
SR has been partially funded by Deutsche Forschungsgemeinschaft (DFG) - Project-ID 318763901 SFB1294..
B Sebastian Reich
Jakiw Pidstrigach
1
Institut für Mathematik, Universität Potsdam, Karl-Liebknecht-Str. 24/25, 14476 Potsdam, Germany
123
Foundations of Computational Mathematics
1 Introduction
Statistical inference for logistic regression and classification problems is a well studied
problem from both parameter optimisation and Bayesian inference perspectives [5].
While many classical optimisation and Markov chain Monte Carlo (MCMC) methods
are applicable, an efficient inference in the context of semi-parametric models still
poses computational challenges. With this paper, we address this challenge by investigating the extension of coupling-of-measures and ensemble transform methodologies
[29] from the squared error loss function to the cross entropy loss function typically
used for logistic regression and classification.
More specifically, previous work on ensemble methods for Bayesian inference,
such as the ensemble Kalman filter (EnKF) [8] and the feedback particle filter (FPF)
[35,38,39], have almost exclusively focused on the squared error loss function of the
form
1
(1)
Ψdata (θ ) = (g(θ ) − t)T Γ −1 (g(θ ) − t),
2
with g : R D → R N a forward map, t ∈ R N the data, Γ ∈ R N ×N the measurement error covariance matrix, and θ ∈ R D the parameters to be estimated. Notable
exceptions include the application of ensemble Kalman inversion (EKI) [17] and the
modified EnKF formulation of [13] to the training of neural networks with a cross
entropy loss function. While these methods seek to minimise a regularised cross
entropy loss function, the more recent work [14] has also investigated ensemble-based
sampling methods for Bayesian inference in the context of logistic regression and classification. This work relies on the time-stepping of appropriate stochastic differential
equations (SDEs) and is in line with several other ensemble-based sampling methods
such as the ensemble preconditioned MCMC methods of [19], the ensemble Kalman
sampler (EKS) [9], and affine invariant Langevin dynamics (ALDI) [10]. Contrary
to such invariance-of-measures based SDE methodologies, the approach taken in this
paper is instead founded on the homotopy approach to Bayesian inference, as first
formulated in a time-continuous framework in [6,26], and which is close in spirit to
the iterative application of the EnKF [31] and parameter estimation methods based on
sequential Monte Carlo (SMC) methods [4].
In addition to expanding the homotopy-based approach to logistic regression, we
address two further important concepts, namely affine-invariance of the proposed
methods [10,11,19] and sub-sampling of data points, as widely used in stochastic
gradient descent [20]. Both concepts are used to improve the computational efficiency
of optimisation and sampling methods. Take, for example, a two dimensional Gaussian
random variable with mean zero and covariance matrix
Σ=
2
0
0
.
1
A random walk Metropolis–Hastings algorithm will sample inefficiently whenever
2 is vastly different from one. An affine-invariant modification, on the other hand,
will sample this problem as efficiently as if were set to = 1 [11]. Sub-sampling
123
Foundations of Computational Mathematics
replaces the exact gradient of a cost functional by a cheaper to evaluate stochastic
approximation, which agrees with the exact gradient in expectation.
We also discuss the possibility for derivative-free implementations [8–10,17], localisation [8,29] via dropouts [32], and efficient linearly implicit time-stepping methods
[2].
Our main contributions with regard to Bayesian homotopy methods for logistic
regression are:
– an extension of affine-invariant ensemble transform approaches, such as the EnKF,
to logistic regression,
– an affine-invariant generalisation of the FPF and its application to logistic regression,
– an extension of data sub-sampling (mini-batches) to Bayesian homotopy methods.
In a further step, we combine these homotopy approaches with SDE-based sampling
methods in order to derive
– a derivative-free and affine-invariant SDE-based sampling methods for logistic
regression.
We demonstrate the appropriateness of the proposed methods by means of a set of
numerical experiments. Extensions to nonlinear and multi-class logistic regression [5]
are also discussed. We also briefly discuss the application to sigmoidal Cox processes
[1].
The layout of this paper is as follows. The required mathematical background
material on both logistic regression is collected in Sect. 2. The homotopy approach to
Bayesian inference is summarised in Sect. 3. There we also present an affine-invariant
formulation of the homotopy approach and discuss and analyse data sub-sampling in
the spirit of [20]. Section 4 develops three different algorithmic approaches for the
implementation of homotopy-based Bayesian inference for logistic regression. More
specifically, we propose an affine-invariant modification of the FPF and two extensions
of the EnKF to logistic regression. We also discuss robust numerical implementations
combining dropouts [32] with localisation [8,29] and linearly implicit time-stepping
methods [2]. Section 5 combines SDE-based sampling methods with a homotopybased drift term in order to derive a gradient-free and aff (...truncated)