Affine-Invariant Ensemble Transform Methods for Logistic Regression (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-022-09550-2.pdf

Affine-Invariant Ensemble Transform Methods for Logistic Regression

Foundations of Computational Mathematics https://doi.org/10.1007/s10208-022-09550-2 Affine-Invariant Ensemble Transform Methods for Logistic Regression Jakiw Pidstrigach1 · Sebastian Reich1 Received: 16 May 2021 / Accepted: 13 September 2021 © The Author(s) 2022 Abstract We investigate the application of ensemble transform approaches to Bayesian inference of logistic regression problems. Our approach relies on appropriate extensions of the popular ensemble Kalman filter and the feedback particle filter to the cross entropy loss function and is based on a well-established homotopy approach to Bayesian inference. The arising finite particle evolution equations as well as their mean-field limits are affine-invariant. Furthermore, the proposed methods can be implemented in a gradient-free manner in case of nonlinear logistic regression and the data can be randomly subsampled similar to mini-batching of stochastic gradient descent. We also propose a closely related SDE-based sampling method which again is affineinvariant and can easily be made gradient-free. Numerical examples demonstrate the appropriateness of the proposed methodologies. Keywords Logistic regression · Bayesian inference · Interacting particle systems · Affine invariance · Ensemble Kalman filter · Langevin dynamics Mathematics Subject Classification 62J02 · 65C05 · 62F15 Communicated by Teresa Krick and Hans Munthe-Kaas. Invited paper based on the FoCM 2021 Online Seminar lecture Statistical inverse problems and affine-invariant gradient flow structures in the space of probability measures presented by Sebastian Reich in June 2021. SR has been partially funded by Deutsche Forschungsgemeinschaft (DFG) - Project-ID 318763901 SFB1294.. B Sebastian Reich Jakiw Pidstrigach 1 Institut für Mathematik, Universität Potsdam, Karl-Liebknecht-Str. 24/25, 14476 Potsdam, Germany 123 Foundations of Computational Mathematics 1 Introduction Statistical inference for logistic regression and classification problems is a well studied problem from both parameter optimisation and Bayesian inference perspectives [5]. While many classical optimisation and Markov chain Monte Carlo (MCMC) methods are applicable, an efficient inference in the context of semi-parametric models still poses computational challenges. With this paper, we address this challenge by investigating the extension of coupling-of-measures and ensemble transform methodologies [29] from the squared error loss function to the cross entropy loss function typically used for logistic regression and classification. More specifically, previous work on ensemble methods for Bayesian inference, such as the ensemble Kalman filter (EnKF) [8] and the feedback particle filter (FPF) [35,38,39], have almost exclusively focused on the squared error loss function of the form 1 (1) Ψdata (θ ) = (g(θ ) − t)T Γ −1 (g(θ ) − t), 2 with g : R D → R N a forward map, t ∈ R N the data, Γ ∈ R N ×N the measurement error covariance matrix, and θ ∈ R D the parameters to be estimated. Notable exceptions include the application of ensemble Kalman inversion (EKI) [17] and the modified EnKF formulation of [13] to the training of neural networks with a cross entropy loss function. While these methods seek to minimise a regularised cross entropy loss function, the more recent work [14] has also investigated ensemble-based sampling methods for Bayesian inference in the context of logistic regression and classification. This work relies on the time-stepping of appropriate stochastic differential equations (SDEs) and is in line with several other ensemble-based sampling methods such as the ensemble preconditioned MCMC methods of [19], the ensemble Kalman sampler (EKS) [9], and affine invariant Langevin dynamics (ALDI) [10]. Contrary to such invariance-of-measures based SDE methodologies, the approach taken in this paper is instead founded on the homotopy approach to Bayesian inference, as first formulated in a time-continuous framework in [6,26], and which is close in spirit to the iterative application of the EnKF [31] and parameter estimation methods based on sequential Monte Carlo (SMC) methods [4]. In addition to expanding the homotopy-based approach to logistic regression, we address two further important concepts, namely affine-invariance of the proposed methods [10,11,19] and sub-sampling of data points, as widely used in stochastic gradient descent [20]. Both concepts are used to improve the computational efficiency of optimisation and sampling methods. Take, for example, a two dimensional Gaussian random variable with mean zero and covariance matrix Σ= 2 0 0 . 1 A random walk Metropolis–Hastings algorithm will sample inefficiently whenever 2 is vastly different from one. An affine-invariant modification, on the other hand, will sample this problem as efficiently as if were set to = 1 [11]. Sub-sampling 123 Foundations of Computational Mathematics replaces the exact gradient of a cost functional by a cheaper to evaluate stochastic approximation, which agrees with the exact gradient in expectation. We also discuss the possibility for derivative-free implementations [8–10,17], localisation [8,29] via dropouts [32], and efficient linearly implicit time-stepping methods [2]. Our main contributions with regard to Bayesian homotopy methods for logistic regression are: – an extension of affine-invariant ensemble transform approaches, such as the EnKF, to logistic regression, – an affine-invariant generalisation of the FPF and its application to logistic regression, – an extension of data sub-sampling (mini-batches) to Bayesian homotopy methods. In a further step, we combine these homotopy approaches with SDE-based sampling methods in order to derive – a derivative-free and affine-invariant SDE-based sampling methods for logistic regression. We demonstrate the appropriateness of the proposed methods by means of a set of numerical experiments. Extensions to nonlinear and multi-class logistic regression [5] are also discussed. We also briefly discuss the application to sigmoidal Cox processes [1]. The layout of this paper is as follows. The required mathematical background material on both logistic regression is collected in Sect. 2. The homotopy approach to Bayesian inference is summarised in Sect. 3. There we also present an affine-invariant formulation of the homotopy approach and discuss and analyse data sub-sampling in the spirit of [20]. Section 4 develops three different algorithmic approaches for the implementation of homotopy-based Bayesian inference for logistic regression. More specifically, we propose an affine-invariant modification of the FPF and two extensions of the EnKF to logistic regression. We also discuss robust numerical implementations combining dropouts [32] with localisation [8,29] and linearly implicit time-stepping methods [2]. Section 5 combines SDE-based sampling methods with a homotopybased drift term in order to derive a gradient-free and aff (...truncated)