Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-021-09531-x.pdf

Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

Foundations of Computational Mathematics (2022) 22:1513–1566 https://doi.org/10.1007/s10208-021-09531-x Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models Marco Mondelli1 · Christos Thrampoulidis2 · Ramji Venkataramanan3 Received: 8 August 2020 / Revised: 25 June 2021 / Accepted: 25 July 2021 / Published online: 17 August 2021 © The Author(s) 2021 Abstract We study the problem of recovering an unknown signal x given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator x̂ L and a spectral estimator x̂ s . The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine x̂ L and x̂ s . At the heart of our analysis is the exact characterization of the empirical joint distribution of (x, x̂ L , x̂ s ) in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of x̂ L and x̂ s , given the limiting distribution of the signal x. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form θ x̂ L + x̂ s and we derive the optimal combination coefficient. In order to establish the limiting distribution of (x, x̂ L , x̂ s ), we design and analyze an approximate message passing algorithm whose iterates give x̂ L and approach x̂ s . Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately. Keywords Linear estimator · Spectral estimator · Generalized linear models · Bayes optimality · Approximate message passing · Weak recovery Mathematics Subject Classification 68Q32 · 68T05 · 62B10 · 62J12 Communicated by Afonso Bandeira. M. Mondelli was partially supported by the 2019 Lopez-Loreta Prize. C. Thrampoulidis was partially supported by an NSF award CIF-2009030 and by an NSERC Discovery Grant. R. Venkataramanan was partially supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1. Extended author information available on the last page of the article 123 1514 Foundations of Computational Mathematics (2022) 22:1513–1566 1 Introduction In a generalized linear model (GLM) [36,39], we want to recover a d-dimensional signal x ∈ Rd given n i.i.d. measurements y = (y1 , . . . , yn ) of the form: yi ∼ p(y | x, ai ), i ∈ {1, . . . , n}, (1) where ·, · denotes the scalar product, {ai }1≤i≤n are known sensing vectors, and the (stochastic) output function p(· | x, ai ) is a known probability distribution. GLMs arise in several problems in statistical inference and signal processing. Examples include photon-limited imaging [53,58], compressed sensing [19], signal recovery from quantized measurements [7,46], phase retrieval [21,49], and neural networks with one hidden layer [30]. The problem of estimating x from y is, in general, non-convex, and semi-definite programming relaxations have been proposed [9,11,52,56]. However, the computational complexity and memory requirement of these approaches quickly grow with the dimension d. For this reason, several non-convex approaches have been developed, e.g., alternating minimization [40], approximate message passing (AMP) [15,44,48], Wirtinger Flow [10], Kaczmarz methods [57], and iterative convex-programming relaxations [1,7,14,25]. The Bayes-optimal estimation and generalization error have also been studied in [3]. When the output function p(· | x, ai ) is unknown, (1) is called the single-index model in the statistics literature, see e.g., [8,28,33]. The problem of recovering a structured signal (e.g., sparse, low-rank) from high-dimensional single-index measurements has been an active research topic over the past few years [22–24,41–43,51,52,59]. Throughout this paper, the performance of an estimator x̂ will be measured by its normalized correlation (or “overlap") with x: x, x̂ x2 x̂2 , (2) where · 2 denotes the Euclidean norm of a vector. Most of the existing techniques require an initial estimate of the signal, which can then be refined via a local algorithm. Here, we focus on two popular methods: a linear estimator and a spectral estimator. The linear estimator x̂ L has the form: n 1 T L (yi )ai , n (3) i=1 where T L denotes a given preprocessing function. The performance analysis of this estimator is quite simple, see e.g., Proposition 1 in [43] or Sect. 2.3 of this paper. The spectral estimator consists in the principal eigenvector x̂ s of a matrix of the form: n 1 Ts (yi )ai aiT , n i=1 123 (4) Foundations of Computational Mathematics (2022) 22:1513–1566 1515 where Ts is another preprocessing function. The idea of a spectral method first appeared in [32] and, for the special case of phase retrieval, a series of works has provided more and more refined performance bounds [11,12,40]. Recently, an exact high-dimensional analysis of the spectral method for generalized linear models with Gaussian sensing vectors has been carried out in [34,37]. These works consider a regime where both n and d grow large at a fixed proportional rate δ = n/d > 0. The choice of Ts which minimizes the value of δ (and, consequently, the amount of data) necessary to achieve a strictly positive scalar product (2) was obtained in [37]. Furthermore, the choice of Ts which maximizes the correlation between x and x̂ s for any given value of the sampling ratio δ was obtained in [35]. The case in which the sensing vectors are obtained by picking columns from a Haar matrix is tackled in [16]. In short, the performance of the linear estimate x̂ L and the spectral estimate x̂ s is well understood, and there is no clear winner between the two. In fact, the superiority of one method over the other depends on the output function p(· | x, ai ) and on the sampling ratio δ. For example, for phase retrieval (yi = |x, ai |), the spectral estimate provides positive correlation with the ground-truth signal as long as δ > 1/2 [37], while linear estimators of the form (3) are not effective for any δ > 0. On the contrary, for 1bit compressed sensing (yi = sign(x, ai )) the situation is the opposite: the spectral estimator is uncorrelated with the signal for any δ > 0, while the linear estimate works well. For many cases of practical interest, e.g., neural networks with ReLU activation function (yi = max(x, ai , 0)), both the linear and the spectral method give estimator with non-zero correlation. Thus, a natural question is the following: What is the optimal way to combine the linear estimator x̂ L and the spectral estimator x̂ s ? This paper closes the gap and answers the question above for Gaussian sensing vectors {ai }1≤i≤n . Our main technical contribution is to provide an exact high-dimensional characterization of the joint empirical di (...truncated)