Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models
Foundations of Computational Mathematics (2022) 22:1513–1566
https://doi.org/10.1007/s10208-021-09531-x
Optimal Combination of Linear and Spectral Estimators for
Generalized Linear Models
Marco Mondelli1 · Christos Thrampoulidis2 · Ramji Venkataramanan3
Received: 8 August 2020 / Revised: 25 June 2021 / Accepted: 25 July 2021 /
Published online: 17 August 2021
© The Author(s) 2021
Abstract
We study the problem of recovering an unknown signal x given measurements obtained
from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator x̂ L and a spectral estimator x̂ s . The former is
a data-dependent linear combination of the columns of the measurement matrix, and
its analysis is quite simple. The latter is the principal eigenvector of a data-dependent
matrix, and a recent line of work has studied its performance. In this paper, we show
how to optimally combine x̂ L and x̂ s . At the heart of our analysis is the exact characterization of the empirical joint distribution of (x, x̂ L , x̂ s ) in the high-dimensional limit.
This allows us to compute the Bayes-optimal combination of x̂ L and x̂ s , given the
limiting distribution of the signal x. When the distribution of the signal is Gaussian,
then the Bayes-optimal combination has the form θ x̂ L + x̂ s and we derive the optimal
combination coefficient. In order to establish the limiting distribution of (x, x̂ L , x̂ s ),
we design and analyze an approximate message passing algorithm whose iterates
give x̂ L and approach x̂ s . Numerical simulations demonstrate the improvement of the
proposed combination with respect to the two methods considered separately.
Keywords Linear estimator · Spectral estimator · Generalized linear models · Bayes
optimality · Approximate message passing · Weak recovery
Mathematics Subject Classification 68Q32 · 68T05 · 62B10 · 62J12
Communicated by Afonso Bandeira.
M. Mondelli was partially supported by the 2019 Lopez-Loreta Prize. C. Thrampoulidis was partially
supported by an NSF award CIF-2009030 and by an NSERC Discovery Grant. R. Venkataramanan was
partially supported by the Alan Turing Institute under the EPSRC grant EP/N510129/1.
Extended author information available on the last page of the article
123
1514
Foundations of Computational Mathematics (2022) 22:1513–1566
1 Introduction
In a generalized linear model (GLM) [36,39], we want to recover a d-dimensional
signal x ∈ Rd given n i.i.d. measurements y = (y1 , . . . , yn ) of the form:
yi ∼ p(y | x, ai ),
i ∈ {1, . . . , n},
(1)
where ·, · denotes the scalar product, {ai }1≤i≤n are known sensing vectors, and the
(stochastic) output function p(· | x, ai ) is a known probability distribution. GLMs
arise in several problems in statistical inference and signal processing. Examples
include photon-limited imaging [53,58], compressed sensing [19], signal recovery
from quantized measurements [7,46], phase retrieval [21,49], and neural networks
with one hidden layer [30].
The problem of estimating x from y is, in general, non-convex, and semi-definite
programming relaxations have been proposed [9,11,52,56]. However, the computational complexity and memory requirement of these approaches quickly grow with
the dimension d. For this reason, several non-convex approaches have been developed,
e.g., alternating minimization [40], approximate message passing (AMP) [15,44,48],
Wirtinger Flow [10], Kaczmarz methods [57], and iterative convex-programming
relaxations [1,7,14,25]. The Bayes-optimal estimation and generalization error have
also been studied in [3]. When the output function p(· | x, ai ) is unknown, (1) is
called the single-index model in the statistics literature, see e.g., [8,28,33]. The problem of recovering a structured signal (e.g., sparse, low-rank) from high-dimensional
single-index measurements has been an active research topic over the past few years
[22–24,41–43,51,52,59].
Throughout this paper, the performance of an estimator x̂ will be measured by its
normalized correlation (or “overlap") with x:
x, x̂
x2 x̂2
,
(2)
where · 2 denotes the Euclidean norm of a vector.
Most of the existing techniques require an initial estimate of the signal, which can
then be refined via a local algorithm. Here, we focus on two popular methods: a linear
estimator and a spectral estimator. The linear estimator x̂ L has the form:
n
1
T L (yi )ai ,
n
(3)
i=1
where T L denotes a given preprocessing function. The performance analysis of this
estimator is quite simple, see e.g., Proposition 1 in [43] or Sect. 2.3 of this paper. The
spectral estimator consists in the principal eigenvector x̂ s of a matrix of the form:
n
1
Ts (yi )ai aiT ,
n
i=1
123
(4)
Foundations of Computational Mathematics (2022) 22:1513–1566
1515
where Ts is another preprocessing function. The idea of a spectral method first appeared
in [32] and, for the special case of phase retrieval, a series of works has provided more
and more refined performance bounds [11,12,40]. Recently, an exact high-dimensional
analysis of the spectral method for generalized linear models with Gaussian sensing
vectors has been carried out in [34,37]. These works consider a regime where both n
and d grow large at a fixed proportional rate δ = n/d > 0. The choice of Ts which
minimizes the value of δ (and, consequently, the amount of data) necessary to achieve
a strictly positive scalar product (2) was obtained in [37]. Furthermore, the choice of Ts
which maximizes the correlation between x and x̂ s for any given value of the sampling
ratio δ was obtained in [35]. The case in which the sensing vectors are obtained by
picking columns from a Haar matrix is tackled in [16].
In short, the performance of the linear estimate x̂ L and the spectral estimate x̂ s is well
understood, and there is no clear winner between the two. In fact, the superiority of one
method over the other depends on the output function p(· | x, ai ) and on the sampling
ratio δ. For example, for phase retrieval (yi = |x, ai |), the spectral estimate provides
positive correlation with the ground-truth signal as long as δ > 1/2 [37], while linear
estimators of the form (3) are not effective for any δ > 0. On the contrary, for 1bit compressed sensing (yi = sign(x, ai )) the situation is the opposite: the spectral
estimator is uncorrelated with the signal for any δ > 0, while the linear estimate works
well. For many cases of practical interest, e.g., neural networks with ReLU activation
function (yi = max(x, ai , 0)), both the linear and the spectral method give estimator
with non-zero correlation. Thus, a natural question is the following:
What is the optimal way to combine the linear estimator x̂ L and the spectral
estimator x̂ s ?
This paper closes the gap and answers the question above for Gaussian sensing vectors {ai }1≤i≤n . Our main technical contribution is to provide an exact high-dimensional
characterization of the joint empirical di (...truncated)