Newton-Type Optimal Thresholding Algorithms for Sparse Optimization Problems
Journal of the Operations Research Society of China
https://doi.org/10.1007/s40305-021-00370-9
Newton-Type Optimal Thresholding Algorithms for Sparse
Optimization Problems
Nan Meng1 · Yun-Bin Zhao2
Received: 6 April 2021 / Revised: 8 September 2021 / Accepted: 9 September 2021
© The Author(s) 2022
Abstract
Sparse signals can be possibly reconstructed by an algorithm which merges a traditional nonlinear optimization method and a certain thresholding technique. Different
from existing thresholding methods, a novel thresholding technique referred to as the
optimal k-thresholding was recently proposed by Zhao (SIAM J Optim 30(1):31–55,
2020). This technique simultaneously performs the minimization of an error metric for
the problem and thresholding of the iterates generated by the classic gradient method.
In this paper, we propose the so-called Newton-type optimal k-thresholding (NTOT)
algorithm which is motivated by the appreciable performance of both Newton-type
methods and the optimal k-thresholding technique for signal recovery. The guaranteed
performance (including convergence) of the proposed algorithms is shown in terms
of suitable choices of the algorithmic parameters and the restricted isometry property
(RIP) of the sensing matrix which has been widely used in the analysis of compressive
sensing algorithms. The simulation results based on synthetic signals indicate that the
proposed algorithms are stable and efficient for signal recovery.
Keywords Compressed sensing · Sparse optimization · Newton-type methods ·
Optimal k-thresholding · Restricted isometry property
Mathematics Subject Classification 90C30 · 90C25 · 65F10 · 94A12 · 15A29
This paper is dedicated to the late Professor Duan Li in commemoration of his contributions to
optimization, financial engineering, and risk management.
The work was founded by the National Natural Science Foundation of China (No. 12071307).
B Yun-Bin Zhao
Nan Meng
1
School of Mathematics, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
2
Shenzhen Research Institute of Big Data, Chinese University of Hong Kong, Shenzhen 518172,
Guangdong, China
123
N. Meng, Y.-B. Zhao
1 Introduction
The sparse optimization problem arises naturally from a wide range of practical scenarios such as compressed sensing [1–4], signal and image processing [5–7], pattern
recognition [8], and wireless communications [9]. The typical problem of signal recovery via compressed sensing can be formulated as the following sparse optimization
problem:
min y − Ax22 : x0 k ,
x
(1)
where k is a given integer number reflecting the sparsity level of the target signal x ∗ ,
A ∈ Rm×n is a measurement matrix with m n, x0 is the so-called 0 -norm
counting the nonzeros of the vector x, and y is the acquired measurements of the
signal x ∗ to recover. The vector y is usually represented as y = Ax ∗ + η, where η
denotes a noise vector.
Developing effective algorithms for the model (1) is fundamentally important in
signal recovery. At the current stage of development, the main algorithms for solving sparse optimization problems can be categorized into several classes: convex
optimization, heuristic algorithms, thresholding algorithms, and Bayes methods. The
typical convex optimization methods include 1 -minimization [10,11], reweighted 1 minimization [12,13], and dual-density-based reweighted 1 -minimization [4,14,15].
The widely used heuristic algorithms include orthogonal matching pursuit (OMP)
[16,17], subspace pursuit (SP) [18], and compressive sampling matching pursuit
(CoSaMP) [19,20]. Depending on thresholding strategies, the thresholding methods
can be roughly classified as soft thresholding [21,22], hard thresholding (e.g., [23–27]),
and the so-called optimal thresholding methods [28,29].
The hard thresholding is the simplest thresholding approach used to generate iterates
satisfying the constraint of the problem (1). Throughout the paper, we use Hk (·) to
denote the hard thresholding operator which retains the largest k magnitudes of a vector
and zeroes out the others. The following iterative hard thresholding (IHT) scheme
x p+1 = Hk x p + λA y − Ax p ,
where λ > 0 is a stepsize, was first studied in [23,30]. Incorporating a pursuit step
(least-squares step) into IHT yields the hard thresholding pursuit (HTP) [26,31], and
when λ is replaced by an adaptive stepsize similar to the one used in traditional
conjugate methods, it leads to the so-called normalized iterative hard thresholding
(NIHT) algorithms in [24,32]. The theoretical performance of these algorithms can be
analyzed in terms of the restricted isometry property (RIP) (see, e.g., [3,23,30]).
On the other hand, the search direction A (y − Ax p ) of the above-mentioned
algorithm is the negative gradient of the objective function of the problem (1). Such
a search direction can be replaced by another direction provided that it is a descent
direction of the objective function. Thus, an Newton-type direction was studied in
[27,33,34]. The following iterative method is proposed and referred to as Newtonstep-based iterative hard thresholding (NSIHT) in [27]:
123
Newton-Type Optimal Thresholding Algorithms…
−1
x p+1 = Hk x p + λ A A + I
A y − Ax p ,
(2)
where > 0 is a parameter and λ > 0 is the stepsize.
However, as pointed out in [28,29], the weakness of the hard thresholding operator
Hk (·) is that when applied to a non-sparse iterate generated by the classic gradient
method, it may cause an ascending value of the objective of (1) at the thresholded
vector, compared to the objective value at its unthresholded counterpart. As a result,
direct use of the hard thresholding operator to a non-sparse or non-compressible vector in the course of an algorithm may lead to significant numerical oscillation and
divergence of the algorithm. To overcome such a drawback of hard thresholding operator, Zhao [28] proposed an optimal k-thresholding technique which makes it possible
to perform thresholding and objective-value reduction simultaneously. The optimal
k-thresholding iterative scheme in [28] can be simply stated as
x p+1 = Zk# x p + λA y − Ax p ,
where λ remains a stepsize, and Zk# (·) is the so-called optimal k-thresholding operator.
Given a vector u, the thresholded vector Zk# (u) = u ⊗ w ∗ (the Hadamard product of
two vectors) where the vector w∗ is the optimal solution to the following quadratic
0-1 optimization problem:
w ∗ := arg min y − A(u ⊗ w)22 : e w = k, w ∈ {0, 1}n ,
w
where e = (1, · · · , 1) ∈ Rn is the vector of ones, and {0, 1}n denotes the set of
n-dimensional 0-1 vectors. To avoid solving such a binary optimization problem, an
alternative approach is to solve its convex relaxation which, as pointed out in [28,29],
is the tightest convex relaxation of the above problem:
w := arg min y − A(u ⊗ w)22 : e w = k, 0 w e .
w
(3)
Based on the convex relaxation of the operator Zk# (·), efficient algorithms calle (...truncated)