A Survey on Some Recent Developments of Alternating Direction Method of Multipliers
Journal of the Operations Research Society of China
https://doi.org/10.1007/s40305-021-00368-3
A Survey on Some Recent Developments of Alternating
Direction Method of Multipliers
De-Ren Han1
Received: 1 February 2020 / Revised: 8 September 2021 / Accepted: 9 September 2021
© The Author(s) 2021
Abstract
Recently, alternating direction method of multipliers (ADMM) attracts much attentions from various fields and there are many variant versions tailored for different
models. Moreover, its theoretical studies such as rate of convergence and extensions
to nonconvex problems also achieve much progress. In this paper, we give a survey
on some recent developments of ADMM and its variants.
Keywords Alternating direction method of multipliers · Global convergence · Rate
of convergence · Nonconvex optimization
Mathematics Subject Classification 90C30 · 90C33 · 65K05
1 Introduction
In this paper, we survey the developments of the alternating direction method of
multipliers (ADMM) and its variants for solving the minimization problem with linear
constrains and a separable objective function which is the sum of many individual
functions without coupled variables:
min
m
i=1
m
θi (xi )
Ai xi = b, xi ∈ Xi , i = 1, · · · , m ,
(1)
i=1
This work is supported by the National Natural Science Foundation of China (Nos. 11625105 and
12131004).
B De-Ren Han
1
LMIB of the Ministry of Education, School of Mathematical Sciences, Beihang University, Beijing
100191, China
123
D. Han
l×n i ; X ⊆ Rn i
where θi : Rn i → R ∪ {∞} are closed proper functions;
i
m A i ∈ R
l
are closed and convex nonempty sets; b ∈ R ; and i=1 n i = n. As a linearly
constrained optimization problem, though the model (1) is special, it is rich enough to
characterize many optimization problems arising from various application fields, e.g.,
the image alignment problem in [1], the robust principal component analysis model
with noisy and incomplete data in [2], the latent variable Gaussian graphical model
selection in [3], the quadratic discriminant analysis model in [4] and the quadratic
conic programming in [5]; just list a few.
We now give some concrete application models.
1 -norm minimization: In some applications such as statistics, machine learning,
and signal processing, one wants to find a ‘sparse’ solution from the given data.
Let b ∈ Rl denote the observed signal and we know that it comes from a linear
transformation A ∈ Rl×n and l << n. The task is to find the sparsest solution, i.e.,
the vector that contains as many zero elements as possible and satisfies the equation
Ax = b. Let y0 denote the number of nonzero elements of the vector y, then we
can formulate the problem as
minn
x∈R
1
Ax − b2 + μx0 ,
2
(2)
where μ > 0 is a scalar regularization parameter that is usually chosen by crossvalidation. Introducing a new variable, we can reformulate (2) as
min
x∈Rn
1
2
Ax − b + μy0 x = y ,
2
(3)
which is a special case of (1) with m = 2.
Since the zero norm is discontinuous and nonconvex, researchers usually replace
it with its convex hull, the 1 -norm. Then, (2) and (3) can be relaxed to
1
Ax − b2 + μx1 ,
2
(4)
1
Ax − b2 + μy1 x = y ,
2
(5)
min
x∈Rn
and
minn
x∈R
respectively. The model (4) is just the well-known lasso [6].
A generalization of the above model is that it is not the solution itself but its linear
transformation is required to be sparse, and the optimization model is
min
x∈Rn
123
1
Ax − b2 + μF x0 ,
2
(6)
A Survey on Some Recent Developments…
where F is an arbitrary linear transformation. Again, after introducing an auxiliary
variable, we get a special case of (1) with m = 2,
min
x∈Rn
1
2
Ax − b + μy0 F x = y ;
2
and replacing the zero norm with its convex hull, the 1 -norm, we obtain its relaxed
model
1
2
(7)
Ax − b + μy1 F x = y ,
min
x∈Rn
2
which is called generalized lasso. When F is the difference matrix
⎧
⎨ 1, j = i + 1,
Fi j = −1, j = i,
⎩
0, otherwise,
then F x1 is the total variation of x [7], which finds wide applications in image
processing.
Other 1 -norm minimization models that can be reformulated into (1) include basis
pursuit [8], Huber function fitting [9], group lasso [10], etc.
Matrix completion: In some applications such as the movie ratings in the Netflix
problem, part of the data (elements of a matrix) is unaccessible, and the task is filling
in the missing entries of a partially observed matrix. That is, given a ratings matrix in
which each entry (i, j) represents the rating of movie j by customer i if customer i
has watched movie j and is otherwise missing, we would like to predict the remaining
entries. A property that helps to accomplish the task is that the preferred matrix is low
rank, or its rank is known as priori; otherwise the hidden entries could be assigned
arbitrary values.
Let M be the matrix to be recovered and let be the set of locations corresponding
to the observed entries ((i, j) ∈ if Mi j is observed). The optimization model is [11]
min
x∈Rl×n
rank(x) xi j = Mi j , for (i, j) ∈ .
(8)
As (5) to (2), we can also relax and reformulate (8) to the convex separable problem
min
x,y∈Rl×n
x∗ x = y, yi j = Mi j , for (i, j) ∈ ,
(9)
where x∗ denotes the nuclear norm of the matrix x which is defined as the sum of
its singular values. Then, we obtain a special case of (1) for m = 2 and with matrix
variables.
Robust principal component analysis: Given part of the elements of a data matrix
which is the superposition of a low rank matrix and a sparse matrix, the robust principal component analysis (RPCA) is to recover each component individually [12].
Moreover, the given data may be corrupted by noises. As in the matrix completion
123
D. Han
example, let be the set of locations corresponding to the given entries, and let
P : Rl×n → Rl×n be the orthogonal projection onto the span of matrices vanishing
outside of , i.e., the i j-th entry of P (x) is xi j if (i, j) ∈ and zero otherwise. The
optimization model for the robust principal component analysis problem is
min
x∈Rl×n
rank(x) + τ1 y0 + τ2 P (z)2F x + y + z = M ,
(10)
where M is the given data, · F denotes the Frobenius norm of a matrix. Relaxing
the rank and the zero norm with their convex hull, we obtain a relaxation [2]
min
x∈Rl×n
x∗ + τ1 y1 + τ2 P (z)2F x + y + z = M .
(11)
Both (10) and (11) fall into the framework of (1) with m = 3.
Note that the original application models (2), (6), (8) and (10) contain discrete
terms such as the 0 -norm and the rank function, and they are nonconvex optimization
problems. Solving these optimization problems are usually NP-hard (nondeterministic
polynomial hard). Peoples thus heuristically turn to solving their relaxation problems
(5), (7), (9) and (11). Fortunately, under suitable conditions, the relaxed problem and
the original one share the same solutions [11,13].
The relaxati (...truncated)