Managing randomization in the multi-block alternating direction method of multipliers for quadratic optimization
Mathematical Programming Computation
https://doi.org/10.1007/s12532-020-00192-5
FULL LENGTH PAPER
Managing randomization in the multi-block alternating
direction method of multipliers for quadratic optimization
Krešimir Mihić1,2
· Mingxi Zhu3
· Yinyu Ye4
Received: 17 April 2019 / Accepted: 24 August 2020
© The Author(s) 2020
Abstract
The Alternating Direction Method of Multipliers (ADMM) has gained a lot of attention
for solving large-scale and objective-separable constrained optimization. However, the
two-block variable structure of the ADMM still limits the practical computational efficiency of the method, because one big matrix factorization is needed at least once even
for linear and convex quadratic programming. This drawback may be overcome by
enforcing a multi-block structure of the decision variables in the original optimization
problem. Unfortunately, the multi-block ADMM, with more than two blocks, is not
guaranteed to be convergent. On the other hand, two positive developments have been
made: first, if in each cyclic loop one randomly permutes the updating order of the multiple blocks, then the method converges in expectation for solving any system of linear
equations with any number of blocks. Secondly, such a randomly permuted ADMM
also works for equality-constrained convex quadratic programming even when the
objective function is not separable. The goal of this paper is twofold. First, we add
more randomness into the ADMM by developing a randomly assembled cyclic ADMM
(RAC-ADMM) where the decision variables in each block are randomly assembled.
We discuss the theoretical properties of RAC-ADMM and show when random assembling helps and when it hurts, and develop a criterion to guarantee that it converges
almost surely. Secondly, using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests on solving both randomly generated and large-scale
benchmark quadratic optimization problems, which include continuous, and binary
graph-partition and quadratic assignment, and selected machine learning problems.
Our numerical tests show that the RAC-ADMM, with a variable-grouping strategy,
could significantly improve the computation efficiency on solving most quadratic optimization problems.
Keywords Quadratic Optimization · ADMM · Decomposition · Randomization ·
Machine learning applications
B Krešimir Mihić
Extended author information available on the last page of the article
123
K. Mihić et al.
Mathematics Subject Classification 90C20 · 65K05 · 90-04
1 Introduction
In this paper we consider the linearly constrained convex minimization model with
an objective function that is the sum of multiple separable functions and a coupled
quadratic function:
min f (x) = 21 x T H x + c T x
x
p
(1)
Ai xi = b
s.t.
i=1
x∈X
is a symmetric positive semidefinite matrix, vector c ∈ Rn and the
where H ∈ R
m×di
, i = 1, 2, . . . , p
problem
1 , . . . , A p ], Ai ∈ R
pparameters are the matrix A = [Am
with i=1 di = n and the vector b ∈ R . The constraint set X is the Cartesian
product of possibly non-convex real, closed, nonempty sets, X = X1 × · · · × X p ,
where xi ∈ Xi ⊆ Rdi .
Problem (1) naturally arises from applications such as machine and statistical
learning, image processing, portfolio management, tensor decomposition, matrix completion or decomposition, manifold optimization, data clustering and many other
problems of practical importance. To solve problem (1), we consider in particular
a randomly assembled multi-block and cyclic alternating direction method of multipliers (RAC-ADMM), a novel algorithm with which we hope to mitigate the problem
of slow convergence and divergence issues of the classical alternating direction method
of multipliers (ADMM) when applied to problems with cross-block coupled variables.
ADMM was originally proposed in 1970’s [31,32] and after a long period without too much attention it has recently gained in popularity for a broad spectrum of
applications [28,41,44,57,67]. Problems successfully solved by ADMM range from
classical linear programming (LP), semidefinite programming (SDP) and quadratically
constrained quadratic programming (QCQP) applied to partial differential equations,
mechanics, image processing, statistical learning, computer vision and similar problems (for examples see [10,39,45,53,58,70]) to emerging areas such as deep learning
[71], medical treatment [81] and social networking [2]. ADMM is shown to be a good
choice for problems where high accuracy is not a requirement but a “good enough”
solution is needed to be found fast.
Cyclic multi-block ADMM (C-ADMM) is an iterative algorithm that embeds
a Gaussian-Seidel decomposition into each iteration of the augmented Lagrangian
method (ALM) [36,59]. It consists of a cyclic update of the blocks of primal variables,
xi ∈ X i , x = (x1 , . . . , x p ), and a dual ascent type update of the variable y ∈ Rm , i.e.,
n×n
⎧ k+1
x1 = arg min{L β (x1 , xk2 , xk3 , . . . , xkp ; yk )},
⎪
⎪
⎪
x1 ∈X 1
⎪
⎪
⎪
⎨ ..
C-ADMM := .
k+1 k+1
k
⎪
= arg min{L β (xk+1
xk+1
⎪
p
1 , x2 , x3 , . . . , x p ; y )},
⎪
⎪
⎪
x p ∈X p
⎪
p
⎩ k+1
y
= yk −β( i=1 Ai xik+1 − b)
123
(2)
Managing randomization in the multi-block alternating…
where β > 0 is a penalty parameter of the Augmented Lagrangian function L β ,
L β (x1 , . . . , x p ; y ) := f (x) − y
k
T
p
i=1
Ai xi − b +
p
Ai xi − b
2
.
(3)
i=1
Note that the classical ADMM [31,32] admits only optimization problems that are
separable in blocks of variables and with p = 2.
Another variant of multi-block ADMM was suggested in [5], where the authors
introduce the distributed multi-block ADMM (D-ADMM) for separable problems.
The method creates a Dantzig-Wolfe-Benders decomposition structure and sequentially solves a “master” problem followed by solving distributed multi-block “slave”
problems. It converts the multi-block problem into an equivalent two-block problem
via variable splitting [6] and performs a separate augmented Lagrangian minimization
over xi . The
method assumes that the objective function is separable across blocks,
f (x) = i f i (xi ) + c T x, and is not provably working for solving problems with
non-separable objective functions.
⎧
Update x , i = 1, . . . , p
⎪
⎪ k+1 i
⎪
⎪
= arg min f i (xi ) − (yk )T (Ai xi − λik ) + β2 Ai xi − λik 2
x
⎪
⎪
⎨ i
xi ∈X i
D-ADMM := Update λi , i = 1, . . . , p
⎪
⎪
k+1
k+1
k+1
1 p
⎪
−b
⎪
j=1 A j x j
⎪ λi = Ai xi − p
⎪
⎩ k+1
p
β
k+1
k
y
= y − p ( i=1 Ai xi − b).
(4)
Because of the variable splitting, the distributed ADMM approach based on (4)
increases the number of variables and constraints in the problem, which in turn makes
the algorithm not very efficient for large p in practice.
The classical two-block ADMM (Eq. 2 with p = 2) and its convergence have been
extensively studied in the literature (e.g. [20,22,31,35,54]. However, the two-block
variable structure of the ADMM still limits the practical computational efficiency of
the method, because one factoriza (...truncated)