Managing randomization in the multi-block alternating direction method of multipliers for quadratic optimization (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s12532-020-00192-5.pdf

Managing randomization in the multi-block alternating direction method of multipliers for quadratic optimization

Mathematical Programming Computation https://doi.org/10.1007/s12532-020-00192-5 FULL LENGTH PAPER Managing randomization in the multi-block alternating direction method of multipliers for quadratic optimization Krešimir Mihić1,2 · Mingxi Zhu3 · Yinyu Ye4 Received: 17 April 2019 / Accepted: 24 August 2020 © The Author(s) 2020 Abstract The Alternating Direction Method of Multipliers (ADMM) has gained a lot of attention for solving large-scale and objective-separable constrained optimization. However, the two-block variable structure of the ADMM still limits the practical computational efficiency of the method, because one big matrix factorization is needed at least once even for linear and convex quadratic programming. This drawback may be overcome by enforcing a multi-block structure of the decision variables in the original optimization problem. Unfortunately, the multi-block ADMM, with more than two blocks, is not guaranteed to be convergent. On the other hand, two positive developments have been made: first, if in each cyclic loop one randomly permutes the updating order of the multiple blocks, then the method converges in expectation for solving any system of linear equations with any number of blocks. Secondly, such a randomly permuted ADMM also works for equality-constrained convex quadratic programming even when the objective function is not separable. The goal of this paper is twofold. First, we add more randomness into the ADMM by developing a randomly assembled cyclic ADMM (RAC-ADMM) where the decision variables in each block are randomly assembled. We discuss the theoretical properties of RAC-ADMM and show when random assembling helps and when it hurts, and develop a criterion to guarantee that it converges almost surely. Secondly, using the theoretical guidance on RAC-ADMM, we conduct multiple numerical tests on solving both randomly generated and large-scale benchmark quadratic optimization problems, which include continuous, and binary graph-partition and quadratic assignment, and selected machine learning problems. Our numerical tests show that the RAC-ADMM, with a variable-grouping strategy, could significantly improve the computation efficiency on solving most quadratic optimization problems. Keywords Quadratic Optimization · ADMM · Decomposition · Randomization · Machine learning applications B Krešimir Mihić Extended author information available on the last page of the article 123 K. Mihić et al. Mathematics Subject Classification 90C20 · 65K05 · 90-04 1 Introduction In this paper we consider the linearly constrained convex minimization model with an objective function that is the sum of multiple separable functions and a coupled quadratic function: min f (x) = 21 x T H x + c T x x p (1) Ai xi = b s.t. i=1 x∈X is a symmetric positive semidefinite matrix, vector c ∈ Rn and the where H ∈ R m×di , i = 1, 2, . . . , p problem 1 , . . . , A p ], Ai ∈ R pparameters are the matrix A = [Am with i=1 di = n and the vector b ∈ R . The constraint set X is the Cartesian product of possibly non-convex real, closed, nonempty sets, X = X1 × · · · × X p , where xi ∈ Xi ⊆ Rdi . Problem (1) naturally arises from applications such as machine and statistical learning, image processing, portfolio management, tensor decomposition, matrix completion or decomposition, manifold optimization, data clustering and many other problems of practical importance. To solve problem (1), we consider in particular a randomly assembled multi-block and cyclic alternating direction method of multipliers (RAC-ADMM), a novel algorithm with which we hope to mitigate the problem of slow convergence and divergence issues of the classical alternating direction method of multipliers (ADMM) when applied to problems with cross-block coupled variables. ADMM was originally proposed in 1970’s [31,32] and after a long period without too much attention it has recently gained in popularity for a broad spectrum of applications [28,41,44,57,67]. Problems successfully solved by ADMM range from classical linear programming (LP), semidefinite programming (SDP) and quadratically constrained quadratic programming (QCQP) applied to partial differential equations, mechanics, image processing, statistical learning, computer vision and similar problems (for examples see [10,39,45,53,58,70]) to emerging areas such as deep learning [71], medical treatment [81] and social networking [2]. ADMM is shown to be a good choice for problems where high accuracy is not a requirement but a “good enough” solution is needed to be found fast. Cyclic multi-block ADMM (C-ADMM) is an iterative algorithm that embeds a Gaussian-Seidel decomposition into each iteration of the augmented Lagrangian method (ALM) [36,59]. It consists of a cyclic update of the blocks of primal variables, xi ∈ X i , x = (x1 , . . . , x p ), and a dual ascent type update of the variable y ∈ Rm , i.e., n×n ⎧ k+1 x1 = arg min{L β (x1 , xk2 , xk3 , . . . , xkp ; yk )}, ⎪ ⎪ ⎪ x1 ∈X 1 ⎪ ⎪ ⎪ ⎨ .. C-ADMM := . k+1 k+1 k ⎪ = arg min{L β (xk+1 xk+1 ⎪ p 1 , x2 , x3 , . . . , x p ; y )}, ⎪ ⎪ ⎪ x p ∈X p ⎪ p ⎩ k+1 y = yk −β( i=1 Ai xik+1 − b) 123 (2) Managing randomization in the multi-block alternating… where β > 0 is a penalty parameter of the Augmented Lagrangian function L β , L β (x1 , . . . , x p ; y ) := f (x) − y k T p i=1 Ai xi − b + p Ai xi − b 2 . (3) i=1 Note that the classical ADMM [31,32] admits only optimization problems that are separable in blocks of variables and with p = 2. Another variant of multi-block ADMM was suggested in [5], where the authors introduce the distributed multi-block ADMM (D-ADMM) for separable problems. The method creates a Dantzig-Wolfe-Benders decomposition structure and sequentially solves a “master” problem followed by solving distributed multi-block “slave” problems. It converts the multi-block problem into an equivalent two-block problem via variable splitting [6] and performs a separate augmented Lagrangian minimization over xi . The method assumes that the objective function is separable across blocks, f (x) = i f i (xi ) + c T x, and is not provably working for solving problems with non-separable objective functions. ⎧ Update x , i = 1, . . . , p ⎪ ⎪ k+1 i ⎪ ⎪ = arg min f i (xi ) − (yk )T (Ai xi − λik ) + β2 Ai xi − λik 2 x ⎪ ⎪ ⎨ i xi ∈X i D-ADMM := Update λi , i = 1, . . . , p ⎪ ⎪ k+1 k+1 k+1 1 p ⎪ −b ⎪ j=1 A j x j ⎪ λi = Ai xi − p ⎪ ⎩ k+1 p β k+1 k y = y − p ( i=1 Ai xi − b). (4) Because of the variable splitting, the distributed ADMM approach based on (4) increases the number of variables and constraints in the problem, which in turn makes the algorithm not very efficient for large p in practice. The classical two-block ADMM (Eq. 2 with p = 2) and its convergence have been extensively studied in the literature (e.g. [20,22,31,35,54]. However, the two-block variable structure of the ADMM still limits the practical computational efficiency of the method, because one factoriza (...truncated)