A Survey on Some Recent Developments of Alternating Direction Method of Multipliers (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s40305-021-00368-3.pdf

A Survey on Some Recent Developments of Alternating Direction Method of Multipliers

Journal of the Operations Research Society of China https://doi.org/10.1007/s40305-021-00368-3 A Survey on Some Recent Developments of Alternating Direction Method of Multipliers De-Ren Han1 Received: 1 February 2020 / Revised: 8 September 2021 / Accepted: 9 September 2021 © The Author(s) 2021 Abstract Recently, alternating direction method of multipliers (ADMM) attracts much attentions from various fields and there are many variant versions tailored for different models. Moreover, its theoretical studies such as rate of convergence and extensions to nonconvex problems also achieve much progress. In this paper, we give a survey on some recent developments of ADMM and its variants. Keywords Alternating direction method of multipliers · Global convergence · Rate of convergence · Nonconvex optimization Mathematics Subject Classification 90C30 · 90C33 · 65K05 1 Introduction In this paper, we survey the developments of the alternating direction method of multipliers (ADMM) and its variants for solving the minimization problem with linear constrains and a separable objective function which is the sum of many individual functions without coupled variables: min m i=1 m θi (xi ) Ai xi = b, xi ∈ Xi , i = 1, · · · , m , (1) i=1 This work is supported by the National Natural Science Foundation of China (Nos. 11625105 and 12131004). B De-Ren Han 1 LMIB of the Ministry of Education, School of Mathematical Sciences, Beihang University, Beijing 100191, China 123 D. Han l×n i ; X ⊆ Rn i where θi : Rn i → R ∪ {∞} are closed proper functions; i m A i ∈ R l are closed and convex nonempty sets; b ∈ R ; and i=1 n i = n. As a linearly constrained optimization problem, though the model (1) is special, it is rich enough to characterize many optimization problems arising from various application fields, e.g., the image alignment problem in [1], the robust principal component analysis model with noisy and incomplete data in [2], the latent variable Gaussian graphical model selection in [3], the quadratic discriminant analysis model in [4] and the quadratic conic programming in [5]; just list a few. We now give some concrete application models. 1 -norm minimization: In some applications such as statistics, machine learning, and signal processing, one wants to find a ‘sparse’ solution from the given data. Let b ∈ Rl denote the observed signal and we know that it comes from a linear transformation A ∈ Rl×n and l << n. The task is to find the sparsest solution, i.e., the vector that contains as many zero elements as possible and satisfies the equation Ax = b. Let y0 denote the number of nonzero elements of the vector y, then we can formulate the problem as minn x∈R 1 Ax − b2 + μx0 , 2 (2) where μ > 0 is a scalar regularization parameter that is usually chosen by crossvalidation. Introducing a new variable, we can reformulate (2) as min x∈Rn 1 2 Ax − b + μy0 x = y , 2 (3) which is a special case of (1) with m = 2. Since the zero norm is discontinuous and nonconvex, researchers usually replace it with its convex hull, the 1 -norm. Then, (2) and (3) can be relaxed to 1 Ax − b2 + μx1 , 2 (4) 1 Ax − b2 + μy1 x = y , 2 (5) min x∈Rn and minn x∈R respectively. The model (4) is just the well-known lasso [6]. A generalization of the above model is that it is not the solution itself but its linear transformation is required to be sparse, and the optimization model is min x∈Rn 123 1 Ax − b2 + μF x0 , 2 (6) A Survey on Some Recent Developments… where F is an arbitrary linear transformation. Again, after introducing an auxiliary variable, we get a special case of (1) with m = 2, min x∈Rn 1 2 Ax − b + μy0 F x = y ; 2 and replacing the zero norm with its convex hull, the 1 -norm, we obtain its relaxed model 1 2 (7) Ax − b + μy1 F x = y , min x∈Rn 2 which is called generalized lasso. When F is the difference matrix ⎧ ⎨ 1, j = i + 1, Fi j = −1, j = i, ⎩ 0, otherwise, then F x1 is the total variation of x [7], which finds wide applications in image processing. Other 1 -norm minimization models that can be reformulated into (1) include basis pursuit [8], Huber function fitting [9], group lasso [10], etc. Matrix completion: In some applications such as the movie ratings in the Netflix problem, part of the data (elements of a matrix) is unaccessible, and the task is filling in the missing entries of a partially observed matrix. That is, given a ratings matrix in which each entry (i, j) represents the rating of movie j by customer i if customer i has watched movie j and is otherwise missing, we would like to predict the remaining entries. A property that helps to accomplish the task is that the preferred matrix is low rank, or its rank is known as priori; otherwise the hidden entries could be assigned arbitrary values. Let M be the matrix to be recovered and let be the set of locations corresponding to the observed entries ((i, j) ∈ if Mi j is observed). The optimization model is [11] min x∈Rl×n rank(x) xi j = Mi j , for (i, j) ∈ . (8) As (5) to (2), we can also relax and reformulate (8) to the convex separable problem min x,y∈Rl×n x∗ x = y, yi j = Mi j , for (i, j) ∈ , (9) where x∗ denotes the nuclear norm of the matrix x which is defined as the sum of its singular values. Then, we obtain a special case of (1) for m = 2 and with matrix variables. Robust principal component analysis: Given part of the elements of a data matrix which is the superposition of a low rank matrix and a sparse matrix, the robust principal component analysis (RPCA) is to recover each component individually [12]. Moreover, the given data may be corrupted by noises. As in the matrix completion 123 D. Han example, let be the set of locations corresponding to the given entries, and let P : Rl×n → Rl×n be the orthogonal projection onto the span of matrices vanishing outside of , i.e., the i j-th entry of P (x) is xi j if (i, j) ∈ and zero otherwise. The optimization model for the robust principal component analysis problem is min x∈Rl×n rank(x) + τ1 y0 + τ2 P (z)2F x + y + z = M , (10) where M is the given data, · F denotes the Frobenius norm of a matrix. Relaxing the rank and the zero norm with their convex hull, we obtain a relaxation [2] min x∈Rl×n x∗ + τ1 y1 + τ2 P (z)2F x + y + z = M . (11) Both (10) and (11) fall into the framework of (1) with m = 3. Note that the original application models (2), (6), (8) and (10) contain discrete terms such as the 0 -norm and the rank function, and they are nonconvex optimization problems. Solving these optimization problems are usually NP-hard (nondeterministic polynomial hard). Peoples thus heuristically turn to solving their relaxation problems (5), (7), (9) and (11). Fortunately, under suitable conditions, the relaxed problem and the original one share the same solutions [11,13]. The relaxati (...truncated)