Hybrid ADMM: a unifying and fast approach to decentralized optimization (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1186%2Fs13634-018-0589-x.pdf

Hybrid ADMM: a unifying and fast approach to decentralized optimization

Ma et al. EURASIP Journal on Advances in Signal (2018) 2018:73 Processing https://doi.org/10.1186/s13634-018-0589-x EURASIP Journal on Advances in Signal Processing RESEARCH Open Access Hybrid ADMM: a unifying and fast approach to decentralized optimization Meng Ma1 , Athanasios N. Nikolakopoulos2 and Georgios B. Giannakis1,2* Abstract The present work introduces the hybrid consensus alternating direction method of multipliers (H-CADMM), a novel framework for optimization over networks which unifies existing distributed optimization approaches, including the centralized and the decentralized consensus ADMM. H-CADMM provides a flexible tool that leverages the underlying graph topology in order to achieve a desirable sweet spot between node-to-node communication overhead and rate of convergence—thereby alleviating known limitations of both C-CADMM and D-CADMM. A rigorous analysis of the novel method establishes linear convergence rate and also guides the choice of parameters to optimize this rate. The novel hybrid update rules of H-CADMM lend themselves to “in-network acceleration” that is shown to effect considerable—and essentially “free-of-charge”—performance boost over the fully decentralized ADMM. Comprehensive numerical tests validate the analysis and showcase the potential of the method in tackling efficiently, widely useful learning tasks. Keywords: ADMM, Distributed optimization, Decentralized learning, Hybrid, Consensus 1 Introduction Recent advances in machine learning, signal processing, and data mining have led to important problems that can be formulated as distributed optimization over networks. Such problems entail parallel processing of data acquired by interconnected nodes and arise frequently in several applications, including data fusion and processing using sensor networks [1–4], vehicle coordination [5, 6], power state estimation [7], clustering [8], classification [9], regression [10], filtering [11], and demodulation [12, 13], to name a few. Among the candidate solvers for such problems, the alternating direction method of multipliers (ADMM) [14, 15] stands out as an efficient and easily implementable algorithm of choice that has attracted much interest in recent years [16–19], thanks to its simplicity, fast convergence, and easily decomposable structure. Many distributed optimization problems can be formulated in a consensus form and solved efficiently by ADMM [15, 20]. The solver involves two basic steps: *Correspondence: Department of Electrical and Computer Engineering, University of Minnesota, 200 Union Street SE, 55455 Minneapolis, USA 2 Digital Technology Center, University of Minnesota, 117 Pleasant ST, 55455 Minneapolis, USA 1 (i) a communication step for exchanging information with a central processing unit, the so-called fusion center (FC), and (ii) an update step for updating the local variables at each node. By alternating between the two, local iterates eventually converge to the global solution. This approach is referred to as centralized consensus ADMM (C-CADMM), and although it has been successfully applied in various settings, it may not always present the preferable solver. In large-scale systems for instance, the cost of connecting each node to the FC may become prohibitive as the overhead of communicating data to the FC may be overwhelming and the related storage requirement could surpass the capacity of a single FC. Furthermore, having one dedicated FC can lead to a single point of failure. In addition, there might be privacy-related issues that restrict access to private data. Decentralized optimization, on the other hand, forgoes with the FC by exchanging information only among single-hop neighbors. As long as the network is connected, local iterates can consent to the globally optimal decision variable, thanks to the aforementioned information exchange. This method—referred to as decentralized consensus ADMM (D-CADMM)—has attracted considerable interest; see e.g., [20] for a review of applications in communications and networking. In large-scale networks, © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Ma et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:73 D-CADMM’s convergence slows down as the per-node information experiences large delays to reach remote destinations through multiple neighbor-to-neighbor communications. 1.1 Our contributions To address the aforementioned limitations, the present paper puts forth a novel decentralized framework, that we term hybrid consensus ADMM (H-CADMM), which unifies and markedly broadens C-CADMM and D-CADMM. Our contributions are in five directions: (i) H-CADMM features hybrid updates accommodating communications with both the FCs and single-hop neighbors, thus bridging centralized with fully decentralized updates. This makes H-CADMM appealing for large-scale networks with multiple local FCs—a situation none of the existing approached is designed to handle. (ii) A novel formulation of D-CADMM without duplicate constraints (dual variables commonly adopted by decentralized learning [7, 20, 21]) emerges simply by specializing the hybrid constraints to coincide with those arising from the purely neighborhood-based formulation. (iii) Linear convergence is established, along with a rate bound and specializes to C- CADMM and D-CADMM. The parameter setting to achieve the optimal bound is also provided. (iv) H-CADMM is flexible to deploy FCs as needed to maximize performance gains, thus striking a desirable trade-off between the number of FCs deployed and convergence gain sought. (v) The capability of handling hybrid constraints not only deals with mixed updates but also effects “in-network acceleration” in decentralized operation without incurring noticeable increase in the overall complexity. Page 2 of 17 programs is established in [35]; see also [18] where the cost is a sum of component costs. Global linear convergence of a more general form of ADMM is reported in [19], and linear convergence for a generalized formulation of consensus ADMM using the so-called “communication matrix” in [31]. Though D-CADMM has been applied to various problems [3, 4, 10, 12, 13, 36], its linear convergence remained open until recently [21] (see [37] for the weighted counterpart). A successive orthogonal projection approach for distributed learning over networked nodes is introduced in [38], where nodes cannot communicate, but each node can access only limited amounts of data, and agreement is enforced across nodes sharing the same data. A distributed ADMM algorithm that deals with node clusters was pr (...truncated)