Fast Optimistic Gradient Descent Ascent (OGDA) Method in Continuous and Discrete Time (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-023-09636-5.pdf

Fast Optimistic Gradient Descent Ascent (OGDA) Method in Continuous and Discrete Time

Foundations of Computational Mathematics https://doi.org/10.1007/s10208-023-09636-5 Fast Optimistic Gradient Descent Ascent (OGDA) Method in Continuous and Discrete Time Radu Ioan Boţ1 · Ernö Robert Csetnek1 · Dang-Khoa Nguyen1,2 Received: 25 March 2022 / Revised: 16 June 2023 / Accepted: 11 September 2023 © The Author(s) 2023 Abstract In the framework of real Hilbert spaces, we study continuous in time dynamics as well as numerical algorithms for the problem of approaching the set of zeros of a singlevalued monotone and continuous operator V . The starting point of our investigations is a second-order dynamical system that combines a vanishing damping term with the time derivative of V along the trajectory, which can be seen as an analogous of the Hessian-driven damping in case the operator isoriginating from a potential. Our 1 method exhibits fast convergence rates of order o tβ(t) for V (z(t)), where z(·) denotes the generated trajectory and β(·) is a positive nondecreasing function satisfying a growth condition, and also for the restricted gap function, which is a measure of optimality for variational inequalities. We also prove the weak convergence of the trajectory to a zero of V . Temporal discretizations of the dynamical system generate implicit and explicit numerical algorithms, which can be both seen as accelerated versions of the Optimistic Gradient Descent Ascent (OGDA) method for monotone operators, for which we prove that the generated sequence of iterates (z k )k≥0 shares Communicated by Jérôme Bolte. Radu Ioan Boţ: Research partially supported by FWF (Austrian Science Fund), Projects W 1260 and P 34922-N. Ernö Robert Csetnek: Research partially supported by FWF (Austrian Science Fund), Project P 29809-N32. Dang-Khoa Nguyen: Research supported by FWF (Austrian Science Fund), Project P 34922-N. B Radu Ioan Boţ Ernö Robert Csetnek Dang-Khoa Nguyen 1 Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Vienna, Austria 2 Faculty of Mathematics and Computer Science, University of Science, Vietnam National University, Ho Chi Minh City 700000, Vietnam 123 Foundations of Computational Mathematics the asymptotic features of the continuous dynamics. In particular we show for the 1 implicit numerical algorithm convergence rates of order o kβk for V (z k ) and the restricted gap function, where (βk )k≥0 is a positive nondecreasing sequence satisfying a growth condition. For the explicit numerical algorithm, we show by additionally assuming that the operator V is Lipschitz continuous convergence rates of order o k1 for V (z k ) and the restricted gap function. All convergence rate statements are last iterate convergence results; in addition to these, we prove for both algorithms the convergence of the iterates to a zero of V . To our knowledge, our study exhibits the best-known convergence rate results for monotone equations. Numerical experiments indicate the overwhelming superiority of our explicit numerical algorithm over other methods designed to solve monotone equations governed by monotone and Lipschitz continuous operators. Keywords Monotone equation · Variational inequality · Optimistic Gradient Descent Ascent (OGDA) method · Extragradient method · Nesterov’s accelerated gradient method · Lyapunov analysis · Convergence rates · Convergence of trajectories · Convergence of iterates Mathematics Subject Classification 47J20 · 47H05 · 65K10 · 65K15 · 65Y20 · 90C30 · 90C52 1 Introduction Let H be a real Hilbert space and V : H → H a monotone and continuous operator. We are interested in developing fast converging methods aimed to find a zero of V , or in other words, to solve the monotone equation V (z) = 0, (1) for which assume that it has a nonempty solution set Z. The monotonicity and the continuity of V imply that z ∗ is a solution of 1 if and only if it is a solution of the following variational inequality z − z ∗ , V (z) ≥ 0 ∀z ∈ H. (2) One of the main motivations to study 1 comes from minimax problems. More precisely, consider the problem min max (x, y) , (3) x∈X y∈Y where X and Y are real Hilbert spaces and : X × Y → R is a continuously differentiable and convex–concave function, i.e., (·, y) is convex for every y ∈ Y and (x, ·) is convex for every x ∈ X . A solution of 3 is a saddle point (x∗ , y∗ ) ∈ X × Y of , which means that it fulfills (x∗ , y) ≤ (x∗ , y∗ ) ≤ (x, y∗ ) ∀ (x, y) ∈ X × Y 123 Foundations of Computational Mathematics or, equivalently, ∇x (x∗ , y∗ ) =0 −∇ y (x∗ , y∗ ) = 0. (4) Taking into account that the mapping (x, y) → ∇x (x, y) , −∇ y (x, y) (5) is monotone [43], it means that the problem of finding a saddle point of eventually brings us back to the problem 1. Both 1 and 3 are fundamental models in various fields such as optimization, economics, game theory and partial differential equations. They have recently regained significant attention, in particular in the machine learning and data science community, due to the fundamental role they play, for instance, in multi-agent reinforcement learning [37], robust adversarial learning [32] and generative adversarial networks (GANs) [18, 24]. In this paper, we develop fast continuous in time dynamics as well as numerical algorithms for solving 1 and investigate their asymptotic/convergence properties. First we formulate a second-order dynamical system that combines a vanishing damping term with the time derivative of V along the trajectory, which can be seen as an analogous of the Hessian-driven damping in case the operator is originating from a potential. A continuously differentiable and nondecreasing function β : [t0 , +∞) → (0, +∞), which appears in the system, plays an important role in the analysis. If β satisfies a specific growth condition, which is for instance satisfied by polynomials including 1 for constant functions, then the method exhibits convergence rates of order o tβ(t) V (z(t)), where z(t) denotes the generated trajectory, and for the restricted gap function associated with 2. In addition, z(t) converges asymptotically weakly to a solution of 1. By considering a temporal discretization of the dynamical system, we obtain an 1 implicit numerical algorithm which exhibits convergence rates of order o kβk for V (z k ) and the restricted gap function associated with 2, where (βk )k≥0 is a nondecreasing sequence and (z k )k≥0 is the generated sequence of iterates. For the latter, we also prove that it converges weakly to a solution of 1. By a further more involved discretization of the dynamical system, we obtain an explicit numerical algorithm, which, under the additional assumption that V is Lip schitz continuous, exhibits convergence rates of order o k1 for V (z k ) and the restricted gap function associated with 2, where (z k )k≥0 is the generated sequence of iterates, which is also to converge weakly to a solution of 1. The resulting numerical schem (...truncated)