Continuum Limit of Lipschitz Learning on Graphs

Foundations of Computational Mathematics, Jan 2022

Tackling semi-supervised learning problems with graph-based methods has become a trend in recent years since graphs can represent all kinds of data and provide a suitable framework for studying continuum limits, for example, of differential operators. A popular strategy here is p-Laplacian learning, which poses a smoothness condition on the sought inference function on the set of unlabeled data. For $$p<\infty $$ continuum limits of this approach were studied using tools from $$\varGamma $$ -convergence. For the case $$p=\infty $$ , which is referred to as Lipschitz learning, continuum limits of the related infinity Laplacian equation were studied using the concept of viscosity solutions. In this work, we prove continuum limits of Lipschitz learning using $$\varGamma $$ -convergence. In particular, we define a sequence of functionals which approximate the largest local Lipschitz constant of a graph function and prove $$\varGamma $$ -convergence in the $$L^{\infty }$$ -topology to the supremum norm of the gradient as the graph becomes denser. Furthermore, we show compactness of the functionals which implies convergence of minimizers. In our analysis we allow a varying set of labeled data which converges to a general closed set in the Hausdorff distance. We apply our results to nonlinear ground states, i.e., minimizers with constrained $$L^p$$ -norm, and, as a by-product, prove convergence of graph distance functions to geodesic distance functions.

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-022-09557-9.pdf

Continuum Limit of Lipschitz Learning on Graphs

Foundations of Computational Mathematics https://doi.org/10.1007/s10208-022-09557-9 Continuum Limit of Lipschitz Learning on Graphs Tim Roith1 · Leon Bungert2 Received: 14 January 2021 / Revised: 29 November 2021 / Accepted: 9 December 2021 © The Author(s) 2022 Abstract Tackling semi-supervised learning problems with graph-based methods has become a trend in recent years since graphs can represent all kinds of data and provide a suitable framework for studying continuum limits, for example, of differential operators. A popular strategy here is p-Laplacian learning, which poses a smoothness condition on the sought inference function on the set of unlabeled data. For p < ∞ continuum limits of this approach were studied using tools from Γ -convergence. For the case p = ∞, which is referred to as Lipschitz learning, continuum limits of the related infinity Laplacian equation were studied using the concept of viscosity solutions. In this work, we prove continuum limits of Lipschitz learning using Γ -convergence. In particular, we define a sequence of functionals which approximate the largest local Lipschitz constant of a graph function and prove Γ -convergence in the L ∞ -topology to the supremum norm of the gradient as the graph becomes denser. Furthermore, we show compactness of the functionals which implies convergence of minimizers. In our analysis we allow a varying set of labeled data which converges to a general closed set in the Hausdorff distance. We apply our results to nonlinear ground states, i.e., minimizers with constrained L p -norm, and, as a by-product, prove convergence of graph distance functions to geodesic distance functions. Keywords Lipschitz learning · Graph-based semi-supervised learning · Continuum limit · Gamma-convergence · Ground states · Distance functions Communicated by Alan Edelman. B Leon Bungert Tim Roith 1 Department Mathematik, Friedrich-Alexander Universität Erlangen-Nürnberg, Cauerstraße 11, 91058 Erlangen, Germany 2 Hausdorff Center for Mathematics, Rheinische Friedrich-Wilhelms-Universität Bonn, Endenicher Allee 62, Villa Maria, 53115 Bonn, Germany 123 Foundations of Computational Mathematics Mathematics Subject Classification 35J20 · 35R02 · 65N12 · 68T05 1 Introduction Several works in mathematical data science and machine learning have proven the importance of semi-supervised learning as an essential tool for data analysis, see [17,38–41]. Many classification tasks and problems in image analysis (see, e.g., [17] for an overview) traditionally require an expert examining the data by hand, and this so-called labeling process is often a time-consuming and expensive task. In contrast, one typically faces an abundance of unlabeled data which one would also like to equip with suitable labels. This is the key goal of the semi-supervised learning problem which mathematically can be formulated as the extension of a labeling function g:O→R onto the whole data set V := V ∪ O, where O denotes the set of labeled and V the set of unlabeled data. In most cases, the underlying data can be represented as a finite weighted graph (V , ω)—composed of vertices V and a weight function ω assigning similarity values to pairs of vertices—which provides a convenient mathematical framework. A popular method to generate a unique extension of the labeling function to the whole data set is so- called p-Laplacian regularization, which can be formulated as minimization task  ω(x, y) p |u(x) − u(y)| p , subject to u = g on O, (1) min u:V →R x,y∈V over all graph functions u : V → R subject to a constraint given by the labels on O, see, e.g., [2,22,35,38]. This method is equivalent to solving the p-Laplacian partial differential equations on graphs [19] and therewith introduces a certain amount of smoothness of the labeling function. Furthermore, continuum limits of this model as the number of unlabeled data tends to infinity were studied using tools from Γ convergence [22,35,36] and PDEs [14–16] (see Sect. 1.2 for more details). Still, p-Laplacian regularization comes with the drawback that it is ill-posed if p is smaller than the ambient space dimension in the sense that the obtained solutions tend to be an average of the label values rather than properly incorporating the information. Extensive studies of this problem were carried out in [35,36]. To overcome this degeneracy, there are several options: In [16] it was investigated at which rates the number of labeled data has to grow to obtain a well-posed problem for p = 2 in (1). In [15] it was suggested to replace the pointwise constraint u = g on O with measure-valued source terms for the graph Laplacian equation. In contrast, in [2] the authors propose to consider the p-Laplacian regularization for large p. In order to have well-posedness for general space dimensions, one therefore considers the limit p → ∞ which leads to the so-called Lipschitz learning problem min max ω(x, y) |u(x) − u(y)| u:V →R x,y∈V 123 subject to u = g on O. (2) Foundations of Computational Mathematics While in the case p < ∞ one has the unique existence of solutions and equivalence of the p-Laplacian PDE and the energy minimization task, these properties are lost in the case p = ∞. One distinguished continuum model—in the sense that it admits unique solutions—connected to this problem is absolutely minimizing Lipschitz extensions and the associated infinity Laplacian equation (see, e.g., [3–5,20,27,34]). Using the concept of viscosity solutions, in [14] a convergence result on continuum limits for the infinity Laplacian equation on the flat torus was established, see again Sect. 1.2 for more details. Still, in [30] the authors suggest that other Lipschitz extensions (next to the absolutely minimizing) are indeed relevant for machine learning tasks but a rigorous continuum limit for general Lipschitz extensions has been pending. The main goal of this paper is to derive a continuum limit for the Lipschitz learning problems (2) to which end we prove Γ -convergence and compactness of the functional in (2). We investigate novel smoothness conditions on the underlying domain which are special for this L ∞ -variational problem and originate from the discrepancy between the maximum local Lipschitz constant and the global one. We apply our results to minimizers of a Rayleigh quotient involving the L ∞ -norm of the gradient as first examined in [13]. The concrete outline of this paper can be found in Sect. 1.3. 1.1 Assumptions and Main Result Let Ω ⊂ Rd , d ∈ N, be an open and bounded domain, and let Ωn ⊂ Ω for n ∈ N denote a sequence of finite subsets. For each n ∈ N we consider the finite weighted graph (Ωn , ωn ), where ωn : Ωn × Ωn → [0, ∞) is a weighting function which in our context is given as ωn (x, y) := ηsn (|x − y|) := η(|x − y| /sn ). Here η : [0, ∞) → [0, ∞) denotes the kernel and sn > 0 the scaling parameter. The edge set of the graph is implicitly characterized via the weighting function, i.e., for x, y (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007/s10208-022-09557-9.pdf
Article home page: https://link.springer.com/article/10.1007/s10208-022-09557-9

Roith, Tim, Bungert, Leon. Continuum Limit of Lipschitz Learning on Graphs, Foundations of Computational Mathematics, 2022, pp. 1-39, DOI: 10.1007/s10208-022-09557-9