Continuum Limit of Lipschitz Learning on Graphs
Foundations of Computational Mathematics
https://doi.org/10.1007/s10208-022-09557-9
Continuum Limit of Lipschitz Learning on Graphs
Tim Roith1 · Leon Bungert2
Received: 14 January 2021 / Revised: 29 November 2021 / Accepted: 9 December 2021
© The Author(s) 2022
Abstract
Tackling semi-supervised learning problems with graph-based methods has become a
trend in recent years since graphs can represent all kinds of data and provide a suitable
framework for studying continuum limits, for example, of differential operators. A
popular strategy here is p-Laplacian learning, which poses a smoothness condition
on the sought inference function on the set of unlabeled data. For p < ∞ continuum
limits of this approach were studied using tools from Γ -convergence. For the case
p = ∞, which is referred to as Lipschitz learning, continuum limits of the related
infinity Laplacian equation were studied using the concept of viscosity solutions. In
this work, we prove continuum limits of Lipschitz learning using Γ -convergence. In
particular, we define a sequence of functionals which approximate the largest local
Lipschitz constant of a graph function and prove Γ -convergence in the L ∞ -topology
to the supremum norm of the gradient as the graph becomes denser. Furthermore,
we show compactness of the functionals which implies convergence of minimizers.
In our analysis we allow a varying set of labeled data which converges to a general
closed set in the Hausdorff distance. We apply our results to nonlinear ground states,
i.e., minimizers with constrained L p -norm, and, as a by-product, prove convergence
of graph distance functions to geodesic distance functions.
Keywords Lipschitz learning · Graph-based semi-supervised learning · Continuum
limit · Gamma-convergence · Ground states · Distance functions
Communicated by Alan Edelman.
B Leon Bungert
Tim Roith
1
Department Mathematik, Friedrich-Alexander Universität Erlangen-Nürnberg, Cauerstraße 11,
91058 Erlangen, Germany
2
Hausdorff Center for Mathematics, Rheinische Friedrich-Wilhelms-Universität Bonn,
Endenicher Allee 62, Villa Maria, 53115 Bonn, Germany
123
Foundations of Computational Mathematics
Mathematics Subject Classification 35J20 · 35R02 · 65N12 · 68T05
1 Introduction
Several works in mathematical data science and machine learning have proven the
importance of semi-supervised learning as an essential tool for data analysis, see
[17,38–41]. Many classification tasks and problems in image analysis (see, e.g., [17]
for an overview) traditionally require an expert examining the data by hand, and this
so-called labeling process is often a time-consuming and expensive task. In contrast,
one typically faces an abundance of unlabeled data which one would also like to equip
with suitable labels. This is the key goal of the semi-supervised learning problem
which mathematically can be formulated as the extension of a labeling function
g:O→R
onto the whole data set V := V ∪ O, where O denotes the set of labeled and V the
set of unlabeled data. In most cases, the underlying data can be represented as a finite
weighted graph (V , ω)—composed of vertices V and a weight function ω assigning similarity values to pairs of vertices—which provides a convenient mathematical
framework. A popular method to generate a unique extension of the labeling function
to the whole data set is so- called p-Laplacian regularization, which can be formulated
as minimization task
ω(x, y) p |u(x) − u(y)| p , subject to u = g on O,
(1)
min
u:V →R
x,y∈V
over all graph functions u : V → R subject to a constraint given by the labels on O,
see, e.g., [2,22,35,38]. This method is equivalent to solving the p-Laplacian partial
differential equations on graphs [19] and therewith introduces a certain amount of
smoothness of the labeling function. Furthermore, continuum limits of this model
as the number of unlabeled data tends to infinity were studied using tools from Γ convergence [22,35,36] and PDEs [14–16] (see Sect. 1.2 for more details).
Still, p-Laplacian regularization comes with the drawback that it is ill-posed if p
is smaller than the ambient space dimension in the sense that the obtained solutions
tend to be an average of the label values rather than properly incorporating the information. Extensive studies of this problem were carried out in [35,36]. To overcome
this degeneracy, there are several options: In [16] it was investigated at which rates
the number of labeled data has to grow to obtain a well-posed problem for p = 2
in (1). In [15] it was suggested to replace the pointwise constraint u = g on O with
measure-valued source terms for the graph Laplacian equation. In contrast, in [2] the
authors propose to consider the p-Laplacian regularization for large p. In order to
have well-posedness for general space dimensions, one therefore considers the limit
p → ∞ which leads to the so-called Lipschitz learning problem
min max ω(x, y) |u(x) − u(y)|
u:V →R x,y∈V
123
subject to u = g on O.
(2)
Foundations of Computational Mathematics
While in the case p < ∞ one has the unique existence of solutions and equivalence of
the p-Laplacian PDE and the energy minimization task, these properties are lost in the
case p = ∞. One distinguished continuum model—in the sense that it admits unique
solutions—connected to this problem is absolutely minimizing Lipschitz extensions
and the associated infinity Laplacian equation (see, e.g., [3–5,20,27,34]). Using the
concept of viscosity solutions, in [14] a convergence result on continuum limits for
the infinity Laplacian equation on the flat torus was established, see again Sect. 1.2
for more details. Still, in [30] the authors suggest that other Lipschitz extensions (next
to the absolutely minimizing) are indeed relevant for machine learning tasks but a
rigorous continuum limit for general Lipschitz extensions has been pending.
The main goal of this paper is to derive a continuum limit for the Lipschitz learning
problems (2) to which end we prove Γ -convergence and compactness of the functional
in (2). We investigate novel smoothness conditions on the underlying domain which are
special for this L ∞ -variational problem and originate from the discrepancy between
the maximum local Lipschitz constant and the global one. We apply our results to
minimizers of a Rayleigh quotient involving the L ∞ -norm of the gradient as first
examined in [13]. The concrete outline of this paper can be found in Sect. 1.3.
1.1 Assumptions and Main Result
Let Ω ⊂ Rd , d ∈ N, be an open and bounded domain, and let Ωn ⊂ Ω for n ∈ N
denote a sequence of finite subsets. For each n ∈ N we consider the finite weighted
graph (Ωn , ωn ), where ωn : Ωn × Ωn → [0, ∞) is a weighting function which in
our context is given as
ωn (x, y) := ηsn (|x − y|) := η(|x − y| /sn ).
Here η : [0, ∞) → [0, ∞) denotes the kernel and sn > 0 the scaling parameter. The
edge set of the graph is implicitly characterized via the weighting function, i.e., for
x, y (...truncated)