Distributed Learning via Filtered Hyperinterpolation on Manifolds
Foundations of Computational Mathematics (2022) 22:1219–1271
https://doi.org/10.1007/s10208-021-09529-5
Distributed Learning via Filtered Hyperinterpolation on
Manifolds
Guido Montúfar1,2 · Yu Guang Wang2,3,4
Received: 16 July 2020 / Revised: 2 April 2021 / Accepted: 31 May 2021 /
Published online: 12 July 2021
© The Author(s) 2021
Abstract
Learning mappings of data on manifolds is an important topic in contemporary
machine learning, with applications in astrophysics, geophysics, statistical physics,
medical diagnosis, biochemistry, and 3D object analysis. This paper studies the problem of learning real-valued functions on manifolds through filtered hyperinterpolation
of input–output data pairs where the inputs may be sampled deterministically or at
random and the outputs may be clean or noisy. Motivated by the problem of handling
large data sets, it presents a parallel data processing approach which distributes the
data-fitting task among multiple servers and synthesizes the fitted sub-models into a
global estimator. We prove quantitative relations between the approximation quality of
the learned function over the entire manifold, the type of target function, the number
of servers, and the number and type of available samples. We obtain the approximation rates of convergence for distributed and non-distributed approaches. For the
non-distributed case, the approximation order is optimal.
Keywords Distributed learning · Filtered hyperinterpolation · Approximation on
manifolds · Kernel methods · Numerical integration on manifolds · Quadrature rule ·
Random sampling · Gaussian white noise
Communicated by Frances Kuo.
Guido Montúfar and Yu Guang Wang acknowledge the support of funding from the European Research
Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant
Agreement No. 757983). Yu Guang Wang also acknowledges support from the Australian Research
Council under Discovery Project DP180100506. This material is based upon work supported by the
National Science Foundation under Grant No. DMS-1439786 while the authors were in residence at the
Institute for Computational and Experimental Research in Mathematics in Providence, RI, during the
Collaborate@ICERM on “Geometry of Data and Networks.” Part of this research was performed while
the authors were at the Institute for Pure and Applied Mathematics (IPAM), which is supported by the
National Science Foundation (Grant No. DMS-1440415).
Extended author information available on the last page of the article
123
1220
Foundations of Computational Mathematics (2022) 22:1219–1271
Mathematics Subject Classification 68W15 · 58C05 · 65D05 · 68Q32 · 42C15 ·
65T60 · 41A50
1 Introduction
Learning functions over manifolds has become an increasingly important topic in
machine learning. The performance of many machine learning algorithms depends
strongly on the geometry of the data. In real-world applications, one often has huge
data sets with noisy samples. In this paper, we propose distributed filtered hyperinterpolation on manifolds, which combines filtered hyperinterpolation and distributed
learning [22,23]. Filtered hyperinterpolation [31,38] provides a constructive approach
to modeling mappings between inputs and outputs in a way that can reduce the influence of noise. The distributed strategy assigns the learning task of the input–output
mapping to multiple local servers, enabling parallel computing for massive data sets.
Each server handles a small fraction of all data by filtered hyperinterpolation. It then
synthesizes the local estimators as a global estimator. We show the precise quantitative
relation between the approximation error of the distributed filtered hyperinterpolation,
the number of the local servers, and the amount of data. The approximation error (over
the entire manifold) converges to zero provided the available amount of data increases
sufficiently fast with the number of servers.
Filtered hyperinterpolation was introduced by Sloan and Womersley [31] on the
two-sphere S2 , which is a form of filtered polynomial approximation method motivated by hyperinterpolation [29]. Hyperinterpolation uses a Fourier expansion where
the integral for the Fourier coefficients is approximated by numerical integration with
a quadrature rule. The filtered hyperinterpolation adopts a similar strategy as hyperinterpolation but uses a filter to modify the Fourier expansion. The filter is a restriction
on the eigenvalues of the basis functions. Effectively this restricts the capacity of the
approximation class and yields a reproducing property for polynomials of a certain
degree specified by the filter. It has some similarities to kernel methods. Filtering
improves the approximation accuracy of plain hyperinterpolation for noiseless data
that is sampled deterministically [18]. With appropriate choice of filter, the filtered
hyperinterpolation achieves the best approximation by polynomials of a given degree
depending on the amount of data (see Sect. 3.1). As shown in the left part of Fig. 1,
one aims at finding the closest approximation of f ∗ within the polynomial space Πn
on the manifold M, which, nevertheless, is difficult to achieve. The filtered hyperinN which lies
terpolation is an approximator VD,n constructed from data D = (xi , yi )i=1
∗
in a slightly larger polynomial space Π2n and whose distance to f is very close to
the distance between f ∗ and Πn .
Motivated by the problem of handling massive amounts of data, we propose a
distributed computational strategy based on filtered hyperinterpolation. As shown in
the right part of Fig. 1, we can split estimation task of filtered hyperinterpolation into
multiple servers j = 1, . . . , m, each of which computes a filtered hyperinterpolation
VD j ,n , for a small subset D j of all the training data. It consists of creating a filtered
expansion in terms of eigenfunctions of the manifold to best-fit the corresponding
fraction of the training data set. The “best-fit” means that the local servers can achieve
123
Foundations of Computational Mathematics (2022) 22:1219–1271
1221
best approximation for noisy data yi = f ∗ (xi ) + i , i = 1, . . . , N , for any continuous
function f ∗ : M → R on the manifold and independent bounded noise i . The central
processor then takes a weighted average of the filtered hyperinterpolations obtained in
(m)
. We call the global estimator
the local servers to synthesize as a global estimator VD,n
the distributed filtered hyperinterpolation.
The remaining of the paper is organized as follows. In Sect. 2, we introduce the main
mathematical settings and notation. Then, we proceed with the study of non-distributed
and distributed filtered hyperinterpolation on manifolds, for which we derive upper
bounds on the error. Our bounds depend on (1) the dimension d of the manifold and
the smoothness r of the Sobolev space that contains the target function, (2) the degree
n of the approximating polynomials, which (...truncated)