Distributed Learning via Filtered Hyperinterpolation on Manifolds (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-021-09529-5.pdf

Distributed Learning via Filtered Hyperinterpolation on Manifolds

Foundations of Computational Mathematics (2022) 22:1219–1271 https://doi.org/10.1007/s10208-021-09529-5 Distributed Learning via Filtered Hyperinterpolation on Manifolds Guido Montúfar1,2 · Yu Guang Wang2,3,4 Received: 16 July 2020 / Revised: 2 April 2021 / Accepted: 31 May 2021 / Published online: 12 July 2021 © The Author(s) 2021 Abstract Learning mappings of data on manifolds is an important topic in contemporary machine learning, with applications in astrophysics, geophysics, statistical physics, medical diagnosis, biochemistry, and 3D object analysis. This paper studies the problem of learning real-valued functions on manifolds through filtered hyperinterpolation of input–output data pairs where the inputs may be sampled deterministically or at random and the outputs may be clean or noisy. Motivated by the problem of handling large data sets, it presents a parallel data processing approach which distributes the data-fitting task among multiple servers and synthesizes the fitted sub-models into a global estimator. We prove quantitative relations between the approximation quality of the learned function over the entire manifold, the type of target function, the number of servers, and the number and type of available samples. We obtain the approximation rates of convergence for distributed and non-distributed approaches. For the non-distributed case, the approximation order is optimal. Keywords Distributed learning · Filtered hyperinterpolation · Approximation on manifolds · Kernel methods · Numerical integration on manifolds · Quadrature rule · Random sampling · Gaussian white noise Communicated by Frances Kuo. Guido Montúfar and Yu Guang Wang acknowledge the support of funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 757983). Yu Guang Wang also acknowledges support from the Australian Research Council under Discovery Project DP180100506. This material is based upon work supported by the National Science Foundation under Grant No. DMS-1439786 while the authors were in residence at the Institute for Computational and Experimental Research in Mathematics in Providence, RI, during the Collaborate@ICERM on “Geometry of Data and Networks.” Part of this research was performed while the authors were at the Institute for Pure and Applied Mathematics (IPAM), which is supported by the National Science Foundation (Grant No. DMS-1440415). Extended author information available on the last page of the article 123 1220 Foundations of Computational Mathematics (2022) 22:1219–1271 Mathematics Subject Classification 68W15 · 58C05 · 65D05 · 68Q32 · 42C15 · 65T60 · 41A50 1 Introduction Learning functions over manifolds has become an increasingly important topic in machine learning. The performance of many machine learning algorithms depends strongly on the geometry of the data. In real-world applications, one often has huge data sets with noisy samples. In this paper, we propose distributed filtered hyperinterpolation on manifolds, which combines filtered hyperinterpolation and distributed learning [22,23]. Filtered hyperinterpolation [31,38] provides a constructive approach to modeling mappings between inputs and outputs in a way that can reduce the influence of noise. The distributed strategy assigns the learning task of the input–output mapping to multiple local servers, enabling parallel computing for massive data sets. Each server handles a small fraction of all data by filtered hyperinterpolation. It then synthesizes the local estimators as a global estimator. We show the precise quantitative relation between the approximation error of the distributed filtered hyperinterpolation, the number of the local servers, and the amount of data. The approximation error (over the entire manifold) converges to zero provided the available amount of data increases sufficiently fast with the number of servers. Filtered hyperinterpolation was introduced by Sloan and Womersley [31] on the two-sphere S2 , which is a form of filtered polynomial approximation method motivated by hyperinterpolation [29]. Hyperinterpolation uses a Fourier expansion where the integral for the Fourier coefficients is approximated by numerical integration with a quadrature rule. The filtered hyperinterpolation adopts a similar strategy as hyperinterpolation but uses a filter to modify the Fourier expansion. The filter is a restriction on the eigenvalues of the basis functions. Effectively this restricts the capacity of the approximation class and yields a reproducing property for polynomials of a certain degree specified by the filter. It has some similarities to kernel methods. Filtering improves the approximation accuracy of plain hyperinterpolation for noiseless data that is sampled deterministically [18]. With appropriate choice of filter, the filtered hyperinterpolation achieves the best approximation by polynomials of a given degree depending on the amount of data (see Sect. 3.1). As shown in the left part of Fig. 1, one aims at finding the closest approximation of f ∗ within the polynomial space Πn on the manifold M, which, nevertheless, is difficult to achieve. The filtered hyperinN which lies terpolation is an approximator VD,n constructed from data D = (xi , yi )i=1 ∗ in a slightly larger polynomial space Π2n and whose distance to f is very close to the distance between f ∗ and Πn . Motivated by the problem of handling massive amounts of data, we propose a distributed computational strategy based on filtered hyperinterpolation. As shown in the right part of Fig. 1, we can split estimation task of filtered hyperinterpolation into multiple servers j = 1, . . . , m, each of which computes a filtered hyperinterpolation VD j ,n , for a small subset D j of all the training data. It consists of creating a filtered expansion in terms of eigenfunctions of the manifold to best-fit the corresponding fraction of the training data set. The “best-fit” means that the local servers can achieve 123 Foundations of Computational Mathematics (2022) 22:1219–1271 1221 best approximation for noisy data yi = f ∗ (xi ) + i , i = 1, . . . , N , for any continuous function f ∗ : M → R on the manifold and independent bounded noise i . The central processor then takes a weighted average of the filtered hyperinterpolations obtained in (m) . We call the global estimator the local servers to synthesize as a global estimator VD,n the distributed filtered hyperinterpolation. The remaining of the paper is organized as follows. In Sect. 2, we introduce the main mathematical settings and notation. Then, we proceed with the study of non-distributed and distributed filtered hyperinterpolation on manifolds, for which we derive upper bounds on the error. Our bounds depend on (1) the dimension d of the manifold and the smoothness r of the Sobolev space that contains the target function, (2) the degree n of the approximating polynomials, which (...truncated)