Learning Elliptic Partial Differential Equations with Randomized Linear Algebra (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-022-09556-w.pdf

Learning Elliptic Partial Differential Equations with Randomized Linear Algebra

Foundations of Computational Mathematics https://doi.org/10.1007/s10208-022-09556-w Learning Elliptic Partial Differential Equations with Randomized Linear Algebra Nicolas Boullé1 · Alex Townsend2 Received: 31 January 2021 / Revised: 18 November 2021 / Accepted: 20 November 2021 © The Author(s) 2022 Abstract Given input–output pairs of an elliptic partial differential equation (PDE) in three dimensions, we derive the first theoretically rigorous scheme for learning the associated Green’s function G. By exploiting the hierarchical low-rank structure of G, we show that one can construct an approximant to G that converges almost surely −1/2 log3 (1/)) using at most O( −6 log4 (1/)) and achieves a relative error of O(Γ input–output training pairs with high probability, for any 0 < < 1. The quantity 0 < Γ ≤ 1 characterizes the quality of the training dataset. Along the way, we extend the randomized singular value decomposition algorithm for learning matrices to Hilbert–Schmidt operators and characterize the quality of covariance kernels for PDE learning. Keywords Data-driven discovery of PDEs · Randomized SVD · Green’s function · Hilbert–Schmidt operators · Low-rank approximation Mathematics Subject Classification 65N80 · 35J08 · 35R30 · 60G15 · 65F55 Communicated by Arieh Iserles. This work is supported by the EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling (EP/L015803/1) in collaboration with Simula Research Laboratory and by the National Science Foundation Grants DMS-1818757, DMS-1952757, and DMS-2045646. B Nicolas Boullé Alex Townsend 1 Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK 2 Department of Mathematics, Cornell University, Ithaca, NY 14853, USA 123 Foundations of Computational Mathematics 1 Introduction Can one learn a differential operator from pairs of solutions and righthand sides? If so, how many pairs are required? These two questions have received significant research attention [17,31,34,43]. From data, one hopes to eventually learn physical laws of nature or conservation laws that elude scientists in the biological sciences [63], computational fluid dynamics [49], and computational physics [45]. The literature contains many highly successful practical schemes based on deep learning techniques [38,48]. However, the challenge remains to understand when and why deep learning is effective theoretically. This paper describes the first theoretically justified scheme for discovering scalar-valued elliptic partial differential equations (PDEs) in three variables from input–output data and provides a rigorous learning rate. While our novelties are mainly theoretical, we hope to motivate future practical choices in PDE learning. We suppose that there is an unknown second-order uniformly elliptic linear PDE operator1 L : H2 (D) ∩ H01 (D) → L 2 (D) with a bounded domain D ⊂ R3 with Lipschitz smooth boundary [16], which takes the form (Lu(x) = −∇ · (A(x)∇u) + c(x) · ∇u + d(x)u, x ∈ D, u|∂ D = 0. (1) Here, for every x ∈ D, we have that A(x) ∈ R3×3 is a symmetric positive definite matrix with bounded coefficient functions so that2 Ai j ∈ L ∞ (D), c ∈ L r (D) with r ≥ 3, d ∈ L s (D) for s ≥ 3/2, and d(x) ≥ 0 [28]. We emphasize that the regularity requirements on the variable coefficients are quite weak. The goal of PDE learning is to discover the operator L from N ≥ 1 input–output pairs, i.e., {( f j , u j )} Nj=1 , where Lu j = f j and u j |∂ D = 0 for 1 ≤ j ≤ N . There are two main types of PDE learning tasks: (1) Experimentally determined input– output pairs, where one must do the best one can with the predetermined information and (2) algorithmically determined input–output pairs, where the data-driven learning algorithm can select f 1 , . . . , f N for itself. In this paper, we focus on the PDE learning task where we have algorithmically determined input–output pairs. In particular, we suppose that the functions f 1 , . . . , f N are generated at random and are drawn from a Gaussian process (GP) (see Sect. 2.3). To keep our theoretical statements manageable, we restrict our attention to PDEs of the form: Lu = −∇ · (A(x)∇u) , x ∈ D, u|∂ D = 0. (2) Lower-order terms in Eq. (1) should cause few theoretical problems [3], though our algorithm and our bounds get far more complicated. 1 Here, L 2 (D) is the space of square-integrable functions defined on D, Hk (D) is the space of k times weakly differentiable functions in the L 2 -sense, and H01 (D) is the closure of Cc∞ (D) in H1 (D). Here, Cc∞ (D) is the space of infinitely differentiable compactly supported functions on D. Roughly speaking, H01 (D) are the functions in H1 (D) that are zero on the boundary of D. 2 For 1 ≤ r ≤ ∞, we denote by L r (D) the space of functions defined on the domain D with finite L r norm, where f r = ( D | f |r dx)1/r if r < ∞, and f ∞ = inf{C > 0 : | f (x)| ≤ C for almost every x ∈ D}. 123 Foundations of Computational Mathematics The approach that dominates the PDE learning literature is to directly learn L by either (1) learning parameters in the PDE [4,64], (2) using neural networks to approximate the action of the PDE on functions [45–49], or (3) deriving a model by composing a library of operators with sparsity considerations [9,35,52,53,59,60]. Instead of trying to learn the unbounded, closed operator L directly, we follow [6,17,18] and discover the Green’s function associated with L. That is, we attempt to learn the function G : D × D → R+ ∪ {∞} such that [16] u j (x) = G(x, y) f j (y)dy, x ∈ D, 1 ≤ j ≤ N. (3) D Seeking G, as opposed to L, has several theoretical benefits: 1. The integral operator in Eq. (3) is compact [15], while L is only closed [14]. This allows G to be rigorously learned by input–output pairs {( f j , u j )} Nj=1 , as its range can be approximated by finite-dimensional spaces (see Theorem 3). 2. It is known that G has a hierarchical low-rank structure k [3, Theorem 2.8]: for 0 < < 1, there exists a function G k (x, y) = j=1 g j (x)h j (y) with k = 4 O(log (1/)) such that [3, Theorem 2.8] G − G k L 2 (X ×Y ) ≤ G L 2 (X ×Ŷ ) , where X , Y ⊆ D are sufficiently separated domains, and Y ⊆ Ŷ ⊆ D denotes a larger domain than Y (see Theorem 4 for the definition). The further apart X and Y , the faster the singular values of G decay. Moreover, G also has an off-diagonal decay property [19,25]: G(x, y) ≤ c G L 2 (D×D) , x−y 2 x = y ∈ D, where c is a constant independent of x and y. Exploiting these structures of G leads to a rigorous algorithm for constructing a global approximant to G (see Sect. 4). 3. The function G is smooth away from its diagonal, allowing one to efficiently approximate it [19]. Once a global approximation G̃ has been constructed for G using input–output pairs, given a new righthand side f one can directly compute the integral in Eq. (3) to obtain the corresponding solution u to Eq. (1). Usually, numerically computing the integral in Eq. (3) must be (...truncated)