Learning Elliptic Partial Differential Equations with Randomized Linear Algebra
Foundations of Computational Mathematics
https://doi.org/10.1007/s10208-022-09556-w
Learning Elliptic Partial Differential Equations
with Randomized Linear Algebra
Nicolas Boullé1 · Alex Townsend2
Received: 31 January 2021 / Revised: 18 November 2021 / Accepted: 20 November 2021
© The Author(s) 2022
Abstract
Given input–output pairs of an elliptic partial differential equation (PDE) in three
dimensions, we derive the first theoretically rigorous scheme for learning the associated Green’s function G. By exploiting the hierarchical low-rank structure of G,
we show that one can construct an approximant to G that converges almost surely
−1/2
log3 (1/)) using at most O( −6 log4 (1/))
and achieves a relative error of O(Γ
input–output training pairs with high probability, for any 0 < < 1. The quantity
0 < Γ ≤ 1 characterizes the quality of the training dataset. Along the way, we
extend the randomized singular value decomposition algorithm for learning matrices
to Hilbert–Schmidt operators and characterize the quality of covariance kernels for
PDE learning.
Keywords Data-driven discovery of PDEs · Randomized SVD · Green’s function ·
Hilbert–Schmidt operators · Low-rank approximation
Mathematics Subject Classification 65N80 · 35J08 · 35R30 · 60G15 · 65F55
Communicated by Arieh Iserles.
This work is supported by the EPSRC Centre for Doctoral Training in Industrially Focused Mathematical
Modelling (EP/L015803/1) in collaboration with Simula Research Laboratory and by the National
Science Foundation Grants DMS-1818757, DMS-1952757, and DMS-2045646.
B Nicolas Boullé
Alex Townsend
1
Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
2
Department of Mathematics, Cornell University, Ithaca, NY 14853, USA
123
Foundations of Computational Mathematics
1 Introduction
Can one learn a differential operator from pairs of solutions and righthand sides?
If so, how many pairs are required? These two questions have received significant
research attention [17,31,34,43]. From data, one hopes to eventually learn physical
laws of nature or conservation laws that elude scientists in the biological sciences [63],
computational fluid dynamics [49], and computational physics [45]. The literature
contains many highly successful practical schemes based on deep learning techniques
[38,48]. However, the challenge remains to understand when and why deep learning
is effective theoretically. This paper describes the first theoretically justified scheme
for discovering scalar-valued elliptic partial differential equations (PDEs) in three
variables from input–output data and provides a rigorous learning rate. While our
novelties are mainly theoretical, we hope to motivate future practical choices in PDE
learning.
We suppose that there is an unknown second-order uniformly elliptic linear PDE
operator1 L : H2 (D) ∩ H01 (D) → L 2 (D) with a bounded domain D ⊂ R3 with
Lipschitz smooth boundary [16], which takes the form
(Lu(x) = −∇ · (A(x)∇u) + c(x) · ∇u + d(x)u, x ∈ D, u|∂ D = 0.
(1)
Here, for every x ∈ D, we have that A(x) ∈ R3×3 is a symmetric positive definite
matrix with bounded coefficient functions so that2 Ai j ∈ L ∞ (D), c ∈ L r (D) with
r ≥ 3, d ∈ L s (D) for s ≥ 3/2, and d(x) ≥ 0 [28]. We emphasize that the regularity
requirements on the variable coefficients are quite weak.
The goal of PDE learning is to discover the operator L from N ≥ 1 input–output
pairs, i.e., {( f j , u j )} Nj=1 , where Lu j = f j and u j |∂ D = 0 for 1 ≤ j ≤ N . There
are two main types of PDE learning tasks: (1) Experimentally determined input–
output pairs, where one must do the best one can with the predetermined information
and (2) algorithmically determined input–output pairs, where the data-driven learning
algorithm can select f 1 , . . . , f N for itself. In this paper, we focus on the PDE learning
task where we have algorithmically determined input–output pairs. In particular, we
suppose that the functions f 1 , . . . , f N are generated at random and are drawn from a
Gaussian process (GP) (see Sect. 2.3). To keep our theoretical statements manageable,
we restrict our attention to PDEs of the form:
Lu = −∇ · (A(x)∇u) , x ∈ D, u|∂ D = 0.
(2)
Lower-order terms in Eq. (1) should cause few theoretical problems [3], though our
algorithm and our bounds get far more complicated.
1 Here, L 2 (D) is the space of square-integrable functions defined on D, Hk (D) is the space of k times
weakly differentiable functions in the L 2 -sense, and H01 (D) is the closure of Cc∞ (D) in H1 (D). Here,
Cc∞ (D) is the space of infinitely differentiable compactly supported functions on D. Roughly speaking,
H01 (D) are the functions in H1 (D) that are zero on the boundary of D.
2 For 1 ≤ r ≤ ∞, we denote by L r (D) the space of functions defined on the domain D with finite L r norm,
where f r = ( D | f |r dx)1/r if r < ∞, and f ∞ = inf{C > 0 : | f (x)| ≤ C for almost every x ∈ D}.
123
Foundations of Computational Mathematics
The approach that dominates the PDE learning literature is to directly learn L
by either (1) learning parameters in the PDE [4,64], (2) using neural networks to
approximate the action of the PDE on functions [45–49], or (3) deriving a model
by composing a library of operators with sparsity considerations [9,35,52,53,59,60].
Instead of trying to learn the unbounded, closed operator L directly, we follow [6,17,18]
and discover the Green’s function associated with L. That is, we attempt to learn the
function G : D × D → R+ ∪ {∞} such that [16]
u j (x) =
G(x, y) f j (y)dy,
x ∈ D,
1 ≤ j ≤ N.
(3)
D
Seeking G, as opposed to L, has several theoretical benefits:
1. The integral operator in Eq. (3) is compact [15], while L is only closed [14]. This
allows G to be rigorously learned by input–output pairs {( f j , u j )} Nj=1 , as its range
can be approximated by finite-dimensional spaces (see Theorem 3).
2. It is known that G has a hierarchical low-rank structure
k [3, Theorem 2.8]: for
0 < < 1, there exists a function G k (x, y) =
j=1 g j (x)h j (y) with k =
4
O(log (1/)) such that [3, Theorem 2.8]
G − G k L 2 (X ×Y ) ≤ G L 2 (X ×Ŷ ) ,
where X , Y ⊆ D are sufficiently separated domains, and Y ⊆ Ŷ ⊆ D denotes a
larger domain than Y (see Theorem 4 for the definition). The further apart X and
Y , the faster the singular values of G decay. Moreover, G also has an off-diagonal
decay property [19,25]:
G(x, y) ≤
c
G L 2 (D×D) ,
x−y 2
x = y ∈ D,
where c is a constant independent of x and y. Exploiting these structures of G
leads to a rigorous algorithm for constructing a global approximant to G (see Sect.
4).
3. The function G is smooth away from its diagonal, allowing one to efficiently
approximate it [19].
Once a global approximation G̃ has been constructed for G using input–output pairs,
given a new righthand side f one can directly compute the integral in Eq. (3) to obtain
the corresponding solution u to Eq. (1). Usually, numerically computing the integral
in Eq. (3) must be (...truncated)