Learning Interaction Kernels in Stochastic Systems of Interacting Particles from Multiple Trajectories
Foundations of Computational Mathematics
https://doi.org/10.1007/s10208-021-09521-z
Learning Interaction Kernels in Stochastic Systems of
Interacting Particles from Multiple Trajectories
Fei Lu1 · Mauro Maggioni2 · Sui Tang3
Received: 2 August 2020 / Revised: 16 April 2021 / Accepted: 19 April 2021
© The Author(s) 2021
Abstract
We consider stochastic systems of interacting particles or agents, with dynamics determined by an interaction kernel, which only depends on pairwise distances. We study
the problem of inferring this interaction kernel from observations of the positions of
the particles, in either continuous or discrete time, along multiple independent trajectories. We introduce a nonparametric inference approach to this inverse problem,
based on a regularized maximum likelihood estimator constrained to suitable hypothesis spaces adaptive to data. We show that a coercivity condition enables us to control
the condition number of this problem and prove the consistency of our estimator, and
that in fact it converges at a near-optimal learning rate, equal to the min–max rate
of one-dimensional nonparametric regression. In particular, this rate is independent
of the dimension of the state space, which is typically very high. We also analyze
the discretization errors in the case of discrete-time observations, showing that it is
of order 1/2 in terms of the time spacings between observations. This term, when
large, dominates the sampling error and the approximation error, preventing convergence of the estimator. Finally, we exhibit an efficient parallel algorithm to construct
the estimator from data, and we demonstrate the effectiveness of our algorithm with
numerical tests on prototype systems including stochastic opinion dynamics and a
Lennard-Jones model.
Keywords Inverse problems · Interacting particle systems · Statistical and machine
learning
Mathematics Subject Classification 70F17 · 62G05 · 62M05
Communicated by Hans Munthe Kaas.
Extended author information available on the last page of the article
123
Foundations of Computational Mathematics
1 Introduction
We consider a system of particles or agents interacting in a random environment, with
their motion described by a first-order stochastic differential equation in the form
1
φ(x j,t − x i,t )(x j,t − x i,t )dt + σ d B i,t , for i = 1, . . . , N ,
N
N
dx i,t =
i =1
(1.1)
where x i,t ∈ Rd represents the position of particle i at time t, φ : R+ → R is an
interaction kernel dependent on the pairwise distance between particles, and B t is a
standard Brownian motion in R N d , with σ > 0 representing the scale of the random
noise. This is a gradient system, with the energy potential Vφ : R N d → R
Vφ (X t ) =
1
Φ(x i,t − x i ,t )
2N
with Φ (r ) = φ(r )r ,
(1.2)
i,i
where X t = (x i,t )i=1,...,N ∈ Rd N is the state of the system. Letting
f φ := −∇Vφ ,
(1.3)
we can write Eq.(1.1) in vector format as
dX t = f φ (X t )dt + σ d B t .
(1.4)
The particles interact with each other based on their pairwise distance, with dissipation
of the total energy, with the system tending to a stable point of the energy potential,
while the random noise injects energy to the system.
Such systems of interacting particles arise in a wide variety of disciplines, including interacting physical particles [22,49] or granular media [1–3,8,12,13] in Physics,
opinion aggregation on interacting networks in Social Science [24,43,46], and Monte
Carlo sampling [36,39], to name just a few.
Motivated by these applications, the inference of such systems from data gains
increasing attention. For deterministic multi-particle systems, various types of learning techniques have been developed (see, e.g., [9,14,40,41,50,55] and the reference
therein). When it comes to stochastic multi-particle systems, only a few efforts have
been made, e.g., learning reduced Langevin equations on manifolds in [19] (without, however, assuming nor exploiting the structure of pairwise interactions), learning
parametric potential functions in [10,15] from single trajectory data, estimating the
diffusion parameter in [26], and estimating effective Langevin equations on manifolds
in [19].
Our goal is to estimate the interaction kernel φ given discrete-time observation data
(m) M
(m) M
, where the initial conditions {X t0 }m=1
are independent
from trajectories {X t0 :t L }m=1
123
Foundations of Computational Mathematics
samples drawn from a distribution μ0 on Rd N , and t0 : t L indicates times 0 = t0 <
t1 < · · · < tl < · · · < t L = T , with with tl = lΔt.
Since, in general, little information about the analytical form of the kernel is available, we infer it in a nonparametric fashion (e.g., [6,20,23]). We note that the problem
we consider is to learn a latent function in the drift term given observations from
multiple trajectories, which is different from the ample literature on the inference
of stochastic differential equations (see, e.g., [29,34]), focusing either on parameter
estimation or on inference for ergodic system. In particular, our learning approach is
close in spirit to the nonparametric regression of the drift studied in [44] for ergodic
system and in [17] from i.i.d paths. However, for systems of interacting particles one
faces the curse of dimensionality when learning the high-dimensional drift directly as
a general function on the high-dimensional state space Rd N . Instead, we will exploit
the structure of the system and learn the latent interaction kernel in the drift, which
only depends on pairwise distances, and show that the curse of dimensionality may
be avoided, when such inverse problem is well-conditioned.
We introduce a maximum likelihood estimator (MLE), along with an efficient algorithm that can be implemented in parallel over trajectories, with an hypothesis space
adaptive to data to reach optimal accuracy. Under a coercivity condition, we prove that
the MLE is consistent and converges at the min–max rate for one-dimensional nonparametric regression. We also analyze the discretization errors due to discrete-time
observations: we show it leads to an error in the estimator that is of order Δt 1/2 (with
Δt = T /L = tl+1 − tl ), and as a result, it prevents us from obtaining the min–max
learning rate in sample size. We demonstrate the effectiveness of our algorithm by
numerical tests on prototype systems including opinion dynamics and a stochastic
Lennard-Jones model (see Sect. 5). Numerical results verify our learning theory in
the sense that the min–max rate of convergence is achieved, and the bias due to the
numerical error is close to the order Δt 1/2 .
1.1 Overview of the Main Results
We consider an approximate maximum likelihood estimator (MLE), which is the
maximizer of the approximate likelihood of the observed trajectories, over a suitable
hypothesis space H:
L,T ,M,H = arg min E L,T ,M (ϕ),
φ
ϕ∈H
where E L,T ,M (ϕ) is an approximation of the negative log-likelihood of the discrete data
(m) M
. Using t (...truncated)