Learning Interaction Kernels in Stochastic Systems of Interacting Particles from Multiple Trajectories (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10208-021-09521-z.pdf

Learning Interaction Kernels in Stochastic Systems of Interacting Particles from Multiple Trajectories

Foundations of Computational Mathematics https://doi.org/10.1007/s10208-021-09521-z Learning Interaction Kernels in Stochastic Systems of Interacting Particles from Multiple Trajectories Fei Lu1 · Mauro Maggioni2 · Sui Tang3 Received: 2 August 2020 / Revised: 16 April 2021 / Accepted: 19 April 2021 © The Author(s) 2021 Abstract We consider stochastic systems of interacting particles or agents, with dynamics determined by an interaction kernel, which only depends on pairwise distances. We study the problem of inferring this interaction kernel from observations of the positions of the particles, in either continuous or discrete time, along multiple independent trajectories. We introduce a nonparametric inference approach to this inverse problem, based on a regularized maximum likelihood estimator constrained to suitable hypothesis spaces adaptive to data. We show that a coercivity condition enables us to control the condition number of this problem and prove the consistency of our estimator, and that in fact it converges at a near-optimal learning rate, equal to the min–max rate of one-dimensional nonparametric regression. In particular, this rate is independent of the dimension of the state space, which is typically very high. We also analyze the discretization errors in the case of discrete-time observations, showing that it is of order 1/2 in terms of the time spacings between observations. This term, when large, dominates the sampling error and the approximation error, preventing convergence of the estimator. Finally, we exhibit an efficient parallel algorithm to construct the estimator from data, and we demonstrate the effectiveness of our algorithm with numerical tests on prototype systems including stochastic opinion dynamics and a Lennard-Jones model. Keywords Inverse problems · Interacting particle systems · Statistical and machine learning Mathematics Subject Classification 70F17 · 62G05 · 62M05 Communicated by Hans Munthe Kaas. Extended author information available on the last page of the article 123 Foundations of Computational Mathematics 1 Introduction We consider a system of particles or agents interacting in a random environment, with their motion described by a first-order stochastic differential equation in the form 1 φ(x j,t − x i,t )(x j,t − x i,t )dt + σ d B i,t , for i = 1, . . . , N , N N dx i,t = i =1 (1.1) where x i,t ∈ Rd represents the position of particle i at time t, φ : R+ → R is an interaction kernel dependent on the pairwise distance between particles, and B t is a standard Brownian motion in R N d , with σ > 0 representing the scale of the random noise. This is a gradient system, with the energy potential Vφ : R N d → R Vφ (X t ) = 1 Φ(x i,t − x i ,t ) 2N with Φ (r ) = φ(r )r , (1.2) i,i where X t = (x i,t )i=1,...,N ∈ Rd N is the state of the system. Letting f φ := −∇Vφ , (1.3) we can write Eq.(1.1) in vector format as dX t = f φ (X t )dt + σ d B t . (1.4) The particles interact with each other based on their pairwise distance, with dissipation of the total energy, with the system tending to a stable point of the energy potential, while the random noise injects energy to the system. Such systems of interacting particles arise in a wide variety of disciplines, including interacting physical particles [22,49] or granular media [1–3,8,12,13] in Physics, opinion aggregation on interacting networks in Social Science [24,43,46], and Monte Carlo sampling [36,39], to name just a few. Motivated by these applications, the inference of such systems from data gains increasing attention. For deterministic multi-particle systems, various types of learning techniques have been developed (see, e.g., [9,14,40,41,50,55] and the reference therein). When it comes to stochastic multi-particle systems, only a few efforts have been made, e.g., learning reduced Langevin equations on manifolds in [19] (without, however, assuming nor exploiting the structure of pairwise interactions), learning parametric potential functions in [10,15] from single trajectory data, estimating the diffusion parameter in [26], and estimating effective Langevin equations on manifolds in [19]. Our goal is to estimate the interaction kernel φ given discrete-time observation data (m) M (m) M , where the initial conditions {X t0 }m=1 are independent from trajectories {X t0 :t L }m=1 123 Foundations of Computational Mathematics samples drawn from a distribution μ0 on Rd N , and t0 : t L indicates times 0 = t0 < t1 < · · · < tl < · · · < t L = T , with with tl = lΔt. Since, in general, little information about the analytical form of the kernel is available, we infer it in a nonparametric fashion (e.g., [6,20,23]). We note that the problem we consider is to learn a latent function in the drift term given observations from multiple trajectories, which is different from the ample literature on the inference of stochastic differential equations (see, e.g., [29,34]), focusing either on parameter estimation or on inference for ergodic system. In particular, our learning approach is close in spirit to the nonparametric regression of the drift studied in [44] for ergodic system and in [17] from i.i.d paths. However, for systems of interacting particles one faces the curse of dimensionality when learning the high-dimensional drift directly as a general function on the high-dimensional state space Rd N . Instead, we will exploit the structure of the system and learn the latent interaction kernel in the drift, which only depends on pairwise distances, and show that the curse of dimensionality may be avoided, when such inverse problem is well-conditioned. We introduce a maximum likelihood estimator (MLE), along with an efficient algorithm that can be implemented in parallel over trajectories, with an hypothesis space adaptive to data to reach optimal accuracy. Under a coercivity condition, we prove that the MLE is consistent and converges at the min–max rate for one-dimensional nonparametric regression. We also analyze the discretization errors due to discrete-time observations: we show it leads to an error in the estimator that is of order Δt 1/2 (with Δt = T /L = tl+1 − tl ), and as a result, it prevents us from obtaining the min–max learning rate in sample size. We demonstrate the effectiveness of our algorithm by numerical tests on prototype systems including opinion dynamics and a stochastic Lennard-Jones model (see Sect. 5). Numerical results verify our learning theory in the sense that the min–max rate of convergence is achieved, and the bias due to the numerical error is close to the order Δt 1/2 . 1.1 Overview of the Main Results We consider an approximate maximum likelihood estimator (MLE), which is the maximizer of the approximate likelihood of the observed trajectories, over a suitable hypothesis space H: L,T ,M,H = arg min E L,T ,M (ϕ), φ ϕ∈H where E L,T ,M (ϕ) is an approximation of the negative log-likelihood of the discrete data (m) M . Using t (...truncated)