Wasserstein Riemannian Geometry on Statistical Manifold
I NTERNATIONAL E LECTRONIC J OURNAL OF G EOMETRY
V OLUME 13 N O . 2 PAGE 144–151 (2020)
DOI: HTTPS :// DOI . ORG /10.36890/ IEJG .689702
Wasserstein Riemannian Geometry on
Statistical Manifold
Carlos Ogouyandjou* and Nestor Wadagni
(Communicated by Murat Tosun)
A BSTRACT
In this paper, we study some geometric properties of statistical manifold equipped with the
Riemannian Otto metric which is related to the L2 -Wasserstein distance of optimal mass transport.
We construct some α-connections on such manifold and we prove that the proposed connections are
torsion-free and coincide with the Levi-Civita connection when α = 0. In addition, the exponentialy
families and the mixture families are shown to be respectively (1)-flat and (−1)-flat.
Keywords: Statistical manifold; Riemannian metric; Otto metric; α-connections; Wasserstein Riemannian space; flatness.
AMS Subject Classification (2020): 15B48 ; 53C23; 53C25 ; 60D05.
1. Introduction
Information geometry started as the investigation of the differential geometric stucture of some set of
probability distributions which constitutes a statistical manifold. Since the seminal work of Rao [11] where
Fisher information geometry is viewed as a Riemannian metric on a space of probability distributions, it
became obvious that as differentiable manifold, a space of probability distributions can be equipped with
a multitude of Riemannian metrics that are not necessarily the Fisher metric. Considering the Riemannian
structure obtained by the Fisher information on a statistical manifold, Amari [2] defines a one-parameter
family of affine connections called α-connections. Hence α-connections have become key tools in information
geometry and have been widely investigated by several authors such as Gbaguidi et al. [7] who constructed a
family of α-connections on a Hilbert bundle of generalized statistical manifold.
In this paper we are interested in statistical manifold equipped with the Wasserstein metric which is
related to optimal transport. Kantorovich and Rubinstein [8] stated that the Wasserstein metric can be taken
as a reasonable distance on spaces of random variables or of probability distributions. However, explicit
calculations based on that metric seems to be somewhat difficult to perform. Lott [9] showed that the
Riemannian Otto metric related to Wasserstein metric makes the calculations on Wasserstein space easier. We
make use of the Otto metric to investigate the Wasserstein Riemannian geometry on statistical manifold.
Let M be a set of probability densities endowed with the Otto Riemannian metric. We construct on M a
family ∇(α) of torsion-free α-connections that is exactly the Levi-Civita connection on M when α = 0. We also
find out that the exponential families and the mixture families are respectively (1)-flat and (−1)-flat. The rest of
the paper is organized as follows: we recall some preliminaries on α-connections in section 2, and we present
useful results on Otto metric and Wasserstein metric in section 3. Finally, the main results are given in section
4.
2. Preliminary remarks on α-connections
For some integer d ≥ 1, let X be a non-empty subset of Rd and M be a family of probability distributions on
X . Each element of M, can be identified with θ = (θ1 , · · · , θn ) ∈ Θ a subset of Rn and the mapping θ 7→ pθ is
Received : 15-February-2020, Accepted : 16-August-2020
* Corresponding author
C. Ogouyandjou and N. Wadagni
injective. M is a C ∞ differentiable manifold.
Example 2.1. X = R, n = 2, θ = (µ, σ), Θ = {(µ, σ) : µ ∈ R, σ ∈ R∗+ }
1
(x − µ)2
p(x, θ) = √ exp −
2σ 2
σ 2π
Put `(.; θ) = log p(., θ). ∂`(.;θ)
∂θ i for i = 1, · · · , n are the scores functions.
˜ θ (M), the vector space spanned by ∂`(x;θ)
The tangent space >θ (M) can be identifed with >
∂θ i , and endowed with
the inner product hX̃, Ỹ iθ = Eθ [X̃ Ỹ ]. The mapping
X
X ∂`(x; θ)
∂
ai i 7→
ai
∂θ
∂θi
i
i
˜ θ (M), (see[12]).
defines an isometry between >θ M and >
Definition 2.1. The Fisher information metric
The Fisher information matrix of M at θ is the n × n matrix G(θ) = (g̃ij (θ)) defined by :
Z
g̃ij (θ) := Eθ [∂i `(X, θ)∂j `(X, θ)] =
∂i `(x, θ)∂j `(x, θ)p(x; θ)dx
X
where ∂i := ∂θ∂ i and `(x, θ) = log p(x; θ). In particular, when n = 1, we call this the Fisher information.
The inner product of the natural basis of the coordinate system (θ1 , · · · , θn )
h∂i , ∂j i = g̃ij
uniquely determines a Riemannian metric g̃ = h·, ·i such that for all θ ∈ Θ, and for all X, Y ∈ >θ M; g̃θ (X, Y ) =
hX, Y iθ = Eθ [(X`)(Y `)]. g̃ is called Fisher metric or alternatively, the information metric.
Definition 2.2. An affine connection ∇ on a differentiable manifold M is a mapping
∇ : X (M) × X (M) → X (M)
which is denoted by (X, Y ) → ∇X Y and which satisfies the following properties:
• ∇f X+gY Z = f ∇X Z + g∇Y Z
• ∇X (Y + Z) = ∇X Y + ∇X Z
• ∇X (f Y ) = f ∇X Y + X(f )Y in which X, Y, Z ∈ X (M) and f, g ∈ C ∞ (M).
Theorem 2.1. [6] Given a Riemannian manifold (M, g), there exists a unique affine connection ∇ on M satisfing the
conditions:
• ∇ is symmetric.
• ∇ is compatible with the Riemannian metric g.
This affine connection is the Levi-Civita connection on the manifold (M, g).
◦k
In a coordinate system (U, θ), the function Γij defined on U by ∇∂i ∂j =
symbol of the the Levi-Civita connection and we have
◦k
∂gij
1 ∂gjm
∂gmi
Γij =
+
− m g mk .
2
∂θi
∂θj
∂θ
k
k Γij ∂k are called the Christoffel
P
(2.1)
(α)
Amari[2] considers the function Γij,k which maps each point θ to the following value:
1−α
(α)
Γij,k := Eθ
∂i ∂j `(X, θ) +
∂i `(X, θ)∂j `(X, θ) (∂k `(X, θ))
2
θ
where α is some arbitrary real number. The α-connection ∇(α) ,which is an affine connection, is defined by
(α)
(α)
h∇∂i ∂j , ∂k i = Γij,k ,
(α)
where g = h·, ·i is the Fisher metric and ∇∂i ∂j is the α covariant derivative of ∂j in the direction of ∂i .
Next, we recall some important results on the Otto metric which is a Riemannian metric on the Wasserstein
space.
145
www.iejgeo.com
Wasserstein Riemannian Geometry on Statistical Manifold
3. Otto metric
3.1. Wasserstein metric
Let (X , µ) and (Y, ν) be two probability spaces. A coupling of (µ, ν) is a random vector (X, Y ) such that the
law of X is µ and the law of Y is ν . By abuse of language, the law of (X, Y ) is also called a coupling of (µ, ν).
We denote by Π(µ, ν) the set of coupling of (µ, ν).
Definition 3.1. Let X be a subset of Rn , n ∈ N∗ and let p ∈ [1; ∞[. For any two probability measures µ, ν on X ,
the Wasserstein distance of order p between µ and ν is defined by:
Wp (µ, ν) =
1/p
kx − yk dπ(x, y)
.
Z
p
inf
π∈Π(µ,ν)
(3.1)
X
Definition 3.2. Let P (X ) be the set of probability measures on X . The Wasserstein space of order p, p ∈ [1, ∞[
is defined as
Z
p
Pp (X ) = µ ∈ P (X );
kxk dµ(x) < +∞ .
(3.2)
X
Wp defines a (finite) distance on Pp (X ). For more details on Wasserstein space see [13].
3.2. Otto metric
We consider an n-dimensional regular statistical manifold M = {p(·; θ); θ = (θ1 , · · · (...truncated)