Large Deviations of Continuous Regular Conditional Probabilities
Large Deviations of Continuous Regular Conditional Probabilities
W. van Zuijlen 0
0 Mathematical Institute, Leiden University , P.O. Box 9512, 2300 RA Leiden , The Netherlands
We study product regular conditional probabilities under measures of two coordinates with respect to the second coordinate that are weakly continuous on the support of the marginal of the second coordinate. Assuming that there exists a sequence of probability measures on the product space that satisfies a large deviation principle, we present necessary and sufficient conditions for the conditional probabilities under these measures to satisfy a large deviation principle. The arguments of these conditional probabilities are assumed to converge. A way to view regular conditional probabilities as a special case of product regular conditional probabilities is presented. This is used to derive conditions for large deviations of regular conditional probabilities. In addition, we derive a Sanovtype theorem for large deviations of the empirical distribution of the first coordinate conditioned on fixing the empirical distribution of the second coordinate.
(Product) regular conditional kernel; Weakly continuous; Large deviations

P(Xn ∈ A  Yn = yn),
B W. van Zuijlen
where ((Xn, Yn))n∈N is a sequence of couples of random variables that satisfies a
large deviation principle and yn → y for some y. As the event [Yn = yn] may have
probability zero, we make sense of (1.1) in terms of a kernel ηn, so that
“represents” (1.1).
Such kernels are called regular conditional probabilities and form an important
object in probability theory. The existence of regular conditional probabilities has been
studied extensively, for example, by Faden [12] or by Leao et al. [21]. There exist in
fact various forms of regular conditional probabilities, namely either with respect to a
σ algebra, with respect to a measurable map or with respect to the projection on one
of the coordinates (in case of a product space).
In order to consider large deviations of conditional probabilities, we have to specify
which conditional probability we are considering; the conditional probability may not
be unique. However, if a (product) regular conditional probability is weakly continuous
on the support of the measure composed with the inverse of the measurable map
(or projection), it is unique on that domain. For these (product) regular conditional
probabilities, it is natural to study their large deviations whenever the argument of
the probability is in the domain on which it is unique. In this paper, we study the
large deviations in the case when the arguments of these kernels converge, i.e. we
study large deviations of (ηn(yn, ·))n∈N for the case that yn → y. To the best of our
knowledge, current literature does not provide a general condition under which such
kernels satisfy a large deviation principle.
1.1 Literature
Some examples in this direction are present. For example in Adams et al. [1], the
large deviation principle is proved for the empirical distribution that is evolved by
independent Brownian motions conditioned on their initial empirical distribution to
lie in a ball (see [1, Theorem 1]). They proceed by proving that the large deviation
principle rate function converges as the radius of the ball converges to zero. For the
purpose of this paper, we have to show that the limit of the radius of the ball and
the limit belonging to the large deviation principle can be interchanged. Léonard
[22] proves the large deviation principle of the empirical distribution that is evolved
by independent Brownian motions conditioned on their initial empirical distribution;
those initial empirical distributions are assumed to be converging (see [22, Proposition
2.19]). In both papers, the evolved state is conditioned on the initial state, while there is
also interest in large deviations of the initial state conditioned on the evolved state. In
this paper, we prove the large deviation principle in this setting for finite state spaces.
There exist various results on quenched large deviations, i.e. large deviations for
regular conditional probabilities in the sense that for almost all realisations of the
disorder, the conditional probabilities satisfy the large deviation principle with a rate
function that does not depend on the disorder. Examples of papers on quenched large
deviations are Comets [5] for conditional large deviations of i.i.d. random fields,
Greven and den Hollander [14] and Comets et al. [6] for random walks in random
environments, Kosygina–Rezakhanlou–Varadhan [18] for a diffusion with a random
drift and RassoulAgha et al. [24] for polymers in a random potential.
Biggins [2] obtains the large deviation principle for mixtures of probability
measures that satisfy the large deviation principle with kernels that satisfy the large
deviation principle as their arguments converge. To some extent, we complement
the article in the opposite direction, in the sense that we assume the large deviation
principle of the mixture and derive the large deviation principle of the kernels.
Our main motivation to study the above large deviations lies in the theory of Gibbs–
nonGibbs transitions. There is a correspondence between the large deviation rate
function of the conditional probability with respect to the evolved coordinate and the
evolved state (measure or sequence) being Gibbs (see van Enter et al. [9]). We refer
to Sect. 1.4 for further discussions on Gibbs–nonGibbs transitions.
1.2 Large Deviations
In the literature on large deviations, two dominant definitions of large deviation
principles are used. One is in terms of a σ algebra on the topological space, as is done in
the book by Dembo and Zeitouni [7] and in the book by Deuschel and Stroock [8];
the other is in terms of the topology, i.e. in terms of open and closed sets, as is done
in the book by den Hollander [16] and in the book by RassoulAgha and Seppäläinen
[25]. Whenever one considers the Borel σ algebra on the topological space, the two
definitions agree.
We define the large deviation lower bound and the large deviation upper bound
separately, as in Sect. 1.3, and in Sect. 6, we describe the necessary and sufficient
conditions for each of the bounds separately. Moreover, we define them on a set of
subsets of the topological space, which is not required to be a σ algebra. In Remark 7.4,
we motivate the choice for this definition.
Definition 1.1 Let X be a topological space and A be a set of subsets of X . Let
I : X → [0, ∞] be lower semicontinuous. Let (μn)n∈N be a sequence of probability
measures on A. Let (rn)n∈N be an increasing sequence in (0, ∞) with limn→∞ rn =
∞. We say that (μn)n∈N satisfies a large deviation lower bound on A with rate function
I and rates (rn)n∈N if
linm→i∞nf r1n log μn( A) ≥ − inf I ( A◦) ( A ∈ A).
We say that (μn)n∈N satisfies a large deviation upper bound on A with rate function
I and rates (rn)n∈N if
In the rest of the paper we only consider the rates rn = n. However, the theory presented
is still valid for general rates (rn)n∈N. We say that (μn)n∈N satisfies a large deviation
principle on A with rate function I whenever it satisfies both the large deviation lower
bound and the large deviation upper bound with rate function I .
We omit “on A” whenever A is the Borel σ algebra B(X ) on X . In this case, the
large deviation lower bound is satisfied if and only if the inequality in (1.2) holds for
all open subsets of X and the large deviation upper bound is satisfied if and only if
the inequality in (1.3) holds for all closed subsets of X .
1.3 Main Results
See Sects. 3 and 4 for the definitions of the objects in the statements of the following
theorems. In Sects. 6 and 7, we consider a more general situation. Theorem 1.2 is a
consequence of Theorem 6.9, and Theorem 1.3 is a consequence of Theorem 7.5.
In this section X and Y are metric spaces.
Theorem 1.2 Let π : X × Y → Y be given by π(x , y) = y. Suppose that (μn)n∈N is
a sequence of probability measures on B(X ) ⊗ B(Y) that satisfies the large deviation
principle with rate function J : X × Y → [0, ∞] that has compact sublevel sets.
Suppose that for each n ∈ N, there exists a product regular conditional probability
ηn : Y × B(X ) → [0, 1] under μn with respect to π that is weakly continuous
on supp(μn ◦ π −1), which we assume to be nonempty. Let y ∈ Y be such that
inf J (X × {y}) < ∞. Define I : X → [0, ∞] by
I (x ) = J (x , y) − inf J (X × {y}).
I has compact sublevel sets, and, for each n ∈ N, ηn is unique on supp(μn ◦ π −1).
Moreover,
(A2) and (B1) ⇐⇒
(A1) For all (yn)n∈N with yn → y and yn ∈ supp(μn ◦ π −1) for all n large
enough,1 the sequence (ηn(yn, ·))n∈N satisfies the large deviation lower bound
with rate function I .
(A2) For all x ∈ X and r > 0, with U = B(x , r ),
sup lim inf z∈Yi,nδ∈f(0,ε) n1 log μn U × Y X × B(z, δ) ≥ − inf I (U ).
ε>0 n→∞ B(z,δ)⊂B(y,ε)
(B1) For all (yn)n∈N with yn → y and yn ∈ supp(μn ◦ π −1) for all n large
enough, the sequence (ηn(yn, ·))n∈N satisfies the large deviation upper bound
with rate function I .
(B2) For all x1, . . . , xk ∈ X and r1, . . . , rk > 0, with W = X \ [B(x1, r1) ∪ · · · ∪
B(xk , rk )],
1 Meaning that there exists an N ∈ N such that yn ∈ supp(μn ◦ π−1) for all n ≥ N .
inf lim sup z∈Ys,uδ∈p(0,ε) n1 log μn W ◦ × Y X × B(z, δ) ≤ − inf I (W ). (1.6)
ε>0 n→∞
The next theorem is similar to Theorem 1.2, but considers the large deviation bounds
for regular conditional kernels instead of product regular conditional probabilities.
Theorem 1.3 Let τ : X → Y be continuous. Suppose that (νn)n∈N is a sequence
of probability measures on B(X ) that satisfies the large deviation principle with rate
function J : X → [0, ∞] that has compact sublevel sets. Suppose that for each n ∈ N
there exists a regular conditional probability ηn : Y × B(X ) → [0, 1] under νn with
respect to τ that is weakly continuous on supp(νn ◦ τ −1), which is assumed to be
nonempty. Let y ∈ Y be such that inf J (τ −1({y})) < ∞. Define I : X → [0, ∞] by
I (x ) =
J (x ) − inf J (τ −1({y})) τ (x ) = y,
I has compact sublevel sets, and, for each n ∈ N, ηn is unique on supp(νn ◦ τ −1).
Moreover,
(A1) ⇐⇒ (A2) and (B1) ⇐⇒ (B2),
(A1) For all (yn)n∈N with yn → y and yn ∈ supp(νn ◦ π −1) for all n large enough,
the sequence (ηn(yn, ·))n∈N satisfies the large deviation lower bound with rate
function I .
(A2) For all x ∈ X and r > 0, with U = B(x , r ),
(B1) For all (yn)n∈N with yn → y and yn ∈ supp(νn ◦ π −1) for all n large enough,
the sequence (ηn(yn, ·))n∈N satisfies the large deviation upper bound with rate
function I .
(B2) For all x1, . . . , xk ∈ X and r1, . . . , rk > 0, with W = X \ [B(x1, r1) ∪ · · · ∪
B(xk , rk )],
In this section we discuss the relation between the large deviation results in this paper
and Gibbs–nonGibbs transitions in more detail. In particular, we discuss possible
future directions regarding large deviations of conditional kernels.
The following situation for interacting particle systems occurs in the meanfield
context (a similar context holds in the context of lattices). The initial system of
socalled spins consists of distributions describing the interaction between spins via a
potential V (for each n there is a distribution describing the law of n spins). This
initial system is assumed to be Gibbs, which is called sequentially Gibbs in the
meanfield context. Allowing the initial state to be transformed, for example, by an evolution
of the spins, a question of interest is whether the transformed state is (sequentially)
Gibbs. This question has been addressed in the meanfield context by Ermoleav and
Külske [11] and by Fernández et al. [13] for {−1, +1}valued spins, by den Hollander
et al. [17] for Rvalued spins and by Külske and Opoku [19] and van Enter et al. [10]
for compactly valued spins. In these papers, independent dynamics of the spins are
considered (the evolution of each spin is independent of the evolution of the other
spins). Independent dynamics simplify the situation. Namely, the evolved measure on
either the product space of the initial and the final space, or—in case of an evolution—
the space of trajectories, is a tilted measure of the evolved measure when considering
V = 0. In this case the measure is a product measure, which means that the spins
are independent. As a consequence (this will be clarified in a forthcoming paper), the
conditional kernel ηn of the initial state on n spins with respect to the final state (for
a fixed potential V ) is a tilted version of the conditional kernel ηn0 of the initial state
with respect to the final state of independent spins (i.e. V = 0). Because of this tilting,
by Varadhan’s lemma, (ηn(yn, ·))n∈N satisfies the large deviation principle with rate
function V + Iy − inf(V + Iy ) if (ηn0(yn, ·))n∈N satisfies the large deviation principle
with rate function Iy . In the forthcoming paper, we will prove that the evolved sequence
is sequentially Gibbs if V + Iζ has a unique global minimiser.
The large deviation principle of (ηn(yn, ·))n∈N has been mentioned in the case of
trajectories in [11, Corollary 2.4] and—as a corollary of that theorem—for the case of
the product space of the initial and the final space in [13, Corollary 1.3]. However, no
proof was given. Theorem 8.2 provides a rigorous proof of the large deviation principle
statement in [13, Corollary 1.3]. In this paper, we do not provide a rigorous proof of
[11, Corollary 2.4]. But Theorem 1.3 may be used, as the conditioning on the final
state is a regular conditional kernel with respect to the map τ : C ([0, T ], X ) → X ,
τ ( f ) = f (T ).
In order to deal with empirical distributions (and not with magnetisations as is
done in [17]), in future research we strive to “extend” the statement of Theorem 8.2
to infinite and possibly noncompact state spaces. In the case of noncompact spaces,
it may be that topologies on the space of probability measures are considered that are
not metrisable.
1.5 Outline
We list some notations, definitions and assumptions in Sect. 2. In Sect. 3, we give and
compare the notions of regular conditional kernels, and we show that a regular
conditional kernel under a measure ν is in fact a product regular conditional kernel under
a measure that is related to ν. In Sect. 4, we introduce and study weakly continuous
regular conditional kernels. In Sect. 5, we present some facts about lower
semicontinuous functions with compact sublevel sets. Relying on the results of Sects. 4 and 5, in
Sect. 6, we present results on large deviation bounds for product regular conditional
probabilities, in particular necessary and sufficient conditions for these bounds to hold.
In Sect. 7, we discuss how to obtain large deviation bounds for regular conditional
probabilities from the results in Sect. 6. In Sect. 8, we apply the theory to obtain
the large deviation principle for the empirical density of the first coordinate given
the empirical density of the second coordinate, for independent and identically
distributed pairs of random variables. In Sect. 9, we give some examples. We also include
an example for which the conditions are not satisfied. For this example we compare
the quenched large deviations with large deviations of the weakly continuous regular
conditional probabilities and comment on the difference with an example by La Cour
and Schieve [20]. In “Appendices 1 and 2” we state some general results considering
large deviations bounds that are used in the different sections. In “Appendix 3” we
provide the proof of a theorem on which the examples of Sect. 9 rely.
2 Notations and Conventions
N = {1, 2, 3, . . . }. For a topological space X , we write B(X ) for the Borel σ algebra
and P(X ) and M(X ) for the spaces of probability and signed measures on B(X ),
respectively. For A ⊂ X we write A◦ for the interior of A and A for the closure
of A. For x ∈ X we write δx for the element in P(X ) with δx ( A) = 1 if x ∈ A
and δx ( A) = 0 otherwise. For x ∈ X we write Nx for the set of B(X )measurable
neighbourhoods of x . For a μ ∈ M(X ) we write supp μ = {x ∈ X : μ(V ) >
0 for all V ∈ Nx } and call this the support of μ. For a function f from a set X into
R and c ∈ R we write [ f ≥ c] = {x ∈ X : f (x ) ≥ c}. Similarly, we use the
notations [ f > c], [ f ≤ c] and [ f < c]. Whenever (xι)ι∈I is a net, where I is a
directed set by (a direction) , we write lim infι∈I xι = supι0∈I infι ι0,ι∈I Rxι, (wsiemiwlarriltye
lim sup). In particular, if V ⊂ Nx and V = {x } and f : V →
lim inf V ∈V f (V ) = supV0∈V inf V ⊂V0,V ∈V f (V ) (i.e. we consider ( f (V ))V ∈V as a
net where V is directed by ⊃ (as )).
Whenever we write μ( AB), we implicitly assume that it is well defined (as μ( A ∩
B)/μ(B)), i.e. that μ(B) = 0.
We use the conventions log 0 = −∞ and inf I (∅) = ∞ whenever I is a function
with values in [0, ∞].
All measures in this paper are signed measures, unless mentioned otherwise.
3 Regular Conditional Kernels Being Product Regular Conditional Kernels
In this section we introduce the notion of a (product) regular conditional kernel. For
an extensive study on regular conditional kernels, see Bogachev [4, Section 10.4].
The notion of a product regular conditional kernel does not appear in [4], but it does
in Faden [12] and in Leao et al. [21]. Besides giving definitions, we make a few
observations, of which Theorem 3.6 is used later on to derive statements of regular
conditional kernels from statements of product regular conditional kernels.
In this section (X, A), (Y, B) are measurable spaces, ν is a measure on A and μ
is a measure on A ⊗ B, τ : X → Y is measurable, and π : X × Y → Y is given by
π(x , y) = y.
Definition 3.1 A function η : Y × A → R is called a (B)kernel if η(·, A) is
(B)measurable for all A ∈ A and η(y, ·) is a measure for all y ∈ Y . A kernel η is called
a probability kernel if η(y, ·) is a probability measure for all y ∈ Y .
Definition 3.2 Let η : Y × A → R be a (probability) kernel.
(a) η is called a regular conditional kernel (regular conditional probability) under ν
with respect to τ if
1B (y)η(y, A) d ν ◦ τ −1 (y) ( A ∈ A, B ∈ B). (3.1)
(b) η is called a product regular conditional kernel (product regular conditional
probability) under μ with respect to π if
1B (y)η(y, A) d μ ◦ π −1 (y) ( A ∈ A, B ∈ B).
3.3 Suppose that E is a subσ algebra of F . Let (Y, B) = (X, E ) and Id : (X, A) →
(Y, B) be the identity map. In agreement of [4, Definition 10.4.1] a kernel η : Y ×A →
R is a regular conditional kernel under μ with respect to E if and only if η is a regular
conditional kernel under μ with respect to Id.
3.4 Consider the two kernels η : Y ×A → R and ξ : Y ×(A⊗B) → R, corresponding
to each other by the formulas ξ(y, F ) = 1F (x , y) d[η(y, ·)](x ) and η(y, A) =
ξ(y, A × Y ). Then ξ is a regular conditionaXlkernel under μ given π if and only if η
is a product regular conditional kernel under μ given π .
In general, X × Y may be equipped with a σ algebra F different from A ⊗ B. In this
situation, where μ is a measure on F and π is F measurable, the above correspondence
cannot be used in general to reduce statements about product regular conditional
kernels to statements about regular conditional kernels. See also example 4.5.
On the other hand, regular conditional probabilities can be seen as special cases of
product regular conditional probabilities; see Theorem 3.6. In the present paper we use
this to derive Theorem 1.3 from Theorem 1.2 but also Theorem 7.5 from Theorem 6.9.
Remark 3.5 If A is generated by a countable set, two regular conditional probabilities
under a measure with respect to a σ algebra (see 3.3) are almost everywhere equal (see
Bogachev [4, Theorem 10.4.3]). Similarly one could state an analogous statement for
regular conditional kernels with respect to measurable maps and for product regular
conditional kernels. In Theorem 4.3 we prove that (product) regular conditional kernels
are unique on the domain on which they are weakly continuous, in case the underlying
topological space is perfectly normal. For such space the Borel σ algebra may not be
generated by a countable set.2
2 The Sorgenfrey line, the space R with the right halfopen interval topology, is perfectly normal but not
second countable (see Steen and Seebach [27, Example 51]).
Theorem 3.6 (a) There exists a measure μ˜ on (X × Y, A ⊗ B) for which μ˜ ( A × B) =
ν( A ∩ τ −1(B)).
(b) η : Y × A → R is a regular conditional kernel under ν with respect to τ if and
only if η is a product conditional kernel under μ˜ with respect to π .
Proof (a) We may assume ν to be positive, since ν = ν+ − ν−. Let E be the
set that consists of in=1 Ai × Bi , where n ∈ N and Ai ∈ A, Bi ∈ B
are such that A1 × B1, . . . , An × Bn are disjoint. Define ν∗ : E → [0, ∞)
by ν∗ in=1 Ai × Bi = ν in=1 Ai ∩ τ −1(Bi ) for A1, . . . , An ∈ A and
B1, . . . , Bn ∈ B as above. Checking that E is a ring of sets and that ν∗ is σ
additive is left for the reader. The existence and unicity of the extension μ˜ follow
from the Carathéodory theorem (see Halmos [15, Section 13, Theorem A]).
(b) It follows from by definition of μ˜ (note that ν ◦ τ −1 = μ˜ ◦ π −1).
4 Weakly Continuous Kernels
In this section we introduce the notion of weak continuity for kernels on topological
spaces. In Theorem 4.3 we show uniqueness of (product) regular conditional kernels
that are weakly continuous. In Theorems 4.6 and 4.8 we describe conditions that imply
the existence of weakly continuous regular conditional probabilities. Similarly as is
done in the Portmanteau theorem when one considers metric spaces, weak convergence
implies lower bounds for open sets and upper bounds for closed sets, as is shown in
Theorem 4.10. As described in Lemmas 4.11 and 4.12, these lim inf and lim sup bounds
imply bounds for (product) regular conditional probabilities on which the results of
Sects. 6 and 7 are based.
In this section X and Y are topological spaces, ν is a measure on B(X ), μ is a
measure on B(X ) ⊗ B(Y), τ : X → Y is measurable, and π : X × Y → Y is given
by π(x , y) = y.
Definition 4.1 We equip the space of measures, M(X ), with the weak topology
(generated by Cb(X ), which we denote by σ (M(X ), Cb(X )) as in the book of Schaefer
[26, Chapter II, Section 5]). In this topology, a net (μι)ι∈I in M(X ) converges to a μ
iLneMtD(X⊂)Yif. Aankdeornnelyl ηif: YX ×f Bd(μXι )→→ XR ifs cdaμllefdorwaellakfly∈coCnbt(iXnu)o.us on D if the map
D → M(X ) given by y → η(y, ·) is continuous in the weak topology. η is called
weakly continuous if η is weakly continuous on Y.
Theorem 4.2 Let X be a perfectly normal3 space and μ ∈ M(X ). Then
x ∈ X :
f dμ > 0 for all f ∈ C (X , [0, 1]) with f (x ) > 0 .
3 Perfectly normal means that every open set in X is equal to f −1((0, ∞)) for some f ∈ C(X ). All metric
spaces are perfectly normal; Bogachev [4, Proposition 6.3.5].
Moreover, μ(X \supp(μ)) = 0.4 As a consequence, μ = 0 if and only if
0 for all f ∈ Cb(X ).
Proof We may assume μ is positive. Let x ∈ supp μ. Then μ(V ) > 0 for all V ∈ Nx .
Let f ∈ C (X , [0, 1]) be such that f (x ) > 0. Then V = f −1(0, ∞) has strictly
positive measure. Since μ(V ) = limn→∞ X min{n f, 1} dμ, there exists an n such
that min{n f, 1} dμ > 0. Consequently, as f ≥ n1 min{n f, 1}, we have f dμ >
0. X X
Let x ∈ X be such that f dμ > 0 for all f ∈ C (X , [0, 1]) with f (x ) > 0.
X
Let V ∈ Nx . As V = f −1(0, ∞) for some f ∈ C (X , [0, 1]), we have μ(V ) ≥
f dμ > 0.
Theorem 4.3 Suppose that X is a perfectly normal space.
(a) Let η and ζ be regular conditional kernels under ν with respect to τ that are weakly
continuous on supp(ν ◦ τ −1). Then η(y, ·) = ζ (y, ·) for all y ∈ supp(ν ◦ τ −1).
If ν is a probability measure, then η(y, ·) is a probability measure for all y ∈
supp(ν ◦ τ −1).
(b) Let η and ζ be product regular conditional kernels under μ with respect to π
that are weakly continuous on supp(μ ◦ π −1). Then η(y, ·) = ζ (y, ·) for all
y ∈ supp(μ ◦ π −1). If μ is a probability measure, then η(y, ·) is a probability
measure for all y ∈ supp(μ ◦ π −1).
Proof We prove (a), and the proof of (b) is similar (replace “ν◦τ −1” by “μ◦π −1”).
To prove η = ζ on D = supp(ν ◦ τ −1), by Theorem 4.2, it is sufficient to prove
f dη(y, ·) = f dζ (y, ·) for all y ∈ D and all f ∈ Cb(X ). Let f ∈ Cb(X ).
BXecause f is the unXiform limit of simple functions, one has for all B ∈ B(Y)
Therefore there exists a set Z ∈ B(Y) with ν ◦ τ −1(Y \ Z ) = 0 such that
Since both y → f dη(y, ·) and y → f dζ (y, ·) are weakly continuous on D,
and Z is dense in XD by Theorem 4.2, we hXave f dη(y, ·) = f dζ (y, ·) for all
y ∈ D. The second statement is proved by takinXg f = 1X . X
4.4 When η is a regular conditional kernel under ν with respect to τ , the value of the
function η(·, A) on the complement of supp(ν ◦ τ −1) is not determined, in the sense
that if η˜ is a kernel with η˜(y, ·) = η(y, ·) for all y ∈ supp(ν ◦ τ −1), then η˜ is also a
regular conditional kernel under ν with respect to τ .
4 This is not true in general. For an example, see Bogachev [4, Example 7.1.3].
For example η˜ given by η˜(y, ·) = η(y, ·) for y ∈ supp(ν ◦ τ −1) and η˜(y, ·) = δx
for y ∈ supp(ν ◦ τ −1)c for some chosen x ∈ X , is such regular conditional kernel.
Whence if ν is a probability measure and there exists a regular conditional kernel
under ν with respect to τ that is weakly continuous on supp(ν ◦ τ −1), then we may
as well assume this kernel to be a probability kernel. A similar statement is true for
product regular conditional kernels.
4.5 By Theorem 3.6, statement (a) of Theorem 4.3 is a consequence of statement
(b). In an attempt to reduce statement (b) to statement (a), the following problem
occurs to the correspondence between regular conditional kernels and product regular
conditional kernels that is mentioned in 3.4.
The Borel σ algebra of X × Y, i.e. B(X × Y), may be strictly larger than B(X ) ⊗
B(Y) (see, e.g. Bogachev [4, Lemma 6.4.1 and Example 6.4.3]). If this is the case, i.e.
B(X ) ⊗ B(Y) B(X × Y), and B(X × Y) equals the Baireσ algebra on X × Y, i.e.
the smallest σ algebra that makes all continuous function X × Y → R measurable,
then there exists a continuous function f ∈ C (X × Y) that is not B(X ) ⊗
B(Y)measurable. Composing the function f with arctan, we obtain a g ∈ Cb(X × Y) that
is not measurable with respect to B(X ) ⊗ B(Y). So if η : Y × B(X ) → R is a product
regular conditional kernel under μ with respect to π , and ξ : Y × B(X ) ⊗ B(Y) → R
is as in Example 3.4, then g is not integrable with respect to ξ(y, ·) for any y ∈ Y.
B(X × Y) equals the Baireσ algebra ifRXR ×eqYuipispeadmweittrhicthspeadciesc(rBeotegatochpeovlo[g4y,
Proposition 6.3.4]). Therefore X = Y =
form an example for which the above is the case.
We state two theorems (Theorems 4.6 and 4.8) showing the existence of product
regular conditional probabilities that are weakly continuous on supp(μ ◦ π −1).
Theorem 4.6 Suppose that Y is countable and equipped with the discrete topology.
Then η : Y × B(X ) → R defined by
is a product regular conditional kernel under μ with respect to π that is weakly
continuous on supp(μ ◦ π −1).
4.7 In case Y is first countable, the notion of open and closed sets and continuity of
functions Y → R is characterised by the convergence of sequences. Therefore the
following are equivalent for a kernel η : Y × B(X ) → R
(b) For all (yn)n∈N in Y with yn → y, one has η(yn, ·) −→w η(y, ·).
The following theorem is an easy consequence of Lebesgue dominated convergence
theorem.
Theorem 4.8 Let Y be first countable. Let λ be a probability measure on B(X ). Let
D ⊂ Y. Let f : X × Y → [0, ∞) be a bounded B(X ) ⊗ B(Y)measurable function
such that y → f (x , y) is continuous on D and equal to zero on Y \ D for λalmost all
x ∈ X . Suppose that f (x , y) dλ(x ) > 0 for all y ∈ D. If η : Y × B(X ) → [0, 1]
is given by X
y ∈ D,
y ∈/ D.
then η is weakly continuous on D (even strongly continuous, i.e. y → η(y, A) is
continuous for all A ∈ B(X )). Let κ be a probability measure on B(Y) and assume
D = supp κ. Then η is a product regular conditional kernel under
with respect to π , that is weakly continuous on D = supp(μ ◦ π −1).
Remark 4.9 In the above theorem the conditions may be weakened. Instead of
assuming f to be bounded and λ, κ to be probability measures, we may as well assume that
λ and κ are positive nonzero measures; that for all y ∈ D there exists a V ∈ Ny
and a λintegrable h : X → [0, ∞) such that f (x , z) ≤ h(x ) for all x ∈ X ; and all
z ∈ V ∩ D and that f is λ ⊗ κintegrable.
In Sect. 6, the condition (b) of Theorem 4.10 is one of the key assumptions. If X
is a metric space, this property follows from weak continuity as in the Portmanteau
theorem. We state this in Theorem 4.10.
Theorem 4.10 Let η : Y × B(X ) → R be a probability kernel. Let D ⊂ Y, y ∈ D
and V ⊂ Ny be such that V = {y}. Consider the following conditions.
(a) D → M(X ), y → η(y, ·) is weakly continuous in y.
(b) lim infι∈I η(yι, G) ≥ η(y, G) for all open G ⊂ X and (yι)ι∈I in D with yι → y.
(c) lim supι∈I η(yι, F ) ≤ η(y, F ) for all closed F ⊂ X and (yι)ι∈I in D with yι → y.
(d) supV ∈V infv∈V ∩D η(v, G) ≥ η(y, G) for all open sets G ⊂ X .
(e) inf V ∈V supv∈V ∩D η(v, F ) ≤ η(y, F ) for all closed sets F ⊂ X .
(b), (c), (d), (e) are equivalent. If X is metrisable, then (a) implies (b). If X is metrisable
and Y is first countable, then (a) is equivalent to (b) and hence to (c), (d), and (e).
Proof We leave it to the reader to check the equivalences between (b), (c), (d), (e). If
X is a metric space, one can follow the lines of the Portmanteau theorem in the book
of Billingsley [3, Theorem 2.1] for the implication (a) implies (b); the fact that the
measures in the proof are indexed by the natural numbers instead of a general directed
set I does not affect the argument. The proof of (b) ⇒(a) in the book of Billingsley
relies on the Lebesgue dominated convergence theorem. But when Y is first countable,
one can restrict to sequences (see 4.7) and obtain the implication (b) ⇒(a) as is done
in the book of Billingsley.
Lemma 4.11 Assume that μ is a probability measure. Let η be a product regular
conditional probability under μ with respect to π . Write D = supp(μ ◦ π −1) and let
y ∈ D. Then for every U ∈ Ny , one has μ(X × U ) > 0 and
Moreover, if V ⊂ Ny is such that
Proof Let U ∈ Ny . Since y ∈ D = supp(μ ◦ π −1), one has μ(X × U ) > 0. (4.7)
follows from the fact that for all A ∈ B(X )
= Y
For an open G ⊂ X we have for V as above
Thus (4.8) follows when assuming (d) of Theorem 4.10. Similarly, one obtains (4.9).
For a regular conditional probability we have a similar statement; see Lemma 4.12.
The proof can be done following the lines of the proof of Lemma 4.11 or as a
consequence of Lemma 4.11 using Theorem 3.6.
Lemma 4.12 Assume that ν is a probability measure. Let η be a regular conditional
probability under ν with respect to τ . Write D = supp(ν ◦ τ −1) and let y ∈ D. Then
for every U ∈ Ny , one has ν(τ −1(U )) > 0 and
Moreover, if V ⊂ Ny is such that
5 Some Facts About Functions with Compact Sublevel Sets
In this section we present some facts for functions with compact sublevel sets which
are used in Sects. 6, 7 and 8.
In this section X , Y and Z are topological spaces.
Definition 5.1 Let J : X → [0, ∞]. We call the set [ J ≤ α] (see Sect. 2) a sublevel
set of J for α ∈ [0, ∞). J is said to be lower semicontinuous if all sublevels of J are
closed. J is said to have compact sublevel sets if all sublevels of J are compact.
5.2 Let J : X → [0, ∞] be lower semicontinuous. Then
Indeed, for all α < J (x ) the set [ J > α] is open and contains x .
Hence, a function J : X → [0, ∞] is lower semicontinuous if and only if
for all x ∈ X and all nets (xι)ι∈I in X that converge to x .
Lemma 5.3 Let τ : Z → Y be continuous. Let J : Z → [0, ∞] have compact
sublevel sets. Let y ∈ Y and V ⊂ Ny , V = {y}. Let F ⊂ Z be closed. Then
Proof The ≤ inequality in (5.3) is immediate. Because lim inf V ∈V inf J (F ∩
τ −1(V )) ≥ lim inf V ∈Ny inf J (F ∩ τ −1(V )), it is sufficient to prove
α := liVm∈ Ninyf inf J (F ∩ τ −1(V )) ≥ inf J (F ∩ τ −1({y})).
Suppose that α < ∞. Whence F ∩ τ −1(V ) ∩ [ J ≤ α + ε] = ∅ for all V ∈ Ny and all
ε > 0. Since [ J ≤ α + ε] is compact, this implies that V ∈Ny F ∩ τ −1(V ) ∩ [ J ≤
α + ε] = ∅, i.e. inf J (F ∩ τ −1({y})) ≤ α + ε for all ε > 0.
5.4 The assumption that τ be continuous is not redundant, e.g. consider Y = Z =
[0, 1] and J = 1( 21 ,1] and τ given by τ (0) = 0, τ (1) = 1 and τ (x ) = 1 − x for
x ∈ (0, 1), F = [0, 1] and y = 1. Then, for all neighbourhoods V of y, τ −1(V )
contains the interval (0, ε) for some ε > 0, whence inf J (F ∩ τ −1(V )) = 0 but
inf J (F ∩ τ −1({y})) = J (1) = 1.
Lemma 5.5 Let X be normal and let G be a basis for the topology of X . Let J :
X × Y → [0, ∞] have compact sublevel sets.
(a) For all open G ⊂ X and ε > 0, there exists a U ∈ G with U ⊂ U ⊂ G such that
α < inf J (W × {y}) ≤ inf J (W ◦ × {y}) ≤ inf J (F × {y}).
Proof (a) Let ε > 0. Let x ∈ G be such that J (x , y) ≤ inf J (G × {y}) + ε. Since X
is a normal topological space, there exists an open set U with x ∈ U ⊂ U ⊂ G.
Because G is a basis, U may be chosen in G. Then inf J (G × {y}) + ε ≥ J (x , y) ≥
inf J (U × {y}).
(b) Let β > α be such that β < inf J (F × {y}). The set K := {x ∈ X : J (x , y) ≤ β}
is a compact set that is disjoint from F . Whence there exists disjoint open U, V ⊂
X with K ⊂ U and F ⊂ V . Since G is a basis and K is compact, there exists
U1, . . . , Uk in G with K ⊂ U1 ∪ · · · ∪ Uk ⊂ U . Then U1 ∪ · · · ∪ Uk ∩ V = ∅.
Whence with W := X \ U1 ∪ · · · ∪ Uk , one has F ⊂ W ◦ and W ⊂ X \ K , which
implies inf J (W × {y}) ≥ β > α.
6 Large Deviations for Product Regular Conditional Probabilities
In this section we consider the following situation.
(i) X and Y are topological spaces, where X is normal.
(ii) G is a basis for the topology of X and H is a basis for the topology of Y.
(iii) π : X × Y → Y is given by π(x , y) = y.
(iv) (μn)n∈N is a sequence of probability measures on B(X ) ⊗ B(Y) satisfying the
large deviation principle on { A × B : A ∈ B(X ), B ∈ B(Y)} with a rate function
J : X × Y → [0, ∞] that has compact sublevel sets.
(v) For each n ∈ N we assume the following: supp(μn ◦ π −1) = ∅,5 there exists
a product regular conditional probability ηn : Y × B(X ) → [0, 1] under μn
with respect to π , which satisfies the following continuity condition (see
Theorem 4.10):
I (x ) = J (x , y) − inf J (X × {y}).
5 As we are considering large deviation bound for (ηn(yn, ·))n∈N with yn ∈ supp(μn ◦π−1), we want such
yn to exist. Instead of this condition, one could of course deal with the situation where supp(μn ◦ π−1) = ∅
for some large N and consider sequences (yn)n∈N with yn ∈ supp(μn ◦ π−1) for n ≥ N .
In this section we derive necessary and sufficient conditions for the large deviation
bounds with rate function I for sequences of the form (ηn(yn, ·))n∈N. We prove this
for general topological spaces instead of metric spaces as it does not cost more effort.
In Theorem 6.3 we consider a fixed sequence (yn)n∈N with yn → y and describe
equivalent conditions for the lower and upper large deviation bound to hold.
We are interested in the question whether for all sequences (yn)n∈N with yn → y
the sequence (ηn(yn, ·))n∈N satisfies the lower and upper large deviation bound with
rate function I . In Theorem 6.9 we give equivalent6 and sufficient conditions for these
bounds in a way that does not depend on sequences (yn)n∈N and the sets (Vn)n∈N as
in Theorem 6.3.
Finally in 6.12 we comment on deriving Theorem 1.2 from Theorem 6.9.
But first we consider specific situations, providing a simple proof of the large
deviation bounds with rate function I for sequences of the form (ηn(yn, ·))n∈N. Namely,
we consider the case that Y is a discrete space (Theorem 6.1) and the case where μn
is a product measure for all n ∈ N (Theorem 6.2).
Theorem 6.1 Suppose that Y is countable and equipped with the discrete topology.
Let y ∈ Y be such that inf J (X × {y}) < ∞. For all (yn)n∈N in Y with yn ∈
supp(μn ◦ π −1) and yn → y, the sequence (ηn(yn, ·))n∈N satisfies the large deviation
principle with rate function I .
Proof This basically follows from the following inequalities which follow from the
large deviation principle and from Theorem 4.6.
lim inf n1 log μn(G × {y}) ≥ − inf J (G × {y}) for all open G ⊂ X ,
n→∞
lim sup n1 log μn(F × {y}) ≤ − inf J (F × {y}) for all closed F ⊂ X .
n→∞
Theorem 6.2 (Independent coordinates) Suppose that X and Y are second countable
and Y is regular. Suppose that μn = μn1 ⊗ μn2 for some μn1 on B(X ) and μn2 on
B(Y) for all n ∈ N. Then (ηn(yn, ·))n∈N satisfies the large deviation principle with
rate function I for all sequences (yn)n∈N in Y. In particular, ηn(yn, ·) = μn1 and
I (x ) = inf J ({x } × Y).
Proof I is lower semicontinuous (e.g. by 5.2) and for c ∈ R the set [I ≤ c] is a subset
of the compact set {x ∈ X : ∃z ∈ Y, J (x , z) ≤ c − inf J (X × {y})}.
[I ≤ c] = π([ J ≤ c + inf J (X × {y})]).
Theorem 6.3 Let (yn)n∈N be a sequence in Y with yn ∈ supp(μn ◦π −1) that converges
to y. For n ∈ N let Vn ⊂ Nyn be such that Vn = {yn}. Then (a2) ⇐⇒ (a3)
⇐⇒ (a1) and (b2) ⇐⇒ (b3) ⇐⇒ (b1)
(a1) For all open G ⊂ X
6 Under the condition that Y is first countable.
(a2) For all U ∈ G7
(a3) For all open U ⊂ X , one has
(b1) For all closed F ⊂ X
(b2) For all U1, . . . , Uk ∈ G, one has for W = X \ (U1 ∪ · · · ∪ Uk )
(b3) For all closed W ⊂ X
Proof The implications (a3) ⇒ (a2) and (b3) ⇒ (b2) are immediate.
(a1) ⇒ (a3) Let U ⊂ X be an open set. By Lemma 4.11, (4.8),
lim inf liVm∈Vinnf n1 log μn(U × YX × V ) ≥ linm→i∞nf n1 log ηn(yn, U ).
n→∞
⇒ (b3) Let W ⊂ X be a closed set. By Lemma 4.11, (4.9),
lim sup lim sup n1 log μn(W × YX × V ) ≤ lim sup n1 log ηn(yn, W ).
n→∞ V ∈Vn n→∞
≥ linm→i∞nf lim sup n1 log μn(U × YX × V )
V ∈Vn
≥ − inf I (U ) = − inf J (U × {y}) + inf J (X × {y})
≥ − inf J (G × {y}) + inf J (X × {y}) − ε. (6.13)
7 Note that μn(X × V ) > 0 for all n ∈ N and V ∈ Nyn , as yn ∈ supp(μn ◦ π−1).
(b2) ⇒ (b1). Let α < inf J (F ×{y}) and U1, . . . , Uk and W be as in Lemma 5.5(b).
Then we obtain using Lemma 4.11
lim sup n1 log ηn(yn, F ) ≤ lim sup n1 log ηn(yn, W ◦)
n→∞ n→∞
6.4 (Fixed y) Note that if yn = y for all n ∈ N, one can take Vn = V for a V ⊂ Ny with
V = {y}. Then Theorem 6.3 implies that (ηn(y, ·))n∈N satisfies the large deviation
principle with rate function I if and only if (a2) and (b2) hold (with Vn = V).
6.5 Let (yn)n∈N in Y be such that yn ∈ supp(μn ◦π −1) and yn → y. From Theorem 6.3
we derive that (a2) holds for some Vn ⊂ Nyn with Vn = {yn} if and only if (a2)
holds for all such Vn. Similarly, (b2) holds for some Vn ⊂ Nyn with Vn = {yn} if
and only if (b2) holds for all such Vn ⊂ Nyn .
In Lemma 6.7, we give a consequence of the large deviation principle of (μn)n∈N.
In Theorems 6.9 and 6.10 we use this to formulate sufficient conditions for upper or
lower large deviation bounds on sequences (ηn(yn, ·))n∈N with yn → y and sequences
(ηn(y, ·))n∈N.
We assumed X to be normal in this section. For Lemma 6.7 this assumption can be
dropped.
6.6 For all neighbourhoods V of y one has by the large deviation principle
lim inf n1 log μn(X × V ) ≥ − inf J (X × V ◦) ≥ − inf J (X × {y}) > −∞. (6.15)
n→∞
In particular, there exists an N ∈ N such that μn(X × V ) > 0 for all n ≥ N . Therefore
μn(G × YX × V ) is well defined for large n.
Lemma 6.7 (a) For open G ⊂ X
(b) For closed F ⊂ X
n1 log μn(G × YX × V ) ≥ − inf I (G).
n1 log μn(F × YX × V ) ≤ − inf I (F ).
Proof (a) Let ε > 0. By Lemma 5.3, there exists a V0 ∈ Ny such that for all V ∈ Ny
with V ⊂ V0
inf J (X × {y}) ≥ inf J (X × V ) ≥ inf J (X × V 0) ≥ inf J (X × {y}) − ε.
(6.18)
Let V ∈ Ny be such that V ⊂ V0. As lim supn→∞ n1 log μn(X × V ) > −∞
(see 6.6), we can “split the lim inf in two” and we get by the large deviation
principle and by (6.18)
= linm→i∞nf n1 log μn(G × V ) − lim sup n1 log μn(X × V )
n→∞
≥ − inf J (G × {y}) + inf J (X × V ) ≥ − inf I (G) − ε.
inf J (F × {y}) ≥ inf J (F × V ) ≥ inf J (F × V 0) ≥ α.
Let V ∈ Ny be such that y ∈ V ⊂ V0. Similarly as above, we get
n1 log μn(F × YX × V ) ≤ −α + inf J (X × {y}).
Theorem 6.8 I has compact sublevel sets.
Proof [I ≤ c] = π([ J ≤ c + inf J (X × {y})]).
Theorem 6.9 We have
and, if Y is first countable, then
⇒ (A4) ⇐⇒ (A3)
(A1) ⇐⇒ (A2),
(A3) For all U ∈ G
(A1) For all (yn)n∈N with yn ∈ supp(μn ◦ π −1) and yn → y, the sequence
(ηn(yn, ·))n∈N satisfies the large deviation lower bound with rate function I .
(A2) For all U ∈ G
n1 log μn(U × YX × V ) ≥ − inf I (U ). (6.22)
(A4) For all U ∈ G we have ∀Z0 ∈ Ny ∀ε > 0∃V0 ∈ Ny ∃Z ∈ Ny, Z ⊂
Z0 ∀M ∃m ≥ M ∃N ∀n ≥ N ∀V ∈ H, V ⊂ V0, V ∩ supp(μn ◦ π −1) = ∅:
n1 log μn(U × YX × V ) ≥ m1 log μm(U × YX × Z ) − ε.
(A5) For all U ∈ G we have ∀ε > 0 ∀V0 ∈ Ny∃N ∈ N ∀n ≥ N ∀V ∈ H, V ⊂
V0, V ∩ supp(μn ◦ π −1) = ∅:
μn(U × YX × V ) ≥ e−nεμn(U × YX × V0).
and, if Y is first countable, then
⇒ (B4) ⇐⇒ (B3)
(B1) ⇐⇒ (B2),
(B1) For all (yn)n∈N with yn ∈ supp(μn ◦ π −1) and yn → y the sequence
(ηn(yn, ·))n∈N satisfies the large deviation upper bound with rate function I .
(B2) For all U1, . . . , Uk ∈ G one has for W = X \ (U1 ∪ · · · ∪ Uk)
n1 log μn(W ◦ × YX × V ) ≤ − inf I (W ). (6.26)
(B3) For all U1, . . . , Uk ∈ G with W = X \ (U1 ∪ · · · ∪ Uk)
NH,y V∀ε⊂>V00,V∃V∩0 s∈upNp(yμ∃nZ◦ π∈−N1)y=,Z∅:⊂ Z0 ∀M ∃m ≥ M ∃N ∀n ≥ N ∀V ∈
n1 log μn(U × YX × V ) ≤ m1 log μm(U × YX × Z ) + ε.
(B5) For all U1, . . . , Uk ∈ G with W = X \(U1 ∪· · ·∪Uk ) we have ∀ε > 0 ∀V0 ∈
Ny ∃N ∈ N ∀n ≥ N ∀V ∈ H, V ⊂ V0, V ∩ supp(μn ◦ π −1) = ∅:
μn(W ◦ × YX × V ) ≤ enεμn(W × YX × V0)
Proof The proofs of (B5) ⇒ (B4) ⇐⇒ (B3) ⇒ (B2) ⇒ (B1) and of (B1) ⇒
(B2) are similar to the proofs of the following implications.
(A4) ⇐⇒ (A3) follows by definition of sup, inf, lim sup and lim inf.
(A5) ⇒ (A3) Let U ∈ G. Assuming (A5) we obtain ∀ε > 0 ∀V0 ∈ Ny ∃N ∈
N ∀n ≥ N and one has μn(X × V0) > 0 and
n1 log μn(U × YX × V ) ≥ n1 log μn(U × YX × V0) − ε. (6.30)
(A3) ⇒ (A2) Follows by Lemma 6.7.
(A2) ⇒ (A1). Suppose that (A2) holds. Let U ∈ G with inf J (U × {y}) < ∞
and let ε > 0. Let V0 ∈ Ny and N ∈ N be such that n1 log μn(U × YX × V ) ≥
− inf I (U )−ε for all n ≥ N and all V ∈ H with V ⊂ V0 and V ∩supp(μn ◦π −1) = ∅.
Let (yn)n∈N be such that yn ∈ supp(μn ◦ π −1) and yn → y. Let N0 ≥ N be such that
yn ∈ V0 for all n ≥ N0. Then for all n ≥ N0 and V ∈ Nyn ∩ H with V ⊂ V0 we have
n1 log μn(U × YX × V ) ≥ − inf I (U ) − ε. This implies (a2) of Theorem 6.3 (with
Vn = Nyn ∩ H).
(A1) ⇒ (A2) (assuming Y is first countable). Suppose that (A2) does not hold.
Let (Vm )m∈N be a decreasing sequence in H with m∈N Vm = {y}. Then there exists
a U ∈ G with inf J (U × {y}) < ∞ and an α > inf I (U ) such that for all M ∈ N and
N ∈ N there exist an n ≥ N and a V ∈ H with V ⊂ VM and V ∩ supp(μn ◦ π −1) = ∅
such that
z∈V ∩supp(μn◦π−1) n1 log ηn(z, U ) ≤ n1 log μn(U × YX × V ).
inf
For each m ∈ N there exist an nm and a ynm ∈ Vm ∩ supp(μnm ◦ π −1) such that
We may choose n1 < n2 < n3 < · · · . With yk = y for k ∈/ {nm : m ∈ N} we have
yn → y and
linm→i∞nf n1 log ηn(yn, U ) ≤ lmim→i∞nf n1m log ηnm (ynm , U ) ≤ −β.
Therefore (a1) of Theorem 6.3 does not hold, which implies that (A1) does not hold.
We can also use Lemma 6.7 and Theorem 6.3 (see also 6.4) to obtain sufficient
conditions for the lower or upper large deviation bounds for (ηn(y, ·))n∈N.
Theorem 6.10 Let V ⊂ Ny be such that
V = {y}.
(a) Suppose that for all U ∈ G with inf J (U × {y}) < ∞
Then (ηn(y, ·))n∈N satisfies the large deviation lower bound with rate function I .
(b) Suppose that for all U1, . . . , Uk ∈ G with W = X \ (U1 ∪ · · · ∪ Uk )
6.11 (6.36) and (6.37) hold for example when ∀ε > 0 ∀V0 ∈ V ∃N ∈ N ∀n ≥
N ∀V ∈ V, V ⊂ V0 :
μn(U × YX × V ) ≥ e−nεμn(U × YX × V0),
μn(W ◦ × YX × V ) ≤ enεμn(W × YX × V0),
6.12 Theorem 1.2 is a consequence of Theorems 4.10, 6.8 and 6.9 with G = {B(x , r ) :
x ∈ X , r > 0} and H = {B(y, δ) : y ∈ Y, δ > 0}.
7 Large Deviations for Regular Conditional Probabilities
In this section X and Y are topological spaces, (νn)n∈N is a sequence of probability
measures on B(X ) that satisfies the large deviation principle with rate function K :
X → [0, ∞], and τ : X → Y is continuous. For more assumptions, see 7.2.
We derive the analogous statements as in Sect. 6 but for regular conditional kernels
instead of product regular conditional kernels (7.3 and Theorem 7.5). First we show that
with μn the probability measure corresponding on the product space corresponding
to νn as in Theorem 3.6, the sequence (μn)n∈N satisfies the large deviation principle
with a rate function described in terms of K (Theorem 7.1).
If (ηn)n∈N are regular conditional probabilities under (νn)n∈N given τ , then one
could also follow the proofs in Sect. 6 for the product regular conditional probabilities
to obtain similar results for large deviations for sequences of the form (ηn(yn, ·))n∈N.
Instead, we make the approach via Theorem 3.6 to translate the results to the setting
of regular conditional probabilities.
Theorem 7.1 For all n ∈ N let μn be the probability measure on B(X ) ⊗ B(Y) for
which μn( A × B) = νn( A ∩ τ −1(B)) for A ∈ B(X ), B ∈ B(Y) (as in Theorem 3.6).
Then (μn)n∈N satisfies the large deviation principle on { A × B : A ∈ B(X ), B ∈
B(Y)} with rate function J : X × Y → [0, ∞] given by
J (x , y) =
If K has compact sublevel sets, then so does J .
Proof By definition of J we have
Let A ∈ B(X ) and B ∈ B(Y). Then
inf K ( A ∩ τ −1(B)) = inf J ( A × B)
A ∈ B(X ), B ∈ B(Y) .
We have ( A ∩ τ −1(B))◦ = A◦ ∩ τ −1(B)◦ and τ −1(B)◦ ⊃ τ −1(B◦), whence
inf K (( A ∩ τ −1(B))◦) ≤ inf K ( A◦ ∩ τ −1(B◦))
= inf J ( A◦ × B◦) = inf J (( A × B)◦).
lim sup n1 log μn( A × B) = lim sup n1 log νn( A ∩ τ −1(B))
n→∞ n→∞
We have A ∩ τ −1(B) ⊂ A ∩ τ −1(B) and τ −1(B) ⊂ τ −1(B), whence
inf K ( A ∩ τ −1(B)) ≥ inf K ( A ∩ τ −1(B)) = inf J ( A × B).
Suppose that K has compact sublevel sets. Let c ≥ 0. Then [ J ≤ c] is contained in
the compact set [K ≤ c] × τ ([K ≤ c]). By Theorem 6.8 I has compact sublevel sets.
I (x ) = J (x , y) − inf J (X × {y})
7.3 As by Theorem 3.6 ηn is the product regular conditional kernel under μn with
νn( Aτ −1(V )) = μn( A × X X × V ).
In this sense also Theorem 1.3 follows from Theorem 1.2. We present some of the
equivalent statements of Theorem 6.9 in Theorem 7.5.
Remark 7.4 Because of the relation between μn and νn and between K and J , in
Theorem 7.1 we were able to prove the large deviation principle on { A × B : A ∈
B(X ), B ∈ B(Y)}. Whether it can be extended to the large deviation principle on
B(X ) ⊗ B(Y) is a priori not clear. However, for the purpose of using the results of
Sect. 6, this is not required (as only (iv) of Sect. 6 is required). This is the main reason
to define the large deviation bounds as in Definition 1.1.
Theorem 7.5 (A3)
⇒ (A1). If Y is first countable, then (A1) ⇐⇒ (A2).
(A3) For all U ∈ G
⇒ (B1). If Y is first countable, then (B1) ⇐⇒ (B2).
n1 log νn(W ◦τ −1(V )) ≤ − inf I (W ).
(B3) For all U1, . . . , Uk ∈ G with W = X \ (U1 ∪ · · · ∪ Uk )
≤ lim sup
In terms of random variables, Sanov’s theorem gives us the large deviation principle
odifsetrmibpuitreicdarladnednosmitiveasrni1ableins=.1WδeXci,ownshiedreer Xla1rg,eXd2e,v.i.a.tiaornesionfdn1epenind=e1nδtXaincdoinddeintitoicnailnlgy
on n1 in=1 δYi = ψn, where (X1, Y1), (X2, Y2), . . . are independent and identically
distributed couples of random variables, both random variables attaining their values
in a finite set. This large deviation principle is formalised in Theorem 8.2.
In this section we consider the following.
• Let π : P(R) × P(S) → P(S) be the map given by π(ξ, ζ ) = ζ .
• Let μn be the probability measure on B(P(R)) ⊗ B(P(S)) defined by
μn = in=1 λ ◦ Ln−1 ◦ m−1, so that for A ∈ B(P(R)) and B ∈ B(P(S))
i=1
• Define θ : S × B(R) → [0, 1] by θ (s, A) = λ( A × SR × {s}).
• Define ηn : P(S) × B(P(R)) → [0, 1] by
⎪⎩ 0
• Let J : P(R) × P(S) → [0, ∞] be given by
where H (ξ λ) is the relative entropy of ξ with respect to λ ([7, Definition 2.1.5]).
• Let ψ ∈ P(S) be such that
8.1 We present some facts which follow from the assumptions with little effort; to
some facts we give some explanation or references.
(a) Penmp(S) is closed in P(S). Moreover, if ξk and ξ in Penmp(S) are such that ξk → ξ ,
then there exist ski and qi in S for i ∈ {1, . . . , n} such that ξk = Ln((sk1, . . . , skn )),
ξ = Ln((q1, . . . , qn )) and ski → qi for all i ∈ {1, . . . , n}.
(b) supp(μn ◦ π −1) = Pemp(S).
n
(c) ηn is a product regular conditional kernel under μn with respect to π that is weakly
n
continuous on Pemp(S).
(d) ( λn ◦ Ln−1)n∈N satisfies the large deviation principle with rate function H (·λ).
(e) m is continuous.
(f) (μn)n∈N satisfies the large deviation principle with rate function J .
(a) follows from the fact that S is a finite space. (b) follows from (a), from the
fact that the complement of Pemp(S) has μn ◦ π −1measure zero and because
n
μn ◦ π −1({Ln(s)}) > 0 for all s ∈ Sn, which is due to the assumptions on λ. (c)
follows by a straightforward calculation, and the continuity follows from (a). For (d)
see Sanov’s theorem (Dembo and Zeitouni [7, Theorem 6.2.10]). (e) follows from the
fact that if ξn → ξ in P(R × S), then the R and Smarginals of ξn converge to
the R and Smarginals of ξ , respectively. Then (f) follows from (a) and (d) by the
contraction principle [7, Theorem 4.2.1].
In the rest of this section, we prove the following theorem.
Theorem 8.2 For all (ψn)n∈N with ψn ∈ Penmp(S) and ψn → ψ , the sequence
(ηn(ψn, ·))n∈N satisfies the large deviation principle with rate function I : P(R) →
[0, ∞], given by
I is continuous on [I < ∞].
As P(S) is first countable, it is sufficient to show that (A2) and (B2) of
Theorem 6.9 hold. In 8.4 we use the bounds of Lemma 8.3 to derive other bounds which
imply (A2) and (B2). The continuity of I follows by continuity of the map ν → H (νλ)
(Lemma 8.5).
i=1
8.4 From Lemma 8.3 we obtain the following bounds for A ∈ B(P(R)) and B ∈
B(P(S)).
μn( A × B) ≤ #Ln−1( A)#Ln−1(B)e−n infν∈m−1(A×B)∩Penmp(R×S) H(νλ)
≤ (n + 1)M e−n infν∈m−1(A×B) H(νλ),
μn( A × B) ≥ (n + 1)−M e−n infν∈m−1(A×B)∩Penmp(R×S) H(νλ).
In order to derive (A2) and (B2) of Theorem 6.9, we make the following observation.
By (8.10) we have for an open U and a closed W that if for both A = U and C = R
n1 log (n + 1)−2M
≤ n1 log (n + 1)2M −
n1 log μn(U × SR × V ) ≥ − inf I (U ),
n1 log μn(W ◦ × SR × V ) ≤ − inf I (W ),
− sup
inf lim sup sup inf
≤ V0∈Nψ n→∞ ζ ∈Penmp(S)∩V0 ν∈m−1(A×{ζ })∩Penmp(R×S)
(8.11) holds (for both A = U and C = R as well as for A = R and C = W , where
U is open and W is closed) if for all open U and all closed W
inf lim sup sup inf
V0∈Nψ n→∞ ζ ∈Penmp(S)∩V0 ν∈m−1(U ×{ζ })∩Penmp(R×S)
(8.16) is a consequence of Lemma 5.3, as m−1(W × V ) = m−1(W × P(S)) ∩
m−1(P(R) × V ), the set F = m−1(W × P(S)) is closed for closed W , m−1(P(R) ×
V ) = (π ◦ m)−1(V ), and π ◦ m is continuous. The proof of inequality (8.15) requires a
little more attention. First we present some facts which are used to prove this inequality
in Lemma 8.8.
Consequently, I as in (8.6) is continuous on [ J < ∞].
Lemma 8.6 (a) Let k, l ∈ N and ζ ∈ Pekmp(S). For all m ≥ kl there exists a ν ∈
Pemmp(S) such that d(ν, ζ ) < 1l .
(b) For all open ⊂ P(S) there exists an N ∈ N such that Penmp(S) ∩ = ∅ for all
n ≥ N .
Proof (a) Let i ∈ {1, . . . , k}. Let ξ ∈ Pemp(S). Then the measure lkl+ki ζ + lki+i ξ is
i
an element of Pelkm+pi (S). For every A ⊂ S
[ lkl+ki ζ + lki+i ξ ]( A) − ζ ( A) ≤ 2 lki+i ≤ 2 lkk = 2l .
By definition of the Prohorov metric, this implies d([ lkl+ki ζ + lki+i ξ ], ζ ) ≤ 2l .
(b) Let ξ ∈ P(S) and δ > 0 be such that B(ξ, δ) ⊂ . For each ξ ∈ P(S) there
is a k ∈ N and a ζ ∈ Pemp(S) such that d(ζ, ξ ) < 2δ . Because of this, (b) follows
k
from (a) by letting l be such that 1l < 2δ and N = lk.
Proof In this proof, for a measure ξ ∈ P(R × S), we write ξrs = ξ({(r, s)}), so that
ξ = rs ξrs δ(r,s) where we use the shorthand notation “ rs ” instead of “ r∈R,s∈S ”.
Let M = #R#S. Note that
Let κ > 0 and n ∈ N. We first give an estimation by which it is clear which κ and
N one should choose. By the assumptions on λ for every s ∈ S there exists a rs ∈ R
with λrss > 0.
First we show that there exists a ξ ∗ ∈ Penmp(X ×Y) with ξ ∗ ξ and ξr∗s −ξrs  ≤ n2
for all r ∈ R and s ∈ S. For each pair (r, s) ∈ R × S with ξrs > 0 we can choose a
ξrs ∈ {0, n1 , n2 , . . . , 1} such that ξrs − ξrs  < n1 . By letting ξr∗s = 0 when ξrs = 0 and
add or subtract n1 to some of the ξrs we obtain a collection of ξr∗s ∈ {0, n1 , n2 , . . . , 1}
with rs ξr∗s = 1 and ξr∗s − ξrs  ≤ n2 and ξr∗s = 0 whenever ξrs = 0 for all r ∈ R
and s ∈ S.
Let ξ ∈ P(R × S). Suppose that ζ ∈ Penmp(S) is such that ζs − r ξrs  < κ.
Then ζs − r ξr∗s  < κ + n2 M . We construct a ν ∈ Penmp(R × S) by defining the
νrs by each s separately. Let s ∈ S. If ζs − r ξr∗s < 0, then we choose νrs ≤ ξr∗s
with νrs ∈ {0, n1 , . . . , 1} in such way that r νrs = ζs (note that νrs − ξr∗s  ≤
ζs − r ξr∗s ). While, if ζs − r ξr∗s ≥ 0, then we let νrs = ξr∗s for all r = rs and we
let νrs s = ξr∗s s + ζs − r ξr∗s (so that r νrs = ζs ). As ξ ∗ ξ and ξ λ, by the
construction of ν we have ν λ. Moreover, we have π ◦ m(ν) = ζ and
r
≤ κ + n2 M + n2 .
which implies by (8.20)
Moreover, as 
d ν(· × S), ξ(· × S) ≤ M max
r∈R
By choosing κ > 0 and N ∈ N such that M 2κ + n2 (M 3 + M 2) < δ the proof is
complete.
Lemma 8.8 For all open U ⊂ R
0 ≤ V0∈Nψ n→∞ ζ ∈Penmp(S)∩V0 ν∈m−1(U ×{ζ })∩Penmp(R×S)
inf lim sup sup inf
Proof We assume infν∈m−1(U ×{ψ}) H (νλ) < ∞. Let ξ ∈ m−1(U × {ψ }) be such
that H (ξ λ) < ∞. Let ε > 0. We show there exists a V0 ∈ Nψ and an N ∈ N such
n n
that for all n ≥ N the set Pemp(S) ∩ V0 is not empty and for all ζ ∈ Pemp(S) ∩ V0
there exists a ν ∈ m−1(U × {ζ }) ∩ Penmp(R × S) with
9 Examples
In Sect. 8, we showed that the regular conditional kernel ηn as in (8.3) satisfies (A1)
and (B1) of Theorem 6.9 by showing that (A2) and (B2) of that theorem hold. This
is not always the most optimal approach; in Example 9.1 we show that for a specific
example of Gaussian measures the expression of ηn allows us to derive (A1) and (B1)
directly.
Furthermore, relying on Theorem 9.2, in Example 9.4, we give an example of a
(ηn)n∈N for which (A1) of Theorem 6.9 does not hold. In Remark 9.5 we mention that
for the one choice of measures in Example 9.4 a quenched large deviation principle is
satisfied, while for the other choice of measures there is no quenched large deviation
principle. In Example 9.6 we show that for a choice of measures as in Example 9.4 the
conditional regular kernel in a specific chosen point does not satisfy any large deviation
principle. In Remark 9.7 we discuss exponential tightness of the regular conditional
kernel. In Remark 9.8 we discuss the differences between the present paper and the
paper of La Cour and Schieve [20].
Example 9.1 Let r = 0, Zn :=
the sequence of probability measures on B(R × R) determined by
R R e− n2 (x2−2r x y+y2) dx dy and consider (μn)n∈N
The sequence satisfies the large deviation principle with rate function J : R2 → [0, ∞]
given by J (x , y) = 21 (x 2 − 2r x y + y2). By Theorem 4.8 ηn given by
e− n2 (x2−2r x y) dx
e− n2 (x−r y)2 dx
is the weakly continuous product regular conditional probability under μn with respect
to the projection on the Ycoordinate. If yn → y, one can show that for λ ∈ R
nl→im∞ n1 log
enλx d[ηn(yn, ·)](x ) = nl→im∞ n1 log
Then by the Gärtner–Ellis theorem (see for example Dembo and Zeitouni [7, Theorem
2.3.6]) we conclude that (ηn(yn, ·))n∈N satisfies the large deviation principle with the
same rate function as the one of the large deviation principle of (ηn(y, ·))n∈N, which is
x → (x − r y)2. Note that this equals J (x , y) − inf J (R × {y}) because of the equality
x 2 − 2r x y + y2 = (x − r y)2 + (1 − r 2)y2.
The proof of the following theorem can be found in “Appendix 3”.
Theorem 9.2 Let X and Y be separable metric spaces. Let (μn1)n∈N and (μn2)n∈N be
sequences of probability measures on B(X ). Let (νn )n∈N be a sequence of probability
measures on B(Y) that satisfies the large deviation principle with a rate function
L : Y → [0, ∞]. Suppose that y ∈ Y and Wn ∈ Ny are such that n∈N Wn = {y}
and αn : Y → [0, 1] is a continuous function with αn(y) = 0 and αn = 1 on Y \ Wn
such that
= 0. (9.4)
Assume (μn1 )n∈N satisfies the large deviation principle with rate function I . Assume
furthermore that for all open A ⊂ X
lim inf n1 log μ1n( A) ≥ linm→i∞nf n1 log μ2n ( A),
n→∞
lim sup n1 log μ1n (X \ A) ≥ lim sup n1 log μ2n(X \ A).
n→∞ n→∞
Then (μn)n∈N satisfies the large deviation principle with rate function J : X × Y →
[0, ∞] given by J (x , y) = I (x ) + L(y). ηn : Y × B(X ) → [0, 1] defined by
ηn(y, A) = αn(y)μn1 ( A) + (1 − αn(y))μn2 ( A)
is the weakly continuous product regular conditional probability under μn with respect
to π : X × Y → Y given by π(x , y) = y.
Note that I (x ) = J (x , y) − inf J (X × {y}) for all x ∈ X , y ∈ Y.
Example 9.3 We give examples of Y, Wn, αn , νn and L such that (9.4) of Theorem 9.2
is satisfied and (νn )n∈N satisfies the large deviation principle with rate function L.
= [0, ∞), αn(y) = min{ny, 1} for y1 ∈ Y and let νn (B) =
0∞ 1B (y)ne−ny d y for B ∈ B([0, ∞)). Then 0n αn dνn = 1 − 2e−1 and
1
0n (1 − αn) dνn = e−1. Therefore with this νn, αn and Wn = [− n1 , n1 ] (9.4) is
satisfied. Moreover (νn)n∈N satisfies the large deviation principle with rate
function L : Y → [0, ∞], L(y) = y (this follows from example by the Gärtner–Ellis
theorem [7, Theorem 2.3.6]).
Let β = ν1([−1, 1]). Let κn = √1n . Then νn[−κn, κn] = β for all n ∈ N. Let φε :
R → [0, 1] be defined by φε(z) = min{ε−1z, 1}. Then limε↓0 [−κn,κn] φε dν1 =
β, limε↓0 [−κn,κn] 1 − φε dν1 = 0 and
Therefore, for all n ∈ N, there exists an εn ∈ (0, κn) such that
Example 9.4 With X = R, μ1n = μN (0, n1 ), μn2 = δ 1 and I (x ) = 21 x 2 for x ∈ R and
n
Y, νn (or νn0), αn, Wn and L as in Examples 9.3 (a) or (b) the conditions of Theorem 9.2
are satisfied (note that (δ n1 )n∈N satisfies the large deviation principle with rate function
H : R → [0, ∞] given by H (0) = 0 and H (x ) = ∞ for x = 0).
Then ηn(0, ·) = δ 1 and ηn(εn, ·) = μN (0, n1 ) for all n ∈ N. Whence (ηn(0, ·))n∈N
n
satisfies the large deviation principle with rate function H and (ηn(εn, ·))n∈N (and
also (ηn(y, ·))n∈N for y > 0) satisfies the large deviation principle with rate function
I . Because I ≤ H , the sequence (ηn(0, ·))n∈N satisfies the large deviation upper
bound not only with H but also with I instead of H . Therefore (b1) of Theorem 6.3
holds in case yn = 0 for all n. Since (ηn(0, ·))n∈N does not satisfy the large deviation
principle with rate function I , (a1) of Theorem 6.3 does not hold. Therefore for any
decreasing sequence (Vm )m∈N in N0 with m∈N Vm = {0} there exists an open set U
with inf I (U ) < ∞ with
We illustrate this for Y, αn, Wn, νn and L as in Examples 9.3(a): For Vm = [0, m1 ),
U = (1, ∞) we get for m ≥ n
ne−ny dy.
ny · ne−ny dy,
which converges to zero as m → ∞, which implies
μn(U × YX × Vm ) ≤ mn μN (0, n1 )(U )
lim sup n1 log μn(U × YX × Vm ) = −∞ < − 2 = − inf I (U ).
1
m→∞
Remark 9.5 (Quenched large deviations) Consider the situation as in Example 9.4.
For all n ∈ N we have the following. If ζn : Y × B(X ) → [0, 1] is a product
regular conditional probability under μn with respect to π , then ζn(y, ·) = ηn(y, ·)
for [μn ◦ π −1]almost all y (see Remark 3.5).
Whence, with νn as in Examples 9.3 (a) or (b), we have a quenched large deviation
principle of the conditional probability with respect to the second coordinate with rate
function I ; for every product regular conditional probability ζn under μn with respect
to π there exists a Z ⊂ Y with μn ◦ π −1(Z ) = νn(Z ) = 1 such that (ζn(y, ·))n∈N
satisfies the large deviation principle with rate function I for all y ∈ Z .
However, with νn0 as in Examples 9.3 (b) instead of νn for such ζ one has ζn(0, ·) =
ηn(0, ·) as ν0( 0 ) > 0. Thus in this case we do not have such a quenched large
n { }
deviation principle.
Example 9.6 With X = N, μ1n = k∈N 2−k δk , μ2n = δn and I (x ) = 0 for x ∈ N as in
Example 9.4, and Y, Wn, αn, νn and L as in Examples 9.3 (a) or (b), the conditions of
Theorem 9.2 are satisfied. In this case, (ηn(0, ·))n∈N does not satisfy a large deviation
principle.
Remark 9.8 Example 9.4 with νn (or νn0) and αn as in Examples 9.3 (b) fits the
assumptions made in Sect. 4 of La Cour and Schieve [20].8 In Sect. 4 of that paper, it is claimed
that the law of the first coordinate conditioned on the second coordinate satisfies the
large deviation principle with the rate function I . Their notion of conditioning on y is
“condition on an arbitrarily small neighbourhood around y”. This approach needs to
be justified. Our results are different, as by Example 9.4 the conditioned kernel in 0,
8 The logarithmic moment generating function (see Dembo and Zeitouni [7, Assumption 2.3.2]) is given
by (x, y) → 21 x2 + 21 y2, whence the Hessian of it equals the identity matrix and is therefore invertible.
In [20], it is mentioned that one cannot proceed the conditioning on all elements, but only those that equal
the derivative of y → 21 y2 at a certain point are considered, of which 0 is an example.
ηn (0, ·) does not satisfy the large deviation principle with the rate function I (even in
the sense of quenched large deviations as discussed in Remark 9.5).
Acknowledgements The author is supported by ERC Advanced Grant VARIS267356 of Frank den
Hollander. The author is grateful to both Frank den Hollander and Frank Redig for valuable suggestions and
useful discussions.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.
Appendix 1: An Elementary Fact About Limsup and Liminf
Lemma 9.9 Let k ∈ N and ani ∈ [0, ∞) for all n ∈ N and i ∈ {1, . . . , k}. If there
exists an N ∈ N such that maxi∈{1,...,k} ani > 0 for all n ≥ N , then9
n→∞
= 0,
= i∈m{1,a..x.,k} linm→s∞up n1 log ani ,
= i∈m{1,a..x.,k} linm→i∞nf n1 log ani .
i=1
i=1
i=1
i=1
≤ n1 log k +
Proof (9.16), (9.17) and (9.18) follow from the inequality
≤ n1 log(k
Appendix 2: Sufficient Bounds for Large Deviation Bounds
m∈N Fm . Then
Let X be a topological space. Let I : X → [0, ∞] have compact sublevel sets. Let
(μn )n∈N be a sequence of probability measures on B(X ).
9 Equation (9.17) can also be found in Dembo and Zeitouni [7, Theorem 1.2.15].
Proof Let c := supm∈N inf I (Fm ). Note that c ≤ inf I (F ). If c = ∞, there is nothing
to prove. Assume that c < ∞. Let K be the compact set [I ≤ c]. Then Fm ∩ K = ∅
for all m ∈ N, whence F ∩ K = ∅ and thus inf I (F ) ≤ c.
9.11 For Lemma 9.10 the condition that I has compact sublevel sets is not redundant.
For example: Let I : N ∪ {0} → [0, ∞] be given by I (0) = 1 and I (x ) = 0 for x ∈ N.
Then for Fm = {0} ∪ {m, m + 1, . . . } and F = {0} one has supm∈N inf I (Fm ) = 0
and inf I (F ) = 1.
Then G = U satisfies (9.21) as well.10
(b) Let F1, F2, . . . be closed. Suppose that for all F ∈ {Fm : m ∈ N}
Then F =
m∈N Fm satisfies (9.22) as well.
Now apply Lemma 9.10.
" U ≥ sup lim inf n1 log μn(G) ≥ sup (− inf I (G)),
G∈U n→∞ G∈U
m∈N
≤ min∈fN linm→s∞up n1 log μn (Fm ) ≤ min∈fN(− inf I (Fm )).
As a consequence of Lemma 9.12 we obtain the following.
Theorem 9.13 Suppose that G is a basis for the topology on X , such that (9.21) holds
for all G ∈ G and (9.22) holds for F = X \ G. Suppose that every open G can be
written as countable union of elements in G. Then (μn)n∈N satisfies the large deviation
principle with rate function I .
Appendix 3: Proof of Theorem 9.2
Proof of Theorem 9.2 As X and Y are separable metric spaces, every open subset of
X × Y is a countable union of elements of the form A × B where A ⊂ X is open and
B ∈ H, where (with dY the metric on Y)
H = {B(y, δ) : δ > 0} ∪ {B(z, δ) : z = y, 0 < δ < dY (y, z)}.
10 This can also be found in O’Brien [23, Proposition 2.1].
We use Theorem 9.13 to prove the large deviation bounds. Note first that (X × Y) \
( A × B) = (X × (Y \ B)) ∪ ((X \ A) × Y), that min{inf I (X \ A), inf L(Y \ B)} =
inf(x,y)∈(X ×Y)\(A×B) I (x ) + L(y) and that by (9.17)
B ⊂ Y
Let A ⊂ X be open and B ∈ H.
• (9.26) follows from the fact that μn(X × (Y \ B)) = νn(Y \ B).
• (9.27) follows from the fact that by (9.4), (9.6) and (9.17) we have
= max $ lim sup n1 log μ1n(X \ A), lim sup n1 log μ2n(X \ A)%
n→∞ n→∞
= lim sup n1 log μ1n(X \ A) ≤ − inf I (X \ A),
n→∞
• (9.28) follows by separating two cases (as either y ∈ B or y ∈/ B): If y ∈/ B, then
Wn ∩ B = ∅ and so μn( A × B) = μn1( A)νn(B) for large n, whence
Suppose that y ∈ B, i.e. Wn ⊂ B for large n. By (9.18) we obtain
lim inf 1 log μn ( A × B) = max $ lim inf 1 log μn1 ( A), lim inf 1 log μn2 ( A)%
n→∞ n n→∞ n n→∞ n
≥ − inf I ( A).
Because inf L ( B) ≥ 0, we conclude (9.28).
We leave it to the reader to check that ηn is the weakly continuous product regular
conditional probability under μn with respect to π .
1. Adams , S. , Dirr , N. , Peletier , M.A. , Zimmer , J. : From a largedeviations principle to the Wasserstein gradient flow: a new micromacro passage . Commun. Math. Phys . 307 ( 3 ), 791  815 ( 2011 )
2. Biggins , J.D.: Large deviations for mixtures . Electron. Commun. Probab . 9 , 60  71 ( 2004 ). (electronic)
3. Billingsley , P. : Convergence of Probability Measures, Wiley Series in Probability and Statistics: Probability and Statistics , 2nd edn. Wiley, New York ( 1999 )
4. Bogachev , V. : Measure Theory . Springer, Berlin ( 2007 )
5. Comets , F. : Large deviation estimates for a conditional probability distribution. Applications to random interaction Gibbs measures . Probab. Theory Relat. Fields 80 ( 3 ), 407  432 ( 1989 )
6. Comets , F. , Gantert , N. , Zeitouni , O. : Quenched, annealed and functional large deviations for onedimensional random walk in random environment . Probab. Theory Relat. Fields 118 ( 1 ), 65  114 ( 2000 )
7. Dembo , A. , Zeitouni , O. : Large Deviations Techniques and Applications , Stochastic Modelling and Applied Probability, vol. 38 . Springer, Berlin ( 2010 ). [Corrected reprint of the second edition ( 1998 )]
8. Deuschel , J.D. , Stroock , D.W. : Large Deviations, Pure and Applied Mathematics, vol. 137 . Academic Press, Boston ( 1989 )
9. van Enter , A.C.D. , Fernández , R., den Hollander , F. , Redig , F. : A largedeviation view on dynamical GibbsnonGibbs transitions . Mosc. Math. J . 10 ( 4 ), 687  711 ( 2010 )
10. van Enter, A.C.D. , Külske , C. , Opoku , A.A. , Ruszel , W.M. : GibbsnonGibbs properties for nvector lattice and meanfield models . Braz. J. Probab. Stat . 24 ( 2 ), 226  255 ( 2010 )
11. Ermolaev , V. , Külske , C. : Lowtemperature dynamics of the CurieWeiss model: periodic orbits, multiple histories, and loss of Gibbsianness . J. Stat. Phys . 141 ( 5 ), 727  756 ( 2010 )
12. Faden , A. : The existence of regular conditional probabilities: necessary and sufficient conditions . Ann. Probab . 13 ( 1 ), 288  298 ( 1985 )
13. Fernández , R., den Hollander , F. , Martínez , J. : Variational description of GibbsnonGibbs dynamical transitions for the CurieWeiss model . Commun. Math. Phys . 319 ( 3 ), 703  730 ( 2013 )
14. Greven , A. , den Hollander , F. : Large deviations for a random walk in random environment . Ann. Probab . 22 ( 3 ), 1381  1428 ( 1994 )
15. Halmos , P.R. : Measure Theory. Springer, Berlin ( 1974 )
16. den Hollander , F. : Large Deviations , Fields Institute Monographs , vol. 14 . American Mathematical Society, Providence, RI ( 2000 )
17. den Hollander , F. , Redig , R., van Zuijlen , W.: GibbsnonGibbs dynamical transitions for meanfield interacting Brownian motions . Stoch. Process. Appl . 125 ( 1 ), 371  400 ( 2015 )
18. Kosygina , E. , Rezakhanlou , F. , Varadhan , S.R.S. : Stochastic homogenization of HamiltonJacobiBellman equations . Commun. Pure Appl. Math. 59(10) , 1489  1521 ( 2006 )
19. Külske , C. , Opoku , A.A.: Continuous spin meanfield models: limiting kernels and Gibbs properties of local transforms . J. Math. Phys . 49 (12), 125215 ( 2008 )
20. La Cour , B.R. , Schieve , W.C.: A general conditional large deviation principle . J. Stat. Phys . 161 ( 1 ), 123  130 ( 2015 )
21. Leao Jr., D. , Fragoso , M. , Ruffino , P. : Regular conditional probability, disintegration of probability and Radon spaces . Proyecciones 23 ( 1 ), 15  29 ( 2004 )
22. Léonard , C.: A large deviation approach to optimal transport . arXiv:0710.1461v1 (2007)
23. O'Brien , G.L. : Sequences of capacities, with connections to largedeviation theory . J. Theor. Probab . 9 ( 1 ), 19  35 ( 1996 )
24. RassoulAgha , F. , Seppäläinen , T. , Yilmaz , A. : Quenched free energy and large deviations for random walk in random potential . Commun. Pure Appl. Math . 66 ( 2 ), 202  244 ( 2013 )
25. RassoulAgha , F. , Seppäläinen , T.: A Course on Large Deviations with an Introduction to Gibbs Measures , Graduate Studies in Mathematics, vol. 162. American Mathematical Society , Providence, RI ( 2015 )
26. Schaefer , H.H. : Topological Vector Spaces. Springer, New York ( 1971 ). ( Third printing corrected, Graduate Texts in Mathematics, Vol. 3)
27. Steen , L.A. , Seebach Jr., J.A.: Counterexamples in Topology. Holt, Rinehart and Winston , New York ( 1970 )