Large Deviations of Continuous Regular Conditional Probabilities

Journal of Theoretical Probability, Dec 2016

We study product regular conditional probabilities under measures of two coordinates with respect to the second coordinate that are weakly continuous on the support of the marginal of the second coordinate. Assuming that there exists a sequence of probability measures on the product space that satisfies a large deviation principle, we present necessary and sufficient conditions for the conditional probabilities under these measures to satisfy a large deviation principle. The arguments of these conditional probabilities are assumed to converge. A way to view regular conditional probabilities as a special case of product regular conditional probabilities is presented. This is used to derive conditions for large deviations of regular conditional probabilities. In addition, we derive a Sanov-type theorem for large deviations of the empirical distribution of the first coordinate conditioned on fixing the empirical distribution of the second coordinate.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs10959-016-0733-1.pdf

Large Deviations of Continuous Regular Conditional Probabilities

Large Deviations of Continuous Regular Conditional Probabilities W. van Zuijlen 0 0 Mathematical Institute, Leiden University , P.O. Box 9512, 2300 RA Leiden , The Netherlands We study product regular conditional probabilities under measures of two coordinates with respect to the second coordinate that are weakly continuous on the support of the marginal of the second coordinate. Assuming that there exists a sequence of probability measures on the product space that satisfies a large deviation principle, we present necessary and sufficient conditions for the conditional probabilities under these measures to satisfy a large deviation principle. The arguments of these conditional probabilities are assumed to converge. A way to view regular conditional probabilities as a special case of product regular conditional probabilities is presented. This is used to derive conditions for large deviations of regular conditional probabilities. In addition, we derive a Sanov-type theorem for large deviations of the empirical distribution of the first coordinate conditioned on fixing the empirical distribution of the second coordinate. (Product) regular conditional kernel; Weakly continuous; Large deviations - P(Xn ∈ A | Yn = yn), B W. van Zuijlen where ((Xn, Yn))n∈N is a sequence of couples of random variables that satisfies a large deviation principle and yn → y for some y. As the event [Yn = yn] may have probability zero, we make sense of (1.1) in terms of a kernel ηn, so that “represents” (1.1). Such kernels are called regular conditional probabilities and form an important object in probability theory. The existence of regular conditional probabilities has been studied extensively, for example, by Faden [12] or by Leao et al. [21]. There exist in fact various forms of regular conditional probabilities, namely either with respect to a σ -algebra, with respect to a measurable map or with respect to the projection on one of the coordinates (in case of a product space). In order to consider large deviations of conditional probabilities, we have to specify which conditional probability we are considering; the conditional probability may not be unique. However, if a (product) regular conditional probability is weakly continuous on the support of the measure composed with the inverse of the measurable map (or projection), it is unique on that domain. For these (product) regular conditional probabilities, it is natural to study their large deviations whenever the argument of the probability is in the domain on which it is unique. In this paper, we study the large deviations in the case when the arguments of these kernels converge, i.e. we study large deviations of (ηn(yn, ·))n∈N for the case that yn → y. To the best of our knowledge, current literature does not provide a general condition under which such kernels satisfy a large deviation principle. 1.1 Literature Some examples in this direction are present. For example in Adams et al. [1], the large deviation principle is proved for the empirical distribution that is evolved by independent Brownian motions conditioned on their initial empirical distribution to lie in a ball (see [1, Theorem 1]). They proceed by proving that the large deviation principle rate function converges as the radius of the ball converges to zero. For the purpose of this paper, we have to show that the limit of the radius of the ball and the limit belonging to the large deviation principle can be interchanged. Léonard [22] proves the large deviation principle of the empirical distribution that is evolved by independent Brownian motions conditioned on their initial empirical distribution; those initial empirical distributions are assumed to be converging (see [22, Proposition 2.19]). In both papers, the evolved state is conditioned on the initial state, while there is also interest in large deviations of the initial state conditioned on the evolved state. In this paper, we prove the large deviation principle in this setting for finite state spaces. There exist various results on quenched large deviations, i.e. large deviations for regular conditional probabilities in the sense that for almost all realisations of the disorder, the conditional probabilities satisfy the large deviation principle with a rate function that does not depend on the disorder. Examples of papers on quenched large deviations are Comets [5] for conditional large deviations of i.i.d. random fields, Greven and den Hollander [14] and Comets et al. [6] for random walks in random environments, Kosygina–Rezakhanlou–Varadhan [18] for a diffusion with a random drift and Rassoul-Agha et al. [24] for polymers in a random potential. Biggins [2] obtains the large deviation principle for mixtures of probability measures that satisfy the large deviation principle with kernels that satisfy the large deviation principle as their arguments converge. To some extent, we complement the article in the opposite direction, in the sense that we assume the large deviation principle of the mixture and derive the large deviation principle of the kernels. Our main motivation to study the above large deviations lies in the theory of Gibbs– non-Gibbs transitions. There is a correspondence between the large deviation rate function of the conditional probability with respect to the evolved coordinate and the evolved state (measure or sequence) being Gibbs (see van Enter et al. [9]). We refer to Sect. 1.4 for further discussions on Gibbs–non-Gibbs transitions. 1.2 Large Deviations In the literature on large deviations, two dominant definitions of large deviation principles are used. One is in terms of a σ -algebra on the topological space, as is done in the book by Dembo and Zeitouni [7] and in the book by Deuschel and Stroock [8]; the other is in terms of the topology, i.e. in terms of open and closed sets, as is done in the book by den Hollander [16] and in the book by Rassoul-Agha and Seppäläinen [25]. Whenever one considers the Borel σ -algebra on the topological space, the two definitions agree. We define the large deviation lower bound and the large deviation upper bound separately, as in Sect. 1.3, and in Sect. 6, we describe the necessary and sufficient conditions for each of the bounds separately. Moreover, we define them on a set of subsets of the topological space, which is not required to be a σ -algebra. In Remark 7.4, we motivate the choice for this definition. Definition 1.1 Let X be a topological space and A be a set of subsets of X . Let I : X → [0, ∞] be lower semicontinuous. Let (μn)n∈N be a sequence of probability measures on A. Let (rn)n∈N be an increasing sequence in (0, ∞) with limn→∞ rn = ∞. We say that (μn)n∈N satisfies a large deviation lower bound on A with rate function I and rates (rn)n∈N if linm→i∞nf r1n log μn( A) ≥ − inf I ( A◦) ( A ∈ A). We say that (μn)n∈N satisfies a large deviation upper bound on A with rate function I and rates (rn)n∈N if In the rest of the paper we only consider the rates rn = n. However, the theory presented is still valid for general rates (rn)n∈N. We say that (μn)n∈N satisfies a large deviation principle on A with rate function I whenever it satisfies both the large deviation lower bound and the large deviation upper bound with rate function I . We omit “on A” whenever A is the Borel σ -algebra B(X ) on X . In this case, the large deviation lower bound is satisfied if and only if the inequality in (1.2) holds for all open subsets of X and the large deviation upper bound is satisfied if and only if the inequality in (1.3) holds for all closed subsets of X . 1.3 Main Results See Sects. 3 and 4 for the definitions of the objects in the statements of the following theorems. In Sects. 6 and 7, we consider a more general situation. Theorem 1.2 is a consequence of Theorem 6.9, and Theorem 1.3 is a consequence of Theorem 7.5. In this section X and Y are metric spaces. Theorem 1.2 Let π : X × Y → Y be given by π(x , y) = y. Suppose that (μn)n∈N is a sequence of probability measures on B(X ) ⊗ B(Y) that satisfies the large deviation principle with rate function J : X × Y → [0, ∞] that has compact sublevel sets. Suppose that for each n ∈ N, there exists a product regular conditional probability ηn : Y × B(X ) → [0, 1] under μn with respect to π that is weakly continuous on supp(μn ◦ π −1), which we assume to be non-empty. Let y ∈ Y be such that inf J (X × {y}) < ∞. Define I : X → [0, ∞] by I (x ) = J (x , y) − inf J (X × {y}). I has compact sublevel sets, and, for each n ∈ N, ηn is unique on supp(μn ◦ π −1). Moreover, (A2) and (B1) ⇐⇒ (A1) For all (yn)n∈N with yn → y and yn ∈ supp(μn ◦ π −1) for all n large enough,1 the sequence (ηn(yn, ·))n∈N satisfies the large deviation lower bound with rate function I . (A2) For all x ∈ X and r > 0, with U = B(x , r ), sup lim inf z∈Yi,nδ∈f(0,ε) n1 log μn U × Y X × B(z, δ) ≥ − inf I (U ). ε>0 n→∞ B(z,δ)⊂B(y,ε) (B1) For all (yn)n∈N with yn → y and yn ∈ supp(μn ◦ π −1) for all n large enough, the sequence (ηn(yn, ·))n∈N satisfies the large deviation upper bound with rate function I . (B2) For all x1, . . . , xk ∈ X and r1, . . . , rk > 0, with W = X \ [B(x1, r1) ∪ · · · ∪ B(xk , rk )], 1 Meaning that there exists an N ∈ N such that yn ∈ supp(μn ◦ π−1) for all n ≥ N . inf lim sup z∈Ys,uδ∈p(0,ε) n1 log μn W ◦ × Y X × B(z, δ) ≤ − inf I (W ). (1.6) ε>0 n→∞ The next theorem is similar to Theorem 1.2, but considers the large deviation bounds for regular conditional kernels instead of product regular conditional probabilities. Theorem 1.3 Let τ : X → Y be continuous. Suppose that (νn)n∈N is a sequence of probability measures on B(X ) that satisfies the large deviation principle with rate function J : X → [0, ∞] that has compact sublevel sets. Suppose that for each n ∈ N there exists a regular conditional probability ηn : Y × B(X ) → [0, 1] under νn with respect to τ that is weakly continuous on supp(νn ◦ τ −1), which is assumed to be non-empty. Let y ∈ Y be such that inf J (τ −1({y})) < ∞. Define I : X → [0, ∞] by I (x ) = J (x ) − inf J (τ −1({y})) τ (x ) = y, I has compact sublevel sets, and, for each n ∈ N, ηn is unique on supp(νn ◦ τ −1). Moreover, (A1) ⇐⇒ (A2) and (B1) ⇐⇒ (B2), (A1) For all (yn)n∈N with yn → y and yn ∈ supp(νn ◦ π −1) for all n large enough, the sequence (ηn(yn, ·))n∈N satisfies the large deviation lower bound with rate function I . (A2) For all x ∈ X and r > 0, with U = B(x , r ), (B1) For all (yn)n∈N with yn → y and yn ∈ supp(νn ◦ π −1) for all n large enough, the sequence (ηn(yn, ·))n∈N satisfies the large deviation upper bound with rate function I . (B2) For all x1, . . . , xk ∈ X and r1, . . . , rk > 0, with W = X \ [B(x1, r1) ∪ · · · ∪ B(xk , rk )], In this section we discuss the relation between the large deviation results in this paper and Gibbs–non-Gibbs transitions in more detail. In particular, we discuss possible future directions regarding large deviations of conditional kernels. The following situation for interacting particle systems occurs in the mean-field context (a similar context holds in the context of lattices). The initial system of socalled spins consists of distributions describing the interaction between spins via a potential V (for each n there is a distribution describing the law of n spins). This initial system is assumed to be Gibbs, which is called sequentially Gibbs in the meanfield context. Allowing the initial state to be transformed, for example, by an evolution of the spins, a question of interest is whether the transformed state is (sequentially) Gibbs. This question has been addressed in the mean-field context by Ermoleav and Külske [11] and by Fernández et al. [13] for {−1, +1}-valued spins, by den Hollander et al. [17] for R-valued spins and by Külske and Opoku [19] and van Enter et al. [10] for compactly valued spins. In these papers, independent dynamics of the spins are considered (the evolution of each spin is independent of the evolution of the other spins). Independent dynamics simplify the situation. Namely, the evolved measure on either the product space of the initial and the final space, or—in case of an evolution— the space of trajectories, is a tilted measure of the evolved measure when considering V = 0. In this case the measure is a product measure, which means that the spins are independent. As a consequence (this will be clarified in a forthcoming paper), the conditional kernel ηn of the initial state on n spins with respect to the final state (for a fixed potential V ) is a tilted version of the conditional kernel ηn0 of the initial state with respect to the final state of independent spins (i.e. V = 0). Because of this tilting, by Varadhan’s lemma, (ηn(yn, ·))n∈N satisfies the large deviation principle with rate function V + Iy − inf(V + Iy ) if (ηn0(yn, ·))n∈N satisfies the large deviation principle with rate function Iy . In the forthcoming paper, we will prove that the evolved sequence is sequentially Gibbs if V + Iζ has a unique global minimiser. The large deviation principle of (ηn(yn, ·))n∈N has been mentioned in the case of trajectories in [11, Corollary 2.4] and—as a corollary of that theorem—for the case of the product space of the initial and the final space in [13, Corollary 1.3]. However, no proof was given. Theorem 8.2 provides a rigorous proof of the large deviation principle statement in [13, Corollary 1.3]. In this paper, we do not provide a rigorous proof of [11, Corollary 2.4]. But Theorem 1.3 may be used, as the conditioning on the final state is a regular conditional kernel with respect to the map τ : C ([0, T ], X ) → X , τ ( f ) = f (T ). In order to deal with empirical distributions (and not with magnetisations as is done in [17]), in future research we strive to “extend” the statement of Theorem 8.2 to infinite and possibly non-compact state spaces. In the case of non-compact spaces, it may be that topologies on the space of probability measures are considered that are not metrisable. 1.5 Outline We list some notations, definitions and assumptions in Sect. 2. In Sect. 3, we give and compare the notions of regular conditional kernels, and we show that a regular conditional kernel under a measure ν is in fact a product regular conditional kernel under a measure that is related to ν. In Sect. 4, we introduce and study weakly continuous regular conditional kernels. In Sect. 5, we present some facts about lower semicontinuous functions with compact sublevel sets. Relying on the results of Sects. 4 and 5, in Sect. 6, we present results on large deviation bounds for product regular conditional probabilities, in particular necessary and sufficient conditions for these bounds to hold. In Sect. 7, we discuss how to obtain large deviation bounds for regular conditional probabilities from the results in Sect. 6. In Sect. 8, we apply the theory to obtain the large deviation principle for the empirical density of the first coordinate given the empirical density of the second coordinate, for independent and identically distributed pairs of random variables. In Sect. 9, we give some examples. We also include an example for which the conditions are not satisfied. For this example we compare the quenched large deviations with large deviations of the weakly continuous regular conditional probabilities and comment on the difference with an example by La Cour and Schieve [20]. In “Appendices 1 and 2” we state some general results considering large deviations bounds that are used in the different sections. In “Appendix 3” we provide the proof of a theorem on which the examples of Sect. 9 rely. 2 Notations and Conventions N = {1, 2, 3, . . . }. For a topological space X , we write B(X ) for the Borel σ -algebra and P(X ) and M(X ) for the spaces of probability and signed measures on B(X ), respectively. For A ⊂ X we write A◦ for the interior of A and A for the closure of A. For x ∈ X we write δx for the element in P(X ) with δx ( A) = 1 if x ∈ A and δx ( A) = 0 otherwise. For x ∈ X we write Nx for the set of B(X )-measurable neighbourhoods of x . For a μ ∈ M(X ) we write supp μ = {x ∈ X : |μ|(V ) > 0 for all V ∈ Nx } and call this the support of μ. For a function f from a set X into R and c ∈ R we write [ f ≥ c] = {x ∈ X : f (x ) ≥ c}. Similarly, we use the notations [ f > c], [ f ≤ c] and [ f < c]. Whenever (xι)ι∈I is a net, where I is a directed set by (a direction) , we write lim infι∈I xι = supι0∈I infι ι0,ι∈I Rxι, (wsiemiwlarriltye lim sup). In particular, if V ⊂ Nx and V = {x } and f : V → lim inf V ∈V f (V ) = supV0∈V inf V ⊂V0,V ∈V f (V ) (i.e. we consider ( f (V ))V ∈V as a net where V is directed by ⊃ (as )). Whenever we write μ( A|B), we implicitly assume that it is well defined (as μ( A ∩ B)/μ(B)), i.e. that μ(B) = 0. We use the conventions log 0 = −∞ and inf I (∅) = ∞ whenever I is a function with values in [0, ∞]. All measures in this paper are signed measures, unless mentioned otherwise. 3 Regular Conditional Kernels Being Product Regular Conditional Kernels In this section we introduce the notion of a (product) regular conditional kernel. For an extensive study on regular conditional kernels, see Bogachev [4, Section 10.4]. The notion of a product regular conditional kernel does not appear in [4], but it does in Faden [12] and in Leao et al. [21]. Besides giving definitions, we make a few observations, of which Theorem 3.6 is used later on to derive statements of regular conditional kernels from statements of product regular conditional kernels. In this section (X, A), (Y, B) are measurable spaces, ν is a measure on A and μ is a measure on A ⊗ B, τ : X → Y is measurable, and π : X × Y → Y is given by π(x , y) = y. Definition 3.1 A function η : Y × A → R is called a (B-)kernel if η(·, A) is (B)measurable for all A ∈ A and η(y, ·) is a measure for all y ∈ Y . A kernel η is called a probability kernel if η(y, ·) is a probability measure for all y ∈ Y . Definition 3.2 Let η : Y × A → R be a (probability) kernel. (a) η is called a regular conditional kernel (regular conditional probability) under ν with respect to τ if 1B (y)η(y, A) d |ν| ◦ τ −1 (y) ( A ∈ A, B ∈ B). (3.1) (b) η is called a product regular conditional kernel (product regular conditional probability) under μ with respect to π if 1B (y)η(y, A) d |μ| ◦ π −1 (y) ( A ∈ A, B ∈ B). 3.3 Suppose that E is a sub-σ -algebra of F . Let (Y, B) = (X, E ) and Id : (X, A) → (Y, B) be the identity map. In agreement of [4, Definition 10.4.1] a kernel η : Y ×A → R is a regular conditional kernel under μ with respect to E if and only if η is a regular conditional kernel under μ with respect to Id. 3.4 Consider the two kernels η : Y ×A → R and ξ : Y ×(A⊗B) → R, corresponding to each other by the formulas ξ(y, F ) = 1F (x , y) d[η(y, ·)](x ) and η(y, A) = ξ(y, A × Y ). Then ξ is a regular conditionaXlkernel under μ given π if and only if η is a product regular conditional kernel under μ given π . In general, X × Y may be equipped with a σ -algebra F different from A ⊗ B. In this situation, where μ is a measure on F and π is F -measurable, the above correspondence cannot be used in general to reduce statements about product regular conditional kernels to statements about regular conditional kernels. See also example 4.5. On the other hand, regular conditional probabilities can be seen as special cases of product regular conditional probabilities; see Theorem 3.6. In the present paper we use this to derive Theorem 1.3 from Theorem 1.2 but also Theorem 7.5 from Theorem 6.9. Remark 3.5 If A is generated by a countable set, two regular conditional probabilities under a measure with respect to a σ -algebra (see 3.3) are almost everywhere equal (see Bogachev [4, Theorem 10.4.3]). Similarly one could state an analogous statement for regular conditional kernels with respect to measurable maps and for product regular conditional kernels. In Theorem 4.3 we prove that (product) regular conditional kernels are unique on the domain on which they are weakly continuous, in case the underlying topological space is perfectly normal. For such space the Borel σ -algebra may not be generated by a countable set.2 2 The Sorgenfrey line, the space R with the right half-open interval topology, is perfectly normal but not second countable (see Steen and Seebach [27, Example 51]). Theorem 3.6 (a) There exists a measure μ˜ on (X × Y, A ⊗ B) for which μ˜ ( A × B) = ν( A ∩ τ −1(B)). (b) η : Y × A → R is a regular conditional kernel under ν with respect to τ if and only if η is a product conditional kernel under μ˜ with respect to π . Proof (a) We may assume ν to be positive, since ν = ν+ − ν−. Let E be the set that consists of in=1 Ai × Bi , where n ∈ N and Ai ∈ A, Bi ∈ B are such that A1 × B1, . . . , An × Bn are disjoint. Define ν∗ : E → [0, ∞) by ν∗ in=1 Ai × Bi = ν in=1 Ai ∩ τ −1(Bi ) for A1, . . . , An ∈ A and B1, . . . , Bn ∈ B as above. Checking that E is a ring of sets and that ν∗ is σ additive is left for the reader. The existence and unicity of the extension μ˜ follow from the Carathéodory theorem (see Halmos [15, Section 13, Theorem A]). (b) It follows from by definition of μ˜ (note that ν ◦ τ −1 = μ˜ ◦ π −1). 4 Weakly Continuous Kernels In this section we introduce the notion of weak continuity for kernels on topological spaces. In Theorem 4.3 we show uniqueness of (product) regular conditional kernels that are weakly continuous. In Theorems 4.6 and 4.8 we describe conditions that imply the existence of weakly continuous regular conditional probabilities. Similarly as is done in the Portmanteau theorem when one considers metric spaces, weak convergence implies lower bounds for open sets and upper bounds for closed sets, as is shown in Theorem 4.10. As described in Lemmas 4.11 and 4.12, these lim inf and lim sup bounds imply bounds for (product) regular conditional probabilities on which the results of Sects. 6 and 7 are based. In this section X and Y are topological spaces, ν is a measure on B(X ), μ is a measure on B(X ) ⊗ B(Y), τ : X → Y is measurable, and π : X × Y → Y is given by π(x , y) = y. Definition 4.1 We equip the space of measures, M(X ), with the weak topology (generated by Cb(X ), which we denote by σ (M(X ), Cb(X )) as in the book of Schaefer [26, Chapter II, Section 5]). In this topology, a net (μι)ι∈I in M(X ) converges to a μ iLneMtD(X⊂)Yif. Aankdeornnelyl ηif: YX ×f Bd(μXι )→→ XR ifs cdaμllefdorwaellakfly∈coCnbt(iXnu)o.us on D if the map D → M(X ) given by y → η(y, ·) is continuous in the weak topology. η is called weakly continuous if η is weakly continuous on Y. Theorem 4.2 Let X be a perfectly normal3 space and μ ∈ M(X ). Then x ∈ X : f d|μ| > 0 for all f ∈ C (X , [0, 1]) with f (x ) > 0 . 3 Perfectly normal means that every open set in X is equal to f −1((0, ∞)) for some f ∈ C(X ). All metric spaces are perfectly normal; Bogachev [4, Proposition 6.3.5]. Moreover, |μ|(X \supp(μ)) = 0.4 As a consequence, μ = 0 if and only if 0 for all f ∈ Cb(X ). Proof We may assume μ is positive. Let x ∈ supp μ. Then μ(V ) > 0 for all V ∈ Nx . Let f ∈ C (X , [0, 1]) be such that f (x ) > 0. Then V = f −1(0, ∞) has strictly positive measure. Since μ(V ) = limn→∞ X min{n f, 1} dμ, there exists an n such that min{n f, 1} dμ > 0. Consequently, as f ≥ n1 min{n f, 1}, we have f dμ > 0. X X Let x ∈ X be such that f dμ > 0 for all f ∈ C (X , [0, 1]) with f (x ) > 0. X Let V ∈ Nx . As V = f −1(0, ∞) for some f ∈ C (X , [0, 1]), we have μ(V ) ≥ f dμ > 0. Theorem 4.3 Suppose that X is a perfectly normal space. (a) Let η and ζ be regular conditional kernels under ν with respect to τ that are weakly continuous on supp(|ν| ◦ τ −1). Then η(y, ·) = ζ (y, ·) for all y ∈ supp(|ν| ◦ τ −1). If ν is a probability measure, then η(y, ·) is a probability measure for all y ∈ supp(|ν| ◦ τ −1). (b) Let η and ζ be product regular conditional kernels under μ with respect to π that are weakly continuous on supp(|μ| ◦ π −1). Then η(y, ·) = ζ (y, ·) for all y ∈ supp(|μ| ◦ π −1). If μ is a probability measure, then η(y, ·) is a probability measure for all y ∈ supp(|μ| ◦ π −1). Proof We prove (a), and the proof of (b) is similar (replace “|ν|◦τ −1” by “|μ|◦π −1”). To prove η = ζ on D = supp(|ν| ◦ τ −1), by Theorem 4.2, it is sufficient to prove f dη(y, ·) = f dζ (y, ·) for all y ∈ D and all f ∈ Cb(X ). Let f ∈ Cb(X ). BXecause f is the unXiform limit of simple functions, one has for all B ∈ B(Y) Therefore there exists a set Z ∈ B(Y) with |ν| ◦ τ −1(Y \ Z ) = 0 such that Since both y → f dη(y, ·) and y → f dζ (y, ·) are weakly continuous on D, and Z is dense in XD by Theorem 4.2, we hXave f dη(y, ·) = f dζ (y, ·) for all y ∈ D. The second statement is proved by takinXg f = 1X . X 4.4 When η is a regular conditional kernel under ν with respect to τ , the value of the function η(·, A) on the complement of supp(|ν| ◦ τ −1) is not determined, in the sense that if η˜ is a kernel with η˜(y, ·) = η(y, ·) for all y ∈ supp(|ν| ◦ τ −1), then η˜ is also a regular conditional kernel under ν with respect to τ . 4 This is not true in general. For an example, see Bogachev [4, Example 7.1.3]. For example η˜ given by η˜(y, ·) = η(y, ·) for y ∈ supp(|ν| ◦ τ −1) and η˜(y, ·) = δx for y ∈ supp(|ν| ◦ τ −1)c for some chosen x ∈ X , is such regular conditional kernel. Whence if ν is a probability measure and there exists a regular conditional kernel under ν with respect to τ that is weakly continuous on supp(|ν| ◦ τ −1), then we may as well assume this kernel to be a probability kernel. A similar statement is true for product regular conditional kernels. 4.5 By Theorem 3.6, statement (a) of Theorem 4.3 is a consequence of statement (b). In an attempt to reduce statement (b) to statement (a), the following problem occurs to the correspondence between regular conditional kernels and product regular conditional kernels that is mentioned in 3.4. The Borel σ -algebra of X × Y, i.e. B(X × Y), may be strictly larger than B(X ) ⊗ B(Y) (see, e.g. Bogachev [4, Lemma 6.4.1 and Example 6.4.3]). If this is the case, i.e. B(X ) ⊗ B(Y) B(X × Y), and B(X × Y) equals the Baire-σ -algebra on X × Y, i.e. the smallest σ -algebra that makes all continuous function X × Y → R measurable, then there exists a continuous function f ∈ C (X × Y) that is not B(X ) ⊗ B(Y)measurable. Composing the function f with arctan, we obtain a g ∈ Cb(X × Y) that is not measurable with respect to B(X ) ⊗ B(Y). So if η : Y × B(X ) → R is a product regular conditional kernel under μ with respect to π , and ξ : Y × B(X ) ⊗ B(Y) → R is as in Example 3.4, then g is not integrable with respect to ξ(y, ·) for any y ∈ Y. B(X × Y) equals the Baire-σ -algebra ifRXR ×eqYuipispeadmweittrhicthspeadciesc(rBeotegatochpeovlo[g4y, Proposition 6.3.4]). Therefore X = Y = form an example for which the above is the case. We state two theorems (Theorems 4.6 and 4.8) showing the existence of product regular conditional probabilities that are weakly continuous on supp(|μ| ◦ π −1). Theorem 4.6 Suppose that Y is countable and equipped with the discrete topology. Then η : Y × B(X ) → R defined by is a product regular conditional kernel under μ with respect to π that is weakly continuous on supp(|μ| ◦ π −1). 4.7 In case Y is first countable, the notion of open and closed sets and continuity of functions Y → R is characterised by the convergence of sequences. Therefore the following are equivalent for a kernel η : Y × B(X ) → R (b) For all (yn)n∈N in Y with yn → y, one has η(yn, ·) −→w η(y, ·). The following theorem is an easy consequence of Lebesgue dominated convergence theorem. Theorem 4.8 Let Y be first countable. Let λ be a probability measure on B(X ). Let D ⊂ Y. Let f : X × Y → [0, ∞) be a bounded B(X ) ⊗ B(Y)-measurable function such that y → f (x , y) is continuous on D and equal to zero on Y \ D for λ-almost all x ∈ X . Suppose that f (x , y) dλ(x ) > 0 for all y ∈ D. If η : Y × B(X ) → [0, 1] is given by X y ∈ D, y ∈/ D. then η is weakly continuous on D (even strongly continuous, i.e. y → η(y, A) is continuous for all A ∈ B(X )). Let κ be a probability measure on B(Y) and assume D = supp κ. Then η is a product regular conditional kernel under with respect to π , that is weakly continuous on D = supp(|μ| ◦ π −1). Remark 4.9 In the above theorem the conditions may be weakened. Instead of assuming f to be bounded and λ, κ to be probability measures, we may as well assume that λ and κ are positive non-zero measures; that for all y ∈ D there exists a V ∈ Ny and a λ-integrable h : X → [0, ∞) such that f (x , z) ≤ h(x ) for all x ∈ X ; and all z ∈ V ∩ D and that f is λ ⊗ κ-integrable. In Sect. 6, the condition (b) of Theorem 4.10 is one of the key assumptions. If X is a metric space, this property follows from weak continuity as in the Portmanteau theorem. We state this in Theorem 4.10. Theorem 4.10 Let η : Y × B(X ) → R be a probability kernel. Let D ⊂ Y, y ∈ D and V ⊂ Ny be such that V = {y}. Consider the following conditions. (a) D → M(X ), y → η(y, ·) is weakly continuous in y. (b) lim infι∈I η(yι, G) ≥ η(y, G) for all open G ⊂ X and (yι)ι∈I in D with yι → y. (c) lim supι∈I η(yι, F ) ≤ η(y, F ) for all closed F ⊂ X and (yι)ι∈I in D with yι → y. (d) supV ∈V infv∈V ∩D η(v, G) ≥ η(y, G) for all open sets G ⊂ X . (e) inf V ∈V supv∈V ∩D η(v, F ) ≤ η(y, F ) for all closed sets F ⊂ X . (b), (c), (d), (e) are equivalent. If X is metrisable, then (a) implies (b). If X is metrisable and Y is first countable, then (a) is equivalent to (b) and hence to (c), (d), and (e). Proof We leave it to the reader to check the equivalences between (b), (c), (d), (e). If X is a metric space, one can follow the lines of the Portmanteau theorem in the book of Billingsley [3, Theorem 2.1] for the implication (a) implies (b); the fact that the measures in the proof are indexed by the natural numbers instead of a general directed set I does not affect the argument. The proof of (b) ⇒(a) in the book of Billingsley relies on the Lebesgue dominated convergence theorem. But when Y is first countable, one can restrict to sequences (see 4.7) and obtain the implication (b) ⇒(a) as is done in the book of Billingsley. Lemma 4.11 Assume that μ is a probability measure. Let η be a product regular conditional probability under μ with respect to π . Write D = supp(μ ◦ π −1) and let y ∈ D. Then for every U ∈ Ny , one has μ(X × U ) > 0 and Moreover, if V ⊂ Ny is such that Proof Let U ∈ Ny . Since y ∈ D = supp(μ ◦ π −1), one has μ(X × U ) > 0. (4.7) follows from the fact that for all A ∈ B(X ) = Y For an open G ⊂ X we have for V as above Thus (4.8) follows when assuming (d) of Theorem 4.10. Similarly, one obtains (4.9). For a regular conditional probability we have a similar statement; see Lemma 4.12. The proof can be done following the lines of the proof of Lemma 4.11 or as a consequence of Lemma 4.11 using Theorem 3.6. Lemma 4.12 Assume that ν is a probability measure. Let η be a regular conditional probability under ν with respect to τ . Write D = supp(ν ◦ τ −1) and let y ∈ D. Then for every U ∈ Ny , one has ν(τ −1(U )) > 0 and Moreover, if V ⊂ Ny is such that 5 Some Facts About Functions with Compact Sublevel Sets In this section we present some facts for functions with compact sublevel sets which are used in Sects. 6, 7 and 8. In this section X , Y and Z are topological spaces. Definition 5.1 Let J : X → [0, ∞]. We call the set [ J ≤ α] (see Sect. 2) a sublevel set of J for α ∈ [0, ∞). J is said to be lower semicontinuous if all sublevels of J are closed. J is said to have compact sublevel sets if all sublevels of J are compact. 5.2 Let J : X → [0, ∞] be lower semicontinuous. Then Indeed, for all α < J (x ) the set [ J > α] is open and contains x . Hence, a function J : X → [0, ∞] is lower semicontinuous if and only if for all x ∈ X and all nets (xι)ι∈I in X that converge to x . Lemma 5.3 Let τ : Z → Y be continuous. Let J : Z → [0, ∞] have compact sublevel sets. Let y ∈ Y and V ⊂ Ny , V = {y}. Let F ⊂ Z be closed. Then Proof The ≤ inequality in (5.3) is immediate. Because lim inf V ∈V inf J (F ∩ τ −1(V )) ≥ lim inf V ∈Ny inf J (F ∩ τ −1(V )), it is sufficient to prove α := liVm∈ Ninyf inf J (F ∩ τ −1(V )) ≥ inf J (F ∩ τ −1({y})). Suppose that α < ∞. Whence F ∩ τ −1(V ) ∩ [ J ≤ α + ε] = ∅ for all V ∈ Ny and all ε > 0. Since [ J ≤ α + ε] is compact, this implies that V ∈Ny F ∩ τ −1(V ) ∩ [ J ≤ α + ε] = ∅, i.e. inf J (F ∩ τ −1({y})) ≤ α + ε for all ε > 0. 5.4 The assumption that τ be continuous is not redundant, e.g. consider Y = Z = [0, 1] and J = 1( 21 ,1] and τ given by τ (0) = 0, τ (1) = 1 and τ (x ) = 1 − x for x ∈ (0, 1), F = [0, 1] and y = 1. Then, for all neighbourhoods V of y, τ −1(V ) contains the interval (0, ε) for some ε > 0, whence inf J (F ∩ τ −1(V )) = 0 but inf J (F ∩ τ −1({y})) = J (1) = 1. Lemma 5.5 Let X be normal and let G be a basis for the topology of X . Let J : X × Y → [0, ∞] have compact sublevel sets. (a) For all open G ⊂ X and ε > 0, there exists a U ∈ G with U ⊂ U ⊂ G such that α < inf J (W × {y}) ≤ inf J (W ◦ × {y}) ≤ inf J (F × {y}). Proof (a) Let ε > 0. Let x ∈ G be such that J (x , y) ≤ inf J (G × {y}) + ε. Since X is a normal topological space, there exists an open set U with x ∈ U ⊂ U ⊂ G. Because G is a basis, U may be chosen in G. Then inf J (G × {y}) + ε ≥ J (x , y) ≥ inf J (U × {y}). (b) Let β > α be such that β < inf J (F × {y}). The set K := {x ∈ X : J (x , y) ≤ β} is a compact set that is disjoint from F . Whence there exists disjoint open U, V ⊂ X with K ⊂ U and F ⊂ V . Since G is a basis and K is compact, there exists U1, . . . , Uk in G with K ⊂ U1 ∪ · · · ∪ Uk ⊂ U . Then U1 ∪ · · · ∪ Uk ∩ V = ∅. Whence with W := X \ U1 ∪ · · · ∪ Uk , one has F ⊂ W ◦ and W ⊂ X \ K , which implies inf J (W × {y}) ≥ β > α. 6 Large Deviations for Product Regular Conditional Probabilities In this section we consider the following situation. (i) X and Y are topological spaces, where X is normal. (ii) G is a basis for the topology of X and H is a basis for the topology of Y. (iii) π : X × Y → Y is given by π(x , y) = y. (iv) (μn)n∈N is a sequence of probability measures on B(X ) ⊗ B(Y) satisfying the large deviation principle on { A × B : A ∈ B(X ), B ∈ B(Y)} with a rate function J : X × Y → [0, ∞] that has compact sublevel sets. (v) For each n ∈ N we assume the following: supp(μn ◦ π −1) = ∅,5 there exists a product regular conditional probability ηn : Y × B(X ) → [0, 1] under μn with respect to π , which satisfies the following continuity condition (see Theorem 4.10): I (x ) = J (x , y) − inf J (X × {y}). 5 As we are considering large deviation bound for (ηn(yn, ·))n∈N with yn ∈ supp(μn ◦π−1), we want such yn to exist. Instead of this condition, one could of course deal with the situation where supp(μn ◦ π−1) = ∅ for some large N and consider sequences (yn)n∈N with yn ∈ supp(μn ◦ π−1) for n ≥ N . In this section we derive necessary and sufficient conditions for the large deviation bounds with rate function I for sequences of the form (ηn(yn, ·))n∈N. We prove this for general topological spaces instead of metric spaces as it does not cost more effort. In Theorem 6.3 we consider a fixed sequence (yn)n∈N with yn → y and describe equivalent conditions for the lower and upper large deviation bound to hold. We are interested in the question whether for all sequences (yn)n∈N with yn → y the sequence (ηn(yn, ·))n∈N satisfies the lower and upper large deviation bound with rate function I . In Theorem 6.9 we give equivalent6 and sufficient conditions for these bounds in a way that does not depend on sequences (yn)n∈N and the sets (Vn)n∈N as in Theorem 6.3. Finally in 6.12 we comment on deriving Theorem 1.2 from Theorem 6.9. But first we consider specific situations, providing a simple proof of the large deviation bounds with rate function I for sequences of the form (ηn(yn, ·))n∈N. Namely, we consider the case that Y is a discrete space (Theorem 6.1) and the case where μn is a product measure for all n ∈ N (Theorem 6.2). Theorem 6.1 Suppose that Y is countable and equipped with the discrete topology. Let y ∈ Y be such that inf J (X × {y}) < ∞. For all (yn)n∈N in Y with yn ∈ supp(μn ◦ π −1) and yn → y, the sequence (ηn(yn, ·))n∈N satisfies the large deviation principle with rate function I . Proof This basically follows from the following inequalities which follow from the large deviation principle and from Theorem 4.6. lim inf n1 log μn(G × {y}) ≥ − inf J (G × {y}) for all open G ⊂ X , n→∞ lim sup n1 log μn(F × {y}) ≤ − inf J (F × {y}) for all closed F ⊂ X . n→∞ Theorem 6.2 (Independent coordinates) Suppose that X and Y are second countable and Y is regular. Suppose that μn = μn1 ⊗ μn2 for some μn1 on B(X ) and μn2 on B(Y) for all n ∈ N. Then (ηn(yn, ·))n∈N satisfies the large deviation principle with rate function I for all sequences (yn)n∈N in Y. In particular, ηn(yn, ·) = μn1 and I (x ) = inf J ({x } × Y). Proof I is lower semicontinuous (e.g. by 5.2) and for c ∈ R the set [I ≤ c] is a subset of the compact set {x ∈ X : ∃z ∈ Y, J (x , z) ≤ c − inf J (X × {y})}. [I ≤ c] = π([ J ≤ c + inf J (X × {y})]). Theorem 6.3 Let (yn)n∈N be a sequence in Y with yn ∈ supp(μn ◦π −1) that converges to y. For n ∈ N let Vn ⊂ Nyn be such that Vn = {yn}. Then (a2) ⇐⇒ (a3) ⇐⇒ (a1) and (b2) ⇐⇒ (b3) ⇐⇒ (b1) (a1) For all open G ⊂ X 6 Under the condition that Y is first countable. (a2) For all U ∈ G7 (a3) For all open U ⊂ X , one has (b1) For all closed F ⊂ X (b2) For all U1, . . . , Uk ∈ G, one has for W = X \ (U1 ∪ · · · ∪ Uk ) (b3) For all closed W ⊂ X Proof The implications (a3) ⇒ (a2) and (b3) ⇒ (b2) are immediate. (a1) ⇒ (a3) Let U ⊂ X be an open set. By Lemma 4.11, (4.8), lim inf liVm∈Vinnf n1 log μn(U × Y|X × V ) ≥ linm→i∞nf n1 log ηn(yn, U ). n→∞ ⇒ (b3) Let W ⊂ X be a closed set. By Lemma 4.11, (4.9), lim sup lim sup n1 log μn(W × Y|X × V ) ≤ lim sup n1 log ηn(yn, W ). n→∞ V ∈Vn n→∞ ≥ linm→i∞nf lim sup n1 log μn(U × Y|X × V ) V ∈Vn ≥ − inf I (U ) = − inf J (U × {y}) + inf J (X × {y}) ≥ − inf J (G × {y}) + inf J (X × {y}) − ε. (6.13) 7 Note that μn(X × V ) > 0 for all n ∈ N and V ∈ Nyn , as yn ∈ supp(μn ◦ π−1). (b2) ⇒ (b1). Let α < inf J (F ×{y}) and U1, . . . , Uk and W be as in Lemma 5.5(b). Then we obtain using Lemma 4.11 lim sup n1 log ηn(yn, F ) ≤ lim sup n1 log ηn(yn, W ◦) n→∞ n→∞ 6.4 (Fixed y) Note that if yn = y for all n ∈ N, one can take Vn = V for a V ⊂ Ny with V = {y}. Then Theorem 6.3 implies that (ηn(y, ·))n∈N satisfies the large deviation principle with rate function I if and only if (a2) and (b2) hold (with Vn = V). 6.5 Let (yn)n∈N in Y be such that yn ∈ supp(μn ◦π −1) and yn → y. From Theorem 6.3 we derive that (a2) holds for some Vn ⊂ Nyn with Vn = {yn} if and only if (a2) holds for all such Vn. Similarly, (b2) holds for some Vn ⊂ Nyn with Vn = {yn} if and only if (b2) holds for all such Vn ⊂ Nyn . In Lemma 6.7, we give a consequence of the large deviation principle of (μn)n∈N. In Theorems 6.9 and 6.10 we use this to formulate sufficient conditions for upper or lower large deviation bounds on sequences (ηn(yn, ·))n∈N with yn → y and sequences (ηn(y, ·))n∈N. We assumed X to be normal in this section. For Lemma 6.7 this assumption can be dropped. 6.6 For all neighbourhoods V of y one has by the large deviation principle lim inf n1 log μn(X × V ) ≥ − inf J (X × V ◦) ≥ − inf J (X × {y}) > −∞. (6.15) n→∞ In particular, there exists an N ∈ N such that μn(X × V ) > 0 for all n ≥ N . Therefore μn(G × Y|X × V ) is well defined for large n. Lemma 6.7 (a) For open G ⊂ X (b) For closed F ⊂ X n1 log μn(G × Y|X × V ) ≥ − inf I (G). n1 log μn(F × Y|X × V ) ≤ − inf I (F ). Proof (a) Let ε > 0. By Lemma 5.3, there exists a V0 ∈ Ny such that for all V ∈ Ny with V ⊂ V0 inf J (X × {y}) ≥ inf J (X × V ) ≥ inf J (X × V 0) ≥ inf J (X × {y}) − ε. (6.18) Let V ∈ Ny be such that V ⊂ V0. As lim supn→∞ n1 log μn(X × V ) > −∞ (see 6.6), we can “split the lim inf in two” and we get by the large deviation principle and by (6.18) = linm→i∞nf n1 log μn(G × V ) − lim sup n1 log μn(X × V ) n→∞ ≥ − inf J (G × {y}) + inf J (X × V ) ≥ − inf I (G) − ε. inf J (F × {y}) ≥ inf J (F × V ) ≥ inf J (F × V 0) ≥ α. Let V ∈ Ny be such that y ∈ V ⊂ V0. Similarly as above, we get n1 log μn(F × Y|X × V ) ≤ −α + inf J (X × {y}). Theorem 6.8 I has compact sublevel sets. Proof [I ≤ c] = π([ J ≤ c + inf J (X × {y})]). Theorem 6.9 We have and, if Y is first countable, then ⇒ (A4) ⇐⇒ (A3) (A1) ⇐⇒ (A2), (A3) For all U ∈ G (A1) For all (yn)n∈N with yn ∈ supp(μn ◦ π −1) and yn → y, the sequence (ηn(yn, ·))n∈N satisfies the large deviation lower bound with rate function I . (A2) For all U ∈ G n1 log μn(U × Y|X × V ) ≥ − inf I (U ). (6.22) (A4) For all U ∈ G we have ∀Z0 ∈ Ny ∀ε > 0∃V0 ∈ Ny ∃Z ∈ Ny, Z ⊂ Z0 ∀M ∃m ≥ M ∃N ∀n ≥ N ∀V ∈ H, V ⊂ V0, V ∩ supp(μn ◦ π −1) = ∅: n1 log μn(U × Y|X × V ) ≥ m1 log μm(U × Y|X × Z ) − ε. (A5) For all U ∈ G we have ∀ε > 0 ∀V0 ∈ Ny∃N ∈ N ∀n ≥ N ∀V ∈ H, V ⊂ V0, V ∩ supp(μn ◦ π −1) = ∅: μn(U × Y|X × V ) ≥ e−nεμn(U × Y|X × V0). and, if Y is first countable, then ⇒ (B4) ⇐⇒ (B3) (B1) ⇐⇒ (B2), (B1) For all (yn)n∈N with yn ∈ supp(μn ◦ π −1) and yn → y the sequence (ηn(yn, ·))n∈N satisfies the large deviation upper bound with rate function I . (B2) For all U1, . . . , Uk ∈ G one has for W = X \ (U1 ∪ · · · ∪ Uk) n1 log μn(W ◦ × Y|X × V ) ≤ − inf I (W ). (6.26) (B3) For all U1, . . . , Uk ∈ G with W = X \ (U1 ∪ · · · ∪ Uk) NH,y V∀ε⊂>V00,V∃V∩0 s∈upNp(yμ∃nZ◦ π∈−N1)y=,Z∅:⊂ Z0 ∀M ∃m ≥ M ∃N ∀n ≥ N ∀V ∈ n1 log μn(U × Y|X × V ) ≤ m1 log μm(U × Y|X × Z ) + ε. (B5) For all U1, . . . , Uk ∈ G with W = X \(U1 ∪· · ·∪Uk ) we have ∀ε > 0 ∀V0 ∈ Ny ∃N ∈ N ∀n ≥ N ∀V ∈ H, V ⊂ V0, V ∩ supp(μn ◦ π −1) = ∅: μn(W ◦ × Y|X × V ) ≤ enεμn(W × Y|X × V0) Proof The proofs of (B5) ⇒ (B4) ⇐⇒ (B3) ⇒ (B2) ⇒ (B1) and of (B1) ⇒ (B2) are similar to the proofs of the following implications. (A4) ⇐⇒ (A3) follows by definition of sup, inf, lim sup and lim inf. (A5) ⇒ (A3) Let U ∈ G. Assuming (A5) we obtain ∀ε > 0 ∀V0 ∈ Ny ∃N ∈ N ∀n ≥ N and one has μn(X × V0) > 0 and n1 log μn(U × Y|X × V ) ≥ n1 log μn(U × Y|X × V0) − ε. (6.30) (A3) ⇒ (A2) Follows by Lemma 6.7. (A2) ⇒ (A1). Suppose that (A2) holds. Let U ∈ G with inf J (U × {y}) < ∞ and let ε > 0. Let V0 ∈ Ny and N ∈ N be such that n1 log μn(U × Y|X × V ) ≥ − inf I (U )−ε for all n ≥ N and all V ∈ H with V ⊂ V0 and V ∩supp(μn ◦π −1) = ∅. Let (yn)n∈N be such that yn ∈ supp(μn ◦ π −1) and yn → y. Let N0 ≥ N be such that yn ∈ V0 for all n ≥ N0. Then for all n ≥ N0 and V ∈ Nyn ∩ H with V ⊂ V0 we have n1 log μn(U × Y|X × V ) ≥ − inf I (U ) − ε. This implies (a2) of Theorem 6.3 (with Vn = Nyn ∩ H). (A1) ⇒ (A2) (assuming Y is first countable). Suppose that (A2) does not hold. Let (Vm )m∈N be a decreasing sequence in H with m∈N Vm = {y}. Then there exists a U ∈ G with inf J (U × {y}) < ∞ and an α > inf I (U ) such that for all M ∈ N and N ∈ N there exist an n ≥ N and a V ∈ H with V ⊂ VM and V ∩ supp(μn ◦ π −1) = ∅ such that z∈V ∩supp(μn◦π−1) n1 log ηn(z, U ) ≤ n1 log μn(U × Y|X × V ). inf For each m ∈ N there exist an nm and a ynm ∈ Vm ∩ supp(μnm ◦ π −1) such that We may choose n1 < n2 < n3 < · · · . With yk = y for k ∈/ {nm : m ∈ N} we have yn → y and linm→i∞nf n1 log ηn(yn, U ) ≤ lmim→i∞nf n1m log ηnm (ynm , U ) ≤ −β. Therefore (a1) of Theorem 6.3 does not hold, which implies that (A1) does not hold. We can also use Lemma 6.7 and Theorem 6.3 (see also 6.4) to obtain sufficient conditions for the lower or upper large deviation bounds for (ηn(y, ·))n∈N. Theorem 6.10 Let V ⊂ Ny be such that V = {y}. (a) Suppose that for all U ∈ G with inf J (U × {y}) < ∞ Then (ηn(y, ·))n∈N satisfies the large deviation lower bound with rate function I . (b) Suppose that for all U1, . . . , Uk ∈ G with W = X \ (U1 ∪ · · · ∪ Uk ) 6.11 (6.36) and (6.37) hold for example when ∀ε > 0 ∀V0 ∈ V ∃N ∈ N ∀n ≥ N ∀V ∈ V, V ⊂ V0 : μn(U × Y|X × V ) ≥ e−nεμn(U × Y|X × V0), μn(W ◦ × Y|X × V ) ≤ enεμn(W × Y|X × V0), 6.12 Theorem 1.2 is a consequence of Theorems 4.10, 6.8 and 6.9 with G = {B(x , r ) : x ∈ X , r > 0} and H = {B(y, δ) : y ∈ Y, δ > 0}. 7 Large Deviations for Regular Conditional Probabilities In this section X and Y are topological spaces, (νn)n∈N is a sequence of probability measures on B(X ) that satisfies the large deviation principle with rate function K : X → [0, ∞], and τ : X → Y is continuous. For more assumptions, see 7.2. We derive the analogous statements as in Sect. 6 but for regular conditional kernels instead of product regular conditional kernels (7.3 and Theorem 7.5). First we show that with μn the probability measure corresponding on the product space corresponding to νn as in Theorem 3.6, the sequence (μn)n∈N satisfies the large deviation principle with a rate function described in terms of K (Theorem 7.1). If (ηn)n∈N are regular conditional probabilities under (νn)n∈N given τ , then one could also follow the proofs in Sect. 6 for the product regular conditional probabilities to obtain similar results for large deviations for sequences of the form (ηn(yn, ·))n∈N. Instead, we make the approach via Theorem 3.6 to translate the results to the setting of regular conditional probabilities. Theorem 7.1 For all n ∈ N let μn be the probability measure on B(X ) ⊗ B(Y) for which μn( A × B) = νn( A ∩ τ −1(B)) for A ∈ B(X ), B ∈ B(Y) (as in Theorem 3.6). Then (μn)n∈N satisfies the large deviation principle on { A × B : A ∈ B(X ), B ∈ B(Y)} with rate function J : X × Y → [0, ∞] given by J (x , y) = If K has compact sublevel sets, then so does J . Proof By definition of J we have Let A ∈ B(X ) and B ∈ B(Y). Then inf K ( A ∩ τ −1(B)) = inf J ( A × B) A ∈ B(X ), B ∈ B(Y) . We have ( A ∩ τ −1(B))◦ = A◦ ∩ τ −1(B)◦ and τ −1(B)◦ ⊃ τ −1(B◦), whence inf K (( A ∩ τ −1(B))◦) ≤ inf K ( A◦ ∩ τ −1(B◦)) = inf J ( A◦ × B◦) = inf J (( A × B)◦). lim sup n1 log μn( A × B) = lim sup n1 log νn( A ∩ τ −1(B)) n→∞ n→∞ We have A ∩ τ −1(B) ⊂ A ∩ τ −1(B) and τ −1(B) ⊂ τ −1(B), whence inf K ( A ∩ τ −1(B)) ≥ inf K ( A ∩ τ −1(B)) = inf J ( A × B). Suppose that K has compact sublevel sets. Let c ≥ 0. Then [ J ≤ c] is contained in the compact set [K ≤ c] × τ ([K ≤ c]). By Theorem 6.8 I has compact sublevel sets. I (x ) = J (x , y) − inf J (X × {y}) 7.3 As by Theorem 3.6 ηn is the product regular conditional kernel under μn with νn( A|τ −1(V )) = μn( A × X |X × V ). In this sense also Theorem 1.3 follows from Theorem 1.2. We present some of the equivalent statements of Theorem 6.9 in Theorem 7.5. Remark 7.4 Because of the relation between μn and νn and between K and J , in Theorem 7.1 we were able to prove the large deviation principle on { A × B : A ∈ B(X ), B ∈ B(Y)}. Whether it can be extended to the large deviation principle on B(X ) ⊗ B(Y) is a priori not clear. However, for the purpose of using the results of Sect. 6, this is not required (as only (iv) of Sect. 6 is required). This is the main reason to define the large deviation bounds as in Definition 1.1. Theorem 7.5 (A3) ⇒ (A1). If Y is first countable, then (A1) ⇐⇒ (A2). (A3) For all U ∈ G ⇒ (B1). If Y is first countable, then (B1) ⇐⇒ (B2). n1 log νn(W ◦|τ −1(V )) ≤ − inf I (W ). (B3) For all U1, . . . , Uk ∈ G with W = X \ (U1 ∪ · · · ∪ Uk ) ≤ lim sup In terms of random variables, Sanov’s theorem gives us the large deviation principle odifsetrmibpuitreicdarladnednosmitiveasrni1ableins=.1WδeXci,ownshiedreer Xla1rg,eXd2e,v.i.a.tiaornesionfdn1epenind=e1nδtXaincdoinddeintitoicnailnlgy on n1 in=1 δYi = ψn, where (X1, Y1), (X2, Y2), . . . are independent and identically distributed couples of random variables, both random variables attaining their values in a finite set. This large deviation principle is formalised in Theorem 8.2. In this section we consider the following. • Let π : P(R) × P(S) → P(S) be the map given by π(ξ, ζ ) = ζ . • Let μn be the probability measure on B(P(R)) ⊗ B(P(S)) defined by μn = in=1 λ ◦ Ln−1 ◦ m−1, so that for A ∈ B(P(R)) and B ∈ B(P(S)) i=1 • Define θ : S × B(R) → [0, 1] by θ (s, A) = λ( A × S|R × {s}). • Define ηn : P(S) × B(P(R)) → [0, 1] by ⎪⎩ 0 • Let J : P(R) × P(S) → [0, ∞] be given by where H (ξ |λ) is the relative entropy of ξ with respect to λ ([7, Definition 2.1.5]). • Let ψ ∈ P(S) be such that 8.1 We present some facts which follow from the assumptions with little effort; to some facts we give some explanation or references. (a) Penmp(S) is closed in P(S). Moreover, if ξk and ξ in Penmp(S) are such that ξk → ξ , then there exist ski and qi in S for i ∈ {1, . . . , n} such that ξk = Ln((sk1, . . . , skn )), ξ = Ln((q1, . . . , qn )) and ski → qi for all i ∈ {1, . . . , n}. (b) supp(μn ◦ π −1) = Pemp(S). n (c) ηn is a product regular conditional kernel under μn with respect to π that is weakly n continuous on Pemp(S). (d) ( λn ◦ Ln−1)n∈N satisfies the large deviation principle with rate function H (·|λ). (e) m is continuous. (f) (μn)n∈N satisfies the large deviation principle with rate function J . (a) follows from the fact that S is a finite space. (b) follows from (a), from the fact that the complement of Pemp(S) has μn ◦ π −1-measure zero and because n μn ◦ π −1({Ln(s)}) > 0 for all s ∈ Sn, which is due to the assumptions on λ. (c) follows by a straightforward calculation, and the continuity follows from (a). For (d) see Sanov’s theorem (Dembo and Zeitouni [7, Theorem 6.2.10]). (e) follows from the fact that if ξn → ξ in P(R × S), then the R- and S-marginals of ξn converge to the R- and S-marginals of ξ , respectively. Then (f) follows from (a) and (d) by the contraction principle [7, Theorem 4.2.1]. In the rest of this section, we prove the following theorem. Theorem 8.2 For all (ψn)n∈N with ψn ∈ Penmp(S) and ψn → ψ , the sequence (ηn(ψn, ·))n∈N satisfies the large deviation principle with rate function I : P(R) → [0, ∞], given by I is continuous on [I < ∞]. As P(S) is first countable, it is sufficient to show that (A2) and (B2) of Theorem 6.9 hold. In 8.4 we use the bounds of Lemma 8.3 to derive other bounds which imply (A2) and (B2). The continuity of I follows by continuity of the map ν → H (ν|λ) (Lemma 8.5). i=1 8.4 From Lemma 8.3 we obtain the following bounds for A ∈ B(P(R)) and B ∈ B(P(S)). μn( A × B) ≤ #Ln−1( A)#Ln−1(B)e−n infν∈m−1(A×B)∩Penmp(R×S) H(ν|λ) ≤ (n + 1)M e−n infν∈m−1(A×B) H(ν|λ), μn( A × B) ≥ (n + 1)−M e−n infν∈m−1(A×B)∩Penmp(R×S) H(ν|λ). In order to derive (A2) and (B2) of Theorem 6.9, we make the following observation. By (8.10) we have for an open U and a closed W that if for both A = U and C = R n1 log (n + 1)−2M ≤ n1 log (n + 1)2M − n1 log μn(U × S|R × V ) ≥ − inf I (U ), n1 log μn(W ◦ × S|R × V ) ≤ − inf I (W ), − sup inf lim sup sup inf ≤ V0∈Nψ n→∞ ζ ∈Penmp(S)∩V0 ν∈m−1(A×{ζ })∩Penmp(R×S) (8.11) holds (for both A = U and C = R as well as for A = R and C = W , where U is open and W is closed) if for all open U and all closed W inf lim sup sup inf V0∈Nψ n→∞ ζ ∈Penmp(S)∩V0 ν∈m−1(U ×{ζ })∩Penmp(R×S) (8.16) is a consequence of Lemma 5.3, as m−1(W × V ) = m−1(W × P(S)) ∩ m−1(P(R) × V ), the set F = m−1(W × P(S)) is closed for closed W , m−1(P(R) × V ) = (π ◦ m)−1(V ), and π ◦ m is continuous. The proof of inequality (8.15) requires a little more attention. First we present some facts which are used to prove this inequality in Lemma 8.8. Consequently, I as in (8.6) is continuous on [ J < ∞]. Lemma 8.6 (a) Let k, l ∈ N and ζ ∈ Pekmp(S). For all m ≥ kl there exists a ν ∈ Pemmp(S) such that d(ν, ζ ) < 1l . (b) For all open ⊂ P(S) there exists an N ∈ N such that Penmp(S) ∩ = ∅ for all n ≥ N . Proof (a) Let i ∈ {1, . . . , k}. Let ξ ∈ Pemp(S). Then the measure lkl+ki ζ + lki+i ξ is i an element of Pelkm+pi (S). For every A ⊂ S [ lkl+ki ζ + lki+i ξ ]( A) − ζ ( A) ≤ 2 lki+i ≤ 2 lkk = 2l . By definition of the Prohorov metric, this implies d([ lkl+ki ζ + lki+i ξ ], ζ ) ≤ 2l . (b) Let ξ ∈ P(S) and δ > 0 be such that B(ξ, δ) ⊂ . For each ξ ∈ P(S) there is a k ∈ N and a ζ ∈ Pemp(S) such that d(ζ, ξ ) < 2δ . Because of this, (b) follows k from (a) by letting l be such that 1l < 2δ and N = lk. Proof In this proof, for a measure ξ ∈ P(R × S), we write ξrs = ξ({(r, s)}), so that ξ = rs ξrs δ(r,s) where we use the shorthand notation “ rs ” instead of “ r∈R,s∈S ”. Let M = #R#S. Note that Let κ > 0 and n ∈ N. We first give an estimation by which it is clear which κ and N one should choose. By the assumptions on λ for every s ∈ S there exists a rs ∈ R with λrss > 0. First we show that there exists a ξ ∗ ∈ Penmp(X ×Y) with ξ ∗ ξ and |ξr∗s −ξrs | ≤ n2 for all r ∈ R and s ∈ S. For each pair (r, s) ∈ R × S with ξrs > 0 we can choose a ξrs ∈ {0, n1 , n2 , . . . , 1} such that |ξrs − ξrs | < n1 . By letting ξr∗s = 0 when ξrs = 0 and add or subtract n1 to some of the ξrs we obtain a collection of ξr∗s ∈ {0, n1 , n2 , . . . , 1} with rs ξr∗s = 1 and |ξr∗s − ξrs | ≤ n2 and ξr∗s = 0 whenever ξrs = 0 for all r ∈ R and s ∈ S. Let ξ ∈ P(R × S). Suppose that ζ ∈ Penmp(S) is such that |ζs − r ξrs | < κ. Then |ζs − r ξr∗s | < κ + n2 M . We construct a ν ∈ Penmp(R × S) by defining the νrs by each s separately. Let s ∈ S. If ζs − r ξr∗s < 0, then we choose νrs ≤ ξr∗s with νrs ∈ {0, n1 , . . . , 1} in such way that r νrs = ζs (note that |νrs − ξr∗s | ≤ |ζs − r ξr∗s |). While, if ζs − r ξr∗s ≥ 0, then we let νrs = ξr∗s for all r = rs and we let νrs s = ξr∗s s + ζs − r ξr∗s (so that r νrs = ζs ). As ξ ∗ ξ and ξ λ, by the construction of ν we have ν λ. Moreover, we have π ◦ m(ν) = ζ and r ≤ κ + n2 M + n2 . which implies by (8.20) Moreover, as | d ν(· × S), ξ(· × S) ≤ M max r∈R By choosing κ > 0 and N ∈ N such that M 2κ + n2 (M 3 + M 2) < δ the proof is complete. Lemma 8.8 For all open U ⊂ R 0 ≤ V0∈Nψ n→∞ ζ ∈Penmp(S)∩V0 ν∈m−1(U ×{ζ })∩Penmp(R×S) inf lim sup sup inf Proof We assume infν∈m−1(U ×{ψ}) H (ν|λ) < ∞. Let ξ ∈ m−1(U × {ψ }) be such that H (ξ |λ) < ∞. Let ε > 0. We show there exists a V0 ∈ Nψ and an N ∈ N such n n that for all n ≥ N the set Pemp(S) ∩ V0 is not empty and for all ζ ∈ Pemp(S) ∩ V0 there exists a ν ∈ m−1(U × {ζ }) ∩ Penmp(R × S) with 9 Examples In Sect. 8, we showed that the regular conditional kernel ηn as in (8.3) satisfies (A1) and (B1) of Theorem 6.9 by showing that (A2) and (B2) of that theorem hold. This is not always the most optimal approach; in Example 9.1 we show that for a specific example of Gaussian measures the expression of ηn allows us to derive (A1) and (B1) directly. Furthermore, relying on Theorem 9.2, in Example 9.4, we give an example of a (ηn)n∈N for which (A1) of Theorem 6.9 does not hold. In Remark 9.5 we mention that for the one choice of measures in Example 9.4 a quenched large deviation principle is satisfied, while for the other choice of measures there is no quenched large deviation principle. In Example 9.6 we show that for a choice of measures as in Example 9.4 the conditional regular kernel in a specific chosen point does not satisfy any large deviation principle. In Remark 9.7 we discuss exponential tightness of the regular conditional kernel. In Remark 9.8 we discuss the differences between the present paper and the paper of La Cour and Schieve [20]. Example 9.1 Let r = 0, Zn := the sequence of probability measures on B(R × R) determined by R R e− n2 (x2−2r x y+y2) dx dy and consider (μn)n∈N The sequence satisfies the large deviation principle with rate function J : R2 → [0, ∞] given by J (x , y) = 21 (x 2 − 2r x y + y2). By Theorem 4.8 ηn given by e− n2 (x2−2r x y) dx e− n2 (x−r y)2 dx is the weakly continuous product regular conditional probability under μn with respect to the projection on the Y-coordinate. If yn → y, one can show that for λ ∈ R nl→im∞ n1 log enλx d[ηn(yn, ·)](x ) = nl→im∞ n1 log Then by the Gärtner–Ellis theorem (see for example Dembo and Zeitouni [7, Theorem 2.3.6]) we conclude that (ηn(yn, ·))n∈N satisfies the large deviation principle with the same rate function as the one of the large deviation principle of (ηn(y, ·))n∈N, which is x → (x − r y)2. Note that this equals J (x , y) − inf J (R × {y}) because of the equality x 2 − 2r x y + y2 = (x − r y)2 + (1 − r 2)y2. The proof of the following theorem can be found in “Appendix 3”. Theorem 9.2 Let X and Y be separable metric spaces. Let (μn1)n∈N and (μn2)n∈N be sequences of probability measures on B(X ). Let (νn )n∈N be a sequence of probability measures on B(Y) that satisfies the large deviation principle with a rate function L : Y → [0, ∞]. Suppose that y ∈ Y and Wn ∈ Ny are such that n∈N Wn = {y} and αn : Y → [0, 1] is a continuous function with αn(y) = 0 and αn = 1 on Y \ Wn such that = 0. (9.4) Assume (μn1 )n∈N satisfies the large deviation principle with rate function I . Assume furthermore that for all open A ⊂ X lim inf n1 log μ1n( A) ≥ linm→i∞nf n1 log μ2n ( A), n→∞ lim sup n1 log μ1n (X \ A) ≥ lim sup n1 log μ2n(X \ A). n→∞ n→∞ Then (μn)n∈N satisfies the large deviation principle with rate function J : X × Y → [0, ∞] given by J (x , y) = I (x ) + L(y). ηn : Y × B(X ) → [0, 1] defined by ηn(y, A) = αn(y)μn1 ( A) + (1 − αn(y))μn2 ( A) is the weakly continuous product regular conditional probability under μn with respect to π : X × Y → Y given by π(x , y) = y. Note that I (x ) = J (x , y) − inf J (X × {y}) for all x ∈ X , y ∈ Y. Example 9.3 We give examples of Y, Wn, αn , νn and L such that (9.4) of Theorem 9.2 is satisfied and (νn )n∈N satisfies the large deviation principle with rate function L. = [0, ∞), αn(y) = min{ny, 1} for y1 ∈ Y and let νn (B) = 0∞ 1B (y)ne−ny d y for B ∈ B([0, ∞)). Then 0n αn dνn = 1 − 2e−1 and 1 0n (1 − αn) dνn = e−1. Therefore with this νn, αn and Wn = [− n1 , n1 ] (9.4) is satisfied. Moreover (νn)n∈N satisfies the large deviation principle with rate function L : Y → [0, ∞], L(y) = y (this follows from example by the Gärtner–Ellis theorem [7, Theorem 2.3.6]). Let β = ν1([−1, 1]). Let κn = √1n . Then νn[−κn, κn] = β for all n ∈ N. Let φε : R → [0, 1] be defined by φε(z) = min{ε−1|z|, 1}. Then limε↓0 [−κn,κn] φε dν1 = β, limε↓0 [−κn,κn] 1 − φε dν1 = 0 and Therefore, for all n ∈ N, there exists an εn ∈ (0, κn) such that Example 9.4 With X = R, μ1n = μN (0, n1 ), μn2 = δ 1 and I (x ) = 21 x 2 for x ∈ R and n Y, νn (or νn0), αn, Wn and L as in Examples 9.3 (a) or (b) the conditions of Theorem 9.2 are satisfied (note that (δ n1 )n∈N satisfies the large deviation principle with rate function H : R → [0, ∞] given by H (0) = 0 and H (x ) = ∞ for x = 0). Then ηn(0, ·) = δ 1 and ηn(εn, ·) = μN (0, n1 ) for all n ∈ N. Whence (ηn(0, ·))n∈N n satisfies the large deviation principle with rate function H and (ηn(εn, ·))n∈N (and also (ηn(y, ·))n∈N for y > 0) satisfies the large deviation principle with rate function I . Because I ≤ H , the sequence (ηn(0, ·))n∈N satisfies the large deviation upper bound not only with H but also with I instead of H . Therefore (b1) of Theorem 6.3 holds in case yn = 0 for all n. Since (ηn(0, ·))n∈N does not satisfy the large deviation principle with rate function I , (a1) of Theorem 6.3 does not hold. Therefore for any decreasing sequence (Vm )m∈N in N0 with m∈N Vm = {0} there exists an open set U with inf I (U ) < ∞ with We illustrate this for Y, αn, Wn, νn and L as in Examples 9.3(a): For Vm = [0, m1 ), U = (1, ∞) we get for m ≥ n ne−ny dy. ny · ne−ny dy, which converges to zero as m → ∞, which implies μn(U × Y|X × Vm ) ≤ mn μN (0, n1 )(U ) lim sup n1 log μn(U × Y|X × Vm ) = −∞ < − 2 = − inf I (U ). 1 m→∞ Remark 9.5 (Quenched large deviations) Consider the situation as in Example 9.4. For all n ∈ N we have the following. If ζn : Y × B(X ) → [0, 1] is a product regular conditional probability under μn with respect to π , then ζn(y, ·) = ηn(y, ·) for [μn ◦ π −1]-almost all y (see Remark 3.5). Whence, with νn as in Examples 9.3 (a) or (b), we have a quenched large deviation principle of the conditional probability with respect to the second coordinate with rate function I ; for every product regular conditional probability ζn under μn with respect to π there exists a Z ⊂ Y with μn ◦ π −1(Z ) = νn(Z ) = 1 such that (ζn(y, ·))n∈N satisfies the large deviation principle with rate function I for all y ∈ Z . However, with νn0 as in Examples 9.3 (b) instead of νn for such ζ one has ζn(0, ·) = ηn(0, ·) as ν0( 0 ) > 0. Thus in this case we do not have such a quenched large n { } deviation principle. Example 9.6 With X = N, μ1n = k∈N 2−k δk , μ2n = δn and I (x ) = 0 for x ∈ N as in Example 9.4, and Y, Wn, αn, νn and L as in Examples 9.3 (a) or (b), the conditions of Theorem 9.2 are satisfied. In this case, (ηn(0, ·))n∈N does not satisfy a large deviation principle. Remark 9.8 Example 9.4 with νn (or νn0) and αn as in Examples 9.3 (b) fits the assumptions made in Sect. 4 of La Cour and Schieve [20].8 In Sect. 4 of that paper, it is claimed that the law of the first coordinate conditioned on the second coordinate satisfies the large deviation principle with the rate function I . Their notion of conditioning on y is “condition on an arbitrarily small neighbourhood around y”. This approach needs to be justified. Our results are different, as by Example 9.4 the conditioned kernel in 0, 8 The logarithmic moment generating function (see Dembo and Zeitouni [7, Assumption 2.3.2]) is given by (x, y) → 21 x2 + 21 y2, whence the Hessian of it equals the identity matrix and is therefore invertible. In [20], it is mentioned that one cannot proceed the conditioning on all elements, but only those that equal the derivative of y → 21 y2 at a certain point are considered, of which 0 is an example. ηn (0, ·) does not satisfy the large deviation principle with the rate function I (even in the sense of quenched large deviations as discussed in Remark 9.5). Acknowledgements The author is supported by ERC Advanced Grant VARIS-267356 of Frank den Hollander. The author is grateful to both Frank den Hollander and Frank Redig for valuable suggestions and useful discussions. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Appendix 1: An Elementary Fact About Limsup and Liminf Lemma 9.9 Let k ∈ N and ani ∈ [0, ∞) for all n ∈ N and i ∈ {1, . . . , k}. If there exists an N ∈ N such that maxi∈{1,...,k} ani > 0 for all n ≥ N , then9 n→∞ = 0, = i∈m{1,a..x.,k} linm→s∞up n1 log ani , = i∈m{1,a..x.,k} linm→i∞nf n1 log ani . i=1 i=1 i=1 i=1 ≤ n1 log k + Proof (9.16), (9.17) and (9.18) follow from the inequality ≤ n1 log(k Appendix 2: Sufficient Bounds for Large Deviation Bounds m∈N Fm . Then Let X be a topological space. Let I : X → [0, ∞] have compact sublevel sets. Let (μn )n∈N be a sequence of probability measures on B(X ). 9 Equation (9.17) can also be found in Dembo and Zeitouni [7, Theorem 1.2.15]. Proof Let c := supm∈N inf I (Fm ). Note that c ≤ inf I (F ). If c = ∞, there is nothing to prove. Assume that c < ∞. Let K be the compact set [I ≤ c]. Then Fm ∩ K = ∅ for all m ∈ N, whence F ∩ K = ∅ and thus inf I (F ) ≤ c. 9.11 For Lemma 9.10 the condition that I has compact sublevel sets is not redundant. For example: Let I : N ∪ {0} → [0, ∞] be given by I (0) = 1 and I (x ) = 0 for x ∈ N. Then for Fm = {0} ∪ {m, m + 1, . . . } and F = {0} one has supm∈N inf I (Fm ) = 0 and inf I (F ) = 1. Then G = U satisfies (9.21) as well.10 (b) Let F1, F2, . . . be closed. Suppose that for all F ∈ {Fm : m ∈ N} Then F = m∈N Fm satisfies (9.22) as well. Now apply Lemma 9.10. " U ≥ sup lim inf n1 log μn(G) ≥ sup (− inf I (G)), G∈U n→∞ G∈U m∈N ≤ min∈fN linm→s∞up n1 log μn (Fm ) ≤ min∈fN(− inf I (Fm )). As a consequence of Lemma 9.12 we obtain the following. Theorem 9.13 Suppose that G is a basis for the topology on X , such that (9.21) holds for all G ∈ G and (9.22) holds for F = X \ G. Suppose that every open G can be written as countable union of elements in G. Then (μn)n∈N satisfies the large deviation principle with rate function I . Appendix 3: Proof of Theorem 9.2 Proof of Theorem 9.2 As X and Y are separable metric spaces, every open subset of X × Y is a countable union of elements of the form A × B where A ⊂ X is open and B ∈ H, where (with dY the metric on Y) H = {B(y, δ) : δ > 0} ∪ {B(z, δ) : z = y, 0 < δ < dY (y, z)}. 10 This can also be found in O’Brien [23, Proposition 2.1]. We use Theorem 9.13 to prove the large deviation bounds. Note first that (X × Y) \ ( A × B) = (X × (Y \ B)) ∪ ((X \ A) × Y), that min{inf I (X \ A), inf L(Y \ B)} = inf(x,y)∈(X ×Y)\(A×B) I (x ) + L(y) and that by (9.17) B ⊂ Y Let A ⊂ X be open and B ∈ H. • (9.26) follows from the fact that μn(X × (Y \ B)) = νn(Y \ B). • (9.27) follows from the fact that by (9.4), (9.6) and (9.17) we have = max $ lim sup n1 log μ1n(X \ A), lim sup n1 log μ2n(X \ A)% n→∞ n→∞ = lim sup n1 log μ1n(X \ A) ≤ − inf I (X \ A), n→∞ • (9.28) follows by separating two cases (as either y ∈ B or y ∈/ B): If y ∈/ B, then Wn ∩ B = ∅ and so μn( A × B) = μn1( A)νn(B) for large n, whence Suppose that y ∈ B, i.e. Wn ⊂ B for large n. By (9.18) we obtain lim inf 1 log μn ( A × B) = max $ lim inf 1 log μn1 ( A), lim inf 1 log μn2 ( A)% n→∞ n n→∞ n n→∞ n ≥ − inf I ( A). Because inf L ( B) ≥ 0, we conclude (9.28). We leave it to the reader to check that ηn is the weakly continuous product regular conditional probability under μn with respect to π . 1. Adams , S. , Dirr , N. , Peletier , M.A. , Zimmer , J. : From a large-deviations principle to the Wasserstein gradient flow: a new micro-macro passage . Commun. Math. Phys . 307 ( 3 ), 791 - 815 ( 2011 ) 2. Biggins , J.D.: Large deviations for mixtures . Electron. Commun. Probab . 9 , 60 - 71 ( 2004 ). (electronic) 3. Billingsley , P. : Convergence of Probability Measures, Wiley Series in Probability and Statistics: Probability and Statistics , 2nd edn. Wiley, New York ( 1999 ) 4. Bogachev , V. : Measure Theory . Springer, Berlin ( 2007 ) 5. Comets , F. : Large deviation estimates for a conditional probability distribution. Applications to random interaction Gibbs measures . Probab. Theory Relat. Fields 80 ( 3 ), 407 - 432 ( 1989 ) 6. Comets , F. , Gantert , N. , Zeitouni , O. : Quenched, annealed and functional large deviations for onedimensional random walk in random environment . Probab. Theory Relat. Fields 118 ( 1 ), 65 - 114 ( 2000 ) 7. Dembo , A. , Zeitouni , O. : Large Deviations Techniques and Applications , Stochastic Modelling and Applied Probability, vol. 38 . Springer, Berlin ( 2010 ). [Corrected reprint of the second edition ( 1998 )] 8. Deuschel , J.-D. , Stroock , D.W. : Large Deviations, Pure and Applied Mathematics, vol. 137 . Academic Press, Boston ( 1989 ) 9. van Enter , A.C.D. , Fernández , R., den Hollander , F. , Redig , F. : A large-deviation view on dynamical Gibbs-non-Gibbs transitions . Mosc. Math. J . 10 ( 4 ), 687 - 711 ( 2010 ) 10. van Enter, A.C.D. , Külske , C. , Opoku , A.A. , Ruszel , W.M. : Gibbs-non-Gibbs properties for n-vector lattice and mean-field models . Braz. J. Probab. Stat . 24 ( 2 ), 226 - 255 ( 2010 ) 11. Ermolaev , V. , Külske , C. : Low-temperature dynamics of the Curie-Weiss model: periodic orbits, multiple histories, and loss of Gibbsianness . J. Stat. Phys . 141 ( 5 ), 727 - 756 ( 2010 ) 12. Faden , A. : The existence of regular conditional probabilities: necessary and sufficient conditions . Ann. Probab . 13 ( 1 ), 288 - 298 ( 1985 ) 13. Fernández , R., den Hollander , F. , Martínez , J. : Variational description of Gibbs-non-Gibbs dynamical transitions for the Curie-Weiss model . Commun. Math. Phys . 319 ( 3 ), 703 - 730 ( 2013 ) 14. Greven , A. , den Hollander , F. : Large deviations for a random walk in random environment . Ann. Probab . 22 ( 3 ), 1381 - 1428 ( 1994 ) 15. Halmos , P.R. : Measure Theory. Springer, Berlin ( 1974 ) 16. den Hollander , F. : Large Deviations , Fields Institute Monographs , vol. 14 . American Mathematical Society, Providence, RI ( 2000 ) 17. den Hollander , F. , Redig , R., van Zuijlen , W.: Gibbs-non-Gibbs dynamical transitions for mean-field interacting Brownian motions . Stoch. Process. Appl . 125 ( 1 ), 371 - 400 ( 2015 ) 18. Kosygina , E. , Rezakhanlou , F. , Varadhan , S.R.S. : Stochastic homogenization of Hamilton-JacobiBellman equations . Commun. Pure Appl. Math. 59(10) , 1489 - 1521 ( 2006 ) 19. Külske , C. , Opoku , A.A.: Continuous spin mean-field models: limiting kernels and Gibbs properties of local transforms . J. Math. Phys . 49 (12), 125215 ( 2008 ) 20. La Cour , B.R. , Schieve , W.C.: A general conditional large deviation principle . J. Stat. Phys . 161 ( 1 ), 123 - 130 ( 2015 ) 21. Leao Jr., D. , Fragoso , M. , Ruffino , P. : Regular conditional probability, disintegration of probability and Radon spaces . Proyecciones 23 ( 1 ), 15 - 29 ( 2004 ) 22. Léonard , C.: A large deviation approach to optimal transport . arXiv:0710.1461v1 (2007) 23. O'Brien , G.L. : Sequences of capacities, with connections to large-deviation theory . J. Theor. Probab . 9 ( 1 ), 19 - 35 ( 1996 ) 24. Rassoul-Agha , F. , Seppäläinen , T. , Yilmaz , A. : Quenched free energy and large deviations for random walk in random potential . Commun. Pure Appl. Math . 66 ( 2 ), 202 - 244 ( 2013 ) 25. Rassoul-Agha , F. , Seppäläinen , T.: A Course on Large Deviations with an Introduction to Gibbs Measures , Graduate Studies in Mathematics, vol. 162. American Mathematical Society , Providence, RI ( 2015 ) 26. Schaefer , H.H. : Topological Vector Spaces. Springer, New York ( 1971 ). ( Third printing corrected, Graduate Texts in Mathematics, Vol. 3) 27. Steen , L.A. , Seebach Jr., J.A.: Counterexamples in Topology. Holt, Rinehart and Winston , New York ( 1970 )


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs10959-016-0733-1.pdf

Large Deviations of Continuous Regular Conditional Probabilities, Journal of Theoretical Probability, 2016, DOI: 10.1007/s10959-016-0733-1