#### Turning a Coin over Instead of Tossing It

Turning a Coin over Instead of Tossing It
0 Centre for Mathematical Sciences, Lund University , 22100-118 Lund , Sweden
1 Department of Mathematics, University of Colorado , Boulder, CO 80309-0395 , USA
2 The hospitality of Microsoft Research and the University of Washington is gratefully acknowledged by the first author. Research of the second author was supported in part by the Swedish Research Council Grant VR2014-5157
Given a sequence of numbers ( pn )n≥2 in [0, 1], consider the following experiment. First, we flip a fair coin and then, at step n, we turn the coin over to the other side with probability pn , n ≥ 2, independently of the sequence of the previous terms. What can we say about the distribution of the empirical frequency of heads as n → ∞? We show that a number of phase transitions take place as the turning gets slower (i. e., pn is getting smaller), leading first to the breakdown of the Central Limit Theorem and then to that of the Law of Large Numbers. It turns out that the critical regime is pn = const/n. Among the scaling limits, we obtain uniform, Gaussian, semicircle, and arcsine laws.
Coin tossing; Central Limit Theorem; Laws of Large Numbers
1 General Model
In this paper, we examine what happens if, instead of tossing a coin, we turn it over
(from heads to tails and from tails to heads), with certain probabilities.
To define the model precisely, let pn , n = 2, 3, . . . be a given deterministic sequence
of numbers between 0 and 1. We define a time-dependent “coin turning process” X
with Xn ∈ {0, 1}, n ≥ 1, as follows. Let X1 = 1 (“heads”) or = 0 (“tails”) with
probability 1/2. For n ≥ 2, set recursively
Xn :=
Xn−1,
1 − Xn−1, with probability pn;
otherwise,
that is, we turn the coin over with probability pn and do nothing with probability
1 − pn, independently of the sequence of the previous terms.
Consider N1 nN=1 Xn, that is, the empirical frequency of 1’s (“heads”) in the
sequence of Xn’s. We are interested in the asymptotic behavior, in law, of this random
variable as N → ∞.
Since we are interested in limit theorems, we center the variable Xn; for
convenience, we also multiply it by two, thus focus on Yn := 2Xn − 1 ∈ {−1, +1} instead
of Xn. We have
Yn :=
Yn−1,
−Yn−1, with probability pn;
otherwise.
Note that the sequence {Yn} can be defined equivalently as follows.
Let Yn := (−1) 1n Wi , where W1, W2, W3, . . . are independent Bernoulli variables
with parameters p1, p2, p3, . . ., respectively, and p1 = 1/2. The number of turns that
occurred up to n is n2 Wi . (Its distribution is Poisson binomial).
This representation is important for the proofs, and below are some easy
observations it implies. However, it is important to point out that the process Y is
also a non-homogeneous Markov chain with state space {−1, 1}, initial distribution
P(Y1 = 1) = P(Y1 = −1) = 1/2, and doubly stochastic symmetric transition
matrices 1 −pnpn 1 −pnpn , n ≥ 2. Using the symmetry in the definition (or the
double stochasticity and induction), Yn (n ≥ 2) has the same distribution as Y1, namely
Bernoulli(1/2). Hence, the limit theorems in this paper involve particular cases of this
two-state Markov chain, but the methods of our proofs rely on the above representation
for the random variables {Yn}, and not on Markovian techniques.
The following quantity will play an important role: for 1 ≤ i < j ≤ N , let
Using the representation for the random variables {Yn}, we have
k=i+1
ei, j :=
(1 − 2 pk ).
for 1 ≤ i ≤ j,
and hence if i = 1, then we get E(Y j ) = e1, j E(Y1). In particular, since we assumed
p1 = 1/2, E(Y1) as well as all consecutive E(Y j ) equal 0 for all j ≥ 2. In fact, for
arbitrary { pn, n ≥ 1} satisfying pn = 1/2, n ≥ 2, the entire sequence (Yn)n≥1 is
centered in expectation (equivalently, E(Xn) = 1/2, n ≥ 1) if and only if p1 = 1/2.
Throughout the paper for N ≥ 1, we set
TN := X1 + · · · + X N , SN := Y1 + · · · + YN .
Then SN = 2TN − N and hence limit theorems we establish below for SN /N will
easily imply analogous results for TN /N = SN /(2N ) + 1/2. At a more elementary
level, we first observe that SN is symmetric in distribution around zero (hence its
odd moments vanish), and as a result, TN is symmetric about N /2. The symmetry of
the law of SN about zero follows from the symmetry in the definition of the model.
In fact, a straightforward calculation gives Eeit SN = E cos(UN t ), where UN :=
1 + (−1)W2 + (−1)W2+W3 + · · · + (−1)W2+W3+···+WN .
Using Corr and Cov for correlation and covariance, respectively, one also has
Corollary 1 (Correlation estimate) Assume that limk→∞ pk = 0 and let n∗ ∈ N be
such that pk ≤ 1/2 for k ≥ n∗. For n∗ ≤ i < j ,
exp ⎝ −2
(1 − rk ) ≤ ei, j ≤ exp ⎝ −2
exp ⎝ −2
≤ ei, j ≤ exp ⎝ −2
Proof Use the Remainder Theorem for Taylor series, yielding
0 ≤ e−2 pk − (1 − 2 pk ) ≤ 2 pk2,
exp (−2 pk ) · (1 − rk ) ≤ 1 − 2 pk ≤ exp (−2 pk ) ,
and multiply these inequalities, to get the first statement.
For the second statement, use that for sufficiently small positive x ,
e−C x ≤ 1 − x ≤ e−x .
Similarly to (1), if K = 2m is a positive even number, and i1 < i2 < · · · < iK then,
using the fact that
k=1
k=i1+1
k=1
k=i3+1
k=1
k=iK−1+1
Wk (mod 2),
we obtain that
i2 i4 iK
= E(−1) i1+1 Wk + i3+1 Wk +···+ iK−1+1 Wk
= ei1,i2 ei3,i4 . . . eiK−1,iK .
We close this section with introducing some frequently used notation.
Notation: In the sequel, Bessel Iα and Bessel Kα will denote the modified Bessel
function of the first kind (or Bessel-I function) and the modified Bessel function of
the second kind (or Bessel-K function), respectively.
Writing out these functions explicitly, one has
m=0 m! (m + α + 1) 2
if α is not an integer (otherwise it is defined through limits), where is Euler’s gamma
function. See, e.g., Sections 9–10 in [1], and formula (6.8) in [2].
2 Review of Literature and Comparison with Our Results
As the Associate Editor kindly pointed out for us, the problem has a history, going
back to at least the 1950s. Below we give a review of the relevant achievements in the
past and compare them to the results presented in this paper.
The case pn = 1/n was already introduced in R. Dobrushin’s thesis1 in the 1950s
and it is attributed to Bernstein (see [4]). Dobrushin did not seem to explicitly identify
1 He considered his thesis work a continuation of the work of Markov, Bernstein, Sapogov, and Linnik on
time inhomogeneous Markov chains.
the limiting frequency of heads with the uniform distribution. However, in their 2007
paper [3] Dietz and Sethuraman proved that for the more general pn = a/n case
(a > 0), the limiting frequency is Beta(a, a)—see their Theorems 1.3 and 1.4. (They
consider a state space consisting of m ≥ 2 points and so they treat the more general
Dirichlet distributions). Therefore, in the pn = a/n case, our contribution is providing
a different proof only. The proof in [3] is significantly longer and more complicated
than ours; however, we only consider m = 2.
The case pn = a/nγ is also treated in [3], albeit only the Weak Law of Large
Numbers (SLLN when 0 < γ < 1/2). The authors note that simulations suggest that
actually a.s. convergence might hold also on the range 1/2 ≤ γ < 1; in our case, we
prove this statement in our Corollary 2. Fluctuations about the mean are not considered
in [3] though.
The situation with the Central Limit Theorem is more interesting in this case. First,
Dobrushin’s Central Limit Theorem for inhomogeneous Markov chains (Theorem 1.1
in [11]) only provides the statement in the sense that, after centering and normalizing
with the standard deviation, the limit is standard normal. We did not find, however,
any result in the literature identifying the order of the standard deviation, which we
do provide. (See the estimates on p. 411 and also Corollary 15 on p. 421 in [10]).
Secondly, and more importantly, Dobrushin’s Theorem only applies to the case
when 0 < γ < 1/3. Although the condition given in that result (formula (1.3) in [11])
is known to be optimal (see Section 2 in [11]), this is only true in the very general
setting in which the theorem is stated. It is therefore interesting, we believe, that we
also prove the CLT in the 1/3 ≤ γ < 1 case excluded in Dobrushin’s result. (In [10],
Dobrushin’s condition was improved by Peligrad, but it is still not applicable in our
case when 1/3 ≤ γ < 1: formula (7) in [10] is actually more stringent then the
Dobrushin condition (9)).
To the best of our knowledge, Theorem 3 is completely new.
3 Supercritical Cases
First, if n pn < ∞, then by the Borel–Cantelli lemma, only finitely many turns will
occur a.s.; therefore, the X j ’s will eventually become all ones or all zeros, and hence
where ζ ∈ {0, 1}. By the symmetry of the definition with respect to heads and tails
(or, by the bounded convergence theorem), ζ is a Bernoulli(1/2) random variable.
4 The Critical Case
Fix a > 0, and let
for some n0 ∈ N. Denote by Beta(a, a) the symmetric (around the point 1/2) Beta
distribution with a > 0, with density
fBeta(a,a)(x ) =
on the unit interval (the normalizing constant is B(a, a) :=
Euler’s Gamma function), and moment generating function
2(a)/ (2a), using
MBeta(a,a)(t ) = 1 +
= et/2
Bessel Ia− 21
Theorem 1 The law of N1
Remark 1 It turns out that the convergence in distribution cannot be strengthened to
convergence in probability; see, e.g., the example in Section 2 of [7].
Proof We will verify the statement by analyzing the moments of SN . The odd moments
are all zero from the symmetry of SN around 0; on the other hand, for even K we can
use the multinomial theorem:
SNK = I + K !
1≤i1<i2<···<iK ≤N
= K !
1≤i1<i2<···<iK ≤N
1≤i1<i2<···<iK ≤N
E(Yi1 Yi2 . . . YiK ) + O(N K −1)
ei1,i2 ei3,i4 . . . eiK −1,iK + O(N K −1).
Let us now analyze the elements in the sum above. From (1), for j > i > max{2a, n0}
we have
⎩ n=i+1
⎫
2a ⎬
log 1 − n
− 2a
= exp O
− 2a log
Consequently, (4) can be approximated as
i 2a i 2a i 2a
1 3 K −1
i 2a · i 2a · · · · · i 2a + O(N K −1)
1≤i1<i2<···<iK ≤N 2 4 K
(the contribution from the terms where i1 ≤ max{2a, n0} as well as the other remainder
terms in the formula for ei, j is of order at most N K −1). Since we are working on a
compact interval, we may conclude (see, e.g., Section 2, Exercise 3.27 in [5]) that
SN /N → ξa in distribution, where ξa is distributed on [−1, 1] and has the following
moments:
⎧ 0,
⎨
which, for even moments, can be equivalently written as
K is odd;
, K = 2m is even,
(2m)! (a + 1/2)
= 22m (m + a + 1/2)
= Bessel Ia−1/2(t ) (a + 1/2)(t /2)1/2−a .
Let ζa := (ξa + 1)/2. We know that N1
= MBeta(a,a)(t ),
completing the proof.
Remark 2 (Particular cases and densities) Note that in particular, for a = 1, a = 1/2
and a = 3/2, the limiting law Beta(a, a) of the relative frequencies in Theorem 1
is Uniform([0, 1]), the arcsine law, and the transformed semicircle law2 on [0, 1],
respectively.
Turning to SN /N , the transformation x → 2x −1 yields that limN →∞ Law(SN /N )
equals to Uniform([−1, 1]), the transformed arcsine law on [−1, 1] and Wigner’s
semicircle law on [−1, 1], respectively. Concerning the corresponding densities on
[−1, 1], we have the following explicit formulas.
2 The density is a semi-ellipse.
• Transformed Arcsine Law: Let a = 1/2. Then
Eetξ = m∞=0 (2mt 2·mm!)2 = Bessel I0(t ) = π1 0
consequently (see, e.g., [1], formula 29.3.60) ξ1/2 has the transformed arcsine
density
, −1 < x < 1;
2 Bessel I1(t ) 1
t = π 0
= Bessel I0(t ) − Bessel I2(t );
for −1 < x < 1. Indeed, for m ∈ N we have
−1
dx =
1 − x
2 a−1
ym−1/2 (1 − y)a−1 dy
= Beta(m + 1/2, a) =
5 Subcritical Case
for some n0. Note that γ > 1 corresponds to the supercritical case studied in Sect. 3,
so from now on assume 0 < γ < 1.
n=i+1
≤ ei, j ≤ exp −
( j + 1)1−γ − (i + 1)1−γ
One can check that supN Eηa2,γ,N < ∞ (see the computation below with m = 1),
and thus Chebyshev’s inequality implies that (ηa,γ,N )N ≥1 is a tight sequence of random
variables. Hence, it is enough to show that each subsequential limit is the same.
Assume that (Nl )l≥1 is a subsequence and liml→∞Law(ηa,γ,Nl ) = L. Since
one has L = liml→∞ LNl ,A too, where
LNl ,A := Law ⎝
and in fact, this limit must be the same for any A > a (and corresponding n∗ =
n∗(a, A, γ )). Informally, this just means that we may throw away a finite chunk of the
sequence of Yi ’s’ (at the beginning) without affecting its limit.
Let us denote the even moments of L by M2m ∈ [0, ∞], m ≥ 1, while we note
again that the odd moments must be zero by symmetry. Also, MNl ,A,K will denote the
K th moment under LNl ,A.
We will show below that for a fixed A > a and K = 2m, m ≥ 1,
(2m − 1)!! ⎡ Yn∗ + · · · + YNl
[ A(1 + γ )]m ≤ lilm→∞inf MNl ,A,K = lilm→∞inf E ⎣ N 1+2γ
l
Once (11) is shown, it will follow from the upper estimate and from the relation
L = liml→∞ LNl ,A for all A > a that
for all K ≥ 1 and all A > a. Since (11) holds for any A > a, letting A ↓ a and
using (11) and (12), one has that in fact
MK =
In summary, we obtain that for any fixed A > a,
At the same time, we recall that the normal distribution is uniquely determined by
its moments, and therefore the convergence toward a normal law is implied by the
convergence of all the moments (see, e.g., [5], Section 2.3.e). In our case, (13) along
with (9) implies L = liml→∞ LNl ,A = Normal(0, σ 2). Therefore, it only remains to
prove (11).
E $Yn∗ + · · · + YN %
= I + K !
n∗≤i1<i2<···<iK ≤N
where I are lower-order terms, as it will be shown below. Using (2) along with (10),
we may continue with
≤ I + K !
n∗+1≤i1<i2<···<iK ≤N +1
By the same token,
By the calculation in the “Appendix,” the RHS of (14) is
Ui1,...,ik := i11−γ − i21−γ + i31−γ − i41−γ + · · · + i K1−−γ1 − i K1−γ .
N K (1+γ )/2
I + K ! × cm (1 − γ 2)m m! · (1 + o(1)).
E $Yn∗ + · · · + YN %
≥ I + K !
exp dUi1,...,ik
1≤i1<i2<···<iK ≤N
K ! N K (1+γ )/2
= I + dm (1 − γ 2)m m! · (1 + o(1)).
The reason the remaining terms, collected in I , are of lower order is as follows.
Apart from the already estimated term, in the expansion for E(Yn∗ + · · · + YN )K for
r = 1, 2, . . . , K − 1 we also have to sum up the terms of the type
E(Yip11 Yip22 . . . Yirpr ), where n∗
≤ i1 < · · · < ir ≤ N , all p j ≥ 1, and p1 + p2 + · · · + pr = K .
Since Yi = ±1, and thus Yip = 1 if p is even and Yip = Yi if p is odd, it suffices to
estimate only the sums
R(r ; 1, . . . , r ; N ; K ; γ ) :=
where the summation is taken over all sets (i1, . . . , ir ) such that ik+1 ≥ ik + k ,
1 ≤ k ≤ K , for all k, i1 ≥ 1 and ir ≤ N . However, since r ≤ K − 1, each of
the sums R(r ; 1, . . . , r ; N ; K ; γ ) is at most of order N r(1+γ )/2 ≤ N (K −1)(1+γ )/2,
precisely by the same arguments which were used to estimate the sum in (14). The
number of those sums can be large, as it is the number of integer partitions of K , but
it depends only on K and does not increase with N .
Consequently, for m ≥ 1 we have
≤ lilm→s∞up E ⎡⎣ Yn∗ +N·l1·+2·γ+ YNl
≤ (II),
(II) :=
and by similar computation,
The proof is complete.
(I) :=
and note that
6 When Does the Law of Large Numbers Hold for General Sequences
{ pn}?
A natural question to ask is when does SN obey the Strong (Weak) Law of Large
Numbers. The following result gives a partial answer.
For a positive even number K , introduce the shorthand
E (N , K ) := N −K
ei1,i2 ei3,i4 . . . eiK−1,iK ,
1≤i1<i2<···<iK ≤N
The first condition in the next theorem may look reminiscent of Kolmogorov’s
sufficient condition for the Strong Law of Large Numbers.
< ∞.
(C2) For some even number K ,
Then SLLN holds, that is, SN /N → 0 a.s.
(b) (Weak Law) The WLLN holds if and only if for each positive even number K ,
(c) (no LLN) If for each positive even number K ,
E (N , K ) < ∞.
then the Law of Large Numbers breaks down, and in fact, Law(SN /N ) converges
to a law which has zero odd moments and even moments {μK }.
Note that (16) is the so-called Carleman condition, guaranteeing that the μK s
correspond to at most one probability law (see Theorem 3.11, Section 2, in [5]).
Proof We will use the facts about the method of moments for weak convergence
discussed in the proof of Theorem 2, along with the fact that from (4), it follows that
= K ! E (N , K ) + O
(b) Since |SN /N | ≤ 1, we know that SN /N converges to zero in law (i.e., in
probability, since the limit is deterministic) if and only if all its moments converge to zero.
(On direction is to realize that the kth moment is the same as E f K (SN /N ), where
f (x ) = x on the unit interval, f (x ) := 1 for x > 1 and f (x ) = −1 for x < −1;
then f is bounded and continuous. The other direction is also known, since the
deterministically zero distribution is uniquely determined by its moments). By
symmetry, it is enough to check the even moments, for which we know (17). The
statement then follows from the fact that the remainder term is O(1/N ).
(c) Assume that the conditions in (c) hold. Since the moments of SN /N converge (the
odd moments are zero by symmetry), the corresponding laws are tight and, by the
Carleman condition, all subsequential limits are the same. That is, as N → ∞,
Law(SN /N ) converges to a law with moments given by μK , and since μK > 0,
the limit cannot be deterministically zero.
Corollary 2 When pn = a/nγ with 0 < γ < 1, a > 0 (subcritical case), the Strong
Law of Large Numbers holds: SN /N → 0, P-a.s. (Observe that in view of Theorem 2,
convergence in probability is immediate).
Proof We have seen in the proof of Theorem 2 that all moments, and in particular the
second moment of the ratio SN /N 1+2γ , converge as N → ∞. Thus E(SN2 ) ∼ N 1+γ
and
E(SN2 /N 2) − 1/N
hence condition (C1) of Theorem 3 is satisfied. (Here f N ∼ gN means that
limN →∞ f N /gN exists and is positive).
Corollary 3 (Monotonicity) If WLLN holds for the sequence { pn}, then it also holds
for the sequence { pˆn}, whenever pˆn ≥ pn for all n.
of the ei, j and the fact that ei, j =
pk ’s for each given 1 ≤ i < j .
Proof This follows from Theorem 3(b) along with the definition of E (N , K ) in terms
j
k=i+1(1 − 2 pk ) is monotone decreasing in the
7 Giving Up Symmetry
Now we will show that, in the supercritical case as well as in the setups of Theorem 1
and of Theorem 2, the initial condition being symmetric (i.e., X1 is equally likely to
be 0 or 1) is actually not essential for the limiting distributions.
Thus, in this section we assume w.l.o.g. that X1 ≡ 0 and thus Y1 ≡ 1 and Yk =
(−1)W2+···+Wk , k ≥ 2.
In the supercritical case, we will have again TN /N → ζ ∈ {0, 1} ∼ Bernoulli(q)
a.s., but because of the lack of symmetry, we can no longer claim that q = 1/2. Our
next statement, the proof of which may already be known, gives the exact value of q
for any sequence of { pn}. In particular, if at least one of pi ’s is 1/2, then q = 1/2,
which is already clear from the symmetry.
Proof Since we are in the supercritical regime, only finitely many turns occur a.s.,
and hence Yn = Y∞ ∈ {−1, 1} for all large n; as a result, Yn → Y∞ a.s. Hence, using
Cesáro mean, SN /N → Y∞ a.s. as well. By the Bounded Convergence Theorem,
EY∞ = limn→∞ EYn = e1,∞. Since Y∞ = 2ζ − 1 and Eζ = q, we have 2q − 1 =
e1,∞.
E(Yi1 Yi2 . . . YiK ) = e0,i1 ei2,i3 . . . eiK−1,iK .
The calculation (6) remains valid for even K ; however, if K is odd, one cannot claim
any more that ES K
N = 0. At the same time, a calculation similar to (6) immediately
shows that if K = 2m + 1, then ESNK = ESN2m+1 = O(N 2m ) = o(N K ), and hence
the rescaled odd moments tend to zero, while the even moments are the same as in the
original model. Hence the limiting distribution must be the same.
Similar argument holds for the subcritical case as well. Indeed, the even moments
ESNK remain the same, while the odd moments for K = 2m + 1 will be ESN2m+1 =
O N m(1+γ ) = o N K (1+γ )/2 due to (18) and the result from the “Appendix.”
8 Further Heuristic Arguments and a Conjecture
In this section, we omit the details of some calculations—the reader can find them in
the preprint of this paper [6].
To avoid ambiguity, by the “classical CLT” we mean the situation where, after
normalizing with the standard deviation, the limit has a standard normal distribution,
and the standard deviation itself is of order √N ; a “nonstandard CLT” will mean that
the standard deviation (and thus the fluctuation) is of a different order.
Consider a sum of N ≥ 1 variables having the same law with finite variance. As
is well known, the two “extreme cases” for a sum are the independent case, when the
variance is linear and one gets the Central Limit Theorem, and the other one is when
all the variables are identical and the variance grows like N 2. By analogy then, (after
recalling that in our model
Var (SN ) = N + 2N 2 E (N , 2)
holds), it seems that the first crucial question is whether
E (N , 2) = O(1/N ),
N → ∞
is still the case. If (19) is true, then Var (SN ) is of order N , and one can expect that
the classical CLT holds. This happens when pn ≡ p ∈ (0, 1).
In a situation when (19) fails, one should know at least if
E (N , 2) = o(1)
holds. Indeed, we know from Theorem 3(b) that the exact criterion for WLLN to hold
is that
E (N , K ) = o(1), for all K = 2m.
In light of this, we make the following conjecture.
If (21) fails for { pn}, then WLLN is no longer valid for the proportion, that is, the
proportion is not concentrated about 1/2 at all. (see examples 3, 4 below).
Examples Supporting the Discussion and Conjecture 1
In the examples below, the deviations from the classical CLT are becoming more
marked as we go from Example 2 to Example 3 to Example 4. Recall that SN :=
Y1 + ...YN and TN := X1 + · · · + X N with TN = (YN + 1)/2; the frequency of heads
is TN /N .
N 2 E (N , 2) =
Therefore, the variance is still of order N , but the constant has changed. Recall
that Cov(Yi , Y j ) = ei, j = κ j−i , and, following [8], define σc2 := 1 +
2 i∞=1 Cov(Yi , Y j ) = 1 −cc , assuming that Y1 ∼ Bernoulli(1/2). In this case,
since we are dealing with a time homogeneous Markov chain, it is well known (see [8])
that
Therefore, unless c = 1/2, the classical CLT is slightly changed, since
Var(Sn) ∼ σc2n. It is also clear that the limiting normal variance can be
arbitrarily large when c is sufficiently small and thus turns occur very rarely. On the other
hand, it can be arbitrarily small if c is sufficiently close to 1 and thus turns occur
very frequently. (If the turns were certain, then the limiting variance would vanish of
course).
Example 2 (Classical CLT breaks down) Consider the case pn := a/nγ with 0 <
γ < 1. Then
that is Var(SN ) is of order N γ +1, and the power is strictly between 1 and 2. Now (21)
is true, WLLN is still in force, and SN /N is still around zero (the proportion of heads
is around 1/2). But (19) is false. The closer γ to 1, the more the situation differs
from the classical CLT. We now have a nonstandard CLT, with larger than classical
fluctuations.
((ij−−11))ij . Consequently,
N 2 E (N , 2) =
is of order N 2, that is, (19) and even (21) are false, causing The Law of Large Numbers
to break down, and SN /N is no longer around zero. This means that the correlation is
as strong as in the case of identical variables, and the fluctuations are now of order N ,
destroying LLN. Similar is the situation when pk = ak with a > 0.
In terms of the relative frequency of heads, instead of being around the δ1/2
distribution, now it tends to the Beta(a, a) distribution.
Example 4 (Extreme limit) Consider the case when n pn < ∞. Then lim inf N →∞ E
(N , 2) > 0 holds (hence (19) and even (21) are false).
Indeed, as we know, the limit of SN /N is “extreme”: 21 (δ−1 + δ1), which is as far
away from δ0 as possible! (i.e., Beta(0, 0) = 21 (δ0 + δ1) ≡Bernoulli(1/2) for the
frequencies of heads).
We conclude this section with an open problem.
Problem 1 (Monotonicity for SLLN ) Is it true that if SLLN holds for the sequence
{ pn}, then it also holds for the sequence { pˆn}, whenever pˆn ≥ pn for all n?
The corresponding statement for WLLN is true by Corollary 3.
Acknowledgements We would like to thank the Associate Editor and the referee for many useful
suggestions and recommendations on how to improve the present paper. The Associate Editor’s remarks about the
history of the problem and the suggested references were especially valuable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.
In this appendix, we will estimate the quantity
Q(n∗, N ) :=
exp c i11−γ − i21−γ + i31−γ − i41−γ
n∗≤i1<i2<···<iK ≤N
for large N and fixed n∗, γ , c, with K = 2m, m ≥ 1. As needed for equation (14),
we will show that it asymptotically equals cmN(1K−(1γ+2γ))m/2m! as N → ∞.
The result will immediately follow from the next statement, as Q(n∗, N )
(asymptotically) does not depend on n∗ as N → ∞.
Lemma 1 Define
n∗+2l≤i2l+1<i2l+2<···<i2m−1<i2m ≤N
exp &c i 1−γ
2l+1 − i21l−+γ2 + · · · + i21m−−γ1 − i21m−γ '× Zl
Q(n∗, N )
Proof We are going to prove the statement by induction on l.
For l = 0, it is true. Now assume that we have established (22) for some l ≥ 0.
Then
Q(n∗, N ) =
n∗+2l+2≤i2l+3<i2l+4<···<i2m−1<i2m ≤N
+ · · · + i21m−−γ1 − i21m−γ '
n∗+2l≤i2l+1<i2l+2<i2l+3
where the sum in the second line is taken over i2l+1 and i2l+2 only. We shall estimate
the sum below.
First, note that
(∗) =
n∗≤i2l+1<i2l+2<N
where each expression is between 0 and 1, can be very well approximated by the
corresponding integral, since, whenever y ≤ x , |x˜ − x | ≤ 1 and |y˜ − y| ≤ 1, the ratio
is bounded above by ec1[y−γ +x−γ ] where c1 > 0 is some constant. Hence, outside of
the area where x and y are both smaller than √N , the above ratio is very close to 1,
while the double sum over that area can be at most N . Therefore, as N → ∞,
n∗≤i2l+1<i2l+2<N
= (1 + o(1))
n∗
ecy1−γ −cx1−γ dx dy + O(N ).
To calculate the inner integral, observe
e−cx1−γ ) γ x γ −1 *
· 1 − c(1 − γ )
where R(y, N ) :=
Note that ψ(y) ↓ 1 as y → ∞, hence, since y ≥ n∗,
n∗
e−cx1−γ dx* ecy1−γ dy ≤
(1 + ψ(n∗))R(y, N )ecy1−γ dy
exp &c[i2l+1 − i21l−+γ2' · o N l(1+γ )
1−γ
Consequently, (∗) = O(N 1+γ ) and
n∗+2l≤i2l+1<i2l+2<i2l+3
The next step is to compute
n∗+2l≤i2l+1<i2l+2<i2l+3
i2(1l++1γ )l × exp &c i21l−+γ1 − i21l−+γ2 ' .
n∗+2l≤i2l+1<i2l+2<i2l+3, i2l+1≤Nγ/2
i2(1l++1γ )l ≤ N γ /2 · N · N (1+γ )l = o N (l+1)(1+γ )
yq ec[y1−γ −x1−γ ] dx dy =
yq ecy1−γ ) b e−cx1−γ dx* dy
where q := (1 + γ )l and b := i2l+3. We are again allowed to do this as for |x˜ − x| ≤ 1
and |y˜ − x| ≤ 1 the ratio
that x ≥ y ≥ N γ /2 yields that this ratio is very close to 1, and thus the double sum
in (25) equals
yq ecy1−γ ) b e−cx1−γ dx* dy + o N (l+1)(1+γ ) .
From (24), since y ≥ N γ /2 and hence is large, we get that the inner integral equals
(1 + o(1)) R(y, b). Therefore, the main expression in (26), up to a factor 1 + o(1) and
the remainder term, equals
= (q + 1 + γ )c(1 − γ ) −
− (∗∗).
Now the only remaining step is to show that the integral (**) is of smaller order; then
the induction step is finished. To this end, fix some γ < θ < 1, and do the following
estimation.
(∗∗) ∝
d y ≤
= o
yq bγ ec[(b−bθ )1−γ −b1−γ ] d y + bθ · bq+γ
exp &c i21m−−γ1 − i21m−γ ' × Zm−1,
since b ≤ N .
Finally, (23) follows from repeating the above steps verbatim with the sum
n∗+2m<i2m−1<i2m <i2m+1
by replacing i2m+1 by N + 1.
1. Abramowitz , M. ; Stegun , I. A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Reprint of the 1972 edition. Dover ( 1992 )
2. Bowman , F. : Introduction to Bessel Functions . Dover Publications Inc ., New York ( 1958 )
3. Dietz , Z. , Sethuraman , S. : Occupation laws for some time-nonhomogeneous Markov chains . Electron. J. Probab . 12 (23), 661 - 683 ( 2007 )
4. Dobrushin , R.L.: Central limit theorems for non-stationary Markov chains . I., II. Theory Probab. Appl . 1 , 65 - 80 ( 1956 )
5. Durrett , R.: Probability: Theory and Examples , 2nd edn. Duxbury Press, Belmont ( 1995 )
6. Engländer , J. , Volkov , S. : Turning a coin over instead of tossing it, preprint . arXiv:1606.03281
7. Gantert , N. : Laws of large numbers for the annealing algorithm . Stoch. Process. Appl . 35 , 309 - 313 ( 1990 )
8. Jones , G.L.: On the Markov chain central limit theorem . Probab. Surv . 1 , 299 - 320 ( 2004 )
9. Lyons , R.: Strong laws of large numbers for weakly correlated random variables . Mich. Math. J . 35 ( 3 ), 353 - 359 ( 1988 )
10. Peligrad , M. : Central limit theorem for triangular arrays of non-homogeneous Markov chains . Probab. Theory Relat. Fields 154 ( 3-4 ), 409 - 428 ( 2012 )
11. Sethuraman , S. , Varadhan , S.R .S.: A martingale proof of Dobrushin's theorem for non-homogeneous Markov chains . Electron. J. Probab . 10 (36), 1221 - 1235 ( 2005 )