Maximal Inequalities for Martingales and Their Differential Subordinates
Adam Osekowski
We introduce a method of proving maximal inequalities for Hilbertspacevalued differentially subordinate local martingales. As an application, we prove that if X = ( Xt )t0, Y = (Yt )t0 are local martingales such that Y is differentially subordinate to X , then

(Ft )t0 of sub fields of F . In addition, we assume that F0 contains all the events
of probability 0. Let X, Y be two adapted local martingales, taking values in a certain
separable Hilbert space H with norm   and scalar product , . With no loss of
generality, we may take H = 2. As usual, we assume that the trajectories of the
processes are rightcontinuous and have limits from the left. The symbol [X, X ] will
stand for the quadratic covariance process of X : this object is given by [X, X ] =
n=1[X n, X n], where X n denotes the nth coordinate of X and [X n, X n] is the usual
square bracket of the realvalued martingale X n (see e.g. Dellacherie and Meyer [15]
for details). In what follows, X = supt0 Xt  will denote the maximal function of
X , we also use the notation Xt = sup0st Xs . Furthermore, for 1 p , we
shall write X  p = supt0 Xt  p and X  p = sup X  p, where the second
supremum is taken over all adapted bounded stopping times .
Throughout the paper we assume that the process Y is differentially subordinate
to X . This concept was originally introduced by Burkholder [8] in the discretetime
case: a martingale g = (gn)n0 is differentially subordinate to f = ( fn)n0, if for any
n 0 we have dgn d fn. Here d f = (d fn)n0, dg = (dgn)n0 are the difference
sequences of f and g, respectively, given by the equations
fn =
k=0
k=0
d fk and gn =
dgk , n = 0, 1, 2, . . . .
The extension of the domination to the continuoustime setting is due to Bauelos and
Wang [3] and Wang [23]. We say that Y is differentially subordinate to X , if the process
([X, X ]t [Y, Y ]t )t0 is nondecreasing and nonnegative as a function of t . If we treat
given discretetime martingales f , g as continuoustime processes (via Xt = f t and
Yt = g t , t 0), we see this domination is consistent with Burkholders original
definition of differential subordination.
To illustrate this notion, consider the following example. Suppose that X is an
Hvalued martingale, H is a predictable process taking values in the interval [1, 1]
and let Y be given as the stochastic integral Yt = H0 X0 + 0t+ Hs dXs , t 0. Then Y
is differentially subordinate to X : we have
[X, X ]t [Y, Y ]t = (1 H02)X02 +
(1 Hs2)d[X, X ]s .
Another example for stochastic integrals, which plays an important role in applications
(see e.g., [2,3,16]), is the following. Suppose that B is a Brownian motion in Rd and
H, K are predictable processes taking values in the matrices of dimensions m d
and n d, respectively. For any t 0, define
Xt =
Hs dBs and Yt =
If the HilbertSchmidt norms of H and K satisfy Kt HS Ht HS for all t > 0,
then Y is differentially subordinate to X : this follows from the identity
[X, X ]t [Y, Y ]t =
Hs 2HS Ks 2HS
The differential subordination implies many interesting inequalities comparing the
sizes of X and Y. A celebrated result of Burkholder gives the following information
on the L pnorms (see [8,10,12,13,23]).
Theorem 1.1 Suppose that X , Y are Hilbertspacevalued local martingales such
that Y is differentially subordinate to X . Then
Y  p ( p 1)X  p, 1 < p < ,
For p = 1, the above moment inequality does not hold with any finite constant, but
we have the corresponding weaktype (1, 1) estimate. In fact, we have the following
result for a wider range of parameters p, proved by Burkholder [8] for 1 p 2 and
Suh [22] for p > 2. See also Wang [23].
Theorem 1.2 Suppose that X , Y are Hilbertspacevalued local martingales such
that Y is differentially subordinate to X . Then
P(Y 1)
P(Y 1)
p p1
Both inequalities are sharp, even if H = R.
There are many other related results, see e.g., the papers [3] and [4] by Bauelos
and Wang, [11] and [13] by Burkholder and consult the references therein. For more
recent works, we refer the interested reader to the papers [1820] by the author, and
[6,7] by Borichev et al. The estimates have found numerous applications in many
areas of mathematics, in particular, in the study of the boundedness of various classes
of Fourier multipliers (consult, for instance, [13,12,16,17]).
There is a general method, invented by Burkholder, which enables one not only
to establish various estimates for differentially subordinated martingales, but is also
very efficient in determining the optimal constants in such inequalities. The idea is
to construct an appropriate special function, an upper solution to a nonlinear problem
corresponding to the inequality under investigation, and then to exploit its properties.
See the survey [13] for the detailed description of the technique in the discretetime
setting and consult Wang [23] for the necessary changes which have to be implemented
so that the method worked in the continuoustime setting.
The above results can be extended in another, very interesting direction. Namely, in
the present paper we will be interested in inequalities involving the maximal functions
of X and/or Y . Burkholder [14] modified his technique so that it could be used to study
such inequalities for stochastic integrals, and applied it to obtain the following result,
which can be regarded as another version of (1.1) for p = 1.
Theorem 1.3 Suppose that X is a realvalued martingale and Y is the stochastic
integral, with respect to X , of some predictable realvalued process H taking values
in [1, 1]. Then we have the sharp estimate
As we have already observed above, if X and Y satisfy the assumptions of this
theorem, then Y is differentially subordinate to X . An appropriate modification of the
proof in [14] shows that the assertion is still valid if we impose this less restrictive
condition on the processes. However, the assertion does not hold any more if we pass
from the real to the vector valued case. Here is one of the main results of this paper.
Theorem 1.4 Suppose that X , Y are Hilbertspacevalued local martingales such
that Y is differentially subordinate to X . Then
The constant is the best possible, even for discretetime martingales taking values
in a twodimensional subspace of H.
This is a very surprising result. In most cases, the inequalities for stochastic integrals
of real valued martingales carry over, with unchanged constants, to the corresponding
bounds for vectorvalued local martingales satisfying differential subordination. In
other words, given a sharp inequality for Hvalued differentially subordinated
martingales, the extremal processes, i.e. those for which the equality is (almost) attained,
can be usually realized as stochastic integrals in which the integrator takes values in
onedimensional subspace of H. See e.g., the statements of Theorems 1.1 and 1.2.
Here the situation is different: the optimal constant does depend on the dimension of
the range of X and Y .
Finally, let us mention here another related result. In general, the best constants
in nonmaximal inequalities for differentially subordinated local martingales do not
change when we restrict ourselves to continuouspath processes; see e.g., Section 15
in [8] for the justification of this phenomenon. However, if we study the maximal
estimates, the best constants may be different: for example, the passage to
continuouspath local martingales reduces the constant in (1.2) to 2. Specifically, we have the
following theorem, which is one of the principal results of [21].
Theorem 1.5 Assume that X, Y are Hilbertspacevalued, continuouspath local
martingales such that Y is differentially subordinate to X . Then
Y  p
Y  p ( p 1)X  p, 2 < p < .
Both inequalities are sharp, even if H = R.
We have organized the paper as follows. The next section is devoted to an extension
of Burkholders method. In Sect. 3 we apply the technique to establish (1.3). In Sect. 4
we prove that the constant cannot be replaced in (1.3) by a smaller one. The final part
of the paper contains the proofs of technical facts needed in the earlier considerations.
2 On the Method of Proof
Burkholders method from [14] is a powerful tool for proving maximal inequalities
for transforms of discretetime realvalued martingales. Then the results for the wider
setting of stochastic integrals are obtained by the use of approximation theorems
of Bichteler [5]. This approach has the advantage that it avoids practically all the
technicalities which arise naturally in the study of continuoustime processes. On
the other hand, it does not allow to study estimates for (local) martingales under
differential subordination; the purpose of this section is to present a refinement of the
method which can be used to handle such problems.
The general statement is the following. Let V : H H [0, ) [0, ) R
be a given Borel function and suppose that we want to show the estimate
EV (Xt , Yt , Xt, Yt) 0
for any t 0 and any Hvalued local martingales X, Y such that Y is differentially
subordinate to X . Due to some technical reasons, we shall deal with a slightly different,
localized version of (2.1) (see Theorem 2.2 for the precise statement). Let D =
HH(0, )(0, ). Introduce the class U (V ), which consists of all C 2 functions
U : D R satisfying (2.2)(2.5) below: for any (x , y, z, w) D,
U (x , y, z, w) 0 if x  z, y min{x , w},
U (x , y, z, w) V (x , y, z, w) if x  z, y w.
Furthermore, there is a locally bounded measurable function c : D [0, ) such
that for all (x , y, z, w) D with x  z, y w and all h, k H,
Ux x (x , y, z, w)h, h + 2 Ux y (x , y, z, w)h, k + Uyy (x , y, z, w)k, k
c(x , y, z, w) k2 h2 .
U (x + h, y + k, x + h z, y + k w)
U (x , y, z, w) + Ux (x , y, z, w), h + Uy (x , y, z, w), k .
The latter condition implies that
Uz (x , y, z, w) 0 if x  = z,
Uw(x , y, z, w) 0 if y = w.
[X, X ]t = X02 + [X c, X c]t +
 Xs 2 for t 0.
0<st
Here Xs = Xs Xs is the jump of X at time s. Furthermore, [X c, X c] = [X, X ]c,
the pathwise continuous part of [X, X ]. Here is Lemma 1 of Wang [23].
Lemma 2.1 If X and Y are semimartingales, then Y is differentially subordinate to
X if and only if Y c is differentially subordinate to X c,  Yt   Xt  for all t > 0
and Y0 X0.
We are ready to study the interplay between the class U (V ) and the bound (2.1).
Theorem 2.2 Assume that U (V ) is nonempty and X, Y are Hilbertspacevalued
local martingales such that Y is differentially subordinate to X . Then there is a
nondecreasing sequence (N )N 1 of stopping times such that limN N = and
I1 =
I2 =
I3 =
TN, j t
TN, j t
EV (XN t , YN t , XN t , YN t ) 0
Proof Let (n)n1 be the localizing sequence for X and Y . Fix t > 0, > 0, N
{1, 2, . . .} and let
N = N inf{s > 0 : Xs  + Ys  + Xsc + Ysc N }.
For 0 s t and d D, put
Xs(d) = (Xs1, Xs2, . . . , Xsd , 0, 0, . . .),
Ys(d) = (Ys1, Ys2, . . . , Ysd , 0, 0, . . .)
Zs(d) = (Xs(d), Ys(d), Xs(d) , Ys(d) ).
There is a sequence (TN, j ) j1 of stopping times with TN, j N , localizing the
stochastic integrals Ux (Zs(d)) dXs(d), Uy (Zs(d)) dYs(d). Since X (d), Y (d) take
values in finitedimensional subspace, we may apply Itos formula to get
U (Z T(dN), j t ) U (Z 0(d)) = I1 + I2 + I3/2 + I4,
TN, j t
TN, j t
TN, j t
Ux (Zs(d)) dXs(d) +
Uy (Zs(d)) dYs(d),
Uyy (Zs(d)) d[Y (d), Y (d)]sc,
Note that the integrals in I2 are with respect to the continuous parts of the processes
Uw(Zs(d)) ( Ys(d) ) in I4.
= Xs(d) } in which the support of d(X (d) )sc is contained. This gives that the
first integral in I2 is nonpositive, the second one is handled analogously. To deal with
I3, fix 0 s0 < s1 t . For any 0, let (i )1ii be a nondecreasing sequence of
stopping times with 0 = s0, i = s1 such that lim max1ii 1 i+1 i  =
0. Keeping fixed, we apply, for each i = 0, 1, 2, . . . , i , the property (2.4) to
x = Xs(0d), y = Ys(0d) , z = X (d)
s0 , w = Ys(0d) and h = hi = X (d)c
TN, j i+1
X (d)c , k = ki = Y (d)c Y (d)c . We sum the obtained i + 1 inequalities
TN, j i TN, j i+1 TN, j i
and let . Using the notation [S, T ]su = [S, T ]u [S, T ]s , we may write the
result in the form
m=1 n=1
Uxm xn (Zs(0d))[X mc, X nc TN, j s1
]TN, j s0 + 2Uxm yn (Zs(0d))[X mc, Y nc TN, j s1
]TN, j s0
+ Uym yn (Zs(0d))[Y mc, Y nc TN, j s1
]TN, j s0
k=1
where in the third passage we have exploited the differential subordination of Y c to
X c. From the local boundedness of c and the definition of N , we infer that on the set
I3 C
]TN, j t ,
using a standard approximation of integrals by discrete sums. Finally, we see that each
summand in I4 is nonpositive, directly from (2.5) and the fact that  Ys   Xs ,
see Lemma 2.1. Consequently,
I4 U
Z T(dN), j t U
Z T(dN), j t
Ux (Z T(dN), j t),
X T(dN), j t Uy (Z T(dN), j t), Y (d)
on the set {TN, j > 0}. Plug all the above estimates into (2.9) and take expectation of
both sides. By (2.8), the bound we obtain can be rewritten in the form
E U (Z T(dN), j t) U (Z 0(d)) + Ux (Z T(dN), j t),
X T(dN), j t
For fixed N , the random variables Z T(dN), j t, j 1, d D, are uniformly bounded
on {N > 0}, in view of the definition of N . Moreover, we have
 X T(dN), j t  =  X T(dN), j t 1{TN, j =N } +  X T(dN), j t 1{TN, j <N }
 X(dN)t 1{TN, j =N } + X T(dN), j t  + X T(dN), j t 1{TN, j <N }
and, similarly,  Y (d)
TN, j t   YN t  + 2N . The random variables  XN t  and
 YN t  are integrable on 1{N >0}, since (N )N 1 localizes X and Y . Thus, if we let
j and then d in (2.10), we obtain
E U (ZN t) U (Z0) + Ux (ZN t),
+ Uy (ZN t), YN t 1{N >0} C ,
by Lebesgues dominated convergence theorem. Here Zs = (Xs , Ys , Xs , Ys
), s 0. Let 0 and apply (2.5) to get
E U (ZN t ) U (Z0) = E U (ZN t ) U (Z0) 1{N >0} 0.
It remains to use (2.2) and (2.3) to complete the proof.
Remark 2.3 A careful inspection of the proof of the above theorem shows that the
function U need not be given on the whole D = H H (0, ) (0, ). Indeed,
it suffices to define it on a certain neighborhood of the set {(x , y, z, w) D : x 
z, y w} in which the process Z takes its values. This can be further relaxed: if
we are allowed to work with those X, Y which are bounded away from 0, then all
we need is a C 2 function U given on some neighborhood of {(x , y, z, w) D : 0 <
x  z, 0 < y w}, satisfying (2.2)(2.5) on this set.
3 The Special Function Corresponding to (1.3)
Now we apply the approach described in the previous section to establish (1.3). Let
V : D R be given by V (x , y, z, w) = y (x  z). Furthermore, put
U (x , y, z, w) = z
: [0, ) R is
( t ) =
t log(1 + t ) (2 log 2) .
We start with four technical lemmas, which will be proved in Sect. 5.
Lemma 3.1 (i) We have ( t ) ( 1) 0 for t 1.
(ii) We have ( t ) t for t 0.
(iii) For any c 0 the function
(2 log 2)s
is convex and nonincreasing.
(iv) For any c > 0, the function
f (s) = s c log 1 +
Lemma 3.2 The function y ( y2) is convex on H.
Lemma 3.3 (i) For any y, k H, we have
(2 log 2)(1
1 + k2) + (1
1 + k2) log( 1 + k2 + y + k)
1 + k2 log( 1 + k2) 0.
(ii) For any y, k H with y + 1
1 + k2 + k y we have
(2log 2)(1
Lemma 3.4 Assume that x , y, h, k H and z > 0 satisfy x  = z, x , h 0 and
k h. Then
U (x + h, y + k, x + h z, y + k w) U (x , y, z, w)
1
+ 1+
y, k x , h
Equipped with these four lemmas, we turn to the following statement.
Theorem 3.5 The function U belongs to the class U (V ).
Proof We check each of the conditions (2.2)(2.5) separately.
The estimate (2.2): this follows immediately from the first part of Lemma 3.1.
The property (2.4): we derive that the lefthand side of the estimate equals
with S = y2 x 2 + z2. The property follows.
The majorization (2.3): in particular, (2.4) implies that for any h the function
t U (x + t h, y, z, w) is concave on [t, t+], where t = inf{t : x + t h z}
and t+ = sup{t : x + t h z}. Consequently, it suffices to verify (2.3) only for
(x , y, z, w) satisfying x  = z. But this reduces to the second part of Lemma 3.1.
The condition (2.5): by homogeneity and continuity of both sides, we may assume
that z = 1 and x  < 1. Define
H (t ) = U (x + t h, y + t k, x + t h 1, y + t k w)
for t R and let t, t+ be as above; note that t < 0 and t+ > 0. By (2.4), H
is concave on [t, t+] and hence (2.5) holds if x + h 1. Suppose then that
x + h > 1 or, in other words, that t+ < 1. The vector x = x + t+h satisfies
x , h 0: this is equivalent to ddt x + t h2t=t+ 0. Hence, by (3.4), if we put
y = y + t+k, then
U (x + h, y + k, x + h 1, y + k w)
= H (t+) + H(t+)(1 t+)
H (0) + H (0)t+ + H (0)(1 t+) = H (0) + H (0).
This is precisely the claim.
Proof of (1.3) It suffices to establish the estimate for X L1, because otherwise there
is nothing to prove. Furthermore, we may assume that Y is bounded away from 0. To
see this, consider a new Hilbert space R H and the martingales (, X ) and (, Y ),
with > 0. These martingales are bounded away from 0 and (, Y ) is differentially
subordinate to (, X ). Having proved (1.3) for these processes, we let 0 and get
the bound for X and Y , by Lebesgues dominated convergence theorem.
We must show that for any bounded stopping time we have
Now we make use of the methodology described in the previous section (in particular,
we exploit Remark 2.3). Since U U (V ), the above estimate follows immediately
from (2.7), applied to the local martingales (X t )t0, (Y t )t0, and letting N
, t and 0.
4 Sharpness
The constant can be shown to be optimal in (1.3) by the use of appropriate examples,
but then the calculations are quite involved. To simplify the proof, we use a different
approach. Assume that the probability space is the interval [0, 1] equipped with its
Borel subsets and Lebesgues measure. Suppose that there is 0 (0, ) with the
following property: for any discrete filtration (Fn)n0 and any adapted martingales
f, g taking values in R2 such that g is differentially subordinate to f , we have
We shall show that the validity of this estimate implies the existence of a certain special
function, with properties similar to those in the definition of the class U (V ). Then, by
proper exploitation of these conditions, we shall deduce that 0 .
Recall that a sequence ( fn)n0 is called simple if for any n the term fn takes only
a finite number of values and there is a deterministic N such that f N = f N +1 =
f N +2 = . . . = f. For any (x , y) R2 R2, introduce the class M(x , y) which
consists of those simple martingale pairs ( f, g) with values in R2 R2, which satisfy
the following two conditions.
(i) ( f0, g0) (x , y),
(ii) for any n 1 we have dgn d fn.
Here we also allow the filtration (Fn)n0 to vary. Let W : R2 R2 (0, ) R
{} be given by the formula
W (x , y, z) = sup Eg 0E( f z) ,
where the supremum is taken over all ( f, g) M(x , y).
Lemma 4.1 The function W enjoys the following properties.
(i) W is finite.
(ii) W is homogeneous of order 1: for any (x , y, z)
= 0,
s t
W (x + t h, y + t k, z) + s + t W (x sh, y sk, z) W (x , y, z). (4.2)
s + t
Proof (i) This follows from (4.1): for any ( f, g) M(x , y) the martingale g y
= (gn y)n0 is differentially subordinate to f , so for any z > 0,
Eg 0E( f z) y + Eg y 0E f y.
Taking the supremum over ( f, g) M(x , y) yields W (x , y, z) y < .
(ii) Use the fact that ( f, g) M(x , y) if and only if ( f, g) M(x , y).
(iii) This follows immediately from the very definition of W .
(iv) The constant pair (x , y) belongs to M(x , y).
(v) Take any x , y1, y2 R2, (0, 1) and let y = y1 + (1 )y2. Pick
( f, g) M(x , y) and observe that ( f, g + yi y) M(x , yi ), i = 1, 2. Thus,
Eg 0E( f z) E g
 + y1 y 0E( f z)
+ (1 ) E g
 + y2 y 0E( f z)
Taking the supremum over ( f, g) M(x , y) gives the desired convexity.
(vi) This is a consequence of the socalled splicing argument of Burkholder (see
e.g., in [9, p. 77]). For the convenience of the reader, let us provide the easy proof.
Pick ( f +, g+) M(x +t h, y +t k), ( f , g) M(x sh, y sk). These two
pairs are spliced together into one pair ( f, g) as follows: set ( f0, g0) (x , y)
and (recall that = [0, 1])
fn1, gn1
fn+1, gn+1
s
s + t
t + s
t
for n = 1, 2, . . .. It is not difficult to see that ( f, g) is a martingale pair with
respect to its natural filtration. Furthermore, it is clear that this pair belongs
to M(x , y). Finally, since x  z, we have fn z = sup1kn  fk  z for
n = 1, 2, . . . and therefore
( r ) = inf{W (x , y, 1) : x  = 1, y = r }.
We shall establish the following property of this object.
( r )
Proof Fix > 0. Pick (x , y, z) R2 R2 (0, ) satisfying x  = z = 1, y = r
and apply (4.2) with h = x , k = y/r , s = and t > 0. We obtain
= + t
where we have used parts (ii) and (iii) of Lemma 4.1. By part (v) of that lemma, the
function s W (x , sy, 1), s R, is continuous. Thus, if we let t , we get
W (x , y, 1) W (x , y/r, 1) + W (x x , y + y/r, 1)
( 1) + W (x x , y + y/r, 1).
Now we have come to the point where we use the fact that we are in the vectorvalued
setting. Namely, we pick a vector d R2 \ {0}, orthogonal to y + y/r (x x ). Let
s, t > 0 be uniquely determined by the equalities x x sd = x x + t d = 1.
Then
y + y/r sd2 x x sd2 = y + y/r 2 x x 2
since, as we have assumed at the beginning, y = r and x  = 1. In other words,
we have y + y/r sd = y2 + 2y + 2 and, similarly, y + y/r + t d =
y 2
  + 2y + 2. Therefore, if we apply (4.2) with x := x x , y := y +
y/r, z = 1, h = k = d and s, t as above, and combine it with the definition of ,
we get
Plugging this into (4.4) and taking the infimum over x , y (satisfying x  = 1, y = r ),
we arrive at the estimate
( r )
Now we are ready to prove that 0 ; suppose on contrary that this inequality
does not hold. By induction, (4.3) yields
k=0 1 +
Fix t > 1, put = (t 2 1)/n and let n to obtain
( 1) ( t ) +
1 .
2
for all t > 1.
Now we shall choose an appropriate t . We have ( 1) < 1; otherwise, we would
let t and obtain the contradiction with the assumption 0 < . Furthermore,
( 1) 1 0 > 2. Thus, the number t , determined by the equation ( 1) = 1 +tt ,
satisfies t > 1. Application of (4.5) with this choice of t gives
t log
2 .
It remains to note that for any t > 1 the righthand side is not smaller than . This
follows from a standard analysis of the derivative. The proof is complete.
5 Proofs of Technical Lemmas
Proof of Lemma 3.1 (i) We have
(1 + 1) < 0.
(ii) The claim is equivalent to ( t ) := ( t 2) t + 0 for all t 0. We
easily check that is convex on [0, ) and, by virtue of (1.4), satisfies () =
() = 0.
(t ) = (1 + 1)/(2(1 + t )) > 0 and ( 1) =
c
log 1 + s
and the expression in the square brackets is nonnegative: indeed, the function
x log(1 + x ) (1 + x )1 + (1 + x )2, x 0,
vanishes at 0 and is nondecreasing.
(iv) We compute that f (s) = [ 4(c + s)2s ]1 0.
Proof of Lemma 3.2 Pick y1, y2 H and (0, 1). By the concavity of the
logarithm, we have
y1 log(1 + y1) + (1 ) y2 log(1 + y2)
y1 + (1 )y2 log (1 + y1 + (1 )y2) .
This can be further bounded from below by
y1 + (1 )y2 log (1 + y1 + (1 )y2) ,
Proof of Lemma 3.3 (i) This follows easily from the obvious estimates
1 + k2 + y + k log( 1 + k2)
1
1 + k2 log( 1 + k2).
(ii) For simplicity, we shall write k, y instead of k, y, respectively. We consider
two major cases.
Case I: Suppose that
1 + k2 (2 log 2)(1 + y).
Then 1 + k2 2 log 2, or k k0 := (2 log 2)2 1. In addition, k1+yk2 1,
so using the convexity of the function (s) = 2s log(1 + s), s 0, we have
1
k y
k y k y
2 1 + k2 log 1 + 1 + k2
(2 log 2)
k y
Hence it suffices to prove that
(2 log 2)(1 + k
k
1 + k2) 1 + y log(1 + y) + (2 log 2)y.
(1) If y 1, then the function
k (2 log 2)(1 + k
is nonincreasing: its derivative at k equals
(2 log 2) 1
= 0.03 . . . < 0.
Thus, for y 1 all we need is to check (5.2) for k satisfying the equation
1 + k2 = (2 log 2)(1 + y). But then the estimate is equivalent to
(k
1 + k2) log(1 + y) + (2 log 2)y,
and the lefthand side is negative, the righthand side is nonnegative.
(2) If 1 < y < 2, then by (5.1) we have k 4(2 log 2)2 1 > 2.4.
Consequently, the lefthand side of (5.2) is smaller than 2 log 2, while the righthand
side exceeds
Case II: Now we assume that
1 + k2 < (2 log 2)(1 + y).
F (k) = (2 log 2)(1
1 + k2 log 1 +
We derive that F (k) = J1 + J2, where
J1 =
log 1 +
k y
y k y
J2 = 1 + y + 1 + k2 + k y
Since log(1 + x ) x /(x + 1) for x > 1, we have J1 0. Furthermore, using the
assumption k2 + 1 + k y 1 + y, we get
=
(2 log 2)
Proof of Lemma 3.4 Of course, we may assume that h = 0. Furthermore, by
homogeneity, it suffices to verify the estimate for z = 1. It is convenient to split the reasoning
into three parts.
Step 1 First we shall show (3.4) in the case when x and h are linearly dependent.
Introduce the function G : [0, ) R given by
G(t ) = x + t h
We shall prove that this function is convex. To do this, fix t1, t2 0, 1, 2 (0, 1)
with 1 + 2 = 1, and let t = 1t1 + 2t2. Using Lemma 3.2, we get
where in the third passage we have exploited the linear dependence of x , h and the
inequality x , h 0. Therefore, using the bound k h and Lemma 3.1 (i),
G(t ) G(1)
=
=
Step 2 Next we check (3.4) in the case when x and h are orthogonal. The inequality
becomes
y + k
y log(1 + y) (2 log 2).
As a function of h, the lefthand side of the inequality is nonincreasing (see Lemma
3.1 (iii)), so it suffices to prove the bound for h = k. Fix y, k and consider the
lefthand side as a function F of y, k . This function is concave (Lemma 3.1 (iv)),
and
F ( y, k ) =
1 + k2 + y + k 1 + y
Now, if y + 1 > 1 + k2 + k y, then F vanishes at y, k = (1 + y)
(1 1 + k2) and hence it suffices to establish (5.4) for y and k satisfying this
equation. A little calculation transforms the estimate into (3.2). On the other hand, if
y + 1 1 + k2 + k y, then F is nonpositive on [yk, yk] and we
need to verify (5.4) for y, k satisfying y, k =  yk. Then the bound reduces to
(3.3).
Step 3 Finally, we treat (3.4) for general vectors. The bound is equivalent to
y + k
y + k x + h log 1 +
(2 log 2)x + h +
For fixed x , y, h, and k, the lefthand side, as a function of x , h , is convex
(see Lemma 3.1 (iii)) and hence it suffices to verify the estimate in the case when
x , h = {x h, 0}. These cases have been considered in Steps 1 and 2.
Acknowledgments This work was partially supported by Polish Ministry of Science and Higher
Education (MNiSW) Grant N N201 397437. The author would like to thank the anonymous Referee for the
careful reading of the first version of the paper, the comments and remarks, which greatly improved the
presentation.
Open Access This article is distributed under the terms of the Creative Commons Attribution License
which permits any use, distribution, and reproduction in any medium, provided the original author(s) and
the source are credited.