The Geometry of Random {-1,1}-Polytopes

Discrete & Computational Geometry, Jul 2005

S. Mendelson, A. Pajor, M. Rudelson

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs00454-005-1186-y.pdf

The Geometry of Random {-1,1}-Polytopes

Discrete Comput Geom Geometry Discrete & Computational The Geometry of Random }-Polytopes S. Mendelson 2 A. Pajor 1 2 M. Rudelson 0 0 Department of Mathematics, University of Missouri , Columbia, MO 65211 , USA 1 Laboratoire d'Analyse et Mathe ́matiques Applique ́es, Universite ́ de Marne-la-Valle ́e , 5 boulevard Descartes, Champs sur Marne, 77454 Marne-la-Valle ́e Cedex 2 , France 2 Centre for Mathematics and its Applications, Institute of Advanced Studies, The Australian National University , Canberra, ACT 0200 , Australia Random {−1, 1}-polytopes demonstrate extremal behavior with respect to many geometric characteristics. We illustrate this by showing that the combinatorial dimension, entropy and Gelfand numbers of these polytopes are extremal at every scale of their arguments. The goal of this article is to investigate some geometric properties of {−1, 1}-polytopes, which are symmetric convex hulls of subsets of the combinatorial cube {−1, 1}n . Formally, let n ≥ 1 and N ≥ 1 be integers. For any set {ωi : 1 ≤ i ≤ N } ⊂ {−1, 1}n , define Kn,N = Kn,N (ω1, . . . , ωN ) = conv(±ω1, . . . , ±ωN ) = absconv(ω1, . . . , ωN ). 1. Introduction ∗ The research by the first author was supported in part by an Australian Research Council Discovery grant, that by the second author was supported in part by an Australian Research Council Fellowship, and the third author’s research was supported in part by NSF Grant DMS-024380. Our focus is on random {−1, 1}-polytopes, where the randomness is generated by the uniform (counting) probability measure on {−1, 1}n. We say that a certain property is satisfied by a random {−1, 1}-polytope if the set of polytopes Kn,N satisfying this property has probability larger than 1 − cn, where c ∈ (0, 1) is a numerical constant which is independent of n and N . Equivalently, one can consider the random structure at hand in the following manner. Let ξ be a symmetric {−1, 1}-valued random variable and let (ξi, j ), 1 ≤ i ≤ N , 1 ≤ j ≤ n, be independent copies of ξ . If e1, . . . , en denote the standard unit vectors, each n Xi = j=1 ξi, j ej is a random point in {−1, 1}n and Kn,N = absconv(X1, . . . , X N ). Throughout this article, we denote by · the canonical Euclidean norm. The corresponding unit ball and its unit sphere are denoted by B2n and Sn−1, respectively. For any Lebesgue measurable set L ⊂ Rn, put vol(L) to be the volume of L and for a set T ⊂ Rn, let absconv(T ) be its symmetric convex hull. It is well known that random polytopes generated by random points on the sphere demonstrate the extremal behavior with respect to many geometric characteristics (see for instance [ 23 ] and an extensive survey [ 12 ]). The investigation of the complexity of random {−1, 1}-polytopes or, equivalently, 0/1-polytopes is more recent (see the survey [ 26 ]). For example, see [ 3 ] for the study of the number of facets and [ 9 ] where it is established that the volume of a random {−1, 1}-polytope with N vertices is the largest possible among all polytopes Kn,N . The main results of this article show that this extremal behavior is true for three important geometric parameters—the combinatorial dimension, the entropy and the Gelfand numbers (defined below). All three parameters are scale-sensitive, and our results show that random polytopes are the “worst possible” among all polytopes Kn,N at every scale of the parameter in question. Indeed, we show that the behavior in the random case matches the upper bounds that hold for any polytope Kn,N . The significance of such results is the fact that the parameters in question play a central role in Asymptotic Geometric Analysis, Empirical Processes theory and Nonparametric Statistics (see, e.g., [ 1 ], [ 8 ], [ 14 ], [ 16 ]–[ 18 ] and references therein), where they serve as a way of measuring the richness or the complexity of a given set. Hence, our result is yet another indication that random polytopes are the “most complicated” in the class Kn,N . Definition 1.1. Let (Y, d) be a metric space and let K ⊂ Y . For every ε > 0, we define the covering number N (K , ε, d) at scale ε, as the minimal number of balls of radius ε (with respect to the metric d) needed to cover K . Usually, we use the Euclidean metric in Rn, in which case, for any ε > 0, we denote the covering number at scale ε by N (K , ε B2n), that is, the number of translates of the n-dimensional Euclidean ball of radius ε needed to cover K . More generally, N ( A, B) is the number of translates of B needed to cover A. Definition 1.2. Let (Y, d) be a metric space. A set is ε-separated with respect to a metric d if the distance between every two distinct points in the set is larger than ε. We denote the maximal cardinality of an ε-separated subset of Y by D(Y, ε, d). As for the covering numbers, when using the Euclidean metric on a set K of Rn, we denote by D(K , ε B2n) the maximal cardinality of an ε-separated subset of K . It is easy to see that the cardinality of a maximal ε-separated subset of Y is equivalent to the covering numbers of Y , namely, for every ε > 0, N (Y, ε, d) ≤ D(Y, ε, d) ≤ N (Y, ε/2, d). The second parameter we study is the combinatorial dimension, which measures the tradeoff between the size of a cube contained in a coordinate projection of a set F and the dimension of the projection. This parameter was introduced independently by several authors—particularly in the context of empirical processes (see, for example, [ 17 ] and [ 22 ]). Definition 1.3. Let F be a set of functions f : → R. For every ε > 0, a set σ = {x1, . . . , xn} ⊂ is said to be ε-shattered by F if there is some function s: σ → R, such that for every I ⊂ {1, . . . , n} there is some f I ∈ F for which f I (xi ) ≥ s(xi ) + ε if i ∈ I , and f I (xi ) ≤ s(xi ) − ε if i ∈ I . Define the shattering dimension at scale ε as VC(F, , ε) = sup{|σ | | σ ⊂ , σ is ε-shattered by F }, where |σ | denotes the cardinality of σ . In cases where the underlying space is clear we denote the combinatorial dimension by VC(F, ε). If F is {−1, 1}-valued, we denote its combinatorial dimension by VC(F ). Observe that the combinatorial dimension is a scale-sensitive version of the Vapnik– Chervonenkis (VC) dimension [ 25 ], which is defined for subsets of the combinatorial cube as the largest dimension of a coordinate projection of F which is the entire combinatorial cube of that dimension. In our case the underlaying space will always be the set of coordinates given by the standard unit basis {e1, . . . , en} and each vector in Rn is a function on this set in the natural way. Also, since we are only interested in convex symmetric sets (as F = Kn,N is convex and symmetric), it is possible to take the level function s ≡ 0 (see, e.g. [ 13 ]). Hence, the combinatorial dimension of Kn,N at scale ε is simply the largest dimension of a subset σ ⊂ {1, . . . , n} such that the coordinate projection Pσ from Rn onto Rσ satisfies ε B∞|σ | ⊂ Pσ Kn,N = {(k(i ))i∈σ : k ∈ Kn,N }, where Bd is the cube of dimension d. ∞ Since our results only hold for a certain range of N and n, we require the following assumption: Assumption 1. 2n ≤ N ≤ 2n. A result we use throughout this article was recently proved in [ 11 ], and shows that a random polytope contains the interpolation body generated by the cube and a “large” Euclidean ball. Theorem 1.4. There exist absolute positive constants c, c1 and c2 for which the following holds. Let n and N be integers such that n < N ≤ 2n and let α = α(N , n) = n/(N − n). For every 0 < β ≤ 21 one has Pr Kn,N ⊃ C (α) β log(2N /n) B2n ∩ B∞n ≥ 1 − exp(−cnβ N 1−β ), where C (α) = c1c2α. We mention that a similar result was obtained by Giannopoulos and Hartzoulaki [ 9 ], though for a slightly more restrictive range of N , namely, for N ≥ n log 2n, and with a weaker probability estimate—only 1 − exp(−cn). Observe that C (α) β log(2N /n)B2n ∩ B∞n ⊃ C (α) β log(2N /n) Bn , n ∞ and, in particular, Theorem 1.4 implies that if Assumption 1 is satisfied and indeed N ≥ 2n, then with probability at least 1 − exp(−cnβ N 1−β ), vol1/n(Kn,N ) ≥ c1 β log(2N /n) n (1) for some absolute constants c and c1. The article is organized as follows. The next section is devoted to the proof of some deterministic upper bounds on the entropy and the combinatorial dimension of symmetric convex hulls of subsets of cardinality N of √n Sn−1; hence, these estimates hold true for any {−1, 1}-polytope. In particular, we prove a complementary result to the Carl–Pajor theorem [ 6 ], by obtaining an entropy estimate for scales smaller than c log(N /n). In Section 3 we show that both upper bounds are sharp as they are attained by a random {−1, 1}-polytope in both cases. We end the article by proving a similar result for Gelfand numbers (defined below). Finally, a notational convention. Throughout, all absolute constants are positive numbers and are denoted by c, C , K and κ. Their values may change from line to line, or even in the same line. We write a ∼ b if there are absolute positive constants c and C such that ca ≤ b ≤ Ca. 2. Deterministic Upper Bounds The first deterministic upper bound we require is on the n2 entropy of any {−1, 1}polytope, and was established in [ 6 ]. Theorem 2.1. There exist absolute positive constants c0 and c1 for which the following ahnolydεs.≥Lect0N√n≥/Nn,, let T ⊂ √n Sn−1 with |T | ≤ N and put K = absconv(T ). Then, for n log N (K , ε B2n) ≤ c1 ε2 log c1 N ε2 A result of a similar flavor is a volumetric estimate on K , which was established independently in [ 2 ], [ 6 ] and [ 10 ]. Theorem 2.2. There exists an absolute positive constant c such that for any K as above, vol(K )1/n ≤ c log(c N /n) 1/2 n An immediate corollary which follows from Theorem 2.2 is an estimate on the combinatorial dimension of any {−1, 1}-polytope. Corollary 2.3. There exists an absolute positive constant C such that for any polytope Kn,N and any 0 < ε ≤ 1, VC(Kn,N , ε) ≤ min C log(C N ε2) ε2 Proof. Since a projection onto k coordinates of a {−1, 1}-polytope in Rn is a {−1, 1}polytope in Rk , then by the volumetric estimate of Theorem 2.2, it is clear that a kprojection of such a polytope cannot contain r Bk for r larger than c(log(N / k)/ k)1/2, from which the estimate easily follows. ∞ It is evident from the formulation of Theorem 2.1 that it is not optimal for all scales of ε. The main result of this section is an entropy estimate for any polytope Kn,N and ε ≤ c log(N /n). This estimate will later be shown to be sharp. Theorem 2.4. There exist absolute positive constants c0 and c1 for which the following holds. Let T ⊂ √n Sn−1 with |T | ≤ N and set K to be its symmetric convex hull. Then for any ε ≤ log(c0 N /n), log N (K , ε B2n) ≤ n log c1 log(c1 N /n) ε . Before presenting the proof, we introduce some volumetric parameters of a convex body K which are related to its mixed volumes (see [ 18 ] and [ 20 ]). Definition 2.5. Let K be a convex compact subset of Rn. For every 1 ≤ d ≤ n, set wd (K ) = 1 vol(B2d ) Gn,d vol( PE K ) d E 1/d , where PE is the orthogonal projection onto E and d E is the Haar probability measure on the Grassmann manifold of subspaces of dimension d of Rn. We also set w0(K ) = 1. The well-known Alexandrov inequalities state that for 1 ≤ d ≤ n, wd (K ) is nonincreasing. For a symmetric convex body K , let K ◦ = {x : x , y ≤ 1 for any y ∈ K }, where x , y denotes the scalar product of vectors x and y. Set M ∗(K ) = Sn−1 x K ∗ dσ , where σ is the Haar probability measure on the sphere and · K ∗ is the norm for which K ◦ is its unit ball. It is easy to verify that w1(K ) = M ∗(K ), and thus, for 1 ≤ d ≤ n, wd (K ) ≤ M ∗(K ) (see Chapter 9 of [ 18 ] or Chapter 6 of [ 20 ]). Finally, recall the Steiner–Minkowski formula (see [ 18 ] and [ 20 ]), that for any t > 0, (2) Lemma 2.6. Let T and K be as in Theorem 2.4. Then for every 1 ≤ d ≤ n, vol(K + t B2n) vol(B2n) = n d=0 n d t n−d wdd (K ). wd (K ) ≤ c log c N d , where c is an absolute positive constant. Proof. Fix 1 ≤ d ≤ n and for u ≥ 1 set u = E ∈ Gn,d : u√d ≤ sup PE t < (u + 1)√d . t∈T By a standard concentration argument for Lipschitz functions on the sphere and the connection between the Haar measure on the sphere and on the Grassmann manifold [ 16 ], there exists κ > 0, such that for every d ≥ κ log N and u ≥ 1, Pr ( u+1) ≤ exp(−c0u2d). Applying Theorem 2.2, it is evident that if T ⊂ √d B2d and |T | ≤ N then vol (absconv(T )) ≤ cd (log(c N /d)/d)d/2. Hence, if E ∈ u then PE K = absconv( PE T ) ⊂ (u + 1)√d B2d , and Gn,d vol( PE K ) d E ≤ vol( PE K ) d E + The claim now follows for d ≥ κ log N because vol(B2d )1/d ∼ 1/√d. It is well known (see, for instance, Lemma 4.14 of [ 18 ]) that if T ⊂ √n Sn−1 and |T | ≤ N then M ∗(K ) ≤ c2√log N , and since wd (K ) ≤ M ∗(K ) then for d ≤ κ log N , wd (K ) ≤ M ∗(K ) ≤ c2 log N ≤ c3 log c3 N d Proof of Theorem 2.4. It is standard to verify that if A and B are convex and symmetric sets in Rn and B ⊂ A then N ( A, B) ≤ 3n vol( A)/vol(B) (see [ 18 ]). In particular, N (K , ε B2n) ≤ N (K + ε B2n, ε B2n) ≤ 3 n vol(K + ε B2n) . vol(ε B2n) By the Steiner–Minkowski formula (2) and the previous lemma, vol((1/ε)K + B2n) vol(B2n) wd (K ) d ε ≤ n d=0 n d c2 ε2 log c N d d/2 = = n d=0 n d=0 n d n d ρd , where ρd = ((c2/ε2) log(c N /d))d/2. A straightforward computation shows that there exists an absolute positive constant c3 such that if ε ≤ log(c N /n), then for every 1 ≤ d ≤ n and every N and ε, ρd ≤ c3 ε2 log(c3 N /n) n/2 . Hence, for some absolute constant c4, we have log N (K , ε B2n) ≤ n log c4 log(c4 N /n) ε as claimed. It is convenient to use the terminology of the so-called s-numbers (see [ 18 ]). For a subset K ⊂ Rn and any j ≥ 1, the j th Gelfand number is defined by cj (K ) = inf{maxx∈K ∩E x : E subspace of Rn, codim(E ) < j } and the j th entropy number is defined by ej (K ) = inf{ε: N (K , ε B2n) ≤ 2 j−1}. Thus, the kth Gelfand number of a body is half of the smallest diameter of a (k − 1)-codimensional section of K and the entropy numbers are the discrete inverse of the logarithm of the covering numbers. Just like the upper bound on the entropy (and thus on ek ), one can prove the following upper estimate on the Gelfand numbers. Theorem 2.7 [ 6 ]. There exists an absolute positive constants c0 such that the following holds. Let N ≥ n, let T ⊂ √n B2n and put K = absconv(T ). Then, for any 1 ≤ k ≤ n, ck (K ) ≤ c0 min √n , 3. Lower Bounds for Random Polytopes We start by formulating and proving the lower bound on the combinatorial dimension of a random polytope. , . Theorem 3.1. There exist absolute positive constants c and c1 for which the following holds. Let n and N be integers which satisfy Assumption 1. Then, for any 0 < β < 12 and N ≥ n, with probability of at least 1 − exp(−cnβ N 1−β ), for every 0 < ε < 1, VC(Kn,N , ε) ≥ min Cβ log(c1 N ε2) ε2 , n , where Cβ depends only on β. A well-known bound on the cardinality of subsets of the combinatorial cube is the Sauer–Shelah lemma [ 19 ], [ 21 ], [ 25 ]. Theorem 3.2. If T ⊂ {−1, 1}n and d = VC(T ), then |T | ≤ d i=0 n i ≤ en d d , where the last inequality holds if n ≥ d. In particular, if |T | ≥ 2αn then VC(T ) ≥ Cαn, where Cα depends only on α. Proof of Theorem 3.1. We prove a lower bound on the inverse function of the combinatorial dimension of a convex symmetric set A. For 1 ≤ d ≤ n, let f A(d) be the largest ε such that, for some σ ⊂ {1, . . . , n} with |σ | = d, ε B∞d ⊂ Pσ A = (a(i ))i∈σ : a ∈ A . Clearly, our claim will follow if we show that with high probability, for any 1 ≤ d ≤ n, fKn,N (d) ≥ min{Cβ log(2N /d)/d, 1}. First, suppose that 4d ≤ log2 N and divide the set {1, . . . , n} into subsets of cardinality 2d. Consider one of these subsets, say J = {1, . . . , 2d}, and denote by PJ the coordinate projection from Rn onto RJ . Let Tn,N be the set of vertices of Kn,N . Then Pr({| PJ Tn,N | ≤ 22d−1}) ≤ 22d−1 =1 22d · 22d N 1 N ≤ 222d · 2 . Since 4d ≤ log2 N , the last expression does not exceed 2−N/2. Note that the projections PJ Tn,N are independent for disjoint subsets J , so the probability that all such projections contain less than 22d−1 distinct elements is at most 2−(n/2d)N/2. Assume now that the projection on at least one subset J contains more than 22d−1 elements. By the Sauer– Shelah lemma, VC( PJ Tn,N ) ≥ d and thus VC(Tn,N ) ≥ d. Therefore, when 4d ≤ log2 N , we have fKn,N (d) ≥ 1 with probability higher than 1−2−(n/2d)N/2 ≥ 1−exp(−cnβ N 1−β ) for some absolute constant c. Next, fix d ≥ log2 N and thus 2d ≥ N ≥ 2n ≥ 2d. Again, we divide {1, . . . , n} into disjoint subsets with d elements, and since the coordinate projections onto these subsets are “independent” random Kd,N polytopes, then by Theorem 1.4 at least one of these polytopes contains a cube of size C β log(2N /d)/d with probability greater than 1 − exp(−c(n/d)dβ N 1−β ) ≥ 1 − exp(−cnβ N 1−β ). Hence, with that probability, fKn,N (d) ≥ C β log(2N /d)/d. Since the function fKn,N is non-increasing, for 14 log2 N ≤ d < log2 N and 1 ≤ d ≤ n, we have fKn,N (d) ≥ fKn,N (log2 N ) ≥ c ≥ C β log(2N /d) d with probability at least 1 − exp(−cnβ N 1−β ). Theorem 3.1 can be used to resolve the following question. It was shown in [ 15 ] that there are absolute positive constants c and C such that for any class of functions bounded by 1, VC(conv(F ), ε) ≤ C · VC(F, cε) ε2 Theorem 3.3. There exist absolute positive constants C and c for which the following holds. For every 0 < ε < 12 there is a class Fε of functions bounded by 1 such that VC(Fε, cε) VC(conv(Fε), ε) ≥ C · ε2 log(1/ε) . Now, one can remove the logarithmic factor and construct a set for which the lower bound matches the upper one for “most” values of ε. Theorem 3.4. There exist absolute positive constants c1 and c2 for which the following holds. Let T be a random subset of {−1, 1}n with 2n elements and set F = T ∪ −T . Then, with probability at least 1 − exp(−c1n), for any γ < 12 and ε ≥ c2/nγ , VC(conv(F ), ε) ≥ c3(γ ) · VC(F, ε) ε2 , where c3(γ ) depends only on γ . Proof. Since F consists of {−1, 1}-valued functions (on the coordinates {e1, . . . , en}), then for any ε > 0, VC(F, ε) ≤ c log n. On the other hand, by Theorem 3.1 for β = 21 , with probability at least 1 − exp(−cnβ N 1−β ) ≥ 1 − exp(−cn) for any γ < 12 and ε ≥ c2/nγ , c VC(conv(F ), ε) ≥ ε2 log(cnε2) ≥ c (γ ) ε2 log n ≥ c3(γ ) VC(F, ε) ε2 . Next, we turn to the question of entropy. We will show that at a scale below c log(N /n), a lower bound on the entropy follows from the fact that Kn,N contains the interpolation body α B2n ∩ Bn for an appropriate value of α, and thus must have a large entropy. ∞ However, for larger scales, one needs an additional argument in order to construct a large separated subset in Kn,N . Theorem 3.5. There exist absolute positive constants C , κ, c, c1 and c2 for which the following holds. For any κ log(N /n) ≤ ε ≤ C √n, with probability at least 1 − exp(−cn), n log D(Kn,N , ε B2n) ≥ c1 ε2 log Lemma 3.6. Let 0 < λ ≤ 12 and for every integer N fix m ≤ N /2. Let B(N , m) be the family of subsets of {1, . . . , N } of cardinality m. Then there exists a subset P ⊂ B(N , m) which satisfies that log| P| ≥ (1 − λ)m log (cλ(N /n)) and if I, J ∈ P and I = J then |I J | ≥ λm. In other words, N log D(B(N , m), λm, dH) ≥ (1 − λ)m log cλ m , where dH is the Hamming metric (that is, dH(I, J ) = |I J |). Proof. Without loss of generality, assume that λm is an integer. Pick any subset of cardinality m of {1, . . . , N } and throw away all subsets of size m such that |I J | ≤ λm. There are at most m k=(1−λ)m m k N − m m − k ≤ 2m max (1−λ)m≤k≤m N − m m − k ≤ 2m N λm such subsets, since m ≤ N /2. Now, select a new subset of size m from the remaining subsets. Repeating this argument, we obtain a family P of subsets of size m which are λm-separated in the Hamming metric and with cardinality larger than 2m N λm (N /2m)m ≥ 2m (N e/λm)λm , which concludes the proof. Next, we use the following formulation of Bernstein’s inequality: N m n i=i One can formulate Theorem 3.7 using the ψ1 norm of the random variable Z . Recall that Z ψ1 = inf{b > 0: E exp(|Z |/b) ≤ 2}. Random variables with a bounded ψ1 norm display an exponential tail (see, for example, [ 24 ]) and the sum of independent copies of Theorem 3.7 [ 4 ], [ 24 ]. Let Z1, . . . , Zn be independent random variables with zero mean, such that for every i and every k ≥ 2, E|Zi |k ≤ k! M k−2vi /2. Then, for any v ≥ in=1 vi and any u > 0, Pr such a variable is highly concentrated. Indeed, it is easy to see that if E exp(|Z |/b) ≤ 2, that is, if Z ψ1 ≤ b, then k∞=1(E|Z |k /bk k!) ≤ 2. Hence, if Zi are distributed as Z , the assumptions of Theorem 3.7 are satisfied for M = Z ψ1 and v = 4n Z 2ψ1 , implying that Pr As an example, consider Zi = ( lj=1 ξi, j )2 − l where, as before, (ξi, j ) are independent, symmetric, {−1, 1}-valued random variables. It is easily verified that E exp(Zi / l) ≤ 2, and thus (3) is satisfied with Z ψ1 ≤ l. Proof of Theorem 3.5. Let m ≤ N /2, to be defined later and set P as in Lemma 3.6 for λ = 21 . Let Xi = nj=1 ξi, j ej and define the random vectors YI = (1/m) i∈I Xi . Thus, each Xi is a random point in {−1, 1}n and YI is a convex combination of points Xi out of the set {Xi , 1 ≤ i ≤ N }. If I, J ∈ P and I = J then Since the random variables ξi, j are symmetric the same holds for each Xi , implying that tYhIe−saYmJehdaissttrhiebustaimone adsis(t1ri/bnu)tionnj=as1((1/im∈I) J ξii∈,Ij )2J. Xi . Thus, (m2/n) · YI − YJ 2n2 has Note that this random variable is highly concentrated. Indeed, setting Z j = ( i∈I J ξi, j )2, it is easy to see that Z j ψ1 ≤ |I J | ≤ m. Hence, by (3), Pr YI − YJ 2n2 − E YI − YJ 2n2 > m2 un = Pr 1 n n j=1 Since E YI − YJ 2n2 = n|I it follows that (Zi − EZi ) > u J |/m2 ≥ λn/m = n/2m, then applying (4) with u = m/4 for some absolute constant c0. Moreover, by (4), for any t > 0, Pr({ YI − Yj n2 ≥ (1 + 2t )E YI − Yj n2 }) ≤ 2 exp(−c0nt ), and by a standard integration argument all the L p norms of YI − YJ n2 are equivalent to the L1 norm with a constant depending only on p. In particular, E YI − YJ n2 ≥ c(E YI − YJ 2n2 )1/2 ≥ c1 n m . Therefore, with probability at least 1 − 2 exp(−c0n), YI − YJ n2 ≥ c2√n/m. Set m = c22n/ε2 and κ = c2/√log 2. Fix ε ≥ κ log(N /n), and thus m ≤ n ≤ N /2 as required in Lemma 3.6. , . Also, and thus 2 log| P| ≤ c0n/8. Hence, for every such ε, with probability at least 1 − exp(−c0n/4) for every distinct I, J ∈ P, YI − YJ n2 ≥ ε, implying that Kn,N contains an ε-separated set whose cardinality satisfies that To handle scales below κ log(N /n), we prove the following: Lemma 3.8. Let κ, N and n be as in Theorem 3.5. There exist absolute positive constants c, c1, c2 and c3 for which the following holds. For any ε ≤ min{κ log(N /n), c√n}, with probability at least 1 − exp(−c1 N 1/2n1/2), log D(Kn,N , ε B2n) ≥ c2n log c3 log(N /n) ε Observe that the constant κ appearing in the restriction ε ≥ κ log(N /n) is of no particular significance, and we could have chosen to use any other absolute constant. Indeed, this follows from the fact that the cardinality of an ε-separated set is monotone in the scale and since the estimates of Theorem 3.5 and of Lemma 3.8 coincide for ε ∼ log(N /n). Proof. Recall that for any two convex, symmetric bodies A and B in Rn, the covering number N ( A, B) satisfies that N ( A, B) ≥ vol( A)/vol(B). Hence, if we apply the volumetric estimate (1) which holds for a random {−1, 1}polytope, it is evident that with probability 1 − exp(−c1 N 1/2n1/2), vol(Kn,N ) D(Kn,N , ε B2n) ≥ N (Kn,N , ε B2n) ≥ vol(ε B2n) ≥ c2 log(2N /n) ε n . Corollary 3.9. There exist absolute positive constants ci , 0 ≤ i ≤ 4, and κ such that if n and N satisfy Assumption 1, and if we set  log H (ε) = c3n  1  ε2 log log(2N /n) ε c4 N ε2 n if ε ≤ κ log(N /n), if κ log(N /n) ≤ ε ≤ √n, then with probability at least 1 − exp(−c0n), for any c1 exp(exp(−c2n)) ≤ ε ≤ √n, log D(Kn,N , ε B2n) ≥ H (ε). Proof. By the previous results it is evident that for any fixed 0 ≤ ε < √n, with probability at least 1 − exp(−cn), log D(Kn,N , ε B2n) ≥ H (ε). Fix ε0 = exp(− exp(cn)) and k = exp(c n/2), and let εi = 2i ε0 for 0 ≤ i ≤ k. Then, with probability at least 1 − exp(c n), log D(Kn,N , εi B2n) ≥ H (εi ), which implies that with the same order of probability, for any ε ∈ [ε0, √n], log D(Kn,N , ε B2n) ≥ c H (ε) for a suitable constant c. c3 nαk log 2N αk 1/2 ≤ αkeαk (Kn,N ) ≤ ρ sup j cj (Kn,N ) 1≤ j≤αk for some absolute constant c3. Clearly, one has sup j cj (Kn,N ) ≤ sup j cj (Kn,N ) + 1≤ j≤αk 1≤ j<k sup j cj (Kn,N ). k≤ j≤αk We conclude by applying Theorem 3.5 to obtain a lower estimate on the Gelfand numbers of a random Kn,N . Recall that the upper estimate holds for any polytope Kn,N and was established in [ 6 ]. Theorem 3.10. There exist absolute positive constants c1, c2 and c3 for which the following holds. For any 1 ≤ k ≤ n with probability at least 1 − exp(−c1n), c2 min 1 , log(2N / k) 1/2 k ≤ ck (Kn,N ) √n ≤ c3 min 1 , log(2N / k) 1/2 k . Before presenting the proof we recall the following application of a general inequality from [ 5 ]. Lemma 3.11. There exists an absolute constant ρ such that for any symmetric convex body K ⊂ Rn and 1 ≤ k ≤ n, sup j ej (K ) ≤ ρ sup j cj (K ). 1≤ j≤k 1≤ j≤k Observe that in terms of entropy numbers, Theorem 3.5 states that there exist absolute constants c1 and c2 such that, for any 1 ≤ k ≤ n, with probability at least 1 − exp(−c1n), one has ek (Kn,N ) ≥ c2 min √n , Proof of Theorem 3.10. To prove the lower estimate we can assume that k ≥ k0 = c log N . Indeed, if k < k0, then ck (Kn,N ) ≥ ck0 (Kn,N ), while for k = k0 the minimum in Theorem 3.10 is a constant. Fix k in that range and let α be a parameter larger than 1, to be defined later. From reformulation (6) of Theorem 3.5 and from (5), (5) (6) (7) Applying the upper bound of Theorem 2.7 for the first term on the right-hand side, it is evident that sup j cj (Kn,N ) ≤ c4 1≤ j≤k nk log 2N k 1/2 Since for all j ≥ k, cj (Kn,N ) ≤ ck (Kn,N ) then αkck (Kn,N ) ≥ sup j cj (Kn,N ), k≤ j≤αk and combining this with (7) and (8) implies c3 nαk log 2N αk 1/2 − c4ρ nk log 2N k 1/2 ≤ ραkck (Kn,N ). To conclude, it is evident that one can choose α such that the term on the left-hand side is larger than c4ρ (nk log(2N / k))1/2. 1. M. Anthony , P.L. Bartlett , Neural Network Learning , Cambridge University Press, Cambridge, 1999 . 2. I. Ba´ra´ny, Z. Fu¨redy, Approximation of the sphere by polytopes having few vertices , Proc. Amer. Math. Soc . 102 ( 3 ), 651 - 659 , 1988 . 3. I. Ba´ra´ny and A. Po ´r, On 0-1 polytopes with many facets, Adv . Math. 161 ( 2001 ), 209 - 228 . 4. G. Bennett, Probability inequalities for the sum of independent random variables , J. Amer. Statist. Assoc . 57 , 33 - 45 , 1962 . 5. B. Carl , Inequalities of Bernstein-Jackson type and the degree of compactness of operators in Banach spaces , Ann. Inst. Fourier 35 , 79 - 118 , 1985 . 6. B. Carl , A. Pajor , Gelfand numbers of operators with values in a Hilbert space , Invent. Math . 94 , 479 - 504 , 1988 . 7. R.M. Dudley , Universal Donsker classes and metric entropy , Ann. Probab. 15 , 1306 - 1326 , 1987 . 8. R.M. Dudley , Uniform Central Limit Theorems, Cambridge Studies in Advanced Mathematics 63 , Cambridge University Press, Cambridge, 1999 . 9. A. Giannopoulos , M. Hartzoulaki , Random spaces generated by vertices of the cube , Discrete Comput. Geom . 28 , 255 - 273 , 2002 . 10. E.D. Gluskin, Extremal properties of orthogonal parallelepipeds and their applications to the geometry of Banach spaces (in Russian) , Mat. Sb . (N.S.) 136 ( 178 ), no. 1 , 85 - 96 , 1988 ; translation in Math. USSR-Sb. 64 ( 1 ), 85 - 96 , 1989 . 11. A.E. Litvak , A. Pajor , M. Rudelson , N. Tomczak-Jaegermann , Smallest singular value of random matrices and geometry of random polytopes, Adv . Math., to appear. 12. P. Mankiewicz , N. Tomczak-Jaegermann, Quotients of finite-dimensional Banach spaces; random phenomena , in Handbook of the Geometry of Banach Spaces , Vol. 2 , pp. 1201 - 1246 , North-Holland, Amsterdam, 2003 . 13. S. Mendelson, G. Schechtman, The shattering dimension of sets of linear functionals , Ann. Probab. 32 ( 3A ), 1746 - 1770 , 2004 . 14. S. Mendelson , R. Vershynin , Entropy and the combinatorial dimension , Invent. Math . 152 ( 1 ), 37 - 55 , 2003 . 15. S. Mendelson , R. Vershynin , Remarks on the geometry of coordinate projections in Rn , Israel J . Math. 140 , 203 - 220 , 2004 . 16. V.D. Milman , G. Schechtman, Asymptotic Theory of Finite Dimensional Normed Spaces, Lecture Notes in Mathematics 1200 , Springer-Verlag, Berlin, 1986 . 17. A. Pajor , Sous espaces n1 des espaces de Banach, Hermann, Paris, 1985 . 18. G. Pisier, The Volume of Convex Bodies and Banach Space Geometry , Cambridge University Press, Cambridge, 1989 . 19. N. Sauer , On the density of families of sets , J. Combin. Theory Ser. A , 13 , 145 - 147 , 1972 . 20. R. Schneider , Convex Bodies: the Brunn-Minkowski Theory , Cambridge University Press, Cambridge, 1993 . 21. S. Shelah , A combinatorial problem: stability and orders for models and theories in infinitary languages , Pacific J. Math. 41 , 247 - 261 , 1972 . 22. M. Talagrand, The Glivenko-Cantelli problem , Ann. Probab. 6 , 837 - 870 , 1987 . 23. N. Tomczak-Jaegermann , Banach-Mazur Distances and Finite-Dimensional Operator Ideals . Pitman Monographs and Surveys in Pure and Applied Mathematics , 38 . Longman, Harlow; co-published in the United States with Wiley , New York, 1989 . 24. A.W. Van der Vaart , J.A. Wellner , Weak Convergence and Empirical Processes , Springer-Verlag, Berlin, 1996 . 25. V.N. Vapnik , A. Ya Chervonenkis , Necessary and sufficient conditions for uniform convergence of means to mathematical expectations , Theory Probab. Appl . 26 ( 3 ), 532 - 553 , 1971 . 26. G.M. Ziegler , Lectures on 0/1 polytopes, in Polytopes-Combinatorics and Computation (G. Kalai and G.M. Ziegler , eds.), pp. 1 - 44 , DMV Seminars , Birkha¨user, Basel, 2000 .


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs00454-005-1186-y.pdf

S. Mendelson, A. Pajor, M. Rudelson. The Geometry of Random {-1,1}-Polytopes, Discrete & Computational Geometry, 2005, 365-379, DOI: 10.1007/s00454-005-1186-y