Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications

Discrete & Computational Geometry, Sep 2009

Let A be an n×N real-valued matrix with n<N; we count the number of k-faces f k (AQ) when Q is either the standard N-dimensional hypercube I N or else the positive orthant ℝ + N . To state results simply, consider a proportional-growth asymptotic, where for fixed δ,ρ in (0,1), we have a sequence of matrices \(A_{n,N_{n}}\) and of integers k n with n/N n →δ and k n /n→ρ as n→∞. If each matrix \(A_{n,N_{n}}\) has its columns in general position, then f k (AI N )/f k (I N ) tends to zero or one depending on whether ρ>min (0,2−δ −1) or ρ<min (0,2−δ −1). Also, if each \(A_{n,N_{n}}\) is a random draw from a distribution which is invariant under right multiplication by signed permutations, then f k (Aℝ + N )/f k (ℝ + N ) tends almost surely to zero or one depending on whether ρ>min (0,2−δ −1) or ρ<min (0,2−δ −1). We make a variety of contrasts to related work on projections of the simplex and/or cross-polytope. These geometric face-counting results have implications for signal processing, information theory, inverse problems, and optimization. Indeed, face counting is related to conditions for uniqueness of solutions of underdetermined systems of linear equations. Below, let A be a fixed n×N matrix, n<N, with columns in general position. (a) Call a vector in ℝ + N k -sparse if it has at most k nonzeros. For such a k-sparse vector x 0, b=Ax 0 generates an underdetermined system b=Ax having k-sparse solution. Among inequality-constrained systems Ax=b, x≥0, having k-sparse solutions, the fraction having a unique nonnegative solution is f k (Aℝ + N )/f k (ℝ + N ).   (b) Call a vector in the hypercube I N k-simple if all entries except at most k are at the bounds 0 or 1. For such a k-simple vector x 0, b=Ax 0 generates an underdetermined system b=Ax with k-simple solution. Among inequality-constrained systems Ax=b, x∈I N , having k-simple solutions, the fraction having a unique hypercube-constrained solution is f k (AI N )/f k (I N ).

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs00454-009-9221-z.pdf

Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications

Discrete Comput Geom Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications David L. Donoho 0 1 2 Jared Tanner 0 1 2 0 J. Tanner ( ) School of Mathematics, University of Edinburgh , Edinburgh , UK 1 D.L. Donoho Department of Statistics, Stanford University , Stanford, CA , USA 2 for hosting the programme “Statistical Challenges of High Dimensional Data” in 2008 and Professor D.M. Titterington for organizing this programme. D.L. Donoho acknowledges support from NSF DMS 05-05303 and a Rothschild Visiting Professorship at the University of Cambridge Let A be an n × N real-valued matrix with n < N ; we count the number of k-faces fk (AQ) when Q is either the standard N -dimensional hypercube I N or else the positive orthant RN . To state results simply, consider a proportional-growth + asymptotic, where for fixed δ, ρ in (0, 1), we have a sequence of matrices An,Nn and of integers kn with n/Nn → δ and kn/n → ρ as n → ∞. If each matrix An,Nn has its columns in general position, then fk (AI N )/fk (I N ) tends to zero or one depending on whether ρ > min(0, 2 − δ−1) or ρ < min(0, 2 − δ−1). Also, if each An,Nn is a random draw from a distribution which is invariant under right multiplication by signed permutations, then fk (ARN+ )/fk (RN ) tends almost surely to zero or one de+ pending on whether ρ > min(0, 2 − δ−1) or ρ < min(0, 2 − δ−1). We make a variety of contrasts to related work on projections of the simplex and/or cross-polytope. These geometric face-counting results have implications for signal processing, information theory, inverse problems, and optimization. Indeed, face counting is related to conditions for uniqueness of solutions of underdetermined systems of linear equations. Below, let A be a fixed n × N matrix, n < N , with columns in general position. - J. Tanner acknowledges support from the Alfred P. Sloan Foundation and thanks John E. and Marva M. Warnock for their generous support in the form of an endowed chair. (a) Call a vector in RN+ k-sparse if it has at most k nonzeros. For such a k-sparse vector x0, b = Ax0 generates an underdetermined system b = Ax having k-sparse solution. Among inequality-constrained systems Ax = b, x ≥ 0, having k-sparse solutions, the fraction having a unique nonnegative solution is fk(ARN+ )/fk(RN ). + (b) Call a vector in the hypercube I N k-simple if all entries except at most k are at the bounds 0 or 1. For such a k-simple vector x0, b = Ax0 generates an underdetermined system b = Ax with k-simple solution. Among inequality-constrained systems Ax = b, x ∈ I N , having k-simple solutions, the fraction having a unique hypercube-constrained solution is fk(AI N )/fk(I N ). 1 Introduction There are three fundamental regular polytopes in RN , N ≥ 5: the hypercube I N , the cross-polytope CN , and the simplex T N−1. For each of these, projecting the vertices into Rn, n < N , yields the vertices of a new polytope; in fact, up to translation and dilation, every polytope in Rn is obtained by rotating the simplex T N−1 and orthogonally projecting on the first n coordinates, for some choice of N and of N -dimensional rotation. Similarly, every centrosymmetric polytope can be generated by projecting the cross-polytope, and every zonotope by projecting the hypercube. 1.1 Random Polytopes Choosing the projection A at random has become popular. Let A be a random orthogonal projection obtained by first applying a uniformly-distributed rotation to RN and then projecting on the first n coordinates. Let Q be a polytope in RN . Then AQ is a random polytope in Rn. Taking Q in turn from each of the three families of regular polytopes, we get three arenas for scholarly study: • Random polytopes of the form AT N−1 were first studied by Affentranger and Schneider [ 2 ] and by Vershik and Sporyshev [ 25 ]; • Random polytopes of the form ACN were first studied extensively by Börözcky and Henk [ 6 ]; • The random zonotope AI N was studied in passing in [ 6 ] and will be heavily studied in this paper; a literature on zonotopes can be found in [ 3, 5, 20, 23, 29 ]. Starting with [ 2, 25 ], interest has focused on the number fk(AQ) of k-faces of such random polytopes AQ; in those papers, fundamental formulas were developed for the expected values E fk(AQ). Deriving insights from these formulae in the highdimensional case has also been an important theme; Böröczky and Henk [6] studied the expected number fk(AQ) for each of these families of random polytopes, focusing on the asymptotic framework where the small dimension n is held fixed while the large dimension N → ∞ (this was previously done for Q = T N−1 in [ 2 ]). Vershik and Sporyshev [25] studied the case AT N−1 in an asymptotic framework with the dimensions N and n both proportionally large, and observed a phenomenon of sharp thresholds: random polytopes can have face lattices undergoing abrupt changes in properties as dimensions change relatively slightly. Our own previous work considered both AT N−1 and ACN [ 10, 12, 14, 16 ] and gave precise information about several such threshold phenomena. To make precise the notion of “threshold phenomenon,” consider the following proportional-dimensional asymptotic framework. A dimension specifier is a triple of integers (k, n, N ), representing a “face” dimension k, a “small” dimension n, and a “large” dimension N ; k < n < N . For fixed δ, ρ ∈ (0, 1), consider sequences of dimension specifiers, indexed by n, and obeying kn/n → ρ and n/Nn → δ as n → ∞. (1.1) For such sequences, the small dimension n is held proportional to the large dimension N as both dimensions grow. We omit subscripts on kn and Nn when possible. For Q = T N−1, CN , the papers [ 10, 12, 14, 16 ] exhibited thresholds ρW (δ; Q) for the ratio between the expected number of faces of the low-dimensional polytope AQ and the number of faces of the high-dimensional polytope Q: fk(AI N ) nl→im∞ fk(I N ) = 1, ρ < ρW (δ, I ), 0, ρ > ρW (δ, I ). Missing from the above picture is information about the third family of regular polytopes, the hypercube. Böröczky and Henk [ 6 ] mentioned in passing the case of the projected hypercube, in the case of A a random orthogonal projection. Böröczky and Henk largely worked in the asymptotic framework n fixed and N → ∞. In that framework the threshold phenomenon of interest here is not visible. In this paper, we adopt the proportional-dimensional framework (1.1) and prove the following. Theorem 1.1 (“Weak” Threshold for Hypercube) Define ρW (δ; I ) := max 0, 2 − δ−1 , 0 < δ < 1. For ρ, δ in (0, 1), consider a sequence of dimension specifiers (k, n, N ) obeying (1.1). Consider a sequence of real-valued n × N matrices A = An,N , each one with columns in general position in Rn. Then lim E fk(AQ) n→∞ fk(Q) =1, ρ < ρW (δ; Q), <1, ρ > ρW (δ; Q). (In this relation, we take a limit as n → ∞ along some sequence obeying the proportional-dimensional constraint (1.1).) In words, the random object AQ has roughly as many k-faces as its generator Q for k below a threshold and has noticeably fewer k-faces than Q for k above the threshold. The threshold functions are defined in terms of Gaussian integrals and other special functions, and can be calculated numerically. 1.2 Random Zonotopes (1.2) (1.3) (1.4) Remarks • Use of the modifier “weak” and the subscript W on ρ matches corresponding usage with T N−1 and CN . • The result shows a sharp discontinuity in the behavior of the face lattices of random zonotopes; the location of the threshold is precisely identified. Such discontinuity is also observed empirically for the other two polytopes (1.2) above; to our knowledge, a proof of discontinuity has not yet been published in that setting. • The result is universal across matrices; only general position is required. Universality of threshold effects across a range of matrix ensembles has been observed empirically for the other two regular polytopes [ 17 ]. However, theoretical results [ 1 ] for other polytopes do not yet match empirical facts. This result gives a rigorous universality result for one family of regular polytopes; this may inspire studies to see if parallel results exist for the others. We briefly discuss the ideas leading to this result. Böröczky and Henk [ 6 ] applied a fundamental identity of Affentranger and Schneider [ 2 ] on general projected polytopes and gave the explicit expression (1.5) (1.6) E fk AI N , valid where A is a uniformly-distributed random orthoprojector. In a previous version of this manuscript [ 15 ], the authors proved that the same formula holds much more generally, in fact under the assumption that A has an orthant-symmetric nullspace in general position. One of our referees pointed out that even more is true: for any A in general position, fk(AI N ) is the fixed number fk AI N = 2 N k N−k−1 =N−n N − k − 1 . This fact follows from Theorem 1.7, [ 27 ], on partions of n-space by hyperplanes, as we show below in Sect. 2.1. (1.6) appears to be known to workers on oriented matroids ([4, p. 220]) but may not seem evident to workers on convex polytopes The recent survey article What is known about unit cubes states that “no good bound for. . . [fk(AI N )] is known,” [ 29 ]. However, see [21, p. 410a]. 1.3 Random Cones Convex cones provide another family of fundamental polyhedral sets. Amongst these, the simplest and most natural is surely the positive orthant P = RN . The image K = AP of a cone under projection A: RN → Rn is again a cone. Su+ch a cone may be expected to have f0(K) = 1 vertex (at 0), and as many as f1(K) = N extreme rays, etc. In fact, every pointed cone in Rn can be generated as a (nonorthogonal) projection of the positive orthant under an appropriate projection from an appropriate RN . There seems to be relatively little prior research on random projections of the positive orthant, except for the special case k = n, which was studied by Buchta [ 8 ]. As with the polytope models, surprising threshold phenomena can arise when the projector is random, and we work in the proportional-dimensional framework. The following result makes use of the notion of a random matrix with centrosymmetric exchangeable columns; for details, see Sect. 2.2 below. Theorem 1.2 (“Weak” Threshold for Orthant) Let A be a random matrix with centrosymmetric exchangeable columns which are in general position almost surely. In the proportional-dimensional framework (1.1), we have n→∞ lim E fk(ARN ) fk(RN +) = + 1, ρ < ρW (δ; R+) 0, ρ > ρW (δ; R+) (1.7) with ρW (δ; R+) ≡ ρW (δ; I ) as defined in (1.3). Here the threshold for the orthant is at precisely the same place as it was for the hypercube. 1.4 Exact Equality in the Number of Faces Our focus in Sects. 1.1–1.3 was on “weak” agreement of E fk(AQ) with fk(Q); in the proportional-dimensional framework, for ρ below threshold ρW (δ; Q), we have limiting relative equality: We now focus on the “strong” agreement; it turns out that in the proportional dimensional framework, for ρ below a somewhat lower threshold ρS (δ; Q), we actually have exact equality with overwhelming probability: Prob fk(Q) = fk(AQ) → 1, n → ∞. (1.8) The existence of such “strong” thresholds for Q = T N−1 and Q = CN was proven in [ 10, 12 ], which exhibited thresholds ρS (δ; Q) below which (1.8) occurs. These “strong thresholds” and the previously mentioned “weak thresholds” (1.2) are depicted in Fig. 3. A similar strong threshold exists for the projected orthant. Theorem 1.3 (“Strong” Threshold for Orthant) Let H (γ ) := γ log(1/γ ) − (1 − γ ) log(1 − γ ) denote the usual (base-e) Shannon Entropy. Let R ψS + (δ, ρ) := H (δ) + δH (ρ) − (1 − ρδ) log 2. R For δ ≥ 1/2, let ρS (δ; R+) denote the zero crossing of ψS + (δ, ρ). In the proportionaldimensional framework (1.1) with ρ < ρS (δ; R+), Prob fk ARN+ = fk RN + → 1 as n → ∞. (1.9) (1.10) The threshold ρW (δ; Q) for Q = RN and I N , and ρS (δ; R+) are depicted in Fig. 1. + In contrast, the hypercube offers no phenomenon like (1.8). Theorem 1.4 (Zonotope Vertices) Let A be an n × N matrix with n < N . Then Proof of Theorem 1.4 fk(AI N ) obtains its maximum when A is in general position, and in this case Theorem 1.8 gives the exact value of fk(I N ) − fk(AI N ), a value which is strictly positive when n < N . 1.5 Exact Nonasymptotic Results We have so far emphasized the Vershik–Sporyshev proportional-dimensional asymptotic framework; this makes for the most natural comparisons between results for the three families of regular polytopes. However, for the positive orthant and hypercube, much more can be said than for the other two polytopes as there are simple exact expressions for finite N . Moreover, these expressions can be derived from two beautiful results in geometric probability, Wendel’s Theorem and Theorem 1.7. Theorem 1.5 (Wendel, [ 26 ]) Let M points in Rm be drawn i.i.d. from a centrosymmetric distribution such that the points are in general position. Then the probability that all the points fall in some half space is Pm,M = 2−M+1 m−1 =0 M − 1 . (1.12) Wendel’s elegant result is often known as simply a piece of recreational mathematics. Our original submission [ 15 ] obtained from it a simple proof of the following identity. Theorem 1.6 Let A be an n × N random matrix with centrosymmetric exchangeable columns in general position almost surely. Then In this revision, the result derives from Theorem 1.7.1 Theorem 1.7 (Winder [ 27 ]; Cover [ 9 ]) A set of M hyperplanes in general position in Rm, all passing through some common point, divides the space into 2M Pm,M regions. This shows that fk (AI N ) satisfies the same formula as E fk (ARN ), but without + the expectation. Theorem 1.8 Let A be an n × N matrix with columns in general position in Rn. Then fk (AI N ) fk (I N ) = 1 − PN −n,N −k . (1.14) Formula (1.14) coincides with Böröczky and Henk’s formula (1.6), [ 6 ]; but whereas (1.6) was proven for the case where A is a uniformly-distributed random orthoprojector, Theorem 1.8 holds for any A in general position. Theorem 1.8 is proven in Sect. 2.1. Theorem 1.6 is proven in Sect. 2.2, where it is derived from Theorem 1.8 by symmetrization. 1.6 Contents Proofs of the above results are given in Sect. 2. The hypercube is contrasted with the other regular polytopes in Sect. 3; the cone and hypercube are contrasted in Sect. 4, where we also present additional results for specially constructed matrices. These phenomena, described here from the viewpoint of combinatorial geometry, have surprising consequences in probability theory, information theory, and signal processing; see [ 11, 13, 16 ] and Sect. 5. 2 Proofs of Main Results We start with the key nonasymptotic exact identities (1.13) and (1.14) and then derive from (1.13) Theorems 1.2 and 1.3 by asymptotic analysis of the probabilities Pm,M . Throughout the paper we write N (A) for the nullspace of A. 2.1 Proof of Theorem 1.8 For convenience, in this section we let I N denote the hypercube [ −1, 1 ]N . Each k-face F of I N is a set of vectors with N − k particular coordinates taking fixed, 1This formula appears to have been derived by multiple authors independently at about the same time; in the discrete geometry literature, Winder’s paper is often cited [ 27 ]; in the Machine Learning and Information Theory literature, Cover’s paper [ 9 ] is typically cited instead; see Cover [ 9 ] for a history of early related results and of the method of proof dating back to Schläfli [ 22 ]. specific values, namely for each particular coordinate a specific choice from the endpoints {−1, 1}N−k applies for every member in the face. Within each face, the remaining k coordinate values may vary throughout the range [ −1, 1 ]k . Let Q be a polyhedron (polytope or polyhedral cone) in RN and x0 ∈ Q. The vector v is a feasible direction for Q at x0 if x0 + t v ∈ Q for all sufficiently small t > 0. Let Feasx0 (Q) denote the cone of all feasible directions for Q at x0. Lemma 2.1 Let F be a k-face of the polytope or polyhedral cone Q, and let x0 be a vector in relint(F ). For an n × N matrix A in general position, the following are equivalent: (Survive(A, F, Q)): AF is a k-face of AQ, (Transverse(A, x0, Q)): N (A) ∩ Feasx0 (Q) = {0}. A proof of Lemma 2.1 is given in [24, p. 329]. Each face F of I N can be identified with its centroid xF ; this is a vector in RN with k of its coordinates = 0 and N − k entries taking the value σF ∈ {−1, 1}N−k . We speak of supp(F ), the support of F ; it is the set of indices of coordinates which vary among members of the face and σF , the sign pattern of F as the common sign pattern of the coordinates which do not vary among members of the face and so are outside the support. Thus, for example, if F is the set of all vectors with −1 ≤ x(1), . . . , x(k) ≤ 1 and x(k + 1) = · · · = x(N ) = 1, then supp(F ) = {1, . . . , k} and σF = (1, . . . , 1). For each whole number m, let [m] := {1, . . . , m}. For an index set J ⊂ [N ] of cardinality k, let F (J ) denote the collection of all k-faces F with supp(F ) = J . There are of course 2N−k such faces; they differ in the choice of σF . Lemma 2.2 Let A be an n × N matrix with n < N whose columns are in general position in Rn. Then CardF ∈F([k]) Survive A, F, I N does not hold = 2N−kPN−n,N−k. Proof Let FeasxF (I N ) denote the cone of feasible directions for I N at xF . The collection of such cones associated to a common support is a cover of RN : (2.1) (2.2) F ∈F([k]) FeasxF I N = RN ; moreover, the terms appearing in the union have pairwise disjoint interiors. If F, G are distinct members of F ([k]), then int FeasxF I N ∩ int FeasxG I N = ∅; roughly speaking, the collection of feasible cones associated to F ([k]) forms a partition of the space RN . Define hyperplanes Hj = {x : x(j ) = 0}; the hyperplanes {Hj : j = k + 1, . . . , N }, where the index avoids the support set [k], also induce a partition of RN ; it is the same as the one induced by the above cones. Set now m = N − n; by general position, N (A) ∼= Rm. Set M = N − k and define Hj = Hk+j ∩ N (A), j = 1, . . . , M. Since N (A) is in general position, these are relative hyperplanes of N (A) ∼= Rm. Thus, up to linear isomorphism, {Hj : j = 1, . . . , M} is a collection of hyperplanes in general position in Rm; these hyperplanes intersect in the common point 0. Theorem 1.7, tells us that Rm is partitioned by M hyperplanes into 2M Pm,M regions. Correspondingly N (A) is partitioned into 2N−kPN−n,N−k regions. The relative interior of each such region in N (A) belongs to the interior of exactly one cone FeasxF (I N ) ⊂ RN (by (2.2) and (2.1)). That cone specifies exactly one k-face F for which (Transverse(A, xF , I N )) does not hold. Equivalently, (Survive(A, F, I N )) does not hold. Theorem 1.8 follows from Lemmas 2.1 and 2.2 by noting that the set of all k-faces of I N can be partitioned cleanly by specifying one of the Nk k-element subsets J ⊂ [N ], card(J ) = k, and then considering F (J ). In combinatorics one denotes by N the collection of different k-element subsets of [N ]. Thus we have the disjoint k union Fk I N = F (J ) : J ∈ N k · 2N−kPN−n,N−k. (2.3) 2.2 Proof of Theorem 1.6 In the original submission of this manuscript, we proved (1.13) using Wendel’s Theorem and then derived (1.14) from it, by an averaging argument. Prompted by a referee, in this revision, we go in the opposite direction, having first proved Theorem 1.8 using Theorem 1.7, we now derive (1.13) from Theorem 1.8 by symmetrization. We start with the following observation on the expected number of k-faces of RN : + = AveF Prob AF is a k-face of ARN+ . (2.4) Here AveF denotes “the arithmetic mean over all k-faces of ARN .” + In this section, it is convenient to let I N = [ 0, 1 ]N . This choice does not affect face counts. With this representation of I N , it becomes true that, the “lower k-faces” of I N are in one-one correspondence with the k-faces of RN . Namely, if, in the previous + subsection’s notation, F is a k-face of I N with σF > 0, then the cone pos(F ) is a k-face of RN . Adopt now the notational convention that within the proof, for a lower + face F of I N , F˜ = pos(F ) denotes the corresponding face of RN . We observe that + for a vector x0 with nonnegative coordinates all strictly less than 1, we have Here AveF denotes “the arithmetic mean over all k-faces of AI N .” Definition 2.3 (Centrosymmetric Exchangeable Columns) Let A be a random n by N matrix such that, for each signed permutation matrix Π and for every measurable set Ω , Prob{A ∈ Ω} = Prob{AΠ ∈ Ω}. Then we say that A has centrosymmetric exchangeable columns. Below we assume without loss of generality that A has centrosymmetric exchangeable columns. Then all k-faces of RN become statistically equivalent: + Prob AF is a k-face of ARN + = Prob AG is a k-face of ARN + for each distinct F , G in Fk(RN ); indeed, there is always a permutation Π for which + G is the image of F under Π : G = Π F , and the probabilities are Π -invariant. Then (2.4) becomes: let F be a fixed k-face of RN ; then + Similarly, all k-faces of I N become statistically equivalent; indeed, there is always a signed permutation Π for which G is the image of F under Π : G = Π F , and the probabilities are Π -invariant. Hence (2.6) becomes: let F be a fixed k-face of I N ; then Combining these displays with (1.14) implies (1.13). (2.5) (2.7) (2.8) 2.3 Some Generalities About Binomial Probabilities The probability Pm,M has a classical interpretation: it gives the probability of at most m − 1 heads in M − 1 tosses of a fair coin. The usual Normal approximation to the binomial tells us that Pm,M ≈ Φ (m − 1) − (M − 1)/2 √(M − 1)/4 , with Φ the usual standard normal distribution function Φ(x) = x e−y2/2 dy/√2π ; −∞ here the approximation symbol ≈ can be made precise using standard limit theorems, e.g., appropriate for small or large deviations. In this expression, the approximating normal has mean (M − 1)/2 and standard deviation √(M − 1)/4. There are three regimes of interest, for large m, M , and three behaviors for Pm,M . • Lower Tail: m M/2 − √M/4. Pm,M ≈ 0. • Middle: m ≈ M/2. Pm,M ∈ (0, 1). • Upper Tail: m M/2 + √M/4. Pm,M ≈ 1. 2.4 Proof of Theorem 1.2 Using the correspondence N − n ↔ m, N − k ↔ M , and the connection to Wendel’s theorem, we have three regimes of interest: • N − n (N − k)/2. • N − n ≈ (N − k)/2. • N − n (N − k)/2. In the proportional-dimensional framework, the above discussion translates into three separate regimes, and separate behaviors we expect to be true: • Case 1: ρ < ρW (δ; R+). PNn−n,Nn−kn → 0. • Case 2: ρ = ρW (δ; R • Case 3 ρ > ρW (δ; R++))..PPNNnn−−nn,N,Nnn−−kknn→∈ (10., 1). Case 2 is trivially true, but it has no role in the statement of Theorem 1.2. Cases 1 and 3 correspond exactly to the two parts of (1.7) that we must prove. To prove Cases 1 and 3, we need an upper bound deriving from standard largedeviations analysis of the lower tail of the binomial. Lemma 2.4 Let N − n < (N − k)/2. Then R PN−n,N−k ≤ n3/2 exp N ψW+ n k , N n , where the exponent is defined as R ψW+ (δ, ρ) := H (δ) + δH (ρ) − H (ρδ) − (1 − ρδ) log 2 (2.9) (2.10) with H (·) the Shannon Entropy (1.9), see Fig. 2. Proof Upperbounding the sum in PN−n,N−k by N − n − 1 times NN−−k −n1 , we arrive at PN−n,N−k ≤ 2N−k−1 n − k · (N − k + 1) N N − k n m We can bound γ ·m for γ < 1 using the Shannon entropy (1.9): n k N −1 k . (2.11) (2.12) c1n−1/2emH (γ ) ≤ γ m·m ≤ c2emH (γ ), 16 √2/π , c2 := 5/4√2π . Recalling the definition of ψWR+ , we obwhere c1 := 25 tain (2.9). We will now consider Cases 1 and 3, and prove the corresponding conclusion. Case 1: ρ < ρW (δ; R+). The threshold function ρW (δ; R+) is the location of the R lowest zero crossing ψW+ (δ, ρ) as a function of ρ for δ fixed; i.e., R ρW (δ; R+) = inf ρ : ψW+ (δ, ρ) ≥ 0 . R Thus, for any ρ strictly below ρW (δ; R+), the exponent ψW+ (δ, ρ) is strictly negative. Lemma 2.4 thus implies that PNn−n,Nn−kn → 0 as n → ∞. Case 3: ρ > ρW (δ; R+). Binomial probabilities have a standard symmetry (re label every “head” outcome as a “tail”, and vice versa). It follows that Pm,M = 1 − PM−m,M . We have PN−k,N−n = 1 − PN−k,n−k . In this case N − n > (N − k)/2, so Lemma 2.4 tells us that PN−k,n−k → 0 as n → ∞; we conclude PN−k,N−n → 1 as n → ∞. 2.5 Proof of Theorem 1.3 PN−n,N−k is the probability that one fixed k-dimensional face F of RN generates a + k-face AF of ARN . The probability that some k-dimensional face generates a k-face + can be upperbounded, using Boole’s inequality, by fk(RN+ ) · PN−n,N−k . From (2.12), (2.9), and fk(RN+ ) = Nk we have fk RN+ · PN−n,N−k ≤ n3/2 exp N ψSR+ (δn, ρn) , R Recall that for δ ≥ 1/2, ρS (δ; R+) is the location of the lowest zero crossing of ψS + as a function of ρ for δ fixed; i.e., R R ψS + (δ; R+) = inf ρ : ψS + (δ, ρ) ≥ 0 . R For any ρ < ρS (δ; R+), we have ψS + (δ, ρ) < 0, and as a result (1.11) follows. 3 Contrasting the Hypercube with Other Polytopes The theorems in Sect. 1 contrast strongly with existing results for other polytopes. 3.1 Nonexistence of Weak Thresholds at δ < 1/2 Theorem 1.1 identifies a region of the phase diagram (δ = Nn , ρ = Nk ) where the typical random zonotope has nearly as many k-faces as its generating hypercube; in particular, if n < N /2, it has many fewer k-faces than the hypercube, for every k. This behavior at δ = n/N < 1/2 is quite different from the behavior seen for random projections of the simplex and the cross-polytope at small δ. Those polytopes have fk (AQ) ≈ fk (Q) for quite a large range of k even at relatively small values of δ; [ 16 ] shows that we can have k ∼ n/(2 log(δ−1)) for small δ at both of those polytopes, while we could not have even k = 1 at the hypercube for such small δ; see also the visual evidence in Fig. 3. 3.2 Nonexistence of Strong Thresholds for Hypercube Lemma 1.4 shows that projected zonotopes always have strictly fewer k-faces than their generators fk (AI N ) < fk (I N ), for every n < N . This is again quite different from the situation with the simplex and the cross-polytope, where we can even have n N and still find k for which fk(AQ) = fk(Q), [ 16 ], roughly k ∼ n/(2e log(δ−1)); again see visual evidence in Fig. 3. 3.3 Universality of Weak Phase Transitions Theorem 1.1 holds for any A in general position. In proving weak and strong threshold results for the simplex and cross-polytope, we required A to either be a random ortho-projector or to have Gaussian i.i.d. entries. Thus, what we proved for those families of regular polytopes applies to a much more limited range of matrix ensembles than for hypercubes. 4 Contrasting the Cone with the Hypercube 4.1 Universality of Weak Phase Transitions For Theorem 1.2, A can be sampled from any ensemble of random matrices invariant under right multiplication by signed permutations. The result is thus universal across a wide class of matrix ensembles. In proving weak and strong threshold results for the simplex and cross-polytope, we required A to either be a random ortho-projector or to have Gaussian i.i.d. entries. Thus, what we proved for those families of regular polytopes applies to a much more limited range of matrix ensembles than for the orthant. Our empirical studies suggest that the same ensembles of matrices which “work” for the orthant weak threshold also “work” for the simplex and cross-polytope thresholds. It seems to us that the universality across matrix ensembles proven here may point to a much larger phenomenon, valid also for other polytope families; however, this universality class is far more restrictive than is the case for the hypercube. For our empirical studies, see [ 17 ]. The weak threshold for the orthant depends very much more delicately on details about A than do the hypercube thresholds; unlike fk(I N ), fk(ARN ) is not the same + number for all A in general position. It makes a substantial difference to the results if the matrix A is not “zero-mean.” 4.2 The Low-Frequency Partial Fourier Matrix Consider the special partial Fourier matrix made only of the n lowest frequency entries. Corollary 4.1 Assume n is odd and, for j = 1, 2, . . . , N , let Then Ωij = cos π(j−1)(i−1) , i = 1, 3, 5, . . . , n, N sin π(j−1)i , N i = 2, 4, 6, . . . , n − 1. fk ΩRN+ = fk RN+ , The result is a corollary of [18, Theorem 3, p. 56]. The key steps of the proof are included in an extended technical report [ 15 ]. This behavior is dramatically different than the case for random A of the type considered so far, and in some sense dramatically better. Corollary 4.1 is closely connected with the classical question of neighborliness. There are famous polytopes which can be generated by projections AT N−1 and have exactly as many k-faces as T N−1 for k ≤ n/2 . A standard example is provided by the matrix Ω defined in (4.1); it obeys fk(ΩT N−1) = fk(T N−1), 0 ≤ k ≤ n/2 . (There is a vast literature touching in some way on the phenomenon fk(ΩT N−1) = fk(T N−1). In that literature, the polytope ΩT N−1 is usually called a cyclic polytope, and the columns of Ω are called points of the trigonometric moment curve; see standard references [ 21, 28 ].) Hence the matrix Ω offers both fk(ΩT N−1) = fk(T N−1) and fk(ΩRN ) = + fk(RN+ ) for 0 ≤ k ≤ n/2 . This is exceptional. For random A of the type discussed in earlier sections, there is a large disparity between the sets of triples (k, n, N ) where fk(AT N−1) = fk(T N−1)—this happens for k/n < ρS (n/N ; T )— and those where fk(ARN ) = fk(RN )—this happens for k/n < ρS (n/N ; R+). These + + two strong thresholds are displayed in Figs. 3 and 1, respectively. Even if we relax our notion of agreement of face counts to weak agreement, the collections of triples where fk(AT N−1) ≈ fk(T N−1) and fk(ARN ) ≈ fk(RN ) are + + very different, because ρW (n/N ; T ) and ρW (n/N ; R+) are so dramatically different, particularly at n < N /2. 4.3 Adjoining a Row of Ones to A An important feature of the random matrices A studied earlier is orthant symmetry. In particular, the positive orthant plays no distinguished role with respect to these matrices. On the other hand, the partial Fourier matrix Ω constructed in the last subsection contains a row of ones, and thus the positive orthant has a distinguished role to play for this matrix. Moreover, this distinction is crucial; we find empirically that removing the row of ones from Ω causes the conclusion of Corollary 4.1 to fail drastically. Conversely, consider the matrix A˜ obtained by adjoining a row of N ones to some matrix A: Adding this row of ones to a random matrix causes a drastic shift in the strong and weak thresholds. Theorem 4.1 Consider the proportional-dimensional asymptotic framework with parameters δ, ρ in (0, 1). Let a random (n − 1) × N matrix A have i.i.d. standard normal entries. Let A˜ denote the corresponding n × N matrix whose first row is all ones and whose remaining rows are identical to those of A. Then n→∞ lim E fk(A˜RN ) fk(RN +) = + 1, ρ < ρW (δ, T ), <1, ρ > ρW (δ, T ); (4.2) lim P fk A˜RN n→∞ + = fk RN + = 1, ρ < ρS (δ, T ), 0, ρ > ρS (δ, T ). (4.3) Note particularly the mixed form of this relationship. Although the conclusions concern the behavior of faces of the randomly-projected orthant, the thresholds are those that were previously obtained for the randomly-projected simplex. Since there is such a dramatic difference between ρ(δ; T ) and ρ(δ; R+), the single row of ones can fairly be said to have a huge effect. In particular, the region “below” the simplex weak phase transition ρW (δ; T ) comprises ≈0.5634 of the (δ, ρ) parameter area, and the hypercube weak phase transition ρW (δ; I ) comprises 1 − log 2 ≈ 0.3069. Theorem 4.1 is an immediate consequence of the following identity. Lemma 4.2 Suppose that the row vector 1 is not in the row span of A. Then fk A˜RN + = fk−1 AT N−1 , 0 < k < n. Proof We observe that there is a natural bijection between k-faces of RN and the + (k − 1)-faces of T N−1. The (k − 1)-faces of T N−1 are in bijection with the corresponding support sets of cardinality k: i.e., we can identify with each k-face F the union I of all supports of all members of the face. Similarly, to each support set I of cardinality k, there is a unique k-face F˜ of RN consisting of all points in RN+ whose + support lies in I . Composing bijections F ↔ I ↔ F˜ , we have the bijection F ↔ F˜ . Concretely, let x0 be a point in the relative interior of some (k − 1)-face F of T N−1. Then x0 has k nonzeros. x0 is also in the relative interior of the k-face F˜ of RN+ Conversely, let y0 be a point in the relative interior of some k-face of RN ; then x0 = (1 y0)−1y0 is a point in the relative interior of a (k − 1)-face of T N−1. + The last two paragraphs show that for each pair of corresponding faces (F , F˜ ), we may find a point x0 in both the relative interior of F˜ and also of the relative interior of F . For such x0, Clearly N (A˜) ∩ lin(x0) = {0}, because 1 x0 > 0. We conclude that the following are equivalent: Transverse A, x0, T N−1 : Transverse A˜, x0, RN : + N (A) ∩ Feasx0 T N−1 = {0}. N (A˜) ∩ Feasx0 RN+ = {0}. Rephrasing [ 13 ], the following are equivalent for x0 a point in the relative interior of F : Survive A, F , T N−1 : Transverse A, x0, T N−1 : AF is a (k − 1)-face of AT N−1, N (A) ∩ Feasx0 T N−1 = {0}. We conclude that for two corresponding faces F , F˜ , the following are equivalent: Survive A, F, T N−1 : Survive A˜, F˜ , RN : + Combining this with the natural bijection F ↔ F˜ , the lemma is proved. 5 Application: Compressed Sensing Our face counting results can all be reinterpreted as statements about “simple” solutions of underdetermined systems of linear equations. This reinterpretation allows us to make connections with numerous problems of current interest in signal processing, information theory, and probability. The reinterpretation follows from the two following lemmas, which are restatements of Lemma 2.1 for Q = RN and Q = I N , + rephrasing the notion of (Transverse(A, x0, Q)) with the all but linguistically equivalent (Unique(A, x0, Q)). Lemma 5.1 Let x0 be a vector in RN with exactly k nonzeros. Let F denote the + associated k-face of RN . For an n × N matrix A, let AF denote the image of F + under A and b0 = Ax0 the image of x0 under A. The following are equivalent: Survive A, F, RN : + Unique A, x0, RN : + AF is a k-face of ARN , + The system b0 = Ax has a unique solution in RN . + Lemma 5.2 Let x0 be a vector in I N with exactly k entries strictly between the bounds {0, 1}. Let F denote the associated k-face of I N . For an n × N matrix A, let AF denote the image of F under A and b0 = Ax0 the image of x0 under A. The following are equivalent: Survive A, F, I N : AF is a k-face of AI N , Unique A, x0, I N : The system b0 = Ax has a unique solution in I N . Note that the systems of linear equations referred to in these lemmas are underdetermined: n < N . Hence these lemmas identify conditions on underdetermined systems of linear equations such that, when the solutions are known to obey certain constraints, such seemingly weak prior knowledge in fact uniquely determines the solution. The first result can be paraphrased as saying that nonnegativity constraints can be very powerful if the object is known to have relatively few nonzeros; the second result says that upper and lower bounds can be very powerful, provided that many of those bounds are binding. 5.1 Reconstruction Exploiting Nonnegativity Constraints We wish to reconstruct the unknown x, knowing only the linear measurements b = Ax, the matrix A, and the constraint x ∈ RN+ . Let J (x) be some function of x. Consider the positivity-constrained variational problem (PosJ ) min J (x) subject to b = Ax, x ∈ RN+ . Let posJ (b, A) denote any solution of the problem instance (PosJ ) defined by data b and matrix A. We conclude the following: Corollary 5.1 Suppose that Let x0 ≥ 0 and x0 0 ≤ k. For the problem instance defined by b = Ax0, fk ARN+ = fk RN . + posJ (b, A) = x0. In words: under the given conditions on the face numbers, any variational prescription which imposes nonnegativity constraints will correctly recover the k-sparse solution in any problem instance where such a k-sparse solution exists. Corresponding to this “strong” statement is a “weak” statement. Consider the following probability measure on k-sparse problem instances. • Choose a random subset L of size k from {1, . . . , N }, by k simple random draws without replacement. • Set the entries of x0 not in the selected subset to zero. • Choose the entries of x0 in the selected set L from some fixed joint distribution ψL supported in (0, 1)k . • Generate the problem instance b = Ax0. We speak of drawing a k-sparse random problem instance at random. Corollary 5.2 Suppose that for some For (b, A) a problem instance drawn at random, as above: fk ARN+ ≥ (1 − ) · fk RN . + Prob posJ (b, A) = x0 ≥ (1 − ). In words: under the given conditions on the face lattice, any variational prescription which imposes nonnegativity constraints will correctly succeed to recover the k-sparse solution in at least a fraction (1 − ) of all k-sparse problem instances. For more discussion, including potential applications, see [ 7, 13, 18, 19 ]. 5.2 Reconstruction Exploiting Box Constraints Consider again the problem of reconstruction from measurements b = Ax, but this time assuming that the object x obeys box-constraints: 0 ≤ x(j ) ≤ 1, 1 ≤ j ≤ N . Define the box-constrained variational problem (BoxJ ) min J (x) subject to b = Ax, 0 ≤ x(j ) ≤ 1, j = 1, . . . , N . Let boxJ (b, A) denote any solution of the problem instance (BoxJ ) defined by data b and matrix A. In this setting, the notion corresponding to “sparse” is “simple”. We say that a vector x is k-simple if at most k of its entries differ from the bounds {0, 1}. Consider the following probability measure on problem instances having k-simple solutions. Recall that k-simple vectors have all entries equal to 0 or 1 except at k exceptional locations. • Choose the subset L of k exceptional entries uniformly at random from the set {1, . . . , N } without replacement. • Choose the nonexceptional entries to be either 0 or 1 based on tossing a fair coin. • Choose the values of the exceptional k entries according to a joint probability measure ψL supported in (0, 1)k . • Define the problem instance b = Ax0. Corollary 5.3 Suppose that for some Randomly sample a problem instance (b, A) using the method just described. Then In words: under the given conditions on the face lattice, any variational prescription which imposes box constraints will correctly recover at least a fraction (1 − ) of all underdetermined systems generated by the matrix A which have k-simple solutions. In the hypercube case there is no phenomenon comparable to that which arose in the positive orthant with the special constructions Ω and A˜ ; fk(AI N ) is a fixed number if A is in general position and decreases if A is not in general position. Consequently, the hypercube weak threshold is the best general result on the ability to undersample by exploiting box constraints. In particular, the difference between the weak simplex threshold and the weak hypercube threshold has this interpretation: A given degree k of sparsity of a nonnegative object is much more powerful than that same degree of simplicity of a box-constrained object. Specifically, we should not expect to be able to undersample a typical boxconstrained object by more than a factor of 2 and then reconstruct it using some garden-variety variational prescription. In comparison, the last section showed that we can severely undersample very sparse nonnegative objects. Moreover, when n < N , there is no region where fk(AI N ) = fk(I N ), and consequently box constraints are never enough to ensure boxJ (b, A) = x0 for all k-simple problem instances. Acknowledgements Art Owen suggested that we pay attention to Wendel’s Theorem. We also thank Goodman, Pollack, and Schneider for providing scholarly background and the anonymous referees for the most useful and instructive referee reports we have ever seen. 1. Adamczak , R. , Litvak , A.E. , Pajor , A. , Tomczak-Jaegermann , N.: Restricted isometry property of matrices with independent columns and neighborly polytopes by random sampling . arXiv:0904.4723v1 ( 2009 ) 2. Affentranger , F. , Schneider , R. : Random projections of regular simplices . Discrete Comput. Geom . 7 ( 3 ), 219 - 226 ( 1992 ) 3. Baryshnikov , Y.M.: Gaussian samplesm, regular simplices, and exchangeability . Discrete Comput. Geom . 17 ( 3 ), 257 - 261 ( 1997 ) 4. Björner , A. , Las Vergns , M. , Sturmfels , B. , White , N. , Ziegler , G.: Oriented Matroids . Encyclopedia of Mathematics and Its Applications , vol. 46 . Cambridge University Press, Cambridge ( 1999 ) 5. Bolker , E.D.: A class of convex bodies . Trans. Am. Math. Soc . 145 , 323 - 345 ( 1969 ) 6. Böröczky , K. , Jr. , Henk , M. : Random projections of regular polytopes . Arch. Math. (Basel) 73 ( 6 ), 465 - 473 ( 1999 ) 7. Bruckstein , A.M. , Elad , M. , Zibulevsky , M. : On the uniqueness of non-negative sparse and redundant representations . In: ICASSP 2008 . Special session on Compressed Sensing, Las Vegas , Nevada ( 2008 ) 8. Buchta , C. : On nonnegative solutions of random systems of linear inequalities . Discrete Comput. Geom . 2 ( 1 ), 85 - 95 ( 1987 ) 9. Cover , T.M. : Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition . IEEE Trans. Electron. Comput. EC-14(3) , 326 - 334 ( 1965 ) 10. Donoho , D.L. : High-dimensional centrally-symmetric polytopes with neighborliness proportional to dimension . Discrete Comput. Geom . 35 ( 4 ), 617 - 652 ( 2006 ) 11. Donoho , D.L. : Neighborly polytopes and sparse solutions of underdetermined linear equations . Technical Report , Stanford University ( 2006 ) 12. Donoho , D.L. , Tanner , J.: Neighborliness of randomly-projected simplices in high dimensions . Proc. Natl. Acad. Sci. USA 102 ( 27 ), 9452 - 9457 ( 2005 ) 13. Donoho , D.L. , Tanner , J.: Sparse nonnegative solutions of underdetermined linear equations by linear programming . Proc. Natl. Acad. Sci. USA 102 ( 27 ), 9446 - 9451 ( 2005 ) 14. Donoho , D.L. , Tanner , J.: Exponential bounds implying construction of neighborly polytopes, errorcorrecting codes and compressed sensing matrices by random sampling . Preprint ( 2007 ) 15. Donoho , D.L. , Tanner , J.: Counting the faces of randomly-projected hypercubes and orthants, with applications . arXiv:0807.3590v1 ( 2008 ) 16. Donoho , D.L. , Tanner , J.: Counting faces of randomly-projected polytopes when the projection radically lowers dimension . J. AMS 22 ( 1 ), 1 - 53 ( 2009 ) 17. Donoho , D.L. , Tanner , J.: Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing . Philos. Trans. A ( 2009 ) 18. Donoho , D.L. , Johnstone , I.M. , Hoch , J.C. , Stern , A.S.: Maximum entropy and the nearly black object . J. R. Stat. Soc. , Ser . B (Methodological) 54 ( 1 ), 41 - 81 ( 1992 ) 19. Fuchs , J.-J.: On sparse representations in arbitrary redundant bases . IEEE Trans. Inf. Theory 50 ( 6 ), 1341 - 1344 ( 2004 ) 20. Goodey , P. , Weil , W. : Zonoids and generalisations . In: Handbook of Convex Geometry, vols. A, B, pp. 1297 - 1326 . North-Holland, Amsterdam ( 1993 ) 21. Grünbaum , B. : Convex Polytopes, 2nd edn . Graduate Texts in Mathematics, vol. 221 . Springer, New York ( 2003 ). Prepared and with a preface by Volker Kaibel, Victor Klee , and Günter M. Ziegler 22. Schläfli , L. : In: Gesammelte Mathematische Abhandlungen, vol. 1 , pp. 209 - 212 . Birkhäuser, Basel ( 1950 ) 23. Schneider , R. , Weil , W. : Zonoids and related topics . In: Convexity and Its Applications , pp. 296 - 317 . Birkhäuser, Basel ( 1983 ) 24. Schneider , R. , Weil , W. : Stochastic and Integral Geometry . Springer, Berlin ( 2008 ) 25. Vershik , A.M. , Sporyshev , P.V. : Asymptotic behavior of the number of faces of random polyhedra and the neighborliness problem . Sel. Math. Sov . 11 ( 2 ), 181 - 201 ( 1992 ) 26. Wendel , J.G. : A problem in geometric probability . Math. Scand . 11 , 109 - 111 ( 1962 ) 27. Winder , R.O. : Partitions of n-space by hyperplanes . SIAM J. Appl. Math . 14 ( 4 ), 811 - 818 ( 1966 ) 28. Ziegler , G.M. : Lectures on Polytopes. Graduate Texts in Mathematics, vol. 152 . Springer, New York ( 1995 ) 29. Zong , C. : What is known about unit cubes . Bull. AMS 42 ( 2 ), 181 - 211 ( 2005 )


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs00454-009-9221-z.pdf

David L. Donoho, Jared Tanner. Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications, Discrete & Computational Geometry, 2009, 522-541, DOI: 10.1007/s00454-009-9221-z