Counting the Faces of RandomlyProjected Hypercubes and Orthants, with Applications
Discrete Comput Geom
Counting the Faces of RandomlyProjected Hypercubes and Orthants, with Applications
David L. Donoho 0 1 2
Jared Tanner 0 1 2
0 J. Tanner ( ) School of Mathematics, University of Edinburgh , Edinburgh , UK
1 D.L. Donoho Department of Statistics, Stanford University , Stanford, CA , USA
2 for hosting the programme “Statistical Challenges of High Dimensional Data” in 2008 and Professor D.M. Titterington for organizing this programme. D.L. Donoho acknowledges support from NSF DMS 0505303 and a Rothschild Visiting Professorship at the University of Cambridge
Let A be an n × N realvalued matrix with n < N ; we count the number of kfaces fk (AQ) when Q is either the standard N dimensional hypercube I N or else the positive orthant RN . To state results simply, consider a proportionalgrowth + asymptotic, where for fixed δ, ρ in (0, 1), we have a sequence of matrices An,Nn and of integers kn with n/Nn → δ and kn/n → ρ as n → ∞. If each matrix An,Nn has its columns in general position, then fk (AI N )/fk (I N ) tends to zero or one depending on whether ρ > min(0, 2 − δ−1) or ρ < min(0, 2 − δ−1). Also, if each An,Nn is a random draw from a distribution which is invariant under right multiplication by signed permutations, then fk (ARN+ )/fk (RN ) tends almost surely to zero or one de+ pending on whether ρ > min(0, 2 − δ−1) or ρ < min(0, 2 − δ−1). We make a variety of contrasts to related work on projections of the simplex and/or crosspolytope. These geometric facecounting results have implications for signal processing, information theory, inverse problems, and optimization. Indeed, face counting is related to conditions for uniqueness of solutions of underdetermined systems of linear equations. Below, let A be a fixed n × N matrix, n < N , with columns in general position.

J. Tanner acknowledges support from the Alfred P. Sloan Foundation and thanks John E. and Marva
M. Warnock for their generous support in the form of an endowed chair.
(a) Call a vector in RN+ ksparse if it has at most k nonzeros. For such a ksparse
vector x0, b = Ax0 generates an underdetermined system b = Ax having ksparse
solution. Among inequalityconstrained systems Ax = b, x ≥ 0, having ksparse
solutions, the fraction having a unique nonnegative solution is fk(ARN+ )/fk(RN ).
+
(b) Call a vector in the hypercube I N ksimple if all entries except at most k are at
the bounds 0 or 1. For such a ksimple vector x0, b = Ax0 generates an
underdetermined system b = Ax with ksimple solution. Among inequalityconstrained
systems Ax = b, x ∈ I N , having ksimple solutions, the fraction having a unique
hypercubeconstrained solution is fk(AI N )/fk(I N ).
1 Introduction
There are three fundamental regular polytopes in RN , N ≥ 5: the hypercube I N ,
the crosspolytope CN , and the simplex T N−1. For each of these, projecting the
vertices into Rn, n < N , yields the vertices of a new polytope; in fact, up to
translation and dilation, every polytope in Rn is obtained by rotating the simplex T N−1
and orthogonally projecting on the first n coordinates, for some choice of N and of
N dimensional rotation. Similarly, every centrosymmetric polytope can be generated
by projecting the crosspolytope, and every zonotope by projecting the hypercube.
1.1 Random Polytopes
Choosing the projection A at random has become popular. Let A be a random
orthogonal projection obtained by first applying a uniformlydistributed rotation to RN and
then projecting on the first n coordinates. Let Q be a polytope in RN . Then AQ is a
random polytope in Rn. Taking Q in turn from each of the three families of regular
polytopes, we get three arenas for scholarly study:
• Random polytopes of the form AT N−1 were first studied by Affentranger and
Schneider [
2
] and by Vershik and Sporyshev [
25
];
• Random polytopes of the form ACN were first studied extensively by Börözcky
and Henk [
6
];
• The random zonotope AI N was studied in passing in [
6
] and will be heavily
studied in this paper; a literature on zonotopes can be found in [
3, 5, 20, 23, 29
].
Starting with [
2, 25
], interest has focused on the number fk(AQ) of kfaces of
such random polytopes AQ; in those papers, fundamental formulas were developed
for the expected values E fk(AQ). Deriving insights from these formulae in the
highdimensional case has also been an important theme; Böröczky and Henk [6] studied
the expected number fk(AQ) for each of these families of random polytopes,
focusing on the asymptotic framework where the small dimension n is held fixed while the
large dimension N → ∞ (this was previously done for Q = T N−1 in [
2
]).
Vershik and Sporyshev [25] studied the case AT N−1 in an asymptotic framework
with the dimensions N and n both proportionally large, and observed a
phenomenon of sharp thresholds: random polytopes can have face lattices undergoing abrupt
changes in properties as dimensions change relatively slightly. Our own previous
work considered both AT N−1 and ACN [
10, 12, 14, 16
] and gave precise
information about several such threshold phenomena.
To make precise the notion of “threshold phenomenon,” consider the following
proportionaldimensional asymptotic framework. A dimension specifier is a triple of
integers (k, n, N ), representing a “face” dimension k, a “small” dimension n, and
a “large” dimension N ; k < n < N . For fixed δ, ρ ∈ (0, 1), consider sequences of
dimension specifiers, indexed by n, and obeying
kn/n → ρ
and
n/Nn → δ as n → ∞.
(1.1)
For such sequences, the small dimension n is held proportional to the large dimension
N as both dimensions grow. We omit subscripts on kn and Nn when possible. For
Q = T N−1, CN , the papers [
10, 12, 14, 16
] exhibited thresholds ρW (δ; Q) for the
ratio between the expected number of faces of the lowdimensional polytope AQ and
the number of faces of the highdimensional polytope Q:
fk(AI N )
nl→im∞ fk(I N ) =
1, ρ < ρW (δ, I ),
0, ρ > ρW (δ, I ).
Missing from the above picture is information about the third family of regular
polytopes, the hypercube. Böröczky and Henk [
6
] mentioned in passing the case of the
projected hypercube, in the case of A a random orthogonal projection. Böröczky
and Henk largely worked in the asymptotic framework n fixed and N → ∞. In that
framework the threshold phenomenon of interest here is not visible. In this paper, we
adopt the proportionaldimensional framework (1.1) and prove the following.
Theorem 1.1 (“Weak” Threshold for Hypercube) Define
ρW (δ; I ) := max 0, 2 − δ−1 ,
0 < δ < 1.
For ρ, δ in (0, 1), consider a sequence of dimension specifiers (k, n, N ) obeying
(1.1). Consider a sequence of realvalued n × N matrices A = An,N , each one with
columns in general position in Rn. Then
lim E fk(AQ)
n→∞ fk(Q)
=1, ρ < ρW (δ; Q),
<1, ρ > ρW (δ; Q).
(In this relation, we take a limit as n → ∞ along some sequence obeying the
proportionaldimensional constraint (1.1).) In words, the random object AQ has
roughly as many kfaces as its generator Q for k below a threshold and has noticeably
fewer kfaces than Q for k above the threshold. The threshold functions are defined
in terms of Gaussian integrals and other special functions, and can be calculated
numerically.
1.2 Random Zonotopes
(1.2)
(1.3)
(1.4)
Remarks
• Use of the modifier “weak” and the subscript W on ρ matches corresponding usage
with T N−1 and CN .
• The result shows a sharp discontinuity in the behavior of the face lattices of random
zonotopes; the location of the threshold is precisely identified. Such discontinuity
is also observed empirically for the other two polytopes (1.2) above; to our
knowledge, a proof of discontinuity has not yet been published in that setting.
• The result is universal across matrices; only general position is required.
Universality of threshold effects across a range of matrix ensembles has been observed
empirically for the other two regular polytopes [
17
]. However, theoretical results
[
1
] for other polytopes do not yet match empirical facts. This result gives a
rigorous universality result for one family of regular polytopes; this may inspire studies
to see if parallel results exist for the others.
We briefly discuss the ideas leading to this result. Böröczky and Henk [
6
] applied
a fundamental identity of Affentranger and Schneider [
2
] on general projected
polytopes and gave the explicit expression
(1.5)
(1.6)
E fk AI N
,
valid where A is a uniformlydistributed random orthoprojector. In a previous version
of this manuscript [
15
], the authors proved that the same formula holds much more
generally, in fact under the assumption that A has an orthantsymmetric nullspace in
general position. One of our referees pointed out that even more is true: for any A in
general position, fk(AI N ) is the fixed number
fk AI N
= 2
N
k
N−k−1
=N−n
N − k − 1
.
This fact follows from Theorem 1.7, [
27
], on partions of nspace by hyperplanes,
as we show below in Sect. 2.1. (1.6) appears to be known to workers on oriented
matroids ([4, p. 220]) but may not seem evident to workers on convex polytopes The
recent survey article What is known about unit cubes states that “no good bound for. . .
[fk(AI N )] is known,” [
29
]. However, see [21, p. 410a].
1.3 Random Cones
Convex cones provide another family of fundamental polyhedral sets. Amongst these,
the simplest and most natural is surely the positive orthant P = RN . The image K =
AP of a cone under projection A: RN → Rn is again a cone. Su+ch a cone may be
expected to have f0(K) = 1 vertex (at 0), and as many as f1(K) = N extreme rays,
etc. In fact, every pointed cone in Rn can be generated as a (nonorthogonal) projection
of the positive orthant under an appropriate projection from an appropriate RN .
There seems to be relatively little prior research on random projections of the
positive orthant, except for the special case k = n, which was studied by Buchta [
8
].
As with the polytope models, surprising threshold phenomena can arise when the
projector is random, and we work in the proportionaldimensional framework. The
following result makes use of the notion of a random matrix with centrosymmetric
exchangeable columns; for details, see Sect. 2.2 below.
Theorem 1.2 (“Weak” Threshold for Orthant) Let A be a random matrix with
centrosymmetric exchangeable columns which are in general position almost surely. In the
proportionaldimensional framework (1.1), we have
n→∞
lim E fk(ARN )
fk(RN +) =
+
1, ρ < ρW (δ; R+)
0, ρ > ρW (δ; R+)
(1.7)
with ρW (δ; R+) ≡ ρW (δ; I ) as defined in (1.3).
Here the threshold for the orthant is at precisely the same place as it was for the
hypercube.
1.4 Exact Equality in the Number of Faces
Our focus in Sects. 1.1–1.3 was on “weak” agreement of E fk(AQ) with fk(Q); in
the proportionaldimensional framework, for ρ below threshold ρW (δ; Q), we have
limiting relative equality:
We now focus on the “strong” agreement; it turns out that in the proportional
dimensional framework, for ρ below a somewhat lower threshold ρS (δ; Q), we actually
have exact equality with overwhelming probability:
Prob fk(Q) = fk(AQ)
→ 1,
n → ∞.
(1.8)
The existence of such “strong” thresholds for Q = T N−1 and Q = CN was proven
in [
10, 12
], which exhibited thresholds ρS (δ; Q) below which (1.8) occurs. These
“strong thresholds” and the previously mentioned “weak thresholds” (1.2) are
depicted in Fig. 3. A similar strong threshold exists for the projected orthant.
Theorem 1.3 (“Strong” Threshold for Orthant) Let
H (γ ) := γ log(1/γ ) − (1 − γ ) log(1 − γ )
denote the usual (basee) Shannon Entropy. Let
R
ψS + (δ, ρ) := H (δ) + δH (ρ) − (1 − ρδ) log 2.
R
For δ ≥ 1/2, let ρS (δ; R+) denote the zero crossing of ψS + (δ, ρ). In the
proportionaldimensional framework (1.1) with ρ < ρS (δ; R+),
Prob fk ARN+ = fk RN
+
→ 1 as n → ∞.
(1.9)
(1.10)
The threshold ρW (δ; Q) for Q = RN and I N , and ρS (δ; R+) are depicted in Fig. 1.
+
In contrast, the hypercube offers no phenomenon like (1.8).
Theorem 1.4 (Zonotope Vertices) Let A be an n × N matrix with n < N . Then
Proof of Theorem 1.4 fk(AI N ) obtains its maximum when A is in general position,
and in this case Theorem 1.8 gives the exact value of fk(I N ) − fk(AI N ), a value
which is strictly positive when n < N .
1.5 Exact Nonasymptotic Results
We have so far emphasized the Vershik–Sporyshev proportionaldimensional
asymptotic framework; this makes for the most natural comparisons between results for the
three families of regular polytopes. However, for the positive orthant and hypercube,
much more can be said than for the other two polytopes as there are simple exact
expressions for finite N . Moreover, these expressions can be derived from two beautiful
results in geometric probability, Wendel’s Theorem and Theorem 1.7.
Theorem 1.5 (Wendel, [
26
]) Let M points in Rm be drawn i.i.d. from a
centrosymmetric distribution such that the points are in general position. Then the
probability that all the points fall in some half space is
Pm,M = 2−M+1
m−1
=0
M − 1
.
(1.12)
Wendel’s elegant result is often known as simply a piece of recreational
mathematics. Our original submission [
15
] obtained from it a simple proof of the following
identity.
Theorem 1.6 Let A be an n × N random matrix with centrosymmetric exchangeable
columns in general position almost surely. Then
In this revision, the result derives from Theorem 1.7.1
Theorem 1.7 (Winder [
27
]; Cover [
9
]) A set of M hyperplanes in general position
in Rm, all passing through some common point, divides the space into 2M Pm,M
regions.
This shows that fk (AI N ) satisfies the same formula as E fk (ARN ), but without
+
the expectation.
Theorem 1.8 Let A be an n × N matrix with columns in general position in Rn.
Then
fk (AI N )
fk (I N )
= 1 − PN −n,N −k .
(1.14)
Formula (1.14) coincides with Böröczky and Henk’s formula (1.6), [
6
]; but
whereas (1.6) was proven for the case where A is a uniformlydistributed random
orthoprojector, Theorem 1.8 holds for any A in general position. Theorem 1.8 is proven
in Sect. 2.1. Theorem 1.6 is proven in Sect. 2.2, where it is derived from Theorem 1.8
by symmetrization.
1.6 Contents
Proofs of the above results are given in Sect. 2. The hypercube is contrasted with the
other regular polytopes in Sect. 3; the cone and hypercube are contrasted in Sect. 4,
where we also present additional results for specially constructed matrices. These
phenomena, described here from the viewpoint of combinatorial geometry, have
surprising consequences in probability theory, information theory, and signal processing;
see [
11, 13, 16
] and Sect. 5.
2 Proofs of Main Results
We start with the key nonasymptotic exact identities (1.13) and (1.14) and then derive
from (1.13) Theorems 1.2 and 1.3 by asymptotic analysis of the probabilities Pm,M .
Throughout the paper we write N (A) for the nullspace of A.
2.1 Proof of Theorem 1.8
For convenience, in this section we let I N denote the hypercube [
−1, 1
]N . Each
kface F of I N is a set of vectors with N − k particular coordinates taking fixed,
1This formula appears to have been derived by multiple authors independently at about the same time; in
the discrete geometry literature, Winder’s paper is often cited [
27
]; in the Machine Learning and
Information Theory literature, Cover’s paper [
9
] is typically cited instead; see Cover [
9
] for a history of early
related results and of the method of proof dating back to Schläfli [
22
].
specific values, namely for each particular coordinate a specific choice from the
endpoints {−1, 1}N−k applies for every member in the face. Within each face, the
remaining k coordinate values may vary throughout the range [
−1, 1
]k .
Let Q be a polyhedron (polytope or polyhedral cone) in RN and x0 ∈ Q. The
vector v is a feasible direction for Q at x0 if x0 + t v ∈ Q for all sufficiently small
t > 0. Let Feasx0 (Q) denote the cone of all feasible directions for Q at x0.
Lemma 2.1 Let F be a kface of the polytope or polyhedral cone Q, and let x0 be
a vector in relint(F ). For an n × N matrix A in general position, the following are
equivalent:
(Survive(A, F, Q)):
AF is a kface of AQ,
(Transverse(A, x0, Q)):
N (A) ∩ Feasx0 (Q) = {0}.
A proof of Lemma 2.1 is given in [24, p. 329].
Each face F of I N can be identified with its centroid xF ; this is a vector in RN
with k of its coordinates = 0 and N − k entries taking the value σF ∈ {−1, 1}N−k .
We speak of supp(F ), the support of F ; it is the set of indices of coordinates which
vary among members of the face and σF , the sign pattern of F as the common sign
pattern of the coordinates which do not vary among members of the face and so
are outside the support. Thus, for example, if F is the set of all vectors with −1 ≤
x(1), . . . , x(k) ≤ 1 and x(k + 1) = · · · = x(N ) = 1, then supp(F ) = {1, . . . , k} and
σF = (1, . . . , 1).
For each whole number m, let [m] := {1, . . . , m}. For an index set J ⊂ [N ] of
cardinality k, let F (J ) denote the collection of all kfaces F with supp(F ) = J .
There are of course 2N−k such faces; they differ in the choice of σF .
Lemma 2.2 Let A be an n × N matrix with n < N whose columns are in general
position in Rn. Then
CardF ∈F([k]) Survive A, F, I N
does not hold = 2N−kPN−n,N−k.
Proof Let FeasxF (I N ) denote the cone of feasible directions for I N at xF . The
collection of such cones associated to a common support is a cover of RN :
(2.1)
(2.2)
F ∈F([k])
FeasxF I N
= RN ;
moreover, the terms appearing in the union have pairwise disjoint interiors. If F, G
are distinct members of F ([k]), then
int FeasxF I N
∩ int FeasxG I N
= ∅;
roughly speaking, the collection of feasible cones associated to F ([k]) forms a
partition of the space RN .
Define hyperplanes Hj = {x : x(j ) = 0}; the hyperplanes {Hj : j = k + 1, . . . , N },
where the index avoids the support set [k], also induce a partition of RN ; it is the same
as the one induced by the above cones.
Set now m = N − n; by general position, N (A) ∼= Rm. Set M = N − k and define
Hj = Hk+j ∩ N (A),
j = 1, . . . , M.
Since N (A) is in general position, these are relative hyperplanes of N (A) ∼= Rm.
Thus, up to linear isomorphism, {Hj : j = 1, . . . , M} is a collection of hyperplanes
in general position in Rm; these hyperplanes intersect in the common point 0.
Theorem 1.7, tells us that Rm is partitioned by M hyperplanes into 2M Pm,M regions.
Correspondingly N (A) is partitioned into 2N−kPN−n,N−k regions. The relative
interior of each such region in N (A) belongs to the interior of exactly one cone
FeasxF (I N ) ⊂ RN (by (2.2) and (2.1)). That cone specifies exactly one kface F
for which (Transverse(A, xF , I N )) does not hold. Equivalently, (Survive(A, F, I N ))
does not hold.
Theorem 1.8 follows from Lemmas 2.1 and 2.2 by noting that the set of all kfaces
of I N can be partitioned cleanly by specifying one of the Nk kelement subsets
J ⊂ [N ], card(J ) = k, and then considering F (J ). In combinatorics one denotes by
N the collection of different kelement subsets of [N ]. Thus we have the disjoint
k
union
Fk I N
=
F (J ) : J ∈
N
k
· 2N−kPN−n,N−k.
(2.3)
2.2 Proof of Theorem 1.6
In the original submission of this manuscript, we proved (1.13) using Wendel’s
Theorem and then derived (1.14) from it, by an averaging argument. Prompted by a
referee, in this revision, we go in the opposite direction, having first proved Theorem 1.8
using Theorem 1.7, we now derive (1.13) from Theorem 1.8 by symmetrization.
We start with the following observation on the expected number of kfaces of RN :
+
= AveF Prob AF is a kface of ARN+ .
(2.4)
Here AveF denotes “the arithmetic mean over all kfaces of ARN .”
+
In this section, it is convenient to let I N = [
0, 1
]N . This choice does not affect face
counts. With this representation of I N , it becomes true that, the “lower kfaces” of
I N are in oneone correspondence with the kfaces of RN . Namely, if, in the previous
+
subsection’s notation, F is a kface of I N with σF > 0, then the cone pos(F ) is a
kface of RN . Adopt now the notational convention that within the proof, for a lower
+
face F of I N , F˜ = pos(F ) denotes the corresponding face of RN . We observe that
+
for a vector x0 with nonnegative coordinates all strictly less than 1, we have
Here AveF denotes “the arithmetic mean over all kfaces of AI N .”
Definition 2.3 (Centrosymmetric Exchangeable Columns) Let A be a random n by
N matrix such that, for each signed permutation matrix Π and for every measurable
set Ω ,
Prob{A ∈ Ω} = Prob{AΠ ∈ Ω}.
Then we say that A has centrosymmetric exchangeable columns.
Below we assume without loss of generality that A has centrosymmetric
exchangeable columns. Then all kfaces of RN become statistically equivalent:
+
Prob AF is a kface of ARN
+
= Prob AG is a kface of ARN
+
for each distinct F , G in Fk(RN ); indeed, there is always a permutation Π for which
+
G is the image of F under Π : G = Π F , and the probabilities are Π invariant. Then
(2.4) becomes: let F be a fixed kface of RN ; then
+
Similarly, all kfaces of I N become statistically equivalent; indeed, there is always a
signed permutation Π for which G is the image of F under Π : G = Π F , and the
probabilities are Π invariant. Hence (2.6) becomes: let F be a fixed kface of I N ;
then
Combining these displays with (1.14) implies (1.13).
(2.5)
(2.7)
(2.8)
2.3 Some Generalities About Binomial Probabilities
The probability Pm,M has a classical interpretation: it gives the probability of at most
m − 1 heads in M − 1 tosses of a fair coin. The usual Normal approximation to the
binomial tells us that
Pm,M ≈ Φ
(m − 1) − (M − 1)/2
√(M − 1)/4
,
with Φ the usual standard normal distribution function Φ(x) = x e−y2/2 dy/√2π ;
−∞
here the approximation symbol ≈ can be made precise using standard limit theorems,
e.g., appropriate for small or large deviations. In this expression, the approximating
normal has mean (M − 1)/2 and standard deviation √(M − 1)/4. There are three
regimes of interest, for large m, M , and three behaviors for Pm,M .
• Lower Tail: m M/2 − √M/4. Pm,M ≈ 0.
• Middle: m ≈ M/2. Pm,M ∈ (0, 1).
• Upper Tail: m M/2 + √M/4. Pm,M ≈ 1.
2.4 Proof of Theorem 1.2
Using the correspondence N − n ↔ m, N − k ↔ M , and the connection to Wendel’s
theorem, we have three regimes of interest:
• N − n (N − k)/2.
• N − n ≈ (N − k)/2.
• N − n (N − k)/2.
In the proportionaldimensional framework, the above discussion translates into three
separate regimes, and separate behaviors we expect to be true:
• Case 1: ρ < ρW (δ; R+). PNn−n,Nn−kn → 0.
• Case 2: ρ = ρW (δ; R
• Case 3 ρ > ρW (δ; R++))..PPNNnn−−nn,N,Nnn−−kknn→∈ (10., 1).
Case 2 is trivially true, but it has no role in the statement of Theorem 1.2. Cases 1
and 3 correspond exactly to the two parts of (1.7) that we must prove.
To prove Cases 1 and 3, we need an upper bound deriving from standard
largedeviations analysis of the lower tail of the binomial.
Lemma 2.4 Let N − n < (N − k)/2. Then
R
PN−n,N−k ≤ n3/2 exp N ψW+
n k
,
N n
,
where the exponent is defined as
R
ψW+ (δ, ρ) := H (δ) + δH (ρ) − H (ρδ) − (1 − ρδ) log 2
(2.9)
(2.10)
with H (·) the Shannon Entropy (1.9), see Fig. 2.
Proof Upperbounding the sum in PN−n,N−k by N − n − 1 times NN−−k −n1 , we arrive
at
PN−n,N−k ≤ 2N−k−1 n − k · (N − k + 1) N
N − k n
m
We can bound γ ·m for γ < 1 using the Shannon entropy (1.9):
n
k
N −1
k
.
(2.11)
(2.12)
c1n−1/2emH (γ ) ≤ γ m·m
≤ c2emH (γ ),
16 √2/π , c2 := 5/4√2π . Recalling the definition of ψWR+ , we
obwhere c1 := 25
tain (2.9).
We will now consider Cases 1 and 3, and prove the corresponding conclusion.
Case 1: ρ < ρW (δ; R+). The threshold function ρW (δ; R+) is the location of the
R
lowest zero crossing ψW+ (δ, ρ) as a function of ρ for δ fixed; i.e.,
R
ρW (δ; R+) = inf ρ : ψW+ (δ, ρ) ≥ 0 .
R
Thus, for any ρ strictly below ρW (δ; R+), the exponent ψW+ (δ, ρ) is strictly negative.
Lemma 2.4 thus implies that PNn−n,Nn−kn → 0 as n → ∞.
Case 3: ρ > ρW (δ; R+). Binomial probabilities have a standard symmetry (re
label every “head” outcome as a “tail”, and vice versa). It follows that Pm,M =
1 − PM−m,M . We have PN−k,N−n = 1 − PN−k,n−k . In this case N − n > (N − k)/2,
so Lemma 2.4 tells us that PN−k,n−k → 0 as n → ∞; we conclude PN−k,N−n → 1
as n → ∞.
2.5 Proof of Theorem 1.3
PN−n,N−k is the probability that one fixed kdimensional face F of RN generates a
+
kface AF of ARN . The probability that some kdimensional face generates a kface
+
can be upperbounded, using Boole’s inequality, by fk(RN+ ) · PN−n,N−k .
From (2.12), (2.9), and fk(RN+ ) = Nk we have
fk RN+ · PN−n,N−k ≤ n3/2 exp N ψSR+ (δn, ρn) ,
R
Recall that for δ ≥ 1/2, ρS (δ; R+) is the location of the lowest zero crossing of ψS +
as a function of ρ for δ fixed; i.e.,
R R
ψS + (δ; R+) = inf ρ : ψS + (δ, ρ) ≥ 0 .
R
For any ρ < ρS (δ; R+), we have ψS + (δ, ρ) < 0, and as a result (1.11) follows.
3 Contrasting the Hypercube with Other Polytopes
The theorems in Sect. 1 contrast strongly with existing results for other polytopes.
3.1 Nonexistence of Weak Thresholds at δ < 1/2
Theorem 1.1 identifies a region of the phase diagram (δ = Nn , ρ = Nk ) where the
typical random zonotope has nearly as many kfaces as its generating hypercube; in
particular, if n < N /2, it has many fewer kfaces than the hypercube, for every k.
This behavior at δ = n/N < 1/2 is quite different from the behavior seen for random
projections of the simplex and the crosspolytope at small δ. Those polytopes have
fk (AQ) ≈ fk (Q) for quite a large range of k even at relatively small values of δ;
[
16
] shows that we can have k ∼ n/(2 log(δ−1)) for small δ at both of those polytopes,
while we could not have even k = 1 at the hypercube for such small δ; see also the
visual evidence in Fig. 3.
3.2 Nonexistence of Strong Thresholds for Hypercube
Lemma 1.4 shows that projected zonotopes always have strictly fewer kfaces than
their generators fk (AI N ) < fk (I N ), for every n < N . This is again quite
different from the situation with the simplex and the crosspolytope, where we can
even have n N and still find k for which fk(AQ) = fk(Q), [
16
], roughly
k ∼ n/(2e log(δ−1)); again see visual evidence in Fig. 3.
3.3 Universality of Weak Phase Transitions
Theorem 1.1 holds for any A in general position.
In proving weak and strong threshold results for the simplex and crosspolytope,
we required A to either be a random orthoprojector or to have Gaussian i.i.d. entries.
Thus, what we proved for those families of regular polytopes applies to a much more
limited range of matrix ensembles than for hypercubes.
4 Contrasting the Cone with the Hypercube
4.1 Universality of Weak Phase Transitions
For Theorem 1.2, A can be sampled from any ensemble of random matrices invariant
under right multiplication by signed permutations. The result is thus universal across
a wide class of matrix ensembles.
In proving weak and strong threshold results for the simplex and crosspolytope,
we required A to either be a random orthoprojector or to have Gaussian i.i.d. entries.
Thus, what we proved for those families of regular polytopes applies to a much more
limited range of matrix ensembles than for the orthant.
Our empirical studies suggest that the same ensembles of matrices which “work”
for the orthant weak threshold also “work” for the simplex and crosspolytope
thresholds. It seems to us that the universality across matrix ensembles proven here may
point to a much larger phenomenon, valid also for other polytope families; however,
this universality class is far more restrictive than is the case for the hypercube. For
our empirical studies, see [
17
].
The weak threshold for the orthant depends very much more delicately on details
about A than do the hypercube thresholds; unlike fk(I N ), fk(ARN ) is not the same
+
number for all A in general position. It makes a substantial difference to the results if
the matrix A is not “zeromean.”
4.2 The LowFrequency Partial Fourier Matrix
Consider the special partial Fourier matrix made only of the n lowest frequency
entries.
Corollary 4.1 Assume n is odd and, for j = 1, 2, . . . , N , let
Then
Ωij =
cos π(j−1)(i−1) , i = 1, 3, 5, . . . , n,
N
sin π(j−1)i ,
N
i = 2, 4, 6, . . . , n − 1.
fk ΩRN+ = fk RN+ ,
The result is a corollary of [18, Theorem 3, p. 56]. The key steps of the proof are
included in an extended technical report [
15
].
This behavior is dramatically different than the case for random A of the type
considered so far, and in some sense dramatically better.
Corollary 4.1 is closely connected with the classical question of
neighborliness. There are famous polytopes which can be generated by projections AT N−1
and have exactly as many kfaces as T N−1 for k ≤ n/2 . A standard example
is provided by the matrix Ω defined in (4.1); it obeys fk(ΩT N−1) = fk(T N−1),
0 ≤ k ≤ n/2 . (There is a vast literature touching in some way on the phenomenon
fk(ΩT N−1) = fk(T N−1). In that literature, the polytope ΩT N−1 is usually called a
cyclic polytope, and the columns of Ω are called points of the trigonometric moment
curve; see standard references [
21, 28
].)
Hence the matrix Ω offers both fk(ΩT N−1) = fk(T N−1) and fk(ΩRN ) =
+
fk(RN+ ) for 0 ≤ k ≤ n/2 . This is exceptional. For random A of the type
discussed in earlier sections, there is a large disparity between the sets of triples
(k, n, N ) where fk(AT N−1) = fk(T N−1)—this happens for k/n < ρS (n/N ; T )—
and those where fk(ARN ) = fk(RN )—this happens for k/n < ρS (n/N ; R+). These
+ +
two strong thresholds are displayed in Figs. 3 and 1, respectively.
Even if we relax our notion of agreement of face counts to weak agreement, the
collections of triples where fk(AT N−1) ≈ fk(T N−1) and fk(ARN ) ≈ fk(RN ) are
+ +
very different, because ρW (n/N ; T ) and ρW (n/N ; R+) are so dramatically different,
particularly at n < N /2.
4.3 Adjoining a Row of Ones to A
An important feature of the random matrices A studied earlier is orthant symmetry. In
particular, the positive orthant plays no distinguished role with respect to these
matrices. On the other hand, the partial Fourier matrix Ω constructed in the last subsection
contains a row of ones, and thus the positive orthant has a distinguished role to play
for this matrix. Moreover, this distinction is crucial; we find empirically that removing
the row of ones from Ω causes the conclusion of Corollary 4.1 to fail drastically.
Conversely, consider the matrix A˜ obtained by adjoining a row of N ones to some
matrix A:
Adding this row of ones to a random matrix causes a drastic shift in the strong and
weak thresholds.
Theorem 4.1 Consider the proportionaldimensional asymptotic framework with
parameters δ, ρ in (0, 1). Let a random (n − 1) × N matrix A have i.i.d. standard
normal entries. Let A˜ denote the corresponding n × N matrix whose first row is all
ones and whose remaining rows are identical to those of A. Then
n→∞
lim E fk(A˜RN )
fk(RN +) =
+
1, ρ < ρW (δ, T ),
<1, ρ > ρW (δ, T );
(4.2)
lim P fk A˜RN
n→∞ +
= fk RN
+
=
1, ρ < ρS (δ, T ),
0, ρ > ρS (δ, T ).
(4.3)
Note particularly the mixed form of this relationship. Although the conclusions
concern the behavior of faces of the randomlyprojected orthant, the thresholds are
those that were previously obtained for the randomlyprojected simplex.
Since there is such a dramatic difference between ρ(δ; T ) and ρ(δ; R+), the
single row of ones can fairly be said to have a huge effect. In particular, the
region “below” the simplex weak phase transition ρW (δ; T ) comprises ≈0.5634 of the
(δ, ρ) parameter area, and the hypercube weak phase transition ρW (δ; I ) comprises
1 − log 2 ≈ 0.3069.
Theorem 4.1 is an immediate consequence of the following identity.
Lemma 4.2 Suppose that the row vector 1 is not in the row span of A. Then
fk A˜RN
+
= fk−1 AT N−1 ,
0 < k < n.
Proof We observe that there is a natural bijection between kfaces of RN and the
+
(k − 1)faces of T N−1. The (k − 1)faces of T N−1 are in bijection with the
corresponding support sets of cardinality k: i.e., we can identify with each kface F the
union I of all supports of all members of the face. Similarly, to each support set I of
cardinality k, there is a unique kface F˜ of RN consisting of all points in RN+ whose
+
support lies in I . Composing bijections F ↔ I ↔ F˜ , we have the bijection F ↔ F˜ .
Concretely, let x0 be a point in the relative interior of some (k − 1)face F of
T N−1. Then x0 has k nonzeros. x0 is also in the relative interior of the kface F˜ of
RN+ Conversely, let y0 be a point in the relative interior of some kface of RN ; then
x0 = (1 y0)−1y0 is a point in the relative interior of a (k − 1)face of T N−1. +
The last two paragraphs show that for each pair of corresponding faces (F , F˜ ), we
may find a point x0 in both the relative interior of F˜ and also of the relative interior
of F . For such x0,
Clearly N (A˜) ∩ lin(x0) = {0}, because 1 x0 > 0. We conclude that the following are
equivalent:
Transverse A, x0, T N−1 :
Transverse A˜, x0, RN :
+
N (A) ∩ Feasx0 T N−1
= {0}.
N (A˜) ∩ Feasx0 RN+
= {0}.
Rephrasing [
13
], the following are equivalent for x0 a point in the relative interior
of F :
Survive A, F , T N−1 :
Transverse A, x0, T N−1 :
AF is a (k − 1)face of AT N−1,
N (A) ∩ Feasx0 T N−1
= {0}.
We conclude that for two corresponding faces F , F˜ , the following are equivalent:
Survive A, F, T N−1 :
Survive A˜, F˜ , RN :
+
Combining this with the natural bijection F ↔ F˜ , the lemma is proved.
5 Application: Compressed Sensing
Our face counting results can all be reinterpreted as statements about “simple”
solutions of underdetermined systems of linear equations. This reinterpretation allows us
to make connections with numerous problems of current interest in signal
processing, information theory, and probability. The reinterpretation follows from the two
following lemmas, which are restatements of Lemma 2.1 for Q = RN and Q = I N ,
+
rephrasing the notion of (Transverse(A, x0, Q)) with the all but linguistically
equivalent (Unique(A, x0, Q)).
Lemma 5.1 Let x0 be a vector in RN with exactly k nonzeros. Let F denote the
+
associated kface of RN . For an n × N matrix A, let AF denote the image of F
+
under A and b0 = Ax0 the image of x0 under A. The following are equivalent:
Survive A, F, RN :
+
Unique A, x0, RN :
+
AF is a kface of ARN ,
+
The system b0 = Ax has a unique solution in RN .
+
Lemma 5.2 Let x0 be a vector in I N with exactly k entries strictly between the
bounds {0, 1}. Let F denote the associated kface of I N . For an n × N matrix A,
let AF denote the image of F under A and b0 = Ax0 the image of x0 under A. The
following are equivalent:
Survive A, F, I N :
AF is a kface of AI N ,
Unique A, x0, I N :
The system b0 = Ax has a unique solution in I N .
Note that the systems of linear equations referred to in these lemmas are
underdetermined: n < N . Hence these lemmas identify conditions on underdetermined
systems of linear equations such that, when the solutions are known to obey certain
constraints, such seemingly weak prior knowledge in fact uniquely determines the
solution. The first result can be paraphrased as saying that nonnegativity constraints
can be very powerful if the object is known to have relatively few nonzeros; the
second result says that upper and lower bounds can be very powerful, provided that many
of those bounds are binding.
5.1 Reconstruction Exploiting Nonnegativity Constraints
We wish to reconstruct the unknown x, knowing only the linear measurements b =
Ax, the matrix A, and the constraint x ∈ RN+ .
Let J (x) be some function of x. Consider the positivityconstrained variational
problem
(PosJ ) min J (x) subject to b = Ax, x ∈ RN+ .
Let posJ (b, A) denote any solution of the problem instance (PosJ ) defined by data b
and matrix A.
We conclude the following:
Corollary 5.1 Suppose that
Let x0 ≥ 0 and x0 0 ≤ k. For the problem instance defined by b = Ax0,
fk ARN+ = fk RN .
+
posJ (b, A) = x0.
In words: under the given conditions on the face numbers, any variational
prescription which imposes nonnegativity constraints will correctly recover the ksparse
solution in any problem instance where such a ksparse solution exists.
Corresponding to this “strong” statement is a “weak” statement. Consider the
following probability measure on ksparse problem instances.
• Choose a random subset L of size k from {1, . . . , N }, by k simple random draws
without replacement.
• Set the entries of x0 not in the selected subset to zero.
• Choose the entries of x0 in the selected set L from some fixed joint distribution ψL
supported in (0, 1)k .
• Generate the problem instance b = Ax0.
We speak of drawing a ksparse random problem instance at random.
Corollary 5.2 Suppose that for some
For (b, A) a problem instance drawn at random, as above:
fk ARN+ ≥ (1 − ) · fk RN .
+
Prob posJ (b, A) = x0 ≥ (1 − ).
In words: under the given conditions on the face lattice, any variational
prescription which imposes nonnegativity constraints will correctly succeed to recover the
ksparse solution in at least a fraction (1 − ) of all ksparse problem instances.
For more discussion, including potential applications, see [
7, 13, 18, 19
].
5.2 Reconstruction Exploiting Box Constraints
Consider again the problem of reconstruction from measurements b = Ax, but this
time assuming that the object x obeys boxconstraints: 0 ≤ x(j ) ≤ 1, 1 ≤ j ≤ N .
Define the boxconstrained variational problem
(BoxJ )
min J (x) subject to b = Ax,
0 ≤ x(j ) ≤ 1,
j = 1, . . . , N .
Let boxJ (b, A) denote any solution of the problem instance (BoxJ ) defined by data
b and matrix A. In this setting, the notion corresponding to “sparse” is “simple”. We
say that a vector x is ksimple if at most k of its entries differ from the bounds {0, 1}.
Consider the following probability measure on problem instances having ksimple
solutions. Recall that ksimple vectors have all entries equal to 0 or 1 except at k
exceptional locations.
• Choose the subset L of k exceptional entries uniformly at random from the set
{1, . . . , N } without replacement.
• Choose the nonexceptional entries to be either 0 or 1 based on tossing a fair coin.
• Choose the values of the exceptional k entries according to a joint probability
measure ψL supported in (0, 1)k .
• Define the problem instance b = Ax0.
Corollary 5.3 Suppose that for some
Randomly sample a problem instance (b, A) using the method just described. Then
In words: under the given conditions on the face lattice, any variational
prescription which imposes box constraints will correctly recover at least a fraction (1 − )
of all underdetermined systems generated by the matrix A which have ksimple
solutions.
In the hypercube case there is no phenomenon comparable to that which arose
in the positive orthant with the special constructions Ω and A˜ ; fk(AI N ) is a fixed
number if A is in general position and decreases if A is not in general position.
Consequently, the hypercube weak threshold is the best general result on the ability
to undersample by exploiting box constraints. In particular, the difference between
the weak simplex threshold and the weak hypercube threshold has this interpretation:
A given degree k of sparsity of a nonnegative object is much more powerful
than that same degree of simplicity of a boxconstrained object.
Specifically, we should not expect to be able to undersample a typical
boxconstrained object by more than a factor of 2 and then reconstruct it using some
gardenvariety variational prescription. In comparison, the last section showed that
we can severely undersample very sparse nonnegative objects. Moreover, when
n < N , there is no region where fk(AI N ) = fk(I N ), and consequently box
constraints are never enough to ensure boxJ (b, A) = x0 for all ksimple problem
instances.
Acknowledgements Art Owen suggested that we pay attention to Wendel’s Theorem. We also thank
Goodman, Pollack, and Schneider for providing scholarly background and the anonymous referees for the
most useful and instructive referee reports we have ever seen.
1. Adamczak , R. , Litvak , A.E. , Pajor , A. , TomczakJaegermann , N.: Restricted isometry property of matrices with independent columns and neighborly polytopes by random sampling . arXiv:0904.4723v1 ( 2009 )
2. Affentranger , F. , Schneider , R. : Random projections of regular simplices . Discrete Comput. Geom . 7 ( 3 ), 219  226 ( 1992 )
3. Baryshnikov , Y.M.: Gaussian samplesm, regular simplices, and exchangeability . Discrete Comput. Geom . 17 ( 3 ), 257  261 ( 1997 )
4. Björner , A. , Las Vergns , M. , Sturmfels , B. , White , N. , Ziegler , G.: Oriented Matroids . Encyclopedia of Mathematics and Its Applications , vol. 46 . Cambridge University Press, Cambridge ( 1999 )
5. Bolker , E.D.: A class of convex bodies . Trans. Am. Math. Soc . 145 , 323  345 ( 1969 )
6. Böröczky , K. , Jr. , Henk , M. : Random projections of regular polytopes . Arch. Math. (Basel) 73 ( 6 ), 465  473 ( 1999 )
7. Bruckstein , A.M. , Elad , M. , Zibulevsky , M. : On the uniqueness of nonnegative sparse and redundant representations . In: ICASSP 2008 . Special session on Compressed Sensing, Las Vegas , Nevada ( 2008 )
8. Buchta , C. : On nonnegative solutions of random systems of linear inequalities . Discrete Comput. Geom . 2 ( 1 ), 85  95 ( 1987 )
9. Cover , T.M. : Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition . IEEE Trans. Electron. Comput. EC14(3) , 326  334 ( 1965 )
10. Donoho , D.L. : Highdimensional centrallysymmetric polytopes with neighborliness proportional to dimension . Discrete Comput. Geom . 35 ( 4 ), 617  652 ( 2006 )
11. Donoho , D.L. : Neighborly polytopes and sparse solutions of underdetermined linear equations . Technical Report , Stanford University ( 2006 )
12. Donoho , D.L. , Tanner , J.: Neighborliness of randomlyprojected simplices in high dimensions . Proc. Natl. Acad. Sci. USA 102 ( 27 ), 9452  9457 ( 2005 )
13. Donoho , D.L. , Tanner , J.: Sparse nonnegative solutions of underdetermined linear equations by linear programming . Proc. Natl. Acad. Sci. USA 102 ( 27 ), 9446  9451 ( 2005 )
14. Donoho , D.L. , Tanner , J.: Exponential bounds implying construction of neighborly polytopes, errorcorrecting codes and compressed sensing matrices by random sampling . Preprint ( 2007 )
15. Donoho , D.L. , Tanner , J.: Counting the faces of randomlyprojected hypercubes and orthants, with applications . arXiv:0807.3590v1 ( 2008 )
16. Donoho , D.L. , Tanner , J.: Counting faces of randomlyprojected polytopes when the projection radically lowers dimension . J. AMS 22 ( 1 ), 1  53 ( 2009 )
17. Donoho , D.L. , Tanner , J.: Observed universality of phase transitions in highdimensional geometry, with implications for modern data analysis and signal processing . Philos. Trans. A ( 2009 )
18. Donoho , D.L. , Johnstone , I.M. , Hoch , J.C. , Stern , A.S.: Maximum entropy and the nearly black object . J. R. Stat. Soc. , Ser . B (Methodological) 54 ( 1 ), 41  81 ( 1992 )
19. Fuchs , J.J.: On sparse representations in arbitrary redundant bases . IEEE Trans. Inf. Theory 50 ( 6 ), 1341  1344 ( 2004 )
20. Goodey , P. , Weil , W. : Zonoids and generalisations . In: Handbook of Convex Geometry, vols. A, B, pp. 1297  1326 . NorthHolland, Amsterdam ( 1993 )
21. Grünbaum , B. : Convex Polytopes, 2nd edn . Graduate Texts in Mathematics, vol. 221 . Springer, New York ( 2003 ). Prepared and with a preface by Volker Kaibel, Victor Klee , and Günter M. Ziegler
22. Schläfli , L. : In: Gesammelte Mathematische Abhandlungen, vol. 1 , pp. 209  212 . Birkhäuser, Basel ( 1950 )
23. Schneider , R. , Weil , W. : Zonoids and related topics . In: Convexity and Its Applications , pp. 296  317 . Birkhäuser, Basel ( 1983 )
24. Schneider , R. , Weil , W. : Stochastic and Integral Geometry . Springer, Berlin ( 2008 )
25. Vershik , A.M. , Sporyshev , P.V. : Asymptotic behavior of the number of faces of random polyhedra and the neighborliness problem . Sel. Math. Sov . 11 ( 2 ), 181  201 ( 1992 )
26. Wendel , J.G. : A problem in geometric probability . Math. Scand . 11 , 109  111 ( 1962 )
27. Winder , R.O. : Partitions of nspace by hyperplanes . SIAM J. Appl. Math . 14 ( 4 ), 811  818 ( 1966 )
28. Ziegler , G.M. : Lectures on Polytopes. Graduate Texts in Mathematics, vol. 152 . Springer, New York ( 1995 )
29. Zong , C. : What is known about unit cubes . Bull. AMS 42 ( 2 ), 181  211 ( 2005 )