A Simple Sampling Lemma: Analysis and Applications in Geometric Optimization
Discrete Comput Geom
A Simple Sampling Lemma: Analysis and Applications in Geometric Optimization¤
B. Ga¨rtner 0
E. Welzl 0
0 Institut fu ̈r Theoretische Informatik, ETH Zu ̈rich, ETH Zentrum , CH8092 Zu ̈rich , Switzerland
Random sampling is an efficient method to deal with constrained optimization problems in computational geometry. In a first step, one finds the optimal solution subject to a random subset of the constraints; in many cases, the expected number of constraints still violated by that solution is then significantly smaller than the overall number of constraints that remain. This phenomenon can be exploited in several ways, and typically results in simple and asymptotically fast algorithms. Very often the analysis of random sampling in this context boils down to a simple identity (the sampling lemma) which holds in a general framework, yet has not been stated explicitly in the literature. In the more restricted but still general setting of LPtype problems, we prove tail estimates for the sampling lemma, giving Chernofftype bounds for the number of constraints violated by the solution of a random subset. As an application, we provide the first theoretical analysis of multiple pricing, a heuristic used in the simplex method for linear programming in order to reduce a large problem to few small ones. This follows from our analysis of a reduction scheme for general LPtype problems, which can be considered as a simplification of an algorithm due to Clarkson. The simplified version needs less random resources and allows a Chernofftype tail estimate. ¤ The first author acknowledges support from the Swiss Science Foundation (SNF), Project No. 2150647.97. A preliminary version of this paper appeared in the Proceedings of the 16th Annual ACM Symposium on Computational Geometry (SCG), 2000, pp. 9199.

Random sampling and randomized incremental construction have become
wellestablished, by now even classical, design paradigms in the field of computational geometry,
see [
27
]. Many algorithms following that paradigm have been simplified to a point
where they can easily be taught in introductory CS courses, with almost no technical
difficulties. This was not always the case; pioneering papers, notably the ones by Clarkson
and Shor [
6
], [
9
], Mulmuley [
26
], and by Guibas et al. [
18
], still required more technical
derivations.
This changed when Seidel popularized the backwards analysis paradigm for
randomized algorithms [
30
]. Together with the abstract framework of configuration spaces, this
technique allows us to treat many different algorithms in a simple and unified way [
11
].
The goal of this paper is to popularize and prove results around a simple identity (the
sampling lemma) which underlies the analysis of randomized algorithms for many
geometric optimization problems. By that we mean problems defined in a lowdimensional
space, which usually implies that they have few constraints or few variables when written
as mathematical programs.
As we show below, special cases of the identity, or inequalities implied by it, are used
in many places, including the analysis of the general configuration space framework. To
the knowledge of the authors, the identity itself, however, has not been noticed explicitly.
The Sampling Lemma
Let S be a set of size n and let ' be a function that maps any set R µ S to some value
'.R/.1 Define
V .R/ :D fs 2 SnR j '.R [ fsg/ 6D '.R/g;
X .R/ :D fs 2 R j '.Rnfsg/ 6D '.R/g:
V .R/ is the set of violators of R, while X .R/ is the set of extreme elements in R.
Obviously,
s violates R
,
s is extreme in R [ fsg:
.1/
For a random sample R of size r , i.e. a set R chosen uniformly at random from the
set ¡rS¢ of all r element subsets of S, we define random variables Vr : R 7! jV .R/j and
Xr : R 7! jX .R/j, and we consider the expected values
Lemma 1.1 (Sampling Lemma). For 0 · r < n,
vr :D E .Vr /;
xr :D E .Xr /:
vr xrC1 :
n ¡ r D r C 1
1 Here, the only purpose of ' is to partition 2S into equivalence classes; later, the functionnotation becomes
clear.
Proof. Using the definitions of vr and xrC1 as well as (1), we can argue as follows:
µn¶
r
vr D
D
D
D
X
X [s violates R]
R2.rS/ s2SnR
X
R2.rS/ s2SnR
X
Q2.rCS1/ s2Q
µ n ¶
r C 1
xrC1:
X [s is extreme in R [ fsg]
X[s is extreme in Q]
Here, [¢] is the indicator variable for the event in brackets. Finally, ¡r Cn1¢=¡nr¢ D .n ¡
r /=.r C 1/.
To appreciate the simplicity (if not triviality) of the lemma, one should consider it as
a special case of the following observation: given a bipartite graph, the average vertex
degree in one color class times the size of that class equals the average vertex degree in
the other color class times its size.
In our case, the two color classes are the subsets of S of sizes r and r C 1, respectively,
and two sets R and R [ fsg share an edge if and only if s violates R (equivalently, if
s is extreme in R [ fsg). This means, the sampling lemma still holds if “violation” is
individually defined for every pair .R; s/.
A situation of quite similar flavor, where a simple bipartite graph underlies a
probabilistic scenario, has been studied by Dubhashi and Ranjan [
12
].
We can also establish a version of the sampling lemma in the model of Bernoulli
sampling, where R is chosen by picking each element of S independently with some
fixed probability p 2 [0; 1] (we say R is a random psample). Let V .p/ and X .p/ denote
the random variables for the number of violators and extreme elements, respectively, in
a psample, and let v.p/ and x .p/ be the corresponding expectations.
Lemma 1.2 ( pSampling Lemma). For 0 · p · 1,
Proof. Each r element set R occurs as a psample with probability
pv.p/ D .1 ¡ p/x .p/:
µn¶ pr .1 ¡ p/n¡r :
r
Using the Sampling Lemma 1.1 it follows that
rD0
D p Xn¡1 µ n ¶ pr .1 ¡ p/n¡r xrC1 D p Xn µn¶ pr¡1.1 ¡ p/n¡rC1xr
r C 1 r
rD1
D .1 ¡ p/ Xn µn¶ pr .1 ¡ p/n¡r xr D .1 ¡ p/ Xn µn¶ pr .1 ¡ p/n¡r xr
r r
rD0
In the next section we discuss some wellknown results obtained by random sampling
and show that all of them easily follow from the sampling (respectively psampling)
lemma. Concentrating on the Sampling Lemma 1.1, we elaborate on its connection to
configuration spaces and backwards analysis. Section 3 deals with LPtype problems,
which can be considered as functions ' with specific properties. Section 4 establishes
Chernofftype tail estimates for the random variable Vr , i.e. for the number of violators
of a random sample. The sampling lemma and the tail estimates are finally used in
Section 5 to analyze an algorithm for general LPtype problems, which can be considered
as the “practical” version of Clarkson’s reduction scheme [
16
]. Its specialization to linear
programming is a variant of multiple pricing [
5
].
2. Incarnations of the Sampling Lemmata
Searching in a Sorted Compact List
A sorted compact list represents a set S of n ordered keys in an array, where the order
among the keys is established by additional pointers linking each element to its
predecessor in the order, see Fig. 1. It is well known that the smallest key in a sorted compact
list can be found in O.pn/ expected time [10, Problem 113].
For this, one draws a random sample R of r D 2.pn/ keys, finds the smallest key
s0 in the sample, and finally follows the links from s0 to the overall smallest key. The
efficiency comes from the fact that an expected number of only 2.pn/ keys is still
smaller than s0. In general, setting '.R/ D min.R/ and observing that XrC1 ´ 1, the
sampling lemma yields
E .#fs 2 SnR j s < min.R/g/ D nr C¡ 1r :
.2/
Note that s < min.R/ is equivalent to min.R [ fsg/ 6D min.R/.
Property (2) was exploited by Seidel in the following observation: given a simple
dpolytope P with n vertices, specified by its 1skeleton (the graph of vertices and edges
of P), one can find the vertex that minimizes some linear function f in expected time
O.dpn/. The corresponding randomized subroutine serves as a building block of a
simple algorithm for computing the intersection of halfspaces, or, dually, the convex hull
8 5 4 2 1 6 3 7
of points in ddimensional space. For d ¸ 4, this algorithm achieves optimal expected
worstcase performance [
31
].
Smallest Enclosing Ball
Consider the problem of computing the smallest enclosing ball of a set S of n points in
ddimensional space, for some fixed d. Randomized incremental algorithms do this in
expected O.n/ time [
33
], based on the following fact: if the points are added in random
order, the probability that the nth point is outside the smallest enclosing ball of the first
n ¡ 1 points is bounded by .d C 1/=n. In general, it holds that if R µ S is a random
sample of r points, and ball.R/ denotes the smallest enclosing ball of R, then
E .#f p 2 SnR j p 62 ball.R/g/ · .d C 1/
.3/
n ¡ r :
r C 1
Again, this follows from the sampling lemma, with '.R/ D ball.R/, together with the
observation that any set R has at most d C 1 extreme elements [
33
], and the fact that
s 62 ball.R/ , ball.R [ fsg/ 6D ball.R/.
Similar results hold for the smallest enclosing ellipsoid problem. The randomized
incremental algorithm based on them was the first one to achieve an expected runtime of
O.n/ for that problem, see [
33
]. The pioneering applications of randomized incremental
construction along these lines were Clarkson’s and Seidel’s lineartime algorithms for
linear programming with a fixed number d of variables [
8
], [
29
].
Planar Convex Hull
For a planar point set S, jSj D n, the randomized incremental construction adds the
points in random order, always maintaining the convex hull of the points added so far.
When a point p is added, it has to “locate” itself, i.e. it has to know whether it is outside
the current convex hull, and in this case identify some hull edge e visible from p.
As it turns out, the amortized expected cost for doing this in the r th step (after which
the points added so far form a random sample R of size r ) is proportional to ar =r , where
ar :D E .#f p 2 SnR j p 62 conv.R/g/:
The “trick” now is to express this in terms of another quantity:
br :D E .#f p 2 R j p vertex of conv.R/g/:
The sampling lemma with '.R/ D conv.R/ then shows that
ar D brC1 nr C¡ 1r :
.4/
For this, we need the observation that p 62 conv.R/ is equivalent to conv.R [ fsg/ 6D
conv.R/, which in turn means that p is a vertex of conv.R [ fsg/. The expected overall
location cost (which dominates the runtime) is then proportional to
Xn ar · n Xn brC1
rD1 r rD1 r .r C 1/
:
Because brC1 · r C 1, this gives an O.n log n/ algorithm. However, the bound is much
better in some cases. For example, if the input points are chosen randomly from the unit
square (unit disk, respectively), we get br D O.log r / (br D O. p3r /, respectively) [
28
],
[
20
]. In both cases the algorithm actually runs in linear time. In higher dimensions, an
analysis along these lines is available, but requires substantial refinements [
9
], [
30
].
Minimum Spanning Forests
Let G D .V ; E / be an edgeweighted graph, jV j D n. For D µ E , let msf.D/ denote
the minimum spanning forest of the graph .V ; D/ (which we assume to be unique for
all D). An edge e 2 E is called Dlight if it either connects two components of msf.D/
or it has smaller weight than some edge on the unique path in msf.D/ between its two
vertices. The expected lineartime algorithm for computing msf.E / due to Karger et
al. [
21
], [
25
] relies (among other insights) on the following fact: if D is a random
psample, the expected number of Dlight edges is bounded by n= p. Using the pSampling
Lemma 1.2, this fact is easily derived. Namely, it is a simple observation that e is Dlight
if and only if msf.D/ 6D msf.D [ feg/. With '.D/ D msf.D/, this means that the set of
Dlight edges is exactly the set of violators of D. By the psampling lemma, if D is a
random psample, their expected number is given by
v.p/
D
1 ¡ p x .p/
p
x .p/
· p
:
It remains to observe that x .p/ · n ¡ 1, because X .D/ contains exactly the edges in
msf.D/, for all D.
Along these lines, Chan has proved a bound for the expected number of Dlight edges
in the case where D is a random sample of size r [
4
]. His argument uses backwards
analysis and boils down to a proof of the Sampling Lemma 1.1 in this specific scenario.
Backwards Analysis and Configuration Spaces
The Sampling Lemma 1.1 in its full generality can be easily proved using backwards
analysis, and as indicated in the previous subsection, this is usually the way its
specializations are derived in the applications. For this, one considers the randomized incremental
“construction” of '.S/, via adding the elements of S in random order, and analyzes the
situation in step r C 1 [
30
].
There is also a connection to configuration spaces. In general, such a space consists
of an abstract set of configurations over some set S, where each configuration 1 has a
defining set D.1/ µ S and a conflict set K .1/ µ S. 1 is active with respect to R µ S
if and only if D.1/ µ R and K .1/ µ SnR. The goal is to compute the configurations
active with respect to S, by adding the elements in random order, always maintaining the
active configurations of the current subset. The abstract framework provides bounds for
the expected overall structural change (number of configurations ever becoming active)
during that construction [
9
], [
27
], [
11
].
In our case, every subset R has exactly one active configuration 1 D '.R/ associated
with it, where D.1/ D X .R/ and K .1/ D V .R/.2 In this case the sampling lemma
provides a bound for the expected structural change vr =.n ¡ r / that occurs in step r C 1.
For example, it specializes to Theorem 9.14 of [
11
] if xrC1 is bounded by a constant d.
In the following we are interested not only in the expectation but also in the distribution
of the random variable Vr , something the configuration space framework does not handle.
For this, we concentrate on the case in which .S; '/ has the structure of an LPtype
problem. This situation covers many important optimization problems, including linear
programming and all motivating examples discussed above.
3. LPType Problems
If ' maps subsets to some ordered set O, we can consider functions ' that are monotone,
i.e. '.F / · '.G/ for F µ G. In this situation, we can regard a pair .S; '/ as an
optimization problem over O, as follows: S is an abstract set of constraints, and for any
R µ S, '.R/ represents the minimum value in O subject to the constraints in R. The
examples above are all of this type, if we define appropriate orderings on the 'values.
For '.R/ D min.R/ in the case of keys, we simply take the decreasing order on the keys.
For S a point set and '.R/ D ball.R/, we can order the balls according to their radii,
while for '.R/ D conv.R/, we may use the area of conv.R/.
Moreover, in all these examples, ' has another special property which we refer to as
the locality. We say that ' is local if R µ Q and '.R/ D '.Q/ implies V .R/ D V .Q/,
for all R; Q ½ S. An example for a nonlocal problem is the diameter: for a set S of
points and R µ S, we define '.R/ to be the euclidean diameter of R. In Fig. 2 we have
'.R/ D '.Q/ for R D fq; sg and Q D f p; q; sg, but ; D V .R/ 6D V .Q/ D fr g:
Still, locality is present in many problems of practical relevance, the most prominent
one being linear programming (LP). In a geometric formulation of linear programming,
S is a set of halfspaces in ddimensional space, and '.R/ is the lexicographically smallest
point among all the ones that minimize some fixed linear function over the intersection
p
>D
r
D
s
q
2 Some care is in order here; in degenerate situations, R can define several configurations 1 with different
sets D.1/, in which case X .R/ is the intersection of all those sets.
of all halfspaces in R. If that intersection is empty, we set '.R/ D 1, with the
understanding that this value dominates all other values. If the function is unbounded over the
intersection, we set '.R/ D ?, standing for “undefined.”
Linear programming is also the motivating example for the following definition [
32
].
Definition 3.1. Let S be a finite set, O some ordered set, and ': 2S ! O [ f?g a
function, where ? is assumed to be the minimum value in O [ f?g. The pair .S; '/ is
called an LPtype problem if ' is monotone and local, i.e. if for all R µ Q µ S with
'.R/ 6D ?,
(i) '.R/ · '.Q/, and
(ii) '.R/ D '.Q/ implies V .R/ D V .Q/.
The concept of LPtype problems has proved useful in the understanding of geometric
optimization, see for example [
2
]. For many problems (including linear programming
and smallest enclosing ball), the currently best theoretical runtime bounds in the unit
cost model can be obtained by an algorithm that works for general LPtype problems
[
16
], [
23
].
We recall the following further notations only briefly and refer to the above literature
for details.
Definition 3.2. Let L D .S; '/ be an LPtype problem.
(i) A basis of R µ S is an inclusionminimal subset B µ R with '.B/ D '.R/. A
basis in L is a basis of some set R µ S. A basis in R is a basis in L contained
in R.
(ii) The combinatorial dimension of L, denoted by ± D ±.L/, is the size of a largest
basis in L.
(iii) L is regular if all bases of sets R, jRj ¸ ± (regular bases), have size exactly ±.
(iv) L is nondegenerate if every set R, jRj ¸ ±, has a unique basis B.R/.
The following implications can easily be derived.
Fact 3.3. Let L D .S; '/ be an LPtype problem and R µ S with '.R/ 6D ?. Then
(i) '.R/ D '.SnV .R//, and
(ii) the set X .R/ of extreme elements of R is the intersection of all bases of R.
If L has combinatorial dimension ±, it follows that jX .R/j · ± for all R, so that the
sampling lemma yields
vr · ± n ¡ r :
r C 1
In particular, a random sample of size r ¼ p±n has no more than r violators on average,
and this is the “balancing” that will prove useful below.
In the next section we derive bounds for regular, nondegenerate LPtype problems
that apply to the general case only in a weaker form. While regularity can be enforced in
the nondegenerate case (we describe a wellbehaved “regularizing” construction below),
Enforcing Regularity
Given a nondegenerate LPtype problem .S; '/ of combinatorial dimension ±, the idea
is to make it regular by “pumping up” bases which are too small. For this, we define an
arbitrary linear order on S, and consider the function
'0.R/ :D .'.R/; E .R//;
where E .R/ consists of the vector of the m largest elements in RnB.R/, for m D
min.±; jRj/¡jB.R/j. '0values are compared lexicographically, i.e. by the 'component
first. If the 'values are equal, the lexicographic order of the E components (well defined
with respect to the chosen order on S) decides the comparison. '0 can be considered as
a “refinement” of '.
Lemma 3.4 [
22
]. If L D .S; '/ is nondegenerate, then .S; '0/ is a regular,
nondegenerate LPtype problem of combinatorial dimension ±.L/.
Moreover, if V .R/ and V 0.R/ denote the violating sets of R µ S with respect to '
and '0, we have the following simple but important fact:
V .R/ µ V 0.R/:
This holds because '.R [fsg/ > '.R/ implies '0.R [fsg/ > '0.R/. It follows that when
we develop tail estimates for the expected size of V 0.R/ (more generally, for any regular
and nondegenerate LPtype problem), those estimates then also apply to nonregular
problems.
nondegeneracy is a more subtle issue. It is not known how to make a general
LPtype problem nondegenerate without substantially changing its structure [
22
]. For most
geometric LPtype problems, however, a slight perturbation of the input will entail a
nondegenerate problem, essentially equivalent to the original one. Most notably, this is
the case for linear programming.
4. Tail Estimates
In the following we consider regular and nondegenerate LPtype problems .S; '/ with
jSj D n and ±.S; '/ D d, where we assume n and d to be fixed for the rest of this section.
For given parameters r ¸ d and k, we want to bound
prob.Vr ¸ k/:
The most important observation is that this quantity does not depend on the LPtype
problem, but is merely a function of the parameters n; d; r , and k.
This follows from a result first proved by Clarkson [
7
] in the context of linear
programming, and later generalized to LPtype problems by Matousˇek [
22
]. We rederive
the statement here.
.5/
Theorem 4.1. Let .S; '/ be a regular, nondegenerate LPtype problem with jSj D n
and ±.S; '/ D d. Then
prob.Vr D k/ D
¡kCd¡1¢¡n¡d¡k¢
d¡1 r¡d
¡n¢
r
:
Proof. A basis B is the basis of a set R if and only if B µ R µ SnV .B/. This means,
for any regular basis B with k violators, there are ¡n¡d¡k¢ sets R of size r which have B
r¡d
as their (unique) basis. It follows that
¡n¡d¡k¢
r¡d
¡n¢
r
;
prob.Vr D k/ D bk
r D d; : : : ; n;
where bk is the number of regular bases with k violators in .S; '/. By summing over all
k, we get
µn¶
r
kD0
D Xn¡d bk µn ¡r¡d ¡d k¶;
r D d; : : : ; n:
.6/
This system of linear equations can be written in the form
µµn¶ µ
;
d
n ¶
d C 1
; : : : ;
µn¶¶
n
D .bn¡d ; bn¡d¡1; : : : ; b0/ T ;
where T is an uppertriangular matrix with all diagonal entries equal to 1, therefore
invertible. This means the bk ’s are uniquely determined by the system (6), from which
bk D
µk C d ¡ 1¶
d ¡ 1
follows via a standard binomial coefficient identity [17, equation (5.26)]. This proves
the statement of the theorem.
This result leads to an explicit formula for prob.Vr ¸ k/, but useful tail estimates do
not yet follow from that. By severe grinding it might be possible to extract good bounds
directly from the formula (we did not succeed), but there is another approach: as we
know that the quantity in question does not depend on the particular LPtype problem,
we might as well use our favorite LPtype problem in the analysis. In fact, for any given
parameters n and d, there is a “canonical” LPtype problem from which statements about
the distribution of Vr can be extracted without pain.
The d Smallest Number Problem
Let N be the set f1; : : : ; ng. For R µ N , define mind .R/ as the dsmallest number
in R (equivalently, the element of rank d in R). If jRj < d, this is undefined, and
mind .R/ :D ?. We have the following easy facts (proofs omitted).
Lemma 4.2.
(i) .N ; '/ with '.R/ :D mind .R/ is a regular, nondegenerate LPtype problem of
combinatorial dimension d, if 'values are compared according to decreasing
order in N .
(ii) The basis of any set R, jRj ¸ d, consists of the d smallest numbers in R.
(iii) s 2 SnR violates R if and only if s is smaller than the dsmallest number in R.
For d D 1, we have mind .R/ D min.R/, thus we recover the LPtype problem
underlying the efficient minimum search in a sorted compact list described in the Introduction.
As a warmup exercise, we rederive the formula for the number of bases with exactly
k violators in a regular and nondegenerate LPtype problem, by using the fact that this
number does not depend on the actual LPtype problem, see Theorem 4.1.
Observation 4.3. The dsmallest number problem has
Proof. Any set B with d elements is a regular basis. B has k violators if and only if the
dsmallest number x in B is the (k C d)smallest number in N . The elements in Bnfx g
can be any d ¡ 1 among the k C d ¡ 1 smaller numbers in N .
The proof of this observation might be somewhat simpler than the one we had in
the general case, but it does not lead to new insights. However, the next theorem about
higher moments of Vr is an example of a statement which we think is not immediate to
prove (let alone discover) without making use of the dsmallest number problem.
Theorem 4.4. Let .S; '/ be a regular, nondegenerate LPtype problem, and let R be a
random sample of size r . For j 2 f0; : : : ; n ¡ r g, we have
E
µµVr ¶¶
j
D
¡ n ¢¡ jCd¡1¢
rC j j
¡n¢
r
:
Proof. We evaluate the expectation for the dsmallest number problem and then use
Theorem 4.1. For this, we need to count the expected number of sets J; j J j D j with
J µ V .R/. Observe that this inclusion holds if and only if all elements of J are smaller
than the dsmallest number in R, equivalently, if J is among the j C d ¡ 1 smallest
numbers in R [ J . For any set L of size r C j , there are ¡ jCdj ¡1¢ pairs .R; J /, R [ J D L,
with this property. Thus we get
µn¶
r
E
µµVr ¶¶
j
D
X X [ J µ V .R/]
jRjDr JµSnR
jJjDj
D
D
X
jLjDrC j
µ n
r C j
µ j C d ¡ 1¶
j
¶µ j C d ¡ 1¶
j
:
When applied to j D 2, the theorem can be used to compute the variance of Vr , leading
to a Chebyshevtype tail estimate. The higher moments give still better bounds. We are
going for Chernofftype bounds, by exploiting the special structure of the dsmallest
number problem.
A ChernoffType Tail Estimate
To choose a random subset R µ N of size r , one can proceed in r rounds, where round i
selects an element si uniformly at random among the ones not chosen so far. Equivalently,
one may choose a “rank” `i uniformly at random in f1; : : : ; n C 1 ¡ i g and let si be the
element of rank `i among the ones not chosen so far.
Fix some positive integer k and let Uk be the random variable for the number of
indices i with `i · k. We have the following relation to the random variable Vr .
Lemma 4.5. Let R D R.`/ denote the set determined by ` D .`1; : : : ; `r /. Then
Uk .`/ ¸ d
)
Vr .R/ · k ¡ 1:
Proof. We claim that Uk ¸ d implies mind .R/ · k C d ¡ 1. Because the latter is
equivalent to Vr · k ¡ 1, the lemma follows.
To prove the claim, we first note that
si D `i C #f j < i j sj < si g:
.7/
Consider some set I of d indices i such that `i · k for i 2 I . Such a set exists if Uk ¸ d.
If si · k C d ¡ 1 for all i 2 I , we get mind .R/ · k C d ¡ 1, as required. Otherwise,
there is some i 2 I such that si D k C e; e ¸ d. Then we get
#f j < i j sj < k C eg D k C e ¡ `i ¸ e;
which implies #f j < i j sj < k C dg ¸ d. As before, this means that mind .R/ · k
C d ¡ 1.
Corollary 4.6. prob.Vr ¸ k/ · prob.Uk · d ¡ 1/:
Chernofftype bounds for Uk are easy to obtain now. Uk can be expressed as the sum
of independent random variables Uk;i ; i D 1; : : : ; r , where
Uk;i :D
and it holds that
k
prob.Uk;i D 1/ D n C 1 ¡ i D: pi :
The following is one of the basic Chernoff bounds [19].
Lemma 4.7.
With E .Uk / D . p1 C ¢ ¢ ¢ C pr /=r and t ¸ 0,
Using t D E .Uk / ¡ d C 1 (which is nonnegative for the values of k we will be
interested in below), we obtain
µ .E .Uk / ¡ d C 1/2 ¶
prob.Uk · d ¡ 1/ · exp ¡ 2E .Uk /
:
Fix some value ¸ ¸ 0 and choose k in such a way that E .Uk / D .1 C ¸/d. Then we get
prob.Uk · d ¡ 1/ · exp
· exp
µ .¸d C 1/2 ¶
¡ 2.1 C ¸/d
µ ¸2
¡ 2.1 C ¸/
¶
d :
The value of k that entails E .Uk / D .1 C ¸/d satisfies
.1 C ¸/d
k D PriD¡01 1=.n ¡ i /
n
· .1 C ¸/d ;
r
and we obtain our result.
Theorem 4.8. Let L D .S; '/ be a nondegenerate LPtype problem with jSj D n and
dim.S; '/ D d. For r ¸ d and any ¸ ¸ 0,
³
prob Vr ¸ .1 C ¸/d
n ´
r
· exp
µ
We have derived this bound only for regular problems, but as we have shown before,
any problem can be regularized, and, by (5), the estimate then also holds for nonregular
problems. Because E .Vr / · d.n ¡ r /=.r C 1/ ¼ dn=r , this bound establishes estimates
for the tail “to the right” of the expectation. It might seem that the bound is rather weak, in
particular because it does not depend on n and r . However, it is essentially best possible,
as the following lower bound shows (the actual formulation has been chosen in order to
minimize computational effort).
Theorem 4.9. Let L D .S; '/ be a nondegenerate LPtype problem with jSj D n and
dim.S; '/ D d. For r ¸ d and any ¸ ¸ 0 such that .1 C ¸/d · r=2,
µ
prob Vr > .1 C ¸/d
n C 1 ¡ r
r
¶
µ
¡ d
¸ exp
¡.1 C ¸/d ¡
r
:
Proof.
With Uk as defined above and R D R.`/, relation (7) immediately entails
Vr .R/ · k ¡ d
,
mind .R/ · k
)
Uk .`/ ¸ d;
so that we get prob.Vr > k ¡ d/ ¸ prob.Uk · d ¡ 1/. Furthermore,
prob.Uk · d ¡ 1/ ¸ prob.Uk D 0/ D
With k D .1 C ¸/d.n C 1 ¡ r /=r , it follows that
Yr µ
iD1
¸ exp
An open question is whether the statement of Theorem 4.8 also holds in the degenerate
case. It is tempting to conjecture that prob.Vr ¸ k/ is maximized for nondegenerate
problems—this would yield Theorem 4.8 for the general case. Moreover, while the
bound is tight in the regular case, one might be able to improve it for a given nonregular
problem.
We conclude this section by proving a weaker tail estimate which applies to the
general case. Using this, we can show that the number of violators exceeds the expected
value by no more than a logarithmic factor, with high probability.
Theorem 4.10. Let L D .S; '/ be an LPtype problem with jSj D n and dim.S; '/ D
d. For r ¸ d and any ¸ ¸ 0,
³ ³
prob Vr ¸ ln
ne
d
´ n ´
C ¸ d
r
· exp .¡¸d/ :
Proof. Let Bk denote the set of regular bases with exactly k violators (recall that a
regular basis is a basis of some set R with jRj ¸ d). Any fixed B 2 Bk is a basis of all
the sets R satisfying B µ R µ SnV .B/. It follows that B is a basis of a random sample
R of size r with probability
¡n¡jBj¡k¢ ¡n¡k¢
r¡¡nj¢Bj · ¡nr¢ :
r r
We have jV .R/j D k if and only if R has some basis (equivalently, all its bases) in Bk ,
which gives
Consequently,
¡n¡k¢
prob.Vr D k/ · bk ¡nr¢ ;
r
bk D jBk j:
prob.Vr ¸ k/ ·
n¡r ¡n¡`¢
X b` ¡nr¢ ;
`Dk r
where we know that
Since (see [
24
])
and
because all bases have size at most d. Then we can further argue that
n¡r
X b` ·
`Dk
µ n ¶
· d
:D
Xd µn¶
iD0
i
we finally get, by substituting k D .ln.ne=d/ C ¸/ d.n=r /;
prob.Vr ¸ k/ ·
1 ¡
·
³ ne ´d µ
d
³ ne ´d
d
.ln.ne=d/ C ¸/ d ¶r
r
ne
d
³ ³
exp ¡ ln
´ ´
C ¸ d D exp .¡¸d/ :
Multiple Pricing and Clarkson’s Reduction Scheme
The simplex method [
5
] is usually the most efficient algorithm to solve linear
programming problems in practice. Even in the theoretical setting, all known algorithms to solve
general LPtype problems boil down to variants of the (dual) simplex method, when they
are applied to linear programming [
13
]. In this section we introduce and analyze an
algorithm in the general framework, which—although being new in its precise formulation—
follows a wellknown design paradigm, whose simplex counterpart is known as multiple
pricing [
5
]. The idea of multiple pricing is to reduce a large problem to a (hopefully)
small number of small problems. This can be useful in case the whole problem does
not fit into main memory, but it also helps in general to reduce the cost of a single
simplex iteration. Taking a slightly different approach, partial pricing [
5
] is a related
technique following the same paradigm. Applications have been found in the context of
very largescale linear programming [
3
], but also in geometric optimization [
14
], [
15
].
We do not elaborate on those simplex techniques here; the reader may verify that
the algorithm we are going to present is actually a variant of multiple pricing, when
translated into simplex terminology.
Consider an LPtype problem .S; '/ (not necessarily nondegenerate) of combinatorial
dimension d, and assume we are given an algorithm lp type.G; B/ to compute for any
subset G of S some basis BG of G, given a candidate basis B µ G. Of course, one can
directly solve the problem of finding BS by calling lp type with the large set S and
Algorithm 5.1.
some basis B µ S. As we will see, an efficient alternative is provided by the following
method, parameterized with a sample size r . We assume the initial basis B to be fixed
for the rest of this section.
lp type samplingr .S; B/:
(* returns some basis BS of S *)
choose R with jRj D r; R µ SnB at random
G :D R [ B
REPEAT
B :Dlp type.G; B/
G :D G [ V .B/
UNTILV .B/ D ;
RETURNB
lp type sampling reduces the problem to several calls of lp type, and Fact 3.3(i)
shows that if the procedure terminates, V .B/ D ; implies that B is a basis of S. Moreover,
it must eventually terminate, because every round adds at least one element to G. The
algorithm captures the spirit of Clarkson’s linear programming algorithm [
8
] (and its
generalizations [
1
], [
16
]), but is simpler and more practical. To guarantee its theoretical
complexity, Clarkson’s algorithm draws a random sample in every round, and it restarts a
round whenever jV .B/j turns out to be too large. Thus, Algorithm 5.1 can be interpreted as
the canonical simplification of Clarkson’s algorithm for practical use, where one observes
that resampling and restarting are not necessary (and even decrease the efficiency).
The general phenomenon behind this is that often the theoretically best algorithms
are not competitive in practice, while the algorithms one actually chooses in an
implementation cannot be analyzed. On the one hand this is due to the fact that the worstcase
complexity is an inappropriate measure in many practical situations; on the other hand,
sometimes algorithms used in practice are simply not understood, although they might
allow a worstcase analysis.
In the case of Algorithm 5.1 we have the fortunate situation that it combines efficiency
in practice with provable time bounds (developed below). With the procedure lp type
replaced by a call to a standard simplex implementation, the method has been successfully
used in a linear programming code for geometric optimization [
14
], [
15
], without any
further changes. In its original version, due to Clarkson, Algorithm 5.1 is a buildingblock
of an ingenious lineartime algorithm for linear programming in constant dimension d
[
8
], [
16
].
The theoretical analysis starts with a bound on the number of rounds.
Observation 5.2 [
8
]. Fix some basis BS of S. Then in every round except the last one,
V .B/ contains an element of BS. In particular, there are at most d C 1 rounds.
Proof. Assume that BS is disjoint from V .B/. From Fact 3.3 and monotonicity we then
get '.B/ D '.SnV .B// ¸ '.BS/ D '.S/, from which '.B/ D '.S/ follows. Locality
then implies V .B/ D V .S/ D ;, which means that we are already in the last round.
The critical parameter we are interested in is the size of G in the last round. If this is
small, then all calls to lp type.G; B/ are cheap.
We fix some notation for that. We define S0 :D Sn B, B being the initial candidate
basis plugged into lp type sampling. By
BR.i/; VR.i/; and G.Ri/
we denote the sets B, V . B/, and G computed in round i . Furthermore, we set G.R0/ D
R [ B, while B.0/ and V .0/ are undefined. This means we have
R R
BR.i/ is a basis of G.Ri¡1/;
VR.i/ D V .G.Ri¡1//:
If the algorithm performs exactly ` rounds, sets with indices i > ` are defined to be the
corresponding sets in round `.
We will need a generalization of Observation 5.2.
Lemma 5.3.
For j < i · `, BR.i/ \ VR. j/ 6D ;.
Proof. Assume on the contrary that BR.i/ \ VR. j/ D ;. As in the proof of Observation 5.2,
Fact 3.3 and monotonicity then imply
'.G.Rj¡1// D '.SnVR. j// ¸ '. BR.i// D '.G.Ri¡1//;
a contradiction to the fact that '.G/ strictly increases in every round but the last.
The following lemma is the crucial result. It interprets Algorithm 5.1 as an LPtype
problem itself! Under this interpretation, the set G in the last round is essentially the set
of violators of the initial sample R. Then the techniques of the previous sections (the
sampling lemma and the tail estimates) can be applied to bound the expected size of jGj,
and even get Chernofftype bounds for the distribution of jGj.
Lemma 5.4.
For R µ S0 :D Sn B define
Then the following holds:
'0. R/ D ³'.G.R0//; '.G.R1//; : : : ; '.G.Rd¡1//´ :
Before we go into the technical (although not difficult) proof, we derive the main
result of this section, namely the analysis of Algorithm 5.1. This analysis is now merely
a consequence of previous results.
Theorem 5.5. For R µ S0, a random sample of size r ,
Choosing r D dpn=2 yields
E .jG.Rd/j/ ·
E .jG.Rd/j/ · 2.d C 1/
r n
2
:
Proof. The first inequality directly follows from the sampling lemma, applied to the
LPtype problem .S0; '0/, together with part (ii) of the previous lemma. The second
inequality is routine.
The theorem shows that Algorithm lp type sampling reduces a problem of size n
to at most d problems of expected size no more than O.dpn/. This explains the practical
efficiency of multiple pricing and similar reduction schemes if d ¿ n.
If .S; '/ is nondegenerate, we get the following tail estimate, using part (iii) of
Lemma 5.4 and Theorem 4.8. Again, routine computations yield
Theorem 5.6. If .S; '/ is a nondegenerate LPtype problem, then for R µ S0, a random
sample of size r D dpn=2, and ¸ ¸ 0,
µ
prob jG.Rd/j ¸ .2 C ¸/.d C 1/
r n ¶
2
· exp
µ
Theorem 5.7. If .S; '/ is a general LPtype problem, then for R µ S0, a random
sample of size r D dp.n ln n/=2, and ¸ ¸ 0,
Ã
prob jG.Rd/j ¸ .3 C ¸/ .d C 1/
r n ln n !
2
· exp
µ
¡¸
µd C 1¶¶
2
:
We conclude this section with the proof of Lemma 5.4.
We start by establishing an auxiliary claim:
Claim. For any set Q with Q D R [¢ T µ S0 and i < d,
implies
'.G.Rj// D '.G.Qj//;
j · i;
To prove the claim, we proceed by induction on i , noting that the statements hold for
i D 0 by the locality of '. Now assume the implications are true for j · i ¡ 1. Then we
get
Because '.G.Ri// D '.G.Qi//, the locality of ' implies
G.Qi/ D G.Qi¡1/ [¢ V .i/
Q
D G.Ri¡1/ ¢
[ T [¢ VR.i/ D G.Ri/ [¢ T :
VQ.iC1/ D VR.iC1/;
By the claim above, G.QiC1/ D G.RiC1/ [ fsg, and the monotonicity of ' implies
.8/
This means, s 2 V .iC2/.
R
On the other hand, if s 62 V 0.R/, then the precondition of the claim holds for i D d ¡1,
implying
VR. jC1/ D VQ. jC1/ 63 s;
j · d ¡ 1:
get
This means s 62 V .1/; : : : ; VR.d/.
R
To prove (i), we need to verify the monotonicity and locality (see Definition 3.1).
Inequality (8) shows that '0.R/ · '0.R [ fsg/ in the lexicographic order, for all s 2
V 0.R/, and this implies monotonicity.
For locality, assume R µ Q with '0.R/ D '0.Q/. From the claim and part (ii), we
V 0.R/ D
d
[ V .i/
R D
d
[ V .i/
Q D V 0.Q/;
iD1
iD1
and this is the required property.
It remains to bound the combinatorial dimension of .S0; '0/. To this end we prove
that '0.BR/ D '0.R/, for
BR :D R \
We equivalently show that '.G.Rj// D '.G.BjR//, for j · d ¡ 1; using induction on j . For
j D 0, we get
hence
'.G.Rj// D '.G.BjR/ [ RnBR/ D '.G.BjR//;
.9/
because RnBR is disjoint from BR.1/, the basis of G.R0/. Hence, RnBR can be removed
from G.R0/ without changing the 'value.
Now assume the statement holds for j · d ¡ 2 and consider the case j D d ¡ 1.
By the claim, we get G.Rj/ D G.BjR/ [ RnBR, so, as before, (9) follows, because RnBR is
disjoint from the basis B. jC1/ of G.Rj/.
R
To bound the size of BR, we observe that
jR \ B.i/
R j · d C 1 ¡ i;
for all i · ` (the number of rounds in which V .B/ 6D ;). This follows from Lemma 5.3:
B.i/ has at least one element in each of the i ¡ 1 sets VR.1/; : : : ; VR.i¡1/, which are in turn
disjoint from R. Hence we get
jBRj ·
`
X jR \ B.i/
R j ·
µd C 1¶
2
Proof of part (iii). Nondegeneracy of .S0; '0/ follows if we can show that every set
R µ S0 has the set BR as its unique basis. To this end we prove that whenever we have
L µ R with '0.L/ D '0.R/, then BR µ L.
Fix L µ R with '0.L/ D '0.R/, i.e.
By the claim, this implies
'.G.Ri// D '.G.Li//;
i · d ¡ 1:
G.Ri/ D G.Li/ [¢ .RnL/;
i · d ¡ 1;
and the nondegeneracy of ' yields that G.Ri/ and G.Li/ have the same unique basis BR.iC1/,
for all i . It follows that G.Ld¡1/ contains
d
[ BR.i/;
iD1
L \
d
[ B.i/
R D R \
iD1
iD1
The latter equality holds because RnL is disjoint from G.Ld¡1/, thus in particular from
the union of the BR.i/.
The curious fact that—in the regular and nondegenerate case—the distribution of Vr
does not depend on the actual LPtype problem, deserves a word of warning: namely, this
property does not mean that all nondegenerate LPtype problems with given parameters
so L contains
6. Conclusion
n and d are equally difficult (or easy) to solve. On the contrary, because the random
variable Vr does not depend on the actual problem, it does not carry any information
about the difficulty of a particular problem. There are very easy problems (like d smallest
number), and very difficult ones (like linear programming). For example, Algorithm 5.1
never needs more than two rounds in the case of the d smallest number, and for other
easy LPtype problems characterized by the following property: for any sets B µ R
such that '. B/ D '. R/, and for any set T ,
'. B [ T / D '. R [ T /
holds. This means elements in Rn B can be “forgotten,” as they will not contribute to the
final solution. The absence of this property is what makes linear programming and other
problems difficult.
In general, it seems that the combinatorial dimension of the LPtype problem .S0; '0/
derived from .S; '/ according to the definition in Lemma 5.4 is a more meaningful
indicator of .S; '/’s difficulty than ±.S; '/ itself. For example, in the case of the d
smallest number, we get ±.S0; '0/ D d , much less than the O .d2/ upper bound. This
alternative notion of dimension needs to be further investigated.
An open problem that remains is to improve the tail estimates in the case of degenerate
LPtype problems. Here, the distribution of Vr typically depends on the concrete instance,
and so does bk , the number of bases with k violators. Using only trivial bounds for the
numbers bk , we have obtained the weaker estimate given by Theorem 4.10, indicating
that this estimate might not be the final answer.
Acknowledgment
We thank the referee for carefully pointing out simplifications and suggesting
improvements in the presentation. In particular, we are grateful for the question concerning the
sharpness of our main Chernofftype bound.
[1] I. Adler and R. Shamir . A randomized scheme for speeding up algorithms for linear and convex programming with high constraintstovariable ratio . Math. Programming , 61 : 39  52 , 1993 .
[2] N. Amenta . Hellytype theorems and generalized linear programming . Discrete Comput. Geom. , 12 : 241  261 , 1994 .
[3] R. E. Bixby , J. W. Gregory , I. J. Lustig , R. E. Marsten , and D. F. Shanno . Very largescale linear programming: a case study in combining interior point and simplex methods . Oper. Res. , 40 ( 5 ): 885  897 , 1992 .
[4] T. Chan . Backwards analysis of the KargerKleinTarjan algorithm for minimum spanning trees . Inform. Process. Lett. , 67 : 303  304 , 1998 .
[5] V. Chva ´tal. Linear Programming . Freeman, New York, 1983 .
[6] K. L. Clarkson . New applications of random sampling in computational geometry . Discrete Comput. Geom. , 2 : 195  222 , 1987 .
[7] K. L. Clarkson . A bound on local minima of arrangements that implies the upper bound theorem . Discrete Comput. Geom. , 10 : 427  233 , 1993 .
[8] K. L. Clarkson . Las Vegas algorithms for linear and integer programming . J. Assoc. Comput. Mach. , 42 : 488  499 , 1995 .
[9] K. L. Clarkson and P. W. Shor . Applications of random sampling in computational geometry, II. Discrete Comput . Geom. , 4 : 387  421 , 1989 .
[10] T. H. Cormen , C. E. Leiserson , and R. L. Rivest . Introduction to Algorithms. The MIT Press, Cambridge, MA., 1990 .
[11] M. de Berg , M. van Kreveld , M. Overmars , and O. Schwarzkopf . Computational Geometry: Algorithms and Applications . SpringerVerlag, Berlin, 1997 .
[12] D. Dubhashi and D. Ranjan . Great(er) expectations . BRICS Newsletter , 5 : 11  13 , 1996 .
[13] B. Ga ¨rtner. Randomized Optimization by SimplexType Methods . Ph.D. thesis , Freie Universita¨t, Berlin, 1995 .
[14] B. Ga ¨rtner. Exact arithmetic at low costa case study in linear programming . Comput. Geom. Theory Appl. , 13 : 121  139 , 1999 .
[15] B. Ga ¨rtner and S. Scho¨nherr. An efficient, exact and generic quadratic programming solver for geometric optimization . In Proc. 16th ACM Symp. Comput. Geom. , pages 110  118 , 2000 .
[16] B. Ga ¨rtner and E. Welzl . Linear programmingrandomization and abstract frameworks . In Proc. 13th Symp. Theoret. Aspects Comput. Sci., volume 1046 of Lecture Notes in Computer Science, pages 669  687 . SpringerVerlag, Berlin, 1996 .
[17] R. L. Graham , D. E. Knuth , and O. Patashnik . Concrete Mathematics. AddisonWesley , Reading, MA, 1989 .
[18] L. J. Guibas , D. E. Knuth , and M. Sharir . Randomized incremental construction of Delaunay and Voronoi diagrams . Algorithmica , 7 : 381  413 , 1992 .
[19] T. Hagerup and C. Ru¨b. A guided tour of Chernoff bounds . Inform. Process. Lett., 33 : 305  308 , 1990 .
[20] S. HarPeled . On the Expected Complexity of Random Convex Hulls . Technical Report 330 , School of Mathematical Sciences, TelAviv University, 1998 .
[21] D. Karger , P. N. Klein , and R. E. Tarjan . A randomized lineartime algorithm to find minimum spanning trees . J. Assoc. Comput. Mach. , 42 : 321  328 , 1995 .
[22] J. Matousˇek . On geometric optimization with few violated constraints . Discrete Comput. Geom. , 14 : 365  384 , 1995 .
[23] J. Matousˇek , M. Sharir, and E. Welzl . A subexponential bound for linear programming . Algorithmica , 16 : 498  516 , 1996 .
[24] J. Matousˇek and J. Nesˇetˇril . Invitation to Discrete Mathematics. Oxford University Press, Oxford, 1998 .
[25] R. Motwani and P. Raghavan. Randomized Algorithms . Cambridge University Press, New York, 1995 .
[26] K. Mulmuley . A fast planar partition algorithm, I. J . Symbolic Comput., 10 ( 34 ): 253  280 , 1990 .
[27] K. Mulmuley. Computational Geometry : An Introduction Through Randomized Algorithms . PrenticeHall, Englewood Cliffs, NJ, 1994 .
[28] A. Re ´nyi and R. Sulanke . U¨ ber die konvexe Hu¨lle von n zufa¨llig gewa¨hlten Punkten. Z. Wahrsch ., 2 : 75  84 , 1963 .
[29] R. Seidel . Smalldimensional linear programming and convex hulls made easy . Discrete Comput. Geom. , 6 : 423  434 , 1991 .
[30] R. Seidel . Backwards analysis of randomized geometric algorithms . In J. Pach, editor, New Trends in Discrete and Computational Geometry , volume 10 of Algorithms and Combinatorics, pages 37  68 . SpringerVerlag, New York, 1993 .
[31] R. Seidel . Personal communication, 1996 .
[32] M. Sharir and E. Welzl . A combinatorial bound for linear programming and related problems . In Proc. 9th Symp. Theoret. Aspects Comput. Sci., volume 577 of Lecture Notes in Computer Science, pages 569  579 . SpringerVerlag, Berlin, 1992 .
[33] E. Welzl . Smallest enclosing disks (balls and ellipsoids) . In H. Maurer, editor, New Results and New Trends in Computer Science , volume 555 of Lecture Notes in Computer Science, pages 359  370 . SpringerVerlag, Berlin, 1991 .