#### On the Least Median Square Problem

Discrete Comput Geom
Geometry Discrete & Computational
On the Least Median Square Problem
Jeff Erickson 1
Sariel Har-Peled 1
David M. Mount 0
0 Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland , College Park, MD 20742 , USA
1 Department of Computer Science, University of Illinois , Urbana, IL 61801 , USA
We consider the exact and approximate computational complexity of the multivariate least median-of-squares (LMS) linear regression estimator. The LMS estimator is among the most widely used robust linear statistical estimators. Given a set of n points in Rd and a parameter k, the problem is equivalent to computing the narrowest slab bounded by two parallel hyperplanes that contains k of the points. We present algorithms for the exact and approximate versions of the multivariate LMS problem. We also provide nearly matching lower bounds for these problems. These lower bounds hold under the assumptions that k is (n) and that deciding whether n given points in Rd are affinely non-degenerate requires (nd ) time. ∗ A preliminary version of this paper appeared in Proc. 20th Annu. ACM Sympos. Comput. Geom., pages 273-279, 2004. See http://www.uiuc.edu/∼jeffe/pubs/halfslab.html for the most recent version of this paper. JE was partially supported by NSF CAREER award CCR-0093348 and NSF ITR Grants DMR-0121695 and CCR-0219594, SH-P was partially supported by NSF CAREER award CCR-0132901, and DMM was partially supported by NSF Grant CCR-0098151.
1. Introduction
Fitting a hyperplane to a finite collection of points in space is a fundamental problem
in statistical estimation. Robust estimators are of particular interest because of their
insensitivity to outlying data. The principal measure of the robustness of an estimator
is its breakdown value, that is, the fraction (up to 50%) of outlying data points that
can corrupt the estimator. Rousseeuw’s least median-of-squares (LMS) linear regression
estimator [R1] is among the best known and most widely used robust estimators.
The LMS estimator (with intercept) is defined formally as follows. Consider a set
P = { p1, p2, . . . , pn} of n points in Rd , where pi = (xi,1, xi,2, . . . , xi,d−1, xi,d ). We
would like to compute a parameter vector θ = (θ1, θ2, . . . , θd ) that best fits the data by
the linear model
xi,d = xi,1θ1 + xi,2θ2 + · · · + xi,d−1θd−1 + θd + ri
for all i , where r1, r2, . . . , rn are the (unknown) errors, or residuals. The LMS estimator is
defined to be the parameter vector θ that minimizes the median of the squared residuals.
More generally, given a parameter k, where d + 1 ≤ k ≤ n, the problem is to find a
parameter vector θ that minimizes the kth smallest squared residual. This more general
problem is also called the least-quantile squared (LQS) estimator [RL]. Typically, we
are interested in the case where both k = (n) and n − k = (n).
This estimator is widely used in practice, for example in finance, chemistry, electrical
engineering, process control, and computer vision [R2]. In addition to having a high
breakdown value, the LMS estimator is regression-, scale-, and affine-equivariant, which
means that the estimate transforms properly under these types of transformations [RL].
The LMS estimator may be used either alone or as an initial step in more complex
estimation schemes [Y2].
The LMS estimator has an easy geometric interpretation. A slab is the volume enclosed
by two parallel hyperplanes; the height of a slab is the vertical separation between its
bounding hyperplanes. Computing the LMS estimator is clearly equivalent to computing
a slab of minimum height that encloses at least k of the points. The LMS estimator is
obtained as the hyperplane that bisects this slab. In the dual setting the problem is
equivalent to finding the shortest vertical segment in Rd that intersects at least k of a set
of n given hyperplanes. We also consider the closely related problem of computing the
slab of minimum width enclosing k points, where the width of a slab is measured normal
to its bounding hyperplanes. This problem is also called LMS orthogonal regression [RL].
The most efficient exact algorithm for computing the LMS estimator in the plane is
a topological plane-sweep algorithm of Edelsbrunner and Souvaine [ES1], which runs
in O(n2) time and requires O(n) space. In higher dimensions an O(nd+1 log n)-time
algorithm has long been known [MMRK], [RL].
The LMS problem can be expressed as a linear programming (LP) problem with
violations. To see this, observe that the decision problem of determining whether there
exists a slab of a given vertical width that encloses all n points can be reduced to
determining the feasibility of a d-dimensional LP involving 2n linear constraints. If the
vertical width of the slab is added as a new variable, the problem of finding the slab of
minimum vertical width is thus reduced to solving a feasible LP in dimension d + 1. By
allowing n − k of these constraints to be violated, we obtain the LMS problem. From
known results on solving LPs with violations [M2], it follows that LMS can be solved in
O(n(n −k)d+1) time for general dimension d. In dimensions 2 and 3 we can apply Chan’s
results [C2] to obtain LMS algorithms running in time O(n log n + (n − k)2 log2 n) in the
plane and in O(n log n + (n − k)11/4n1/4 polylog n) in 3-space. For the most interesting
case, where n − k = (n), these results imply running times of O(n2 log2 n) in the
plane, O(n3 polylog n) in 3-space, and O(nd+2) in dimension d.
An alternative approach that can also be deployed in this case is an algorithm by
Har-Peled and Wang for approximate shape fitting with outliers [HW], which extracts
a small coreset and finds the best possible solution within this coreset. However, this
algorithm excels only when the number of outliers is small; specifically, the running time
is near-linear only if k ≥ n − o(n1/2d ). The assumption that n − k = (n) results in the
worst case for these algorithms. It is natural to ask whether either of these approaches
can be extended to handle the case where the number of outliers is large.
Finally, the problem of fitting the data with two slabs of minimum width seems to
be inherently connected to the problem of LMS; intuitively, we want to find a good
clustering for most of the points by the first slab, and also a good clustering for the
remaining (outlier) points. Thus, a better understanding of the LMS problem might lead
to a better understanding of the two-slab problem, which seems to be surprisingly hard
[H]. Currently no near-linear-time approximation algorithm is known for d > 3.
Given the high complexity of computing the LMS estimator exactly, it is natural to
consider whether more efficient approximation algorithms exist. Mount et al. [MNR+]
presented a practical approximation algorithm for the LMS line estimator in the plane,
based on approximating the quantile and/or the vertical width of the slab. Their algorithm,
however, does not guarantee a better than O(n2) running time when the quantile is
required to be exact. Olson presented a 2-approximation algorithms for LMS, which
runs in O(n log2 n) time in the plane and in O(nd−1 log n) time for any fixed d ≥ 3 [O].
In this paper we consider the computational complexity of the both the exact and
approximate versions of the LMS problem. In Section 3 we describe a randomized
algorithm to compute the exact LMS estimator for n points in Rd in O(nd log n) time
with high probability. We also describe, in Section 4, a randomized ε-approximation
algorithm whose running time is O((nd / kε) polylog n) with high probability. For the
most interesting case k = (n), this is faster than our exact algorithm by roughly a
factor of n/ε.
In Section 5 we provide results suggesting that any exact algorithm for the LMS
problem requires (nd ) time, and any constant-factor approximation algorithm requires
(nd−1) time, thus providing a strong indication that our algorithms are close to optimal.
Specifically, we describe linear-time reductions from the affine degeneracy problem:
Given a set of n points on the d-dimensional integer1 lattice Zd , do any d + 1 of
the points lie on a common hyperplane? These reductions imply lower bounds if the
following widely believed conjecture is true.
Conjecture 1.1. Solving the d-dimensional affine degeneracy problem requires (nd )
time in the worst case.
If this conjecture is true, the (nd ) lower bound is tight; the problem can be solved
in O(nd ) time by constructing the dual hyperplane arrangement. Erickson and Seidel
[ES2], [E3] proved an (nd ) lower bound on the number of sidedness queries required
to solve this problem; however, the model of computation in which their lower bound
holds is not strong enough to solve the LMS problem, since it does not allow us to
compare widths of different slabs. The strongest lower bound known in any general
model of computation is (n log n), for any fixed dimension, in the algebraic decision
1 In the algebraic decision tree model of computation, the restriction to integers can be removed by using
formal infinitesimals in our reductions [E1], [E2].
and computation tree models [B], [Y1], although the problem is known to be NP-complete
when d is not fixed [K], [E3]. The planar affine degeneracy problem is one of Gajentaan
and Overmars’s canonical 3SUM-hard problems [GO]. (In higher dimensions the affine
degeneracy problem is actually (d + 1)-SUM-hard [E3]; however, the best lower bound
that this could imply is only (n d/2 +1) [E2].)
2.
Notation and Terminology
Whenever we work in the space Rd , we will refer to the xd -coordinate direction as vertical
and each of the other coordinate directions as horizontal. A hyperplane is vertical if it
contains a vertical line and horizontal if its normal direction is vertical.
A slab is the non-empty intersection of two closed halfspaces whose bounding
hyperplanes are parallel. We will distinguish between two different natural notions of the
“thickness” of a slab. The height of a slab σ , denoted ht(σ ), is the length of a vertical line
segment with one endpoint on each bounding hyperplane. If one slab has smaller height
than another, we say that the first slab is shorter and the second is taller. We denote by
ht( P, k) the height of the shortest slab containing at least k points from a point set P.
On the other hand, the width of a slab σ , denoted wd(σ ), is the distance between the two
bounding hyperplanes, measured along their common normal direction. If one slab has
smaller width than another, we say that the first slab is narrower and the second is wider.
We denote by wd( P, k) the width of the narrowest slab containing at least k points in
a point set P. A slab whose height or width is zero is just a hyperplane. Vertical slabs,
even those with zero width, have infinite height.
The LMS problem has a natural dual formulation, which is crucial in the
development of our algorithms. We use the standard duality transformation that maps any point
(a1, a2, . . . , ad ) to the hyperplane xd = a1x1 + a2x2 + · · · + ad−1xd−1 − ad and vice
versa. Under this duality map, a point p is above (resp. below) a hyperplane h if and only
if the dual hyperplane p∗ is below (resp. above) the dual point h∗; moreover, the vertical
distance between the point and the hyperplane is preserved. The dual of a slab σ is a
vertical line segment σ ∗ of length ht(σ ); a point p lies inside the slab σ if and only if the
dual hyperplane p∗ intersects the dual segment σ ∗. Thus, the LMS problem is equivalent
to finding, given a set H of hyperplanes, the shortest vertical segment that intersects at
least k of them. To be consistent with the primal formulation, we let ht(H, k) denote the
length of this segment.
For any hyperplane h and any real value t , let h + t denote the hyperplane resulting
from translating h upward by (vertical) distance t , and let σ (h, t ) denote the slab bounded
by h and h + t .
3. Exact Algorithms
3.1.
Minimum Height
In this section we develop efficient exact algorithms for computing the LMS estimator
in any fixed dimension. The two-dimensional problem can be solved in O(n2) time
using Edelsbrunner and Souvaine’s dual topological sweep algorithm [ES1]. Our
higherdimensional algorithm improves the previous best running time by a factor of O(n). It
is based on a “sample and sweep” approach, which has been used earlier in randomized
algorithms for two-dimensional algorithms for robust regression [DMN], [M1]. The first
step computes a random sample of critical vertical length values and then applies binary
search with a decision procedure to identify a small interval containing the optimum
length, and then it applies a sweep through this interval to determine the optimum value.
Theorem 3.1. Given a set H of n hyperplanes in Rd and an integer k, we can compute
the shortest vertical segment that stabs k hyperplanes of H in O(nd log n) time with high
probability.
Proof. By simple local minimality conditions, the endpoints of the required segment
together lie on a total of at least d + 1 hyperplanes of H . Let T be the (multi)set of
critical values t such that the arrangement of slabs (t ) = {σ (h, t ) | h ∈ H } contains
a vertex on d + 1 bounding hyperplanes. For notational convenience, let t ∗ = ht(H, k).
We easily observe that |T | = O(nd+1) and that t ∗ ∈ T .
A vertical line segment of length t stabs k hyperplanes in H if and only if some point
lies in at least k slabs in (t ), or, equivalently, if and only if some vertex has depth
at least k in the arrangement of (t ). Thus, given a candidate value t , we can decide
whether t < t ∗ in time O(nd ) by constructing the arrangement of (t ) and computing
the depth of every vertex.
We transform this decision procedure into a search algorithm as follows. We randomly
choose a sample R of O(nd ) vertical distances from T , by repeatedly choosing d + 1
hyperplanes of H and computing the shortest vertical segment stabbing all of them. We
then use the decision procedure to guide a binary search through the random sample R.
In O(nd log n) time, this binary search finds an interval [t −, t +] that contains t ∗ but no
other elements of R. With high probability, the interval [t −, t +] contains only O(n log n)
values from T .
To locate t ∗ within this interval, we maintain the arrangement of slabs (t ) in a kinetic
data structure [BGH] and perform a sweep by letting t vary continuously from t = t −
to t = t +. Throughout the sweep process, we maintain the depth of each vertex in the
arrangement. The combinatorics of the arrangement changes only at times in T . At each
such critical event, a simplicial cell in the arrangement of (t ) collapses to a point and
reverses orientation. We maintain a priority queue of collapse times for the simplicial
cells in (t ), ignoring any cell that collapses after time t +. We prepare the initial priority
queue by constructing the arrangement of (t −), computing the collapse time of each
simplicial cell (in O(
1
) time), and inserting only the relevant future events into the queue.
Each critical event requires a constant number of priority queue operations, plus O(
1
)
additional work. Thus, the total cost of the sweep is O(nd + E log E ), where E is the
number of critical events. With high probability, E = O(n log n), so the sweep requires
only O(nd ) time. At some time in the sweep we must encounter a vertex of depth exactly
k, and the hyperplanes incident to this vertex define the desired slab.
Corollary 3.2. Given a set P of n points in Rd and an integer k, we can compute the
shortest slab containing k points of P in O(nd log n) time with high probability.
Chan [C3] observes that his randomized optimization technique [C1] can be used
in place of sampling and sweeping, using a constant-complexity cutting to generate
subproblems, as in his other results on linear programming with violations [C2]. Chan’s
techniques reduce the expected running time of our algorithm to O (nd ) but do not
improve our high-probability time bound.
3.2.
Minimum Width
We can use similar techniques to solve the orthogonal LMS regression problem. A simple
modification of Edelsbrunner and Souvaine’s topological sweep algorithm [ES1] solves
the two-dimensional version of this problem in O (n2) time and O (n) space.
Theorem 3.3. Given a set P of n points in Rd and an integer k, we can compute the
narrowest slab containing k points of P in O (nd log n) time with high probability.
Proof. In light of the results of [ES1] we may assume that d is at least 3. We easily
observe that the target slab has at least one point (in fact, d + 1 points) from P on its
boundary. We describe an algorithm to find the narrowest slab that contains k points in
P and has a specific point q ∈ P on its boundary. To find the narrowest unrestricted slab
containing k points in P , we simply run this algorithm once for every reference point
q ∈ P and return the narrowest result.
We first solve the following decision problem: Is there a slab of width w, whose
boundary contains q , that contains k points in P ? Any slab with q on its boundary
is uniquely determined by the normal vector pointing into the slab from the bounding
hyperplane that contains q . Thus, we have a “duality” transformation from slabs with
q on their boundary and points on the sphere of directions Sd−1 centered at q . For any
point p ∈ P , the family of slabs of width w that contain p defines a strip s( p, w) on
Sd−1; this strip is the intersection of Sd−1 and a slab with q on its boundary. Thus, our
decision problem can be rephrased as follows: Is any point in Sd−1 covered by k of the
strips in {s( p, w) | p ∈ P }?
This decision problem can be solved in O (nd−1) time by reducing it to searching a
number of two-dimensional arrangements. Let A(w) denote the above arrangement of
strips on Sd−1. We may assume that the points of P are in general position and that the
slabs defining the strips are topologically closed. It suffices to determine the existence
of a vertex of A(w) that is covered by at least k strips. Such a vertex is the intersection
of d − 1 hyperplanes forming the slab boundaries together with Sd−1. To determine the
existence of such a vertex, we enumerate all subsets of slab boundaries of cardinality
d − 3, where each slab may contribute either one of its two bounding hyperplanes to
each subset, but not both. There are O (nd−3) such subsets. The intersection of any such
subset is a 3-flat in Rd , and so the intersection of A(w) with this flat is either empty
or is an arrangement of at most n − (d − 3) strips on a 2-sphere. The strip boundaries
are circles on this 2-sphere, which implies that each pair intersects at most twice. It
follows that the size of the resulting arrangement on the 2-sphere is O (n2), and so the
standard randomized incremental algorithm can be adapted to compute the arrangement
in time O(n2) time with high probability [M3]. In the same time bound we can determine
whether there is any vertex of this arrangement that is covered by k − (d − 1) strips.
Repeating this for all the O(nd−3) subsets provides the desired result.
By applying the same random sampling and sweeping techniques used to prove
Theorem 3.1, we can now transform this decision procedure into a search algorithm,
increasing the running time by a factor of O(log n). We omit the remaining straightforward
details.
Again, Chan’s randomized optimization technique [C1], [C2] can be used to reduce
the expected running time of our algorithm to O(nd ).
4. Approximation Algorithms
Our algorithm for approximating the LMS estimator rely on the same techniques used
in our exact algorithms. Specifically, we exploit the dual formulation of the problem, we
use random sampling and sweeping to reduce the search to a decision problem, and we
begin by considering only slabs with a given reference point (or dual hyperplane).
Lemma 4.1. Give a set H of n hyperplanes in Rd , a hyperplane g in Rd (not necessarily
in H ), and an integer k, we can compute the shortest vertical segment that stabs k
hyperplanes in H and whose midpoint lies on g, in O(nd−1 log n + n log2 n) time with
high probability.
Proof. As usual, we reduce the search problem to a decision problem: Given a parameter
≥ 0, is there a vertical segment of length , whose midpoint lies on g, that stabs k
hyperplanes of H ?
For every hyperplane h ∈ H , let τ (h, ) be the set of points on g that lie at vertical
distance at most /2 from h, or equivalently, the intersection of g with the slab of height
centered at h. Generically, τ (h, ) is a (d − 1)-dimensional slab in g; however, if h is
parallel to g, then τ (h, ) is either the empty set or all of g. Finally, let T ( ) = {τ (h, ) |
h ∈ H }. Our decision problem can now be rephrased as follows: Is any point in g covered
by k or more regions in T ( )? This problem can be decided in O(nd−1 + n log n) time by
simply constructing the arrangement of T ( ) and computing the depth of every vertex.
To complete the proof of the lemma, we apply the usual random sampling and
sweeping techniques to our decision procedure.
Note that the results of this lemma apply if we require instead that the upper endpoint
(rather than the midpoint) of the vertical segment lies on g. If d > 2, it follows that
by applying the lemma for every hyperplane in g ∈ H we have an alternate proof of
Theorem 3.1. Once again, we can save a factor of O(log n) in the expected running time
using Chan’s randomized optimization technique [C1], [C2].
Using this lemma we have the following result. Note that, unlike the results of the
previous section, the randomized algorithm of our next result is a Monte Carlo algorithm.
Theorem 4.2. Given a set H of n hyperplanes in Rd , an integer k, and a real parameter
ε > 0, we can compute a vertical segment of length at most (1 + ε) ht( H, k) that stabs
k hyperplanes of H in O ((nd / kε) polylog n) time with high probability.
Proof. Let s be the shortest vertical segment that stabs k hyperplanes in H . Let R be
a random sample of O ((n/ k) log n) hyperplanes in H . By ε-net theory [C4] (or by a
direct probabilistic argument), s stabs at least one of the hyperplanes of R with high
probability. For every hyperplane in R, apply the algorithm of Lemma 4.1 (with respect
to the entire set of hyperplanes H ), and let s˜ be the shortest segment thus computed.
Clearly, the length of s˜ is at most twice the length of s.
Now let δ = ε|s˜|/2, and consider the set of hyperplanes
R = {h + i δ | h ∈ R, i = −
1/ε , . . . , 1/ε } .
(Recall that h + t denotes the hyperplane resulting by translating h vertically by
distance t .) With high probability, at least one of the hyperplanes of R intersects s and is
within distance δ/2 = ε|s˜|/4 ≤ ε|s|/2 = ε ht( H, k)/2 from the midpoint of s. Thus,
the shortest vertical segment computed by applying the algorithm of Lemma 4.1 to each
hyperplane of R is the required (1 + ε)-approximation.
The algorithm of Lemma 4.1 is invoked O ((n/ kε) log n) times, so the overall running
time is O ((n2/ kε) log3 n) if d = 2 and O ((nd / kε) log2 n) otherwise.
The algorithm of Theorem 4.2 clearly also solves the dual problem of finding an
approximately shortest slab containing k points.
Corollary 4.3. Given a set P of n points in Rd , an integer k, and a real parameter
ε > 0, we can compute a slab of height at most (1 + ε) ht( P, k) that contains k points
of P in O ((nd / kε) polylog n) time with high probability.
In the special case k = (n), we can improve the running time of our algorithm by a
factor of O (log n) using Chazelle’s deterministic algorithm for constructing ε-nets [C4].
Finding an approximately narrowest slab is only slightly different. Using a similar
algorithm to the one described above, together with the techniques of Theorem 3.3, we
obtain the following result. We omit further details.
Theorem 4.4. Given a set P of n points in Rd , an integer k, and a real parameter
ε > 0, we can compute a slab of width at most (1 + ε) wd( P, k) that contains k points
of P in O ((nd / kε) polylog n) time with high probability.
5.
Hardness Results
In this section we prove relative lower bounds for both the exact and approximate LMS
hyperplane-fitting problems. Our reductions suggest that computing the exact LMS
estimator requires (nd ) time and that computing any constant-factor approximation
requires (nd−1) time, in the worst case. Thus, it is unlikely that our LMS algorithms can
be sped up by more than polylogarithmic factors, at least when k and n − k are both
(n). Our reductions also directly imply (n log n) lower bounds for both the exact
and approximate LMS problems in any fixed dimension, in the algebraic decision tree
model; they also imply that when the dimension is not fixed, even the approximate LMS
problem is NP-hard.
Our reductions rely on the following observations. We say that a slab is minimal for
a set of points if its boundary contains at least d + 1 affinely independent points from
the set. For any set P of more than d points, the shortest and narrowest slabs containing
P are both minimal.
Lemma 5.1. Let P be a set of d + 1 affinely independent points on the integer grid
[−M .. M ]d , and let σ be any minimal slab containing P.
(a) ht(σ ) can be written as a ratio of integers a/b, where a and b are both O(M d ).
(b) wd(σ )2 can be written as a ratio of integers a /b , where a and b are both
O(M 2d ).
(c) Either ht(σ ) = 0 or ht(σ ) = (1/M d ).
(d) Either wd(σ ) = 0 or wd(σ ) = (1/M d ).
(e) σ has an integer normal vector n(σ ) whose coefficients have absolute value
O(M d ).
(f) ht(σ ) ≤ wd(σ ) · O(M d ).
(All these bounds hide constant factors exponential in d.)
Proof. Without loss of generality, we assume the slab σ is not vertical. We refer to
the d-dimensional Lebesgue measure as volume and the (d − 1)-dimensional Lebesgue
measure as area.
(a) Let denote the convex hull of P, and let V denote the volume of . We can
express V as the ratio A · ht(σ )/d, where A is the sum of the signed areas of the vertical
projections of certain facets of onto Rd−1. Specifically, a facet contributes its area to
A if it touches the lower bounding hyperplane of σ , positively if is locally above that
facet and negatively otherwise. Clearly, V = O(M d ), and the projected area of each
facet is at most O(M d−1). Moreover, since every coordinate is an integer, V is an integer
multiple of 1/d!, and the projected area of each facet is an integer multiple of 1/(d − 1)!.
We conclude that
where the integers d! V and (d − 1)! A are both O(M d ).
(b) Without loss of generality, suppose the origin in one of the points in P. We can
express σ by the inequality 0 ≤ u · x ≤ 1 for some vector n normal to σ . For every
point p ∈ P we have either u · p = 0 or u · p = 1. We can compute n by solving a
system of d linear equations (one for each point in P except the origin) in d variables.
By Cramer’s rule, we have u = (a1/b, a2/b, . . . , ad /b), where the numerators ai and
the common denominator b are each determinants of d × d integer matrices. Since each
element of these matrices has absolute value O(M ), we have ai = O(M d ) for each i
and b = O(M d ). Finally, we observe that wd(σ ) = 1/ u .
(c) This follows immediately from part (a).
(d) This follows immediately from part (b).
(e) Take n(σ ) = b u = (a1, a2, . . . , ad ), where u is the normal vector constructed in
part (b).
(f) This follows from part (e) and the identity ht(σ ) = wd(σ ) n /|nd |, where n is
any vector normal to σ and nd is its vertical component.
5.1.
Exact Height
We now establish our (relative) lower bounds for computing minimum-height slabs
exactly.
Theorem 5.2. Conjecture 1.1 implies that computing the shortest slab containing d +1
points from a given set of n points in Zd requires (nd ) time in the worst case.
Proof. The given points are affinely degenerate if and only if the shortest slab containing
d + 1 points has height zero.
Theorem 5.3. Conjecture 1.1 implies that computing the shortest slab containing n/2
points from a given set of n points in Zd requires (nd ) time in the worst case.
Proof. Suppose we are given a set P of m = n/2 − d − 1 points in Zd . Let d denote
the difference between the largest and smallest xd -coordinates of any point in P . In
O (n) time we can construct a new set Q of 2m + 2(d + 1) = n points by taking the
union of P , a copy of P shifted upward by 2 d , and a set of n − 2m = 2(d + 1) extra
points at least 5 d above everything else. The original set P contains d + 1 points on
a common hyperplane if and only if the shortest slab that contains n/2 points in Q has
height exactly 2 d .
This reduction can be generalized easily to either larger or smaller numbers of points
in the target slab, as follows:
Theorem 5.4. Conjecture 1.1 implies that computing the shortest slab containing k
points from a given set of n points in Zd requires (min{k, n − k}d ) time in the worst
case.
Proof. If k ≤ (n/2) + d + 1, we start with a set P of m = k − (d + 1) points. We
construct a new set Q containing two copies of P , one directly above the other, with
n − 2m extra points far above both copies.
Similarly, if k > (n/2) + d + 1, we start with a set P of m = n − k + d + 1 points.
We construct a new set Q that contains two copies of P , one directly above the other,
with n − 2m extra points directly between the two copies.
In both cases the shortest slab containing k points of Q has height equal to the vertical
distance between the two copies of P if and only if P is affinely degenerate. In the first
case the extra points lie above the shortest slab; in the second case the extra points are
inside the shortest slab. Thus, Conjecture 1.1 implies that computing the shortest slab
containing k points in Q requires (md ) time.
Theorem 5.5. Conjecture 1.1 implies that computing the shortest slab containing k
points from a given set of n points in Zd requires ((n/ k)d ) time in the worst case.
Proof. Suppose we are given a set P of m = n(d + 1)/ k points in Zd . In linear
time we compute an upper bound M on the absolute value of every coordinate. Let
δ = α(d + 1)/ k M d for some sufficiently small constant α > 0, possibly depending on
d. We construct a new set P consisting of k/(d + 1) copies of P, where the i th copy is
shifted upward a distance of i δ.
If some hyperplane h contains d + 1 points in P, then the slab σ (h, (k/(d + 1) − 1)δ)
contains k points in P , and thus ht( P , k) ≤ (k/(d + 1) − 1)δ ≤ α/M d .
On the other hand, suppose P is affinely non-degenerate. Let σ be any slab containing
k points in P . This slab contains copies of at least d + 1 points in P, so its height is at
least ht( P, d + 1) − (k/(d + 1) − 1)δ = (1/M d ) by Lemma 5.1(c). By choosing α
small enough, we guarantee that ht( P , k) > α/M d .
These reductions suggest (nd ) lower bounds for exact LMS for the most interesting
case, when both k and n − k are (n), as well as the simplest non-trivial case k = O(
1
).
We conjecture that the true complexity is ((n − k)d ) for any k, but there is still a gap
between the upper and lower bounds for most small values of k.
Since the affine degeneracy problem is NP-hard when the dimension d is not fixed
[K], [E3], our reductions imply a similar NP-hardness result for the LMS estimator.
Corollary 5.6. For any k ≤ n − (nc) for some constant c, computing the shortest
slab containing k points from a given set of n points in Zn is NP-hard.
5.2. Approximate Height
Theorem 5.7. Conjecture 1.1 implies that for any any function f (n, k) ≥ 1 computable
in constant time, computing a slab of height at most f (n, k) · ht( P, k) containing k points
from a given set P of n points in Zd requires ((n − k)d−1) time in the worst case.
Proof. Suppose we are given a set P of m = n/2 − k/2 − d points on the integer
lattice Zd−1. Let M be an upper bound of the maximum absolute value of any coordinates
in P, and let δ = α/M d−1 for a sufficiently small constant α > 0, possibly depending
on d. We can compute these values in O(m) time.
In O(n) time we construct a new set Q comprised of three subsets: (
1
) a copy of P
on the vertical hyperplane x1 = 1, (
2
) a set of k − 2d points placed on a vertical line
passing through the origin and within distance δ/ f (n, k) of the origin, and (
3
) a copy
of − P (the reflection of P through the origin) on the hyperplane x1 = −1. For any
non-vertical slab σ , let σx denote the intersection of σ with the hyperplane x1 = x ; this
is a (d − 1)-dimensional slab with the same height as σ .
If any d points of P lie on a common (d − 2)-flat, then there is a slab of height
at most δ/ f (n, k) containing k points of Q. Otherwise, let σ be any slab containing k
points of Q. Without loss of generality, σ1 contains at least d points of P , so by applying
Lemma 5.1(c) to P (in dimension d − 1), we have ht(σ ) = ht(σ1) = (1/M d−1).
We can choose the constant α so that the parameter δ is strictly smaller than this lower
bound. Thus, by approximating ht(Q, k) within a factor of f (n, k), we can determine
whether the original set P contains a degeneracy. Conjecture 1.1 implies that this requires
(md−1) = ((n − k)d−1) time in the worst case.
The proof of Theorem 5.5 implies an even stronger hardness result when k is small.
Theorem 5.8. Conjecture 1.1 implies that for any function f (n, k) ≥ 1 computable in
constant time, computing a slab of height at most f (n, k) ht( P, k) containing k points
from a given set P of n points in Zd requires ((n/ k)d ) time in the worst case.
Proof. Suppose we are given a set P of m = n(d + 1)/ k points in the finite integer
lattice [−M .. M ]d . Let δ = α(d + 1)/ k M d for some sufficiently small constant α > 0,
possibly depending on d . We construct a new set P consisting of k/(d + 1) copies of
P , where the i th copy is shifted upward a distance of i δ/ f (n, k).
The proof of Theorem 5.5 implies that if P is affinely degenerate, then ht( P , k) ≤
α/M d f (n, k), and if P is non-degenerate, then ht( P , k) > α/M d . Thus, by
approximating ht( P , k) within a factor of f (n, k), we can determine whether P is
degenerate. Conjecture 1.1 implies that this requires (md ) = ((n/ k)d ) time in the worst
case.
Our reductions strongly suggest that our approximation algorithm is within a
polylogarithmic factor of optimal, at least when k = (n) or k = O (
1
). They also imply
that the approximate LMS problem is NP-hard when the dimension is not fixed.
Corollary 5.9. For any function f (n, k) ≥ 1 computable in polynomial time, and any
k ≤ n − (nc) for some constant c, computing an f (n, k)-approximation of the shortest
slab containing k points from a given set of n points in Zn is NP-hard.
Finally, we describe a general reduction from computing slabs with minimum height to
computing slabs of minimum width. This reduction implies that all our lower bounds
for minimizing height apply verbatim to the corresponding width problem. The key
observation is that horizontally scaling Zd does not change the height of any slab,
although it does change the width. If we scale any point set P far enough, then sorting
the non-vertical minimal slabs by width would be the same as sorting them by height; in
particular, the narrowest non-vertical slab containing k points of P will also be the shortest
slab containing k points of P . There are two main technical difficulties: quantifying the
amount of scaling required and eliminating vertical slabs from consideration.
Suppose we want to find the shortest slab containing k ≥ d + 1 points from a given
set P of n points on the integer lattice [−M .. M ]d . If M is not given, we can easily
compute it in O (n) time. Let P be the set obtained by scaling P horizontally (that is,
in every direction except vertically) by a large integer factor := (M 6d ). Scaling
any slab σ horizontally by gives us a slab σ with the same height, containing the
corresponding subset of points.
Fix a minimal non-vertical slab σ containing at least d + 1 points of P . Let n be the
integer normal vector of σ described by Lemma 5.1(e). To obtain a normal vector n for
the scaled slab σ , we can simply scale n in the vertical direction by a factor of . We can
decompose n into a vertical component nv and a horizontal component nh . Lemma 5.1(e)
implies that nh = O (M d ), and since σ is not vertical, nv ≥ = (M 6d ). We have
the following bound on the width of σ in terms of its height:
Lemma 5.1(a) implies that the heights of any two minimal slabs σ1 and σ2 either are
equal or differ by at least (1/M 2d ). It follows that ht(σ1) < ht(σ2) implies wd(σ1) <
wd(σ2); the height order and width order of the non-vertical minimal slabs is the same,
except that some equal-height pairs may not have equal width. In particular, the narrowest
non-vertical slab containing k points in P is also the shortest such slab. The entire
reduction requires only linear time, and increases the bit length of the input by at most
a factor of O (d ).
This completes the reduction from finding the shortest slab containing k of n given
points to finding the narrowest such slab that is not vertical, but what about vertical slabs?
Lemma 5.1(d) implies that the narrowest vertical slab σv containing k points of P is
either a single hyperplane or it has width ( /M d−1) = (M 5d ). Thus, if no vertical
hyperplane contains k points in P , the narrowest slab containing k points in P is not
vertical, since the entire point set fits in a slab of width 2M , so our reduction is complete.
However, if k points in P lie on a vertical hyperplane, the shortest and narrowest slabs
containing k points in P may not coincide.
To avoid this problem, we first perturb the initial set P , essentially following the
infinitesimal perturbation method of Emiris and Canny [EC], [ECS]. Let δ = 1/M 4d .
For any point p ∈ P , let p˜ denote a point at distance at most δ from p, and let P˜ = { p˜ |
p ∈ P }. For any minimal slab σ containing some subset of P , we define a corresponding
slab σ˜ that is minimal for the corresponding subset of P˜ .
Lemma 5.1(f) implies that ht(σ ) ≤ O (M d ) wd(σ ), so ht(σ˜ ) ≤ ht(σ ) + O (1/M 3d ),
and Lemma 5.1(a) implies that any two minimal slabs either have equal height or differ in
height by at least (1/M 2d ). It follows that if σ1 is shorter than σ2, then σ˜1 is shorter than
σ˜2. In other words, the shortest slab containing k points in P˜ has the same combinatorial
description as some shortest slab containing k points in P .
Arbitrarily index the points in P as p1, p2, . . . , pn , and let q be the smallest prime
number larger than n (and therefore less than 2n). We choose the specific perturbation
˜i = pi + δµ q (i ), where µ q is the modular moment curve
p
1
µ q (t ) := q (t , t 2 mod q, t 3 mod q, . . . , t d mod q).
We can express the volume of any simplex in P˜ as a polynomial in δ. Lemma 5.1 implies
that the sign of this polynomial is determined by the sign of the largest term. Moreover,
the coefficient δd term is the volume of a simplex whose vertices are integer points on
the modular moment curve, and is therefore not equal to zero. We conclude that no
d + 1 points in P˜ lie on a common hyperplane; in particular, no k points lie on a vertical
hyperplane.
Scaling the set P˜ by a factor of q/δ gives us an integer point set, where every coordinate
has absolute value at most O (M 5d ). Thus, to find the shortest slab containing k points
in P , we can apply our earlier reduction to P˜ . The entire reduction requires only linear
time and increases the bit length of the input by at most a factor of O (d2).
Theorem 5.10. Conjecture 1.1 implies that computing the narrowest slab containing k
points from a given set of n points in Zd requires time (min{k, n − k}d ) and ((n/ k)d )
in the worst case.
Theorem 5.11. Conjecture 1.1 implies that computing a slab of width at most
2 wd( P, k) containing k points from a given set P of n points in Zd requires time
((n − k)d−1) and ((n/ k)d ) in the worst case.
Corollary 5.12. For any k ≤ n − (nc) for some constant c, computing a
2-approximation of the thinnest slab containing k points from a given set of n points in Zn is
NP-hard.
Acknowledgments
We thank Timothy Chan for suggesting the use of his randomized optimization technique.
We also thank the anonymous referees for making a number of suggestions for improving
the presentation.
[B] M. Ben-Or . Lower bounds for algebraic computation trees . In Proc. 15th Annu. ACM Sympos. Theory Comput. , pages 80 - 86 , 1983 .
[BGH] J. Basch , L. J. Guibas , and J. Hershberger . Data structures for mobile data . J. Algorithms , 31 ( 1 ): 1 - 28 , 1999 .
[C1] T. M. Chan . Geometric applications of a randomized optimization technique . Discrete Comput. Geom. , 22 : 547 - 567 , 1999 .
[C2] T. M. Chan . Low-dimensional linear programming with violations . In Proc. 43rd Annu. IEEE Sympos. Found. Comput. Sci. , pages 570 - 579 , 2002 .
[C3] T. M. Chan . Personal communication . February 2004 .
[C4] B. Chazelle . The Discrepancy Method: Randomness and Complexity . Cambridge University Press, New York, 2001 .
[DMN] M. B. Dillencourt , D. M. Mount , and N. S. Netanyahu . A randomized algorithm for slope selection . Internat. J. Comput. Geom. Appl. , 2 : 1 - 27 , 1992 .
[E1] J. Erickson . On the relative complexities of some geometric problems . In Proc. 7th Canad. Conf. Comput. Geom. , pages 85 - 90 , 1995 .
[E2] J. Erickson . Lower bounds for linear satisfiability problems . Chicago J. Theoret. Comput. Sci. , 1999(8) , 1999 .
[E3] J. Erickson . New lower bounds for convex hull problems in odd dimensions . SIAM J. Comput. , 28 : 1198 - 1214 , 1999 .
[EC] I. Z. Emiris and J. F. Canny . A general approach to removing degeneracies . SIAM J. Comput. , 24 : 650 - 664 , 1995 .
[ECS] I. Z. Emiris , J. F. Canny , and R. Seidel . Efficient perturbations for handling geometric degeneracies . Algorithmica , 19 ( 1-2 ): 219 - 242 , September 1997 .
[ES1] H. Edelsbrunner and D. L. Souvaine . Computing median-of-squares regression lines and guided topological sweep . J. Amer. Statist . Assoc., 85 : 115 - 119 , 1990 .
[ES2] J. Erickson and R. Seidel . Better lower bounds on detecting affine and spherical degeneracies . Discrete Comput. Geom. , 13 : 41 - 57 , 1995 .
[GO] A. Gajentaan and M. H. Overmars . On a class of O(n2) problems in computational geometry . Comput. Geom. Theory Appl. , 5 : 165 - 185 , 1995 .
[H] S. Har-Peled. No coreset, no cry . In Proc. 24th Conf. Found. Soft. Tech. Theoret. Comput. Sci. , pages 324 - 335 , 2004 .
[HW] S. Har-Peled and Y. Wang . Shape fitting with outliers . SIAM J. Comput. , 33 ( 2 ): 269 - 285 , 2004 .
[K] L. Khachiyan . On the complexity of approximating determinants in matrices . J. Complexity , 11 : 138 - 153 , 1995 .
[M1] J. Matousˇek. Randomized optimal algorithm for slope selection . Inform . Process. Lett., 39 : 183 - 187 , 1991 .
[M2] J. Matousˇek . On geometric optimization with few violated constraints . Discrete Comput. Geom. , 14 : 365 - 384 , 1995 .
[M3] K. Mulmuley. Computational Geometry : An Introduction through Randomized Algorithms . Prentice-Hall, Englewood Cliffs, NJ, 1994 .
[MMRK] P. Meer , D. Mintz , A. Rosenfeld , and D. Y. Kim . Robust regression methods for computer vision: a review . Internat. J. Comput. Vision , 6 : 59 - 70 , 1991 .
[MNR+ ] D. M. Mount , N. S. Netanyahu , K. R. Romanik , R. Silverman , and A. Y. Yu . A practical approximation algorithm for the lms line estimator . In Proc. 8th ACM-SIAM Sympos. Discrete Algorithms , pages 473 - 482 , January 1997 .
[O] C. F. Olson . An approximation algorithm for least median of squares regression . Inform . Process. Lett., 63 : 237 - 241 , 1997 .
[R1] P. J. Rousseeuw . Least median of squares regression . J. Amer. Statist . Assoc., 79 ( 388 ): 871 - 880 , December 1984 .
[R2] P. J. Rousseeuw . Introduction to positive breakdown methods . In G. S. Maddala and C. R. Rao , editors, Handbook of Statistics , Vol 15 : Robust Inference , pages 101 - 121 . Elsevier, Amsterdam, 1997 .
[RL] P. J. Rousseeuw and A. M. Leroy . Robust Regression and Outlier Detection . Wiley, New York, 1987 .
[Y1] A. C. Yao . Lower bounds for algebraic computation trees with integer inputs . SIAM J. Comput. , 20 ( 4 ): 655 - 668 , 1991 .
[Y2] V. J. Yohai . High breakdown-point and high efficiency robust estimates for regression . Ann. Statist., 15 : 642 - 656 , 1987 .