Orthogonal Range Searching in Moderate Dimensions: k-d Trees and Range Trees Strike Back
Orthogonal Range Searching in Moderate
Dimensions: k-d Trees and Range Trees Strike
Back∗†
Timothy M. Chan
Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana,
IL, USA
Abstract
We revisit the orthogonal range searching problem and the exact `∞ nearest neighbor searching
problem for a static set of n points when the dimension d is moderately large. We give the first
data structure with near linear space that achieves truly sublinear query time when the dimension
is any constant multiple of log n. Specifically, the preprocessing time and space are O(n1+δ ) for
any constant δ > 0, and the expected query time is n1−1/O(c log c) for d = c log n. The data
structure is simple and is based on a new “augmented, randomized, lopsided” variant of k-d trees.
It matches (in fact, slightly improves) the performance of previous combinatorial algorithms that
work only in the case of offline queries [Impagliazzo, Lovett, Paturi, and Schneider (2014) and
Chan (SODA’15)]. It leads to slightly faster combinatorial algorithms for all-pairs shortest paths
in general real-weighted graphs and rectangular Boolean matrix multiplication.
In the offline case, we show that the problem can be reduced to the Boolean orthogonal vectors
problem and thus admits an n2−1/O(log c) -time non-combinatorial algorithm [Abboud, Williams,
and Yu (SODA’15)]. This reduction is also simple and is based on range trees.
Finally, we use a similar approach to obtain a small improvement to Indyk’s data structure
[FOCS’98] for approximate `∞ nearest neighbor search when d = c log n.
1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
Keywords and phrases computational geometry, data structures, range searching, nearest neighbor searching
Digital Object Identifier 10.4230/LIPIcs.SoCG.2017.27
1
Introduction
In this paper, we revisit some classical problems in computational geometry:
In orthogonal range searching, we want to preprocess n data points in Rd so that we can
detect if there is a data point inside any query axis-aligned box, or report or count all
such points.
In dominance range searching, we are interested in the special case when the query box is
d-sided, of the form (−∞, q1 ] × · · · × (−∞, qd ]; in other words, we want to detect if there
is a data point (p1 , . . . , pd ) that is dominated by a query point (q1 , . . . , qd ), in the sense
that pj ≤ qj for all j ∈ {1, . . . , d}, or report or count all such points.
In `∞ nearest neighbor searching, we want to preprocess n data points in Rd so that we
can find the nearest neighbor to the given query point under the `∞ metric.
∗
†
A full version of the paper is available at http://tmc.web.engr.illinois.edu/high_ors3_17.pdf.
This work was done while the author was at the Cheriton School of Computer Science, University of
Waterloo.
© Timothy M. Chan;
licensed under Creative Commons License CC-BY
33rd International Symposium on Computational Geometry (SoCG 2017).
Editors: Boris Aronov and Matthew J. Katz; Article No. 27; pp. 27:1–27:15
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
27:2
Orthogonal Range Searching in Moderate Dimensions
All three problems are related. Orthogonal range searching in d dimensions reduces to
dominance range searching in 2d dimensions.1 Furthermore, ignoring logarithmic factors,
`∞ nearest neighbor searching reduces to its decision problem (deciding whether the `∞
nearest neighbor distance to a given query point is at most a given radius) by parametric
search or randomized search [7], and the decision problem clearly reduces to orthogonal range
searching.
The standard k-d tree [22] has O(dn log n) preprocessing time and O(dn) space, but
the worst-case query time is O(dn1−1/d ). The standard range tree [22] requires O(n logd n)
preprocessing time and space and O(logd n) query time, excluding an O(K) term for the
reporting version of the problem with output size K. Much work in computational geometry has been devoted to small improvements of a few logarithmic factors. For example, the current best result for orthogonal range reporting has O(n logd−3+ε n) space and
O(logd−3 n/ logd−4 log n + K) time [12]; there are also other small improvements for various
offline versions of the problems [12, 13, 2].
In this paper, we are concerned with the setting when the dimension is nonconstant. Traditional approaches from computational geometry tend to suffer from exponential dependencies
in d (the so-called “curse of dimensionality”). For example, the O(dn1−1/d ) or O(logd n)
query time bound for range trees or k-d trees is sublinear only when d log n/ log log n. By
a more careful analysis [10], one can show that range trees still have sublinear query time
when d α0 log n for a sufficiently small constant α0 . The case when the dimension is close
to logarithmic in n is interesting in view of known dimensionality reduction techniques [16]
(although such techniques technically are not applicable to exact problems and, even with
approximation, do not work well for `∞ ). The case of polylogarithmic dimensions is also
useful in certain non-geometric applications such as all-pairs shortest paths (as we explain
later). From a theoretical perspective, it is important to understand when the time complexity
transitions from sublinear to superlinear.
Previous offline results. We first consider the offline version of the problems where we
want to answer a batch of n queries all given in advance. In high dimensions, it is possible
to do better than O(dn2 )-time brute-force search, by a method of Matoušek [21] using fast
(rectangular) matrix multiplication [20]; for example, we can get n2+o(1) time for d n0.15 .
However, this approach inherently cannot give subquadratic bounds.
In 2014, a surprising discovery was made by Impagliazzo et al. [17]: range-tree-like
divide-and-conquer can still work well even when the dimension goes a bit above logarithmic.
Their algorithm can answer n offline dominance range queries (and thus orthogonal range
15
queries and `∞ nearest neighbor queries) in total time n2−1/O(c log c) (ignoring an O(K)
term for reporting) in dimension d = c log n for any possibly nonconstant c ranging from 1 to
about log1/15 n (ignoring log log n factors). Shortly after, by a more careful analysis of the
2
same algorithm, Chan [8] refined the time bound to n2−1/O(c log c) , which is subquadratic
for c up to about log n, i.e., dimension up to about log2 n.
At SODA’15, Abboud, Williams, and Yu [1] obtained an even better time bound for
dominance range detection in the Boolean special case, where all coordinate values are 0’s and
1’s (in this case, the problem is better known as the Boolean orthogonal vectors problem2 ).
1
(p1 , . . . , pd ) is inside the box [a1 , b1 ] × · · · × [ad , bd ] iff (−p1 , p1 , . . . , −pd , pd ) is dominated by
(−a1 , b1 , . . . , −ad , bd ) in R2d .
Pd
2
Two vectors (p1 , . . . (...truncated)