Orthogonal Range Searching in Moderate Dimensions: k-d Trees and Range Trees Strike Back

Jun 2017

We revisit the orthogonal range searching problem and the exact l_infinity nearest neighbor searching problem for a static set of n points when the dimension d is moderately large. We give the first data structure with near linear space that achieves truly sublinear query time when the dimension is any constant multiple of log n. Specifically, the preprocessing time and space are O(n^{1+delta}) for any constant delta>0, and the expected query time is n^{1-1/O(c log c)} for d = c log n. The data structure is simple and is based on a new "augmented, randomized, lopsided" variant of k-d trees. It matches (in fact, slightly improves) the performance of previous combinatorial algorithms that work only in the case of offline queries [Impagliazzo, Lovett, Paturi, and Schneider (2014) and Chan (SODA'15)]. It leads to slightly faster combinatorial algorithms for all-pairs shortest paths in general real-weighted graphs and rectangular Boolean matrix multiplication. In the offline case, we show that the problem can be reduced to the Boolean orthogonal vectors problem and thus admits an n^{2-1/O(log c)}-time non-combinatorial algorithm [Abboud, Williams, and Yu (SODA'15)]. This reduction is also simple and is based on range trees. Finally, we use a similar approach to obtain a small improvement to Indyk's data structure [FOCS'98] for approximate l_infinity nearest neighbor search when d = c log n.

Article PDF cannot be displayed. You can download it here:

http://drops.dagstuhl.de/opus/volltexte/2017/7226/pdf/LIPIcs-SoCG-2017-27.pdf

Orthogonal Range Searching in Moderate Dimensions: k-d Trees and Range Trees Strike Back

Orthogonal Range Searching in Moderate Dimensions: k-d Trees and Range Trees Strike Back∗† Timothy M. Chan Dept. of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA Abstract We revisit the orthogonal range searching problem and the exact `∞ nearest neighbor searching problem for a static set of n points when the dimension d is moderately large. We give the first data structure with near linear space that achieves truly sublinear query time when the dimension is any constant multiple of log n. Specifically, the preprocessing time and space are O(n1+δ ) for any constant δ > 0, and the expected query time is n1−1/O(c log c) for d = c log n. The data structure is simple and is based on a new “augmented, randomized, lopsided” variant of k-d trees. It matches (in fact, slightly improves) the performance of previous combinatorial algorithms that work only in the case of offline queries [Impagliazzo, Lovett, Paturi, and Schneider (2014) and Chan (SODA’15)]. It leads to slightly faster combinatorial algorithms for all-pairs shortest paths in general real-weighted graphs and rectangular Boolean matrix multiplication. In the offline case, we show that the problem can be reduced to the Boolean orthogonal vectors problem and thus admits an n2−1/O(log c) -time non-combinatorial algorithm [Abboud, Williams, and Yu (SODA’15)]. This reduction is also simple and is based on range trees. Finally, we use a similar approach to obtain a small improvement to Indyk’s data structure [FOCS’98] for approximate `∞ nearest neighbor search when d = c log n. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems Keywords and phrases computational geometry, data structures, range searching, nearest neighbor searching Digital Object Identifier 10.4230/LIPIcs.SoCG.2017.27 1 Introduction In this paper, we revisit some classical problems in computational geometry: In orthogonal range searching, we want to preprocess n data points in Rd so that we can detect if there is a data point inside any query axis-aligned box, or report or count all such points. In dominance range searching, we are interested in the special case when the query box is d-sided, of the form (−∞, q1 ] × · · · × (−∞, qd ]; in other words, we want to detect if there is a data point (p1 , . . . , pd ) that is dominated by a query point (q1 , . . . , qd ), in the sense that pj ≤ qj for all j ∈ {1, . . . , d}, or report or count all such points. In `∞ nearest neighbor searching, we want to preprocess n data points in Rd so that we can find the nearest neighbor to the given query point under the `∞ metric. ∗ † A full version of the paper is available at http://tmc.web.engr.illinois.edu/high_ors3_17.pdf. This work was done while the author was at the Cheriton School of Computer Science, University of Waterloo. © Timothy M. Chan; licensed under Creative Commons License CC-BY 33rd International Symposium on Computational Geometry (SoCG 2017). Editors: Boris Aronov and Matthew J. Katz; Article No. 27; pp. 27:1–27:15 Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 27:2 Orthogonal Range Searching in Moderate Dimensions All three problems are related. Orthogonal range searching in d dimensions reduces to dominance range searching in 2d dimensions.1 Furthermore, ignoring logarithmic factors, `∞ nearest neighbor searching reduces to its decision problem (deciding whether the `∞ nearest neighbor distance to a given query point is at most a given radius) by parametric search or randomized search [7], and the decision problem clearly reduces to orthogonal range searching. The standard k-d tree [22] has O(dn log n) preprocessing time and O(dn) space, but the worst-case query time is O(dn1−1/d ). The standard range tree [22] requires O(n logd n) preprocessing time and space and O(logd n) query time, excluding an O(K) term for the reporting version of the problem with output size K. Much work in computational geometry has been devoted to small improvements of a few logarithmic factors. For example, the current best result for orthogonal range reporting has O(n logd−3+ε n) space and O(logd−3 n/ logd−4 log n + K) time [12]; there are also other small improvements for various offline versions of the problems [12, 13, 2]. In this paper, we are concerned with the setting when the dimension is nonconstant. Traditional approaches from computational geometry tend to suffer from exponential dependencies in d (the so-called “curse of dimensionality”). For example, the O(dn1−1/d ) or O(logd n) query time bound for range trees or k-d trees is sublinear only when d  log n/ log log n. By a more careful analysis [10], one can show that range trees still have sublinear query time when d  α0 log n for a sufficiently small constant α0 . The case when the dimension is close to logarithmic in n is interesting in view of known dimensionality reduction techniques [16] (although such techniques technically are not applicable to exact problems and, even with approximation, do not work well for `∞ ). The case of polylogarithmic dimensions is also useful in certain non-geometric applications such as all-pairs shortest paths (as we explain later). From a theoretical perspective, it is important to understand when the time complexity transitions from sublinear to superlinear. Previous offline results. We first consider the offline version of the problems where we want to answer a batch of n queries all given in advance. In high dimensions, it is possible to do better than O(dn2 )-time brute-force search, by a method of Matoušek [21] using fast (rectangular) matrix multiplication [20]; for example, we can get n2+o(1) time for d  n0.15 . However, this approach inherently cannot give subquadratic bounds. In 2014, a surprising discovery was made by Impagliazzo et al. [17]: range-tree-like divide-and-conquer can still work well even when the dimension goes a bit above logarithmic. Their algorithm can answer n offline dominance range queries (and thus orthogonal range 15 queries and `∞ nearest neighbor queries) in total time n2−1/O(c log c) (ignoring an O(K) term for reporting) in dimension d = c log n for any possibly nonconstant c ranging from 1 to about log1/15 n (ignoring log log n factors). Shortly after, by a more careful analysis of the 2 same algorithm, Chan [8] refined the time bound to n2−1/O(c log c) , which is subquadratic for c up to about log n, i.e., dimension up to about log2 n. At SODA’15, Abboud, Williams, and Yu [1] obtained an even better time bound for dominance range detection in the Boolean special case, where all coordinate values are 0’s and 1’s (in this case, the problem is better known as the Boolean orthogonal vectors problem2 ). 1 (p1 , . . . , pd ) is inside the box [a1 , b1 ] × · · · × [ad , bd ] iff (−p1 , p1 , . . . , −pd , pd ) is dominated by (−a1 , b1 , . . . , −ad , bd ) in R2d . Pd 2 Two vectors (p1 , . . . (...truncated)


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2017/7226/pdf/LIPIcs-SoCG-2017-27.pdf
Article home page: http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=7226

Timothy M. Chan. Orthogonal Range Searching in Moderate Dimensions: k-d Trees and Range Trees Strike Back, 2017, pp. 27:1-27:15, 77, DOI: 10.4230/LIPIcs.SoCG.2017.27