Cache-Oblivious Range Reporting with Optimal Queries Requires Superlinear Space

Discrete & Computational Geometry, Apr 2011

We consider a number of range reporting problems in two and three dimensions and prove lower bounds on the amount of space used by any cache-oblivious data structure for these problems that achieves the optimal query bound of O(log  B N+K/B) block transfers, where K is the size of the query output. The problems we study are three-sided range reporting, 3-d dominance reporting, and 3-d halfspace range reporting. We prove that, in order to achieve the above query bound or even a bound of f(log  B N,K/B), for any monotonically increasing function f(⋅,⋅), the data structure has to use Ω(N(log log N) ε ) space. This lower bound holds also for the expected size of any Las-Vegas-type data structure that achieves an expected query bound of at most f(log  B N,K/B) block transfers. The exponent ε depends on the function f(⋅,⋅) and on the range of permissible block sizes. Our result has a number of interesting consequences. The first one is a new type of separation between the I/O model and the cache-oblivious model, as deterministic I/O-efficient data structures with the optimal query bound in the worst case and using linear or O(Nlog ∗ N) space are known for the above problems. The second consequence is the non-existence of linear-space cache-oblivious persistent B-trees with optimal 1-d range reporting queries.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs00454-011-9347-7.pdf

Cache-Oblivious Range Reporting with Optimal Queries Requires Superlinear Space

Discrete Comput Geom Cache-Oblivious Range Reporting with Optimal Queries Requires Superlinear Space Peyman Afshani 0 Chris Hamilton 0 Norbert Zeh 0 0 P. Afshani MADALGO, Department of Computer Science, Aarhus University , IT Parken, Aabogade 34, 8200 Aarhus N , Denmark We consider a number of range reporting problems in two and three dimensions and prove lower bounds on the amount of space used by any cacheoblivious data structure for these problems that achieves the optimal query bound of O(logB N + K/B) block transfers, where K is the size of the query output. The problems we study are three-sided range reporting, 3-d dominance reporting, and 3-d halfspace range reporting. We prove that, in order to achieve the above query bound or even a bound of f (logB N , K/B), for any monotonically increasing function f (·, ·), the data structure has to use Ω (N (log log N )ε) space. This lower bound holds also for the expected size of any Las-Vegas-type data structure that achieves an expected query bound of at most f (logB N , K/B) block transfers. The exponent ε depends on the function f (·, ·) and on the range of permissible block sizes. Our result has a number of interesting consequences. The first one is a new type of separation between the I/O model and the cache-oblivious model, as deterministic I/O-efficient data structures with the optimal query bound in the worst case and using linear or O(N log∗ N ) space are known for the above problems. The second consePart of this work was done while P. Afshani had been visiting Dalhousie University. C. Hamilton was supported by a Killam Predoctoral Scholarship. N. Zeh was supported in part by the Natural Science and Engineering Research Council of Canada and the Canada Research Chairs programme. - quence is the non-existence of linear-space cache-oblivious persistent B-trees with optimal 1-d range reporting queries. 1 Introduction Range reporting is a well studied fundamental problem in computational geometry. Given a set S of points in Rd , the goal is to preprocess S so that, for any query range q of a given shape, all points in S ∩ q can be reported efficiently. Typical query shapes include axis-aligned boxes, circles, simplices, and halfspaces. To indicate the type of permissible queries, the problem is then referred to more specifically as orthogonal, circular, simplex or halfspace range reporting. Three-sided range reporting is a special case of 2-d orthogonal range reporting that considers axis-aligned boxes whose top boundaries are fixed at y = +∞. Dominance reporting is another important special case of orthogonal range reporting: given a query point q, the problem is to report all points in S that are dominated by q, that is, whose coordinates are less than q’s in all dimensions. These different query types are illustrated in Fig. 1. Most previous work on this type of problem has focused on standard models of computation, such as the RAM model or the pointer machine model. The distinguishing feature of these models is that the access cost to a data item is independent of the location where the item is stored in memory. These models are useful for studying the fundamental computational difficulty of a problem, but they ignore that in reality the time to access an item can vary by a factor of up to 106 depending on its present location (disk, internal memory, CPU cache, etc.). A number of models have been proposed to capture the non-uniform access costs in real memory hierarchies. See [ 31 ] for a survey. The two most widely adopted ones are the input/output model (or I/O model) [ 6 ] and the cache-oblivious model [ 18 ]. Their success is due to the balance they provide between simplicity, in order to allow the design and analysis of sophisticated algorithms, and accuracy in predicting the performance of algorithms on real memory hierarchies. The I/O model considers two levels of memory: a fast internal memory with the capacity to hold M data items, and a slow but conceptually unlimited external memory. All computation has to happen on data in internal memory. The transfer of data between internal and external memory happens in blocks of B consecutive data items; the complexity of an algorithm is the number of such block transfers it performs. The cache-oblivious model provides a simple framework for designing algorithms for multi-level memory hierarchies, while using the simple two-level I/O model for the analysis. In this model, the algorithm is oblivious of the memory hierarchy and, thus, cannot initiate block transfers explicitly. Instead, the swapping of data between internal and external memory is the responsibility of a paging algorithm, which is assumed to be offline optimal, that is, to perform the minimum number of block transfers possible for the memory access sequence of the algorithm. Since the memory parameters are used only in the analysis, the analysis applies to any two consecutive levels of the memory hierarchy. In particul (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs00454-011-9347-7.pdf

Peyman Afshani, Chris Hamilton, Norbert Zeh. Cache-Oblivious Range Reporting with Optimal Queries Requires Superlinear Space, Discrete & Computational Geometry, 2011, pp. 824-850, Volume 45, Issue 4, DOI: 10.1007/s00454-011-9347-7