Cache-Oblivious Range Reporting with Optimal Queries Requires Superlinear Space
Discrete Comput Geom
Cache-Oblivious Range Reporting with Optimal Queries Requires Superlinear Space
Peyman Afshani 0
Chris Hamilton 0
Norbert Zeh 0
0 P. Afshani MADALGO, Department of Computer Science, Aarhus University , IT Parken, Aabogade 34, 8200 Aarhus N , Denmark
We consider a number of range reporting problems in two and three dimensions and prove lower bounds on the amount of space used by any cacheoblivious data structure for these problems that achieves the optimal query bound of O(logB N + K/B) block transfers, where K is the size of the query output. The problems we study are three-sided range reporting, 3-d dominance reporting, and 3-d halfspace range reporting. We prove that, in order to achieve the above query bound or even a bound of f (logB N , K/B), for any monotonically increasing function f (·, ·), the data structure has to use Ω (N (log log N )ε) space. This lower bound holds also for the expected size of any Las-Vegas-type data structure that achieves an expected query bound of at most f (logB N , K/B) block transfers. The exponent ε depends on the function f (·, ·) and on the range of permissible block sizes. Our result has a number of interesting consequences. The first one is a new type of separation between the I/O model and the cache-oblivious model, as deterministic I/O-efficient data structures with the optimal query bound in the worst case and using linear or O(N log∗ N ) space are known for the above problems. The second consePart of this work was done while P. Afshani had been visiting Dalhousie University. C. Hamilton was supported by a Killam Predoctoral Scholarship. N. Zeh was supported in part by the Natural Science and Engineering Research Council of Canada and the Canada Research Chairs programme.
-
quence is the non-existence of linear-space cache-oblivious persistent B-trees with
optimal 1-d range reporting queries.
1 Introduction
Range reporting is a well studied fundamental problem in computational geometry.
Given a set S of points in Rd , the goal is to preprocess S so that, for any query range
q of a given shape, all points in S ∩ q can be reported efficiently. Typical query shapes
include axis-aligned boxes, circles, simplices, and halfspaces. To indicate the type of
permissible queries, the problem is then referred to more specifically as orthogonal,
circular, simplex or halfspace range reporting. Three-sided range reporting is a
special case of 2-d orthogonal range reporting that considers axis-aligned boxes whose
top boundaries are fixed at y = +∞. Dominance reporting is another important
special case of orthogonal range reporting: given a query point q, the problem is to report
all points in S that are dominated by q, that is, whose coordinates are less than q’s in
all dimensions. These different query types are illustrated in Fig. 1.
Most previous work on this type of problem has focused on standard models of
computation, such as the RAM model or the pointer machine model. The
distinguishing feature of these models is that the access cost to a data item is independent of the
location where the item is stored in memory. These models are useful for studying
the fundamental computational difficulty of a problem, but they ignore that in reality
the time to access an item can vary by a factor of up to 106 depending on its present
location (disk, internal memory, CPU cache, etc.).
A number of models have been proposed to capture the non-uniform access costs
in real memory hierarchies. See [
31
] for a survey. The two most widely adopted ones
are the input/output model (or I/O model) [
6
] and the cache-oblivious model [
18
].
Their success is due to the balance they provide between simplicity, in order to allow
the design and analysis of sophisticated algorithms, and accuracy in predicting the
performance of algorithms on real memory hierarchies.
The I/O model considers two levels of memory: a fast internal memory with the
capacity to hold M data items, and a slow but conceptually unlimited external
memory. All computation has to happen on data in internal memory. The transfer of data
between internal and external memory happens in blocks of B consecutive data items;
the complexity of an algorithm is the number of such block transfers it performs.
The cache-oblivious model provides a simple framework for designing algorithms
for multi-level memory hierarchies, while using the simple two-level I/O model for
the analysis. In this model, the algorithm is oblivious of the memory hierarchy and,
thus, cannot initiate block transfers explicitly. Instead, the swapping of data between
internal and external memory is the responsibility of a paging algorithm, which is
assumed to be offline optimal, that is, to perform the minimum number of block
transfers possible for the memory access sequence of the algorithm. Since the memory
parameters are used only in the analysis, the analysis applies to any two consecutive
levels of the memory hierarchy. In particul (...truncated)