Maximum Volume Subset Selection for Anchored Boxes
Maximum Volume Subset Selection for Anchored
Boxes
Karl Bringmann1 , Sergio Cabello∗2 , and Michael T. M. Emmerich3
1
Max Planck Institute for Informatics, Saarland Informatics Campus,
Saarbrücken, Germany
Department of Mathematics, IMFM, Ljubljana, Slovenia; and
Department of Mathematics, FMF, University of Ljubljana, Ljubljana,
Slovenia
Leiden Institute of Advanced Computer Science (LIACS), Leiden University,
Leiden, The Netherlands
2
3
Abstract
Let B be a set of n axis-parallel boxes in Rd such that each box has a corner at the origin and
the other corner in the positive quadrant of Rd , and let k be a positive integer. We study the
problem of selecting k boxes in B that maximize the volume of the union of the selected boxes.
The research is motivated by applications in skyline queries for databases and in multicriteria
optimization, where the problem is known as the hypervolume subset selection problem. It is
known that the problem can be solved in polynomial
time in the plane, while the best known
running time in any dimension d ≥ 3 is Ω nk . We show that:
The problem is NP-hard already in 3 dimensions.
√
In 3 dimensions, we break the bound Ω nk , by providing an nO( k) algorithm.
For any constant dimension d, we give an efficient polynomial-time approximation scheme.
1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
Keywords and phrases geometric optimization, subset selection, hypervolume indicator, Klee’s
measure problem, boxes, NP-hardness, PTAS
Digital Object Identifier 10.4230/LIPIcs.SoCG.2017.22
1
Introduction
An anchored box is an orthogonal range of the form box(p) := [0, p1 ] × . . . × [0, pd ] ⊂ Rd≥0 ,
spanned by the point p ∈ Rd>0 . This paper is concerned with the problem Volume Selection:
Given a set P of n points in Rd>0 , select k points in P maximizing the volume of the union
of their anchored boxes. That is, we want to compute
[
VolSel(P, k) := max vol
box(p) ,
S⊆P, |S|=k
p∈S
as well as a set S ∗ ⊆ P of size k realizing this value. Here, vol denotes the usual volume.
Motivation
This geometric problem is of key importance in the context of multicriteria optimization and
decision analysis, where it is known as the hypervolume subset selection problem (HSSP)
∗
Supported by the Slovenian Research Agency, program P1-0297 and project L7-5459.
© Karl Bringmann, Sergio Cabello, and Michael T. M. Emmerich;
licensed under Creative Commons License CC-BY
33rd International Symposium on Computational Geometry (SoCG 2017).
Editors: Boris Aronov and Matthew J. Katz; Article No. 22; pp. 22:1–22:15
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
22:2
Maximum Volume Subset Selection for Anchored Boxes
[2, 3, 4, 24, 12, 13]. In this context, the points in P correspond to solutions of an optimization
problem with d objectives, and the goal is to find a small subset of P that “represents”
the set P well. The quality of a representative subset S ⊆ P is measured by the volume
of the union of the anchored boxes spanned by points in S; this is also known as the
hypervolume indicator [34]. Note that with this quality indicator, finding the optimal size-k
representation is equivalent to our problem VolSel(P, k). In applications, such bounded-size
representations are required in archivers for non-dominated sets [23] and for multicriteria
optimization algorithms and heuristics [3, 10, 7].1 Besides, the problem has recently received
attention in the context of skyline operators in databases [17].
In 2 dimensions, the problem can be solved in polynomial time [2, 13, 24], which is used
in applications such as analyzing benchmark functions [2] and efficient postprocessing of
multiobjective algorithms [12]. A natural question is whether efficient algorithms also exist in
dimension d ≥ 3, and thus whether these applications can be pushed beyond two objectives.
In this paper, we answer this question negatively, by proving that Volume Selection
is NP-hard
already in 3 dimensions. We then consider the question whether the previous
Ω( nk ) bound can be improved, which we answer affirmatively in 3 dimension. Finally, in
any constant dimension, we improve the best-known (1 − 1/e)-approximation to an efficient
polynomial-time approximation scheme (EPTAS). See Section 1.2 for details.
1.1
Further Related Work
Klee’s Measure Problem
To compute the volume of the union of n (not necessarily anchored) axis-aligned boxes in Rd
is known as Klee’s measure problem. The fastest known algorithm takes time2 O(nd/2 ), which
can be improved to O(nd/3 polylog(n)) if all boxes are cubes [15]. By a simple reduction [8],
the same running time as on cubes can be obtained on anchored boxes, which can be improved
to O(n log n) for d ≤ 3 [6]. These results are relevant to this paper because Klee’s measure
problem on anchored boxes (spanned by the points in P ) is a special case of Volume
Selection (by calling VolSel(P, |P |)).
Chan [14] gave a reduction from k-Clique to Klee’s measure problem in 2k dimensions.
This proves NP-hardness of Klee’s measure problem when d is part of the input (and thus
d can be as large as n). Moreover, since k-Clique has no f (k) · no(k) algorithm under the
Exponential Time Hypothesis [16], Klee’s measure problem has no f (d) · no(d) algorithm
under the same assumption. The same hardness results also hold for Klee’s measure problem
on anchored boxes, by a reduction in [8] (NP-hardness was first proven in [11]).
Finally, we mention that Klee’s measure problem has a very efficient randomized (1 ± ε)approximation algorithm in time O(n log(1/δ)/ε2 ) with error probability δ [9].
Known Results for Volume Selection
As mentioned above, 2-dimensional Volume Selection can be solved in polynomial time;
the initial O(kn2 ) algorithm [2] was later improved to O((n − k)k + n log n) [13, 24]. In higher
dimensions, by enumerating all size-k subsets and solving an instance of Klee’s measure
problem on anchored boxes for each one, there is an O nk k d/3 polylog(k) algorithm. For
1
We remark that in these applications the anchor point is often not the origin, however, by a simple
translation we can move our anchor point from (0, . . . , 0) to any other point in Rd .
2
In O-notation, we always assume d to be a constant, and log(x) is to be understood as max{1, log(x)}.
K. Bringmann, S. Cabello, and M. T. M. Emmerich
22:3
small n − k, this can be improved to O(nd/2 log n + nn−k ) [10]. Volume Selection is
NP-hard when d is part of the input, since the same holds already for Klee’s measure problem
on anchored boxes. However, this does not explain the exponential dependence on k for
constant d.
Since the volume of the union of boxes is a submodular function (see, e.g., [31]), the
greedy algorithm for submodular function maximization [27] yields a (1 − 1/e)-approximation
of VolSel(P, k). This algorithm solves O(nk) instances of Klee’s measure problem on at
mo (...truncated)