Maximum Volume Subset Selection for Anchored Boxes

Jun 2017

Let B be a set of n axis-parallel boxes in d-dimensions such that each box has a corner at the origin and the other corner in the positive quadrant, and let k be a positive integer. We study the problem of selecting k boxes in B that maximize the volume of the union of the selected boxes. The research is motivated by applications in skyline queries for databases and in multicriteria optimization, where the problem is known as the hypervolume subset selection problem. It is known that the problem can be solved in polynomial time in the plane, while the best known algorithms in any dimension d>2 enumerate all size-k subsets. We show that: * The problem is NP-hard already in 3 dimensions. * In 3 dimensions, we break the enumeration of all size-k subsets, by providing an n^O(sqrt(k)) algorithm. * For any constant dimension d, we give an efficient polynomial-time approximation scheme.

Article PDF cannot be displayed. You can download it here:

http://drops.dagstuhl.de/opus/volltexte/2017/7201/pdf/LIPIcs-SoCG-2017-22.pdf

Maximum Volume Subset Selection for Anchored Boxes

Maximum Volume Subset Selection for Anchored Boxes Karl Bringmann1 , Sergio Cabello∗2 , and Michael T. M. Emmerich3 1 Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany Department of Mathematics, IMFM, Ljubljana, Slovenia; and Department of Mathematics, FMF, University of Ljubljana, Ljubljana, Slovenia Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, The Netherlands 2 3 Abstract Let B be a set of n axis-parallel boxes in Rd such that each box has a corner at the origin and the other corner in the positive quadrant of Rd , and let k be a positive integer. We study the problem of selecting k boxes in B that maximize the volume of the union of the selected boxes. The research is motivated by applications in skyline queries for databases and in multicriteria optimization, where the problem is known as the hypervolume subset selection problem. It is known that the problem can be solved in polynomial time in the plane, while the best known  running time in any dimension d ≥ 3 is Ω nk . We show that: The problem is NP-hard already in 3 dimensions. √  In 3 dimensions, we break the bound Ω nk , by providing an nO( k) algorithm. For any constant dimension d, we give an efficient polynomial-time approximation scheme. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems Keywords and phrases geometric optimization, subset selection, hypervolume indicator, Klee’s measure problem, boxes, NP-hardness, PTAS Digital Object Identifier 10.4230/LIPIcs.SoCG.2017.22 1 Introduction An anchored box is an orthogonal range of the form box(p) := [0, p1 ] × . . . × [0, pd ] ⊂ Rd≥0 , spanned by the point p ∈ Rd>0 . This paper is concerned with the problem Volume Selection: Given a set P of n points in Rd>0 , select k points in P maximizing the volume of the union of their anchored boxes. That is, we want to compute [  VolSel(P, k) := max vol box(p) , S⊆P, |S|=k p∈S as well as a set S ∗ ⊆ P of size k realizing this value. Here, vol denotes the usual volume. Motivation This geometric problem is of key importance in the context of multicriteria optimization and decision analysis, where it is known as the hypervolume subset selection problem (HSSP) ∗ Supported by the Slovenian Research Agency, program P1-0297 and project L7-5459. © Karl Bringmann, Sergio Cabello, and Michael T. M. Emmerich; licensed under Creative Commons License CC-BY 33rd International Symposium on Computational Geometry (SoCG 2017). Editors: Boris Aronov and Matthew J. Katz; Article No. 22; pp. 22:1–22:15 Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 22:2 Maximum Volume Subset Selection for Anchored Boxes [2, 3, 4, 24, 12, 13]. In this context, the points in P correspond to solutions of an optimization problem with d objectives, and the goal is to find a small subset of P that “represents” the set P well. The quality of a representative subset S ⊆ P is measured by the volume of the union of the anchored boxes spanned by points in S; this is also known as the hypervolume indicator [34]. Note that with this quality indicator, finding the optimal size-k representation is equivalent to our problem VolSel(P, k). In applications, such bounded-size representations are required in archivers for non-dominated sets [23] and for multicriteria optimization algorithms and heuristics [3, 10, 7].1 Besides, the problem has recently received attention in the context of skyline operators in databases [17]. In 2 dimensions, the problem can be solved in polynomial time [2, 13, 24], which is used in applications such as analyzing benchmark functions [2] and efficient postprocessing of multiobjective algorithms [12]. A natural question is whether efficient algorithms also exist in dimension d ≥ 3, and thus whether these applications can be pushed beyond two objectives. In this paper, we answer this question negatively, by proving that Volume Selection is NP-hard already in 3 dimensions. We then consider the question whether the previous  Ω( nk ) bound can be improved, which we answer affirmatively in 3 dimension. Finally, in any constant dimension, we improve the best-known (1 − 1/e)-approximation to an efficient polynomial-time approximation scheme (EPTAS). See Section 1.2 for details. 1.1 Further Related Work Klee’s Measure Problem To compute the volume of the union of n (not necessarily anchored) axis-aligned boxes in Rd is known as Klee’s measure problem. The fastest known algorithm takes time2 O(nd/2 ), which can be improved to O(nd/3 polylog(n)) if all boxes are cubes [15]. By a simple reduction [8], the same running time as on cubes can be obtained on anchored boxes, which can be improved to O(n log n) for d ≤ 3 [6]. These results are relevant to this paper because Klee’s measure problem on anchored boxes (spanned by the points in P ) is a special case of Volume Selection (by calling VolSel(P, |P |)). Chan [14] gave a reduction from k-Clique to Klee’s measure problem in 2k dimensions. This proves NP-hardness of Klee’s measure problem when d is part of the input (and thus d can be as large as n). Moreover, since k-Clique has no f (k) · no(k) algorithm under the Exponential Time Hypothesis [16], Klee’s measure problem has no f (d) · no(d) algorithm under the same assumption. The same hardness results also hold for Klee’s measure problem on anchored boxes, by a reduction in [8] (NP-hardness was first proven in [11]). Finally, we mention that Klee’s measure problem has a very efficient randomized (1 ± ε)approximation algorithm in time O(n log(1/δ)/ε2 ) with error probability δ [9]. Known Results for Volume Selection As mentioned above, 2-dimensional Volume Selection can be solved in polynomial time; the initial O(kn2 ) algorithm [2] was later improved to O((n − k)k + n log n) [13, 24]. In higher dimensions, by enumerating all size-k subsets and solving an instance of Klee’s measure problem on anchored boxes for each one, there is an O nk k d/3 polylog(k) algorithm. For 1 We remark that in these applications the anchor point is often not the origin, however, by a simple translation we can move our anchor point from (0, . . . , 0) to any other point in Rd . 2 In O-notation, we always assume d to be a constant, and log(x) is to be understood as max{1, log(x)}. K. Bringmann, S. Cabello, and M. T. M. Emmerich 22:3 small n − k, this can be improved to O(nd/2 log n + nn−k ) [10]. Volume Selection is NP-hard when d is part of the input, since the same holds already for Klee’s measure problem on anchored boxes. However, this does not explain the exponential dependence on k for constant d. Since the volume of the union of boxes is a submodular function (see, e.g., [31]), the greedy algorithm for submodular function maximization [27] yields a (1 − 1/e)-approximation of VolSel(P, k). This algorithm solves O(nk) instances of Klee’s measure problem on at mo (...truncated)


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2017/7201/pdf/LIPIcs-SoCG-2017-22.pdf
Article home page: http://drops.dagstuhl.de/opus/frontdoor.php?source_opus=7201

Karl Bringmann, Sergio Cabello, Michael T. M. Emmerich. Maximum Volume Subset Selection for Anchored Boxes, 2017, pp. 22:1-22:15, 77, DOI: 10.4230/LIPIcs.SoCG.2017.22