Semi-Direct Sum Theorem and Nearest Neighbor under l_infty

LIPICS - Leibniz International Proceedings in Informatics, Aug 2018

We introduce semi-direct sum theorem as a framework for proving asymmetric communication lower bounds for the functions of the form V_{i=1}^n f(x,y_i). Utilizing tools developed in proving direct sum theorem for information complexity, we show that if the function is of the form V_{i=1}^n f(x,y_i) where Alice is given x and Bob is given y_i's, it suffices to prove a lower bound for a single f(x,y_i). This opens a new avenue of attack other than the conventional combinatorial technique (i.e. "richness lemma" from [Miltersen et al., 1995]) for proving randomized lower bounds for asymmetric communication for functions of such form. As the main technical result and an application of semi-direct sum framework, we prove an information lower bound on c-approximate Nearest Neighbor (ANN) under l_infty which implies that the algorithm of [Indyk, 2001] for c-approximate Nearest Neighbor under l_infty is optimal even under randomization for both decision tree and cell probe data structure model (under certain parameter assumption for the latter). In particular, this shows that randomization cannot improve [Indyk, 2001] under decision tree model. Previously only a deterministic lower bound was known by [Andoni et al., 2008] and randomized lower bound for cell probe model by [Kapralov and Panigrahy, 2012]. We suspect further applications of our framework in exhibiting randomized asymmetric communication lower bounds for big data applications.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://drops.dagstuhl.de/opus/volltexte/2018/9410/pdf/LIPIcs-APPROX-RANDOM-2018-6.pdf

Semi-Direct Sum Theorem and Nearest Neighbor under l_infty

LIPIcs.APPROX-RANDOM. Semi-Direct Sum Theorem and Nearest Neighbor under ` Young Kun Ko 0 1 2 0 n F (~x, ~y) = 1 Department of Computer Science, Princeton University , 35 Olden St. Princeton NJ 08540 , USA 2 Mark Braverman We introduce semi-direct sum theorem as a framework for proving asymmetric communication lower bounds for the functions of the form Win=1 f (x, yi). Utilizing tools developed in proving direct sum theorem for information complexity, we show that if the function is of the form Win=1 f (x, yi) where Alice is given x and Bob is given yi's, it suffices to prove a lower bound for a single f (x, yi). This opens a new avenue of attack other than the conventional combinatorial technique (i.e. “richness lemma” from [12]) for proving randomized lower bounds for asymmetric communication for functions of such form. As the main technical result and an application of semi-direct sum framework, we prove an information lower bound on c-approximate Nearest Neighbor (ANN) under `∞ which implies that the algorithm of [9] for c-approximate Nearest Neighbor under `∞ is optimal even under randomization for both decision tree and cell probe data structure model (under certain parameter assumption for the latter). In particular, this shows that randomization cannot improve [9] under decision tree model. Previously only a deterministic lower bound was known by [1] and randomized lower bound for cell probe model by [10]. We suspect further applications of our framework in exhibiting randomized asymmetric communication lower bounds for big data applications. 2012 ACM Subject Classification Theory of computation → Communication complexity Direct Sum Theorem in communication and information complexity is a key technique in lower bounding the communication and information complexity of computing functions of the following form where Alice is given a length n string ~x and Bob is given another length n string ~y according to some distribution. Many fundamental functions, namely Disjointness and Equality (under 1 Research supported in part by an NSF Awards, DMS-1128155, CCF- 1525342, and CCF-1149888, a Packard Fellowship in Science and Engineering, and the Simons Collaboration on Algorithms and Geometry. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. and phrases Asymmetric Communication Lower Bound; Data Structure Lower Bound; Nearest Neighbor Search Introduction de Morgan’s Law), are of the above form. Direct Sum Theorem allows us to reduce computing f to computing F . This implies that the lower bound for f can be translated to a lower bound for F . Since f is a function over a smaller number of bits (1-bit AND in the case of Disjointness) providing lower bounds for f is much easier than for F from the technical perspective. Asymmetric communication complexity addresses a different setting where the bit-size of Bob’s input is much larger than Alice’s input. One example of such setting is determining whether S ∩ T = ∅ when |S| |T | (Lopsided Disjointness). In this setting, it is more meaningful to lower bound the length of message sent by Alice and Bob separately, instead of bounding the total length of the transcript as in the symmetric setting. In asymmetric setting, the trivial protocol where Alice sends her whole input is usually the most efficient protocol in terms of total number of bits communicated between Alice and Bob. Lower bounds in asymmetric communication are not only interesting from pure communication complexity theory perspective but also from its applications to lower bounds in data structures as seen in [11, 12, 14]. It is now a well-taught fact in graduate communication complexity classes that asymmetric communication lower bounds translate to data structure (cell probe and decision tree models) lower bounds. We explain this connection formally in the appendix Section B. The key technical tool that was used for showing asymmetric communication lower bounds is “Richness Lemma” from [12]. This lemma is an extension of monochromatic rectangle lower bound for symmetric communication complexity. The main observation is that lower bounds on the height and width of any monochromatic rectangle translates to lower bounds on asymmetric communication complexity. But similar to symmetric communication setting, rectangle lower bounds usually require complicated combinatorial lemmas. Further adding to this technical complication, it is also not a great tool for analyzing the performance of randomized protocols. For randomized protocols, rectangles are no longer monochromatic but are allowed to be “roughly” monochromatic, which makes it harder to argue that such rectangle does not exist. To avoid technical complications from asymmetric communication complexity, [13, 10] introduces a notion of “robust expansion,” to lower bound the number of cells required in cell-probe data structure model even under randomization. However, this method does not imply lower bound in decision tree model as shown in [1] for the deterministic case. Instead of using “Richness Lemma,” we take an information theoretic approach in establishing asymmetric communication lower bound, similar to the one observed in [8] for lopsided disjointness. But we emphasize that unlike in [8] or lopsided disjointness, the functions in consideration are of the form n F (x, ~y) = _ f (x, yi). i=1 These functions are especially interesting for big data applications. In particular, it provides a lower bound for the following set of queries: Alice (user) with input x queries Bob (database) with y1, . . . , yn whether there exists a data point that satisfies certain condition (i.e. f (x, yi)). More importantly, unlike in [8], Alice’s input is recycled over the singleton function f , while Bob’s inputs are all distinct. Therefore, simple application of Direct Sum Theorem does not suffice for this application. As in Direct Sum Theorem, we reduce the task of computing f to computing F . Indeed, as in symmetric communication setting, direct sum does not necessarily hold if one considers the total length of the transcript that is the number of communicated bits. Instead, we introduce “asymmetric information cost,” and analogous “asymmetric information complexity,” as introduced in [8, 15] as the key complexity measure in asymmetric communication. We then show that asymmetric information cost lower bounds asymmetric communication cost with an analogous argument from symmetric communication setting. 1.1 Technical Results Semi-Direct Sum With a properly defined notion on asymmetric information cost, it is rather straightforward to prove the following semi-direct sum theorem. I Theorem 1 (Semi-Direct Sum (informal)). Consider a protocol Π for computing F with the following guarantee : For all (x, ~y) ∈ F −1(1), Prπ[π(x, ~y) 6= 1] < ε1 For all (x, ~y) ∈ F −1(0), Prπ[π(x, ~y) 6= 0] < ε0 that is with a prior free error guarantee. Suppose μ is a distribution such that X, Y1, . . . , Yn are i.i.d. Furthermore, suppose f (X, Yi) = 1 with probability at most o(1/n) under μ. Then there exists a protocol Π0 (with input X and Yi) for computing a singleton f with the following guarantee : For all (x, yi) ∈ f −1(1), Prπ0 [π0(x, yi) 6= 1] < ε1 + 0.01 For all (x, yi) ∈ f −1(0), Prπ0 [π0(x, yi) 6= 0] < ε0 + 0.01 with n · I(x,yi)∼μ(Π0; Yi|X) ≤ I(x,y~)∼μ(Π; Y1, . . . Yn|X) and |Π0a| ≤ |Πa|, that is the total number of bits sent by Alice. This allows us to translate a lower bound on a singleton function f , to a lower bound on the whole F if the distribution for measuring the information cost is i.i.d. In particular, if one shows that for any Π0 with the prior free error guarantee require I(x,yi)∼μ(Π0; Yi|X) > β, this implies that Π requires I(x,y~)∼μ(Π; Y~ |X) > βn. We emphasize that the error guarantee is “prior free” in a sense that the protocol must be correct (up to some error parameters) for all inputs, instead of being correct on average under some distribution. In particular, the probability of error is only over the protocol, not the input. Under the guarantee, we can convert the prior free lower bound to a lower bound for a fixed prior using minimax argument as in [3]. More formally, we have the following theorem. I Theorem 2 (Minimax Theorem). Fix some 0 < α < 1. Consider a protocol Π for computing F such that for any (x, ~y), Prπ[π(x, ~y) 6= F (x, ~y)] < ε/α. Then there exists a “hard” distribution μ on X and Y~ such that for any protocol Π0 such that Pr(x,y~)∼μ[π0(x, ~y) 6= F (x, ~y)] < ε, I(x,y~)∼μ(Π0; Y~ |X) 1 − α ≥ sup I(x,y~)∼ν (Π; Y~ |X). ν Intuitively, this implies that the asymmetric information cost for protocols with distributional error guarantee and prior free error guarantee are at most constant factor within each other. In particular, setting α = 1/2, and exhibiting a lower bound on I(x,y~)∼ν (Π; Y~ |X) for a particular ν (which is a product distribution from semi-direct sum theorem), we exhibit an existence of a hard distribution μ such that any protocol that errors at most ε on average must reveal a large amount of information about Y~ . The main technical disadvantage of the rectangle bound is that it is extremely challenging to give any bounds on non-product distribution over Yi’s as mentioned in [1]. We avoid this complication by a standard technique in information complexity. We first provide a lower bounds on prior-free protocols. Then using minimax argument, we can argue that a hard distribution exists without having to explicitly construct a hard distribution. Nearest Neighbor Lower Bound As the main application of our approach, we revisit c-approximate Nearest Neighbor Search under `∞-norm, improving the corresponding asymmetric communication lower bound to asymmetric information lower bound. This strengthens the deterministic asymmetric communication lower bound given in [1] to a randomized asymmetric communication lower bound, and obtain two corollaries for decision tree model and cell-probe model respectively. (1) For cell-probe model, this reproves and simplifies the proof of [10]; (2) For decision tree model, this shows that [9] is tight for decision tree model even under randomization, substantially improving upon [1] which only proved tightness for deterministic decision tree model. Therefore, this closes the remaining gap for c-approximate Nearest Neighbor Search under `∞-norm under (randomized) decision tree model. Formally, consider the partial function F (x, ~y) = (1 if ∃yi kx − yik∞ ≤ 1 0 if ∀yi kx − yik∞ ≥ c. for c > 3 with x, y1, . . . , yn ∈ {0, . . . , M }d where Alice (user) is given x and Bob (database) is given y1, . . . , yn. The goal of the database is to compute F . [9] showed an unorthodox yet efficient, deterministic data structure which achieves O(logρ log d) approximation using O(dnρpoly log n) space. Surprisingly, [1] showed the optimality of [9] under any deterministic data structure (under decision tree model) while [10] showed that this is optimal for no(1)word size (randomized) cell probe data structure. Whether randomization can improve [9] under decision tree model remained an open problem for over a decade. We answer this negatively by using semi-direct sum theorem. In particular, we prove the following asymmetric information lower bound. 1 I Theorem 3. Set ρ = (τ log d)1/c, and d ≥ (2 · log n) 1−τ for some constant 1 > τ > 0. Let Π be a protocol (with both private and public randomness) that computes F with the following guarantee For all (x, ~y) ∈ F −1(1), Pr[π(x, y) 6= 1] < 0.05 For all (x, ~y) ∈ F −1(0), Pr[π(x, y) 6= 0] < 0.05 There exists a distribution u such that for any such Π and for any sufficiently small constant δ > 0 , if |Πa| ≤ δρ log n i.e. the number of bits sent by Alice, then Ib = I(x,y~)∼u⊗n+1 (Π; Y~ |X) ≥ n1−O(δ). This is an analogous theorem to asymmetric communication lower bound provided in [1]. But we point out that the error guarantee required is prior free and this is an information lower bound which is stronger than communication lower bound. This theorem, together with standard techniques in translating asymmetric communication lower bound to decision tree lower bound, shows that [9] is optimal under randomized decision tree model with prior free error guarantee. [1] raised an interesting technical bottleneck that the hard distribution for randomized asymmetric communication (with distributional error guarantee) must be a non-product distribution over the inputs. We avoid this technical bottleneck by applying the minimax theorem, thereby exhibiting a lower bound with distributional error guarantee over the inputs without explicitly constructing a hard (non-product) distribution. Furthermore, using the standard technique for translating asymmetric communication lower bound to cell-probe lower bound, this reproves [10]. 2 2.1 Preliminary Asymmetric Information Cost In this section, we define analogous quantities for asymmetric communication. We refer the reader to Section A for the definitions in symmetric settings. First, recall the following natural definition for defining asymmetric communication cost. I Definition 4 (Asymmetric Communication Cost). Protocol Π has asymmetric communication cost of (a, b) if |Πa| := Pi:odd |Πi| ≤ a and |Πb| := Pi:even |Πi| = b. Since we have two parameters, naive notion of lower bound on one quantity no longer applies. Here, the lower bound will be of the form “if |Πa| < a, then |Πb| > b(a).” We introduce analogous definitions for information cost, as first defined in [15], extending the intuition explained in Section A. I Definition 5 (Asymmetric Information Cost). Protocol Π has asymmetric information cost of [Ia, Ib] under μ if Iaμ(Π) := Iμ(Π; X|Y ) ≤ Ia and Ibμ(Π) := Iμ(Π; Y |X) ≤ Ib. Similar to information cost of a protocol being a lower bound for communication cost of a protocol, it is straightforward prove an analogous lemma for asymmetric setting. I Lemma 6 (Asymmetric Information Cost lower bounds Communication Cost). For any protocol Π and any distribution μ, Iaμ(Π) ≤ |Πa| and Ibμ(Π) ≤ |Πb|. Proof. Suppose at round r, Alice sends ar bits. Now we can write the information cost incurred at round r, that is how much additional information Bob gains, as Iμ(Mr+1; X|Y, Πr), where Mr+1 is the message sent by Alice. Now we can write Iμ(Mr+1; X|Y, Πr) ≤ Hμ(Mr+1|Y, Πr) ≤ ar where the last inequality holds since Mr+1 is of length ar and thus can have entropy of at most ar. Applying Fact 26, Iμ(Π; X|Y ) = X Iμ(Mr+1; X|Y, M1, . . . , Mr) ≤ r X ar = |Πa| r Similarly we also get Iμ(Π; Y |X) ≤ |Πb|. J Indeed, it is no longer meaningful to argue the infimum of one quantity. Instead, we impose condition on |Πa| or Iaμ(Π) then argue about the infimum of Ibμ(Π). Then similar to distributional information complexity and prior-free information complexity as defined in [3], we can define an analogous notion for asymmetric setting conditioned on |Πa| ≤ a. I Definition 7 (Distributional Asymmetric Information Complexity). Distributional asymmetric information complexity for Bob of f under μ with error ε and subject to |Πa| < a is defined as ICμ<a(f, ε) = inΠf Ibμ(Π) where the infimum is taken over the set of protocols that achieve Pr(x,y)∼μ,π∼Π[π(x, y) 6= f (x, y)] < ε. I Definition 8 (Prior Free Asymmetric Information Complexity). Prior free asymmetric information complexity for Bob of f with error ε and |Πa| < a is defined as IC<a(f, ε) = inf max Ibμ(Π) Π μ where the infimum is taken over the set of protocols that achieve Pr[π(x, y) 6= f (x, y)] < ε for all (x, y). Indeed, one can also similarly define it with an upper bound on the information revealed by Alice say Iaμ, rather than |Πa|. But for our application (to data structure lower bound), the above definition suffices. Then using a standard Minimax argument, we can also that prior free asymmetric information complexity lower bounds distributional asymmetric information complexity up to some constant factor in the error parameter. I Theorem 9 (Theorem 2 rephrased). For any f , ε ≥ 0, and α ∈ (0, 1), there exists μ on (x, y) such that IC<a(f, ε/α) ≤ ICμ<a(f, ε) 1 − α Proof. The proof follows from a standard minimax technique (by Theorem 3.5 of [3]) for translating prior free lower bound to distribution lower bound. We define the following two-player zero-sum game, where Player 1 comes up with a protocol Π for f conditioned on |Πa| < a (note that such set of protocols is closed under convex combination) and Player 2 comes up with a distribution μ on (x, y)’s with the following payoff : P (Π, μ) := (1 − α) · Ibμ(Π) I + α · Pr(x,y)∼μ[π(x, y) 6= f (x, y)] ε where I := maxμ ICμ<a(f, ε). Then the rest of the argument follows from [3]. J We remark that our hard distribution for distributional error guarantee therefore is not explicitly defined since the proof is non-constructive and follows from bounding IC<a(f, ε/α). Since IC<a(f, ε/α) is maximum over the distributions, it suffices to exhibit a lower bound on a particular (in our case a product distribution) distribution. But this does not imply that μ is a product distribution as well, since a convex combination of product distributions is not necessarily a product distribution as well. 3 Semi-Direct Sum Theorem In this section, we prove semi-direct sum theorem for prior free information cost, that is where the protocol is guaranteed to be correct with good probability on all inputs when the underlying distribution is a product distribution. Let F (x, y) be of form Win=1 f (x, yi), that is OR of functions on singletons. Recall that in symmetric setting (refer to [4]) we prove direct sum by extracting a strategy for a single copy (xi, yi) from a protocol that solves for (~x, ~y). Similarly, we can prove semi-direct sum theorem for those set of functions, by extracting a strategy for f (x, yi) from a protocol for F (x, ~y). More precisely, we prove the following theorem. I Theorem 10 (Semi Direct Sum Theorem). Consider a protocol Π for computing F with the following guarantee : For all (x, ~y) ∈ F −1(1), Prπ[π(x, ~y) 6= 1] < ε1 For all (x, ~y) ∈ F −1(0), Prπ[π(x, ~y) 6= 0] < ε0 |Πa| ≤ a and Ibμ(Π) ≤ b where μ is a distribution such that Y1, . . . , Yn are i.i.d. with X distributed independently as well. Furthermore, suppose f (X, Yi) = 1 with probability at most o(1/n) under μ. Then there exists a protocol Π0 (with input X and Yi) for computing a singleton f with the following guarantee : For all (x, yi) ∈ f −1(1), Prπ0 [π0(x, yi) 6= 1] < ε1 + 0.01 For all (x, yi) ∈ f −1(0), Prπ0 [π0(x, yi) 6= 0] < ε0 + 0.01 |Π0a| ≤ a and Ibμ(Π0) ≤ b/n. and |Π0a| ≤ |Πa|, that is the total number of bits sent by Alice. Before proving the theorem, we prove necessary facts in information theory. I Proposition 11. Suppose Y1, . . . Yn are all independent given X. Then I(Π; Yi|X, Y1, . . . Yi−1) ≥ I(Π; Yi|X) Proof. Via our independence assumption, I(Yi; Y1, . . . , Yi−1|X) = 0. Then applying Fact 26, we get the desired inequality. J I Lemma 12 (Semi Direct Sum). Suppose given X, Yi’s are i.i.d. Then 1 n · I(Π; Y |X) ≥ I(Π; Yi|X) Proof. First we show that I(Π; Y |X) ≥ Pi I(Π; Yi|X). By Fact 27 we have I(Π; Y1, . . . , Yn|X) = I(Π; Y1|X) + I(Π; Y2|Y1, X) + . . . + I(Π; Yn|Y1, . . . , Yn−1, X) Now for each term I(Π; Yi|Y1, . . . , Yi−1, X), we can lower bound it with I(Π; Yi|X) from Proposition 11 due to our assumption on independence. Applying the lower bound term by term we have I(Π; Y |X) ≥ Pi I(Π; Yi|X). By our assumption on the distribution, I(Π; Yi|X) = I(Π; Yj|X) for all i 6= j. Thus we have the desired inequality. J Proof of Theorem 10. Protocol 1 Protocol Π0 1. Alice and Bob jointly and publicly samples J ∈ [n]. 2. Bob privately samples Y1 . . . YJ−1, YJ+1 . . . Yn. 3. Alice and Bob set X = x and YJ = y then run protocol Π on (x, ~y) Consider Protocol 1. First observe that F (X, Y ) = f (X, YJ ) with high probability, since by our assumption on the density of f −1(0) under μ,  n  Pr  _ f (X, Yi) = 0 ≤ (1 − o(1/n))n−1 = o(1). i=1 i6=J Therefore, Π0 satisfies the claimed guarantee if Π has the guarantee. Also |Π0a| = |Πa| by design. The bound on Ibμ(Π0) follows from Proposition 11 and Lemma 12 via following observation I(Π0; Y |X) = I(Π; Y |X) ≤ I(Π, J ; Y |X) = I(J ; Y |X) + I(Π; YJ |J, X) = I(Π; YJ |J, X) = n1 · Xi=n1 I(Π; Yi|X) ≤ I(Π; Y1 .n. . Yn|X) . J As a contrapositive of Theorem 10, we obtain the following corollary which will be the main component in establishing asymmetric information lower bound. I Corollary 13. Suppose there exists a product distribution μ on X and Y with Prμ[f (x, y) = 1] = o(1/n) such that for any protocol Π that computes f with Pr[f (x, y) 6= π(x, y)] < ε for all (x, y) with |Πa| < a, Iμ(Π; Y |X) ≥ b. Then there exists a (product distribution) μn such that for any protocol Πn that computes F with Pr[F (x, ~y) 6= πn(x, ~y)] < ε + 0.01 for all (x, ~y) with |Πan| < a, Iμn (Πn; Y~ |X) ≥ n · b. In other words, IC<a(f, ε) ≥ n · b. In this section, we prove Theorem 3. To utilize Theorem 10, we focus on prior free information cost lower bound for the following function for x, y ∈ {0, . . . , M }d. f (x, y) := (1 if kx − yk∞ ≤ 1 0 if kx − yk∞ ≥ c We focus on lower bounding IC<a(f, ε) for some sufficiently small constant ε. The main idea of the proof is that any protocol that achieves the desired error guarantee must be distinguishing between f −1(1) and f −1(0) for any given x. In other words, whether y is in the neighborhood of x or not. To make the proof simpler, we prove the following Compression Lemma whose proof we attach in Section C which allows us to assume without loss of generality (with some minor costs) that the protocol is one round where Bob’s reply is a single bit. I Lemma 14 (Compression for Bob). Consider Π such that Iu⊗2 (Π; X|Y ) < Ia and Iu⊗2 (Π; Y |X) < Ib that computes f . Further, assume that Ia · Ib = o(1). Then Π can be compressed to a one round protocol τ ∼ Π0 with Iu⊗2 (Π0; X|Y ) < O(Ia) and Iu⊗2 (Π0; Y |X) < O H(√IaIb) with the following guarantee: For (x, y) ∈ f −1(1), Pr[τ (x, y) 6= 1] < ε1 + 0.05 There exists Z ⊂ f −1(0), with μ(Z) = μ(f −1(0)) − 0.01 such that for all (x, y) ∈ S, Pr[τ (x, y) 6= 0] < ε0 + 0.05 where ε0 = max(x,y)∈f−1(0) Pr[π(x, y) 6= f (x, y)] and similarly ε1 = max(x,y)∈f−1(1) Pr[π(x, y) 6= f (x, y)]. We also prove the following simple observation which allows us to assume without loss of generality that only Bob has access to private randomness for a single round protocol. I Lemma 15. Suppose there is a single round protocol Π where Alice and Bob both have access to both private and public randomness. Alice sends at most a-bits |Πa| ≤ a Bob sends at most b-bits of information I(Π; Y |X) ≤ b. Then there is a protocol Π0 where Alice does not have access to private randomness but |Π0a| ≤ a and I(Π0; Y |X) ≤ b Proof. Consider the following simple modification to Π0 from Π. Alice additionally samples from public randomness Rpaub (thereby revealing the randomness of Bob) instead of using private randomness and follows Π. Bob ignores Rpaub and follows Π. By design |Π0a| ≤ a. Let Mb denote the reply by Bob and Ma the message by Alice. Then I(Π0; Y |X) = I(Mb; Y |X, Rpaub, Ma) ≤ I(Mb; Y |X, Ma) = I(Π; Y |X) since I(Y ; Rpaub|Ma, X) = 0. J Lemma 15 implies that it suffices to lower bound the case where Alice does not have access to private randomness. Then we set with u(~x) with ~x as a d-dimensional vector defined as a product of u(xi)’s. Then we prove the main technical theorem whose proof is attached in Section D. I Theorem 16 (Single Function Lower Bound). Let Π be a one round protocol where Alice does not have access to private randomness, Bob replies with one bit and computes f with the following guarantee For (x, y) ∈ f −1(1), Pr[π(x, y) 6= 1] < 0.1 There exists S ⊂ f −1(0), with u(S) ≥ u(f −1(0)) − 0.01 such that for all (x, y) ∈ S, Pr[π(x, y) 6= 0] < 0.1 For such Π, for any sufficiently small constant δ > 0, if Ia = I(x,y)∼u⊗2 (Π; X|Y ) ≤ δρ log n, then Ib = I(x,y)∼u⊗2 (Π; Y |X) ≥ n−O(δ). The main intuition of the proof follows from bounding `1-norm between transcripts from (x, y) ∈ f −1(1) and (x, y) ∈ Z. If the information sent by Bob is lower than the claimed bound, then the protocol cannot distinguish between f −1(1) and f −1(0) (more technically close in terms of `1-norm) which is a contradiction. Then combining Theorem 16 with Lemma 14 and Lemma 15, we get the following prior free bound as a corollary conditioned on |Πa| ≤ δρ log n. I Corollary 17. Let Π be any protocol that computes f with the following guarantee For all (x, y) ∈ f −1(1), Pr[π(x, y) 6= 1] < 0.05 For all (x, y) ∈ f −1(0), Pr[π(x, y) 6= 0] < 0.05 For such Π, for any sufficiently small constant δ > 0, if Ia = I(x,y)∼u×u(Π; X|Y ) ≤ δρ log n, then Ib = I(x,y)∼u×u(Π; Y |X) ≥ n−O(δ). Proof. Suppose we have a protocol that computes f with the guarantee, with Ia ≤ δρ log n and Ib = I(Π; Y |X) < n−ω(δ). Then by Lemma 14, we can compress the protocol to a one-round protocol with 1-bit response from Bob with information cost O(δρ log n) and n−ω(δ) respectively for Alice and Bob, with the following error guarantee. For all (x, y) ∈ f −1(1), Pr[τ (x, y) 6= 1] < 0.1 There exists S ⊂ f −1(0), with u(S) ≥ u(f −1(0)) − 0.01 such that for all (x, y) ∈ S, Pr[τ (x, y) 6= 0] < 0.1 This is indeed a contradiction to Theorem 16. J Then finally combining Corollary 17 with Corollary 13, and with a guarantee that f (x, y) = 1 with probability o(1/n), the proof of which we append in Claim 40, we get a lower bound for any protocol that computes F (x, ~y) with prior free error guarantee, when the information cost is measured over the distribution where all X and Y ’s are distributed according to u. I Theorem 18. Let Π be any protocol that computes F with the following guarantee For all (x, ~y) ∈ F −1(1), Pr[π(x, y) 6= 1] < 0.04 For all (x, ~y) ∈ F −1(0), Pr[π(x, y) 6= 0] < 0.04 For such Π, for any sufficiently small constant δ > 0, if |Πa| ≤ δρ log n, then Ib = I(Π; Y |X) ≥ n1−O(δ). Via the minimax argument, we get the following for distributional error setting (from applying Theorem 2) I Corollary 19. For any sufficiently small constant δ > 0, there exists μ such that for any protocol Π that computes F with Pr(x,y~)∼μ[π(x, ~y) 6= F (x, ~y)] < 0.01, if |Πa| ≤ δρ log n, then Ib = I(Π; Y |X) ≥ n1−O(δ). This yields the desired data structure lower bounds from the translations in Section B. For the definition of data structures, we refer the reader to Section B as well. I Corollary 20 (Cell-Probe Lower Bound). Consider any randomized cell-probe data structure solving d-dimensional near-neighbor search under `∞ with approximation factor c = O(logρ log d), with the guarantee that if there exists yi ∈ ~y that is close to x, the querier outputs 1 with > 0.95 probability. If the word size is w = n1−δ for some δ > 0, the data structure requires space nΩ(ρ/t) for query time t. I Corollary 21 (Decision Tree Lower Bound). Let δ > 0 be arbitrary constant. A decision tree of depth r = n1−2δ and node size w = nδ that solves d-dimensional near-neighbor search under `∞ with approximation c = O(logρ log d), must have size s = nΩ(ρ). 1 2 3 4 5 6 7 8 9 10 11 A Information Theory In this section, we provide necessary backgrounds on information theory and information complexity that are used in this paper. For further reference, we refer the reader to [7] and [6, 3, 4, 2, 5]. I Definition 22 (Entropy). The entropy of a random variable X is defined as H(X) := X Pr[X = x] log x 1 Pr[X = x] . Similarly, the conditional entropy is defined as H(X|Y ) := E Y X Pr[X = x|Y = y] log x 1 Pr[X = x|Y = y] # . As an abuse of notation, we also denote binary entropy function H : [0, 1] → [0, 1] as " 1 p H(p) := p log + (1 − p) log I Definition 23 (Mutual Information). Mutual information between X and Y (conditioned on Z) is defined as I Definition 24 (KL-Divergence). KL-Divergence between two distributions μ and ν is defined as In order to bound mutual information, it suffices to bound KL-divergence, due to following fact. I Fact 25 (KL-Divergence and Mutual Information). The following equality between mutual information and KL-Divergence holds I(A; B|C) = E [D(A|B=b,C=c||A|C=c)] . B,C I(A; B|C) ≤ I(A; B|C, D). I Fact 26. Let A, B, C, D be random variables such that I(A; D|C) = 0. Then I Fact 27 (Mutual Information Chain Rule). I(A; B1, . . . Bn|C) = n X I(A; Bi|C, B1, . . . Bi−1) i=1 I Fact 28. Let p(x, y) and q(x, y) be joint distributions of two random variables X and Y . Then D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + E D(p(y|x)||q(y|x)) x∼p To provide context to asymmetric information costs that will be introduced in the next section, we first introduce usual definition of information cost of f . I Definition 29 (Information Cost). The information cost of Protocol Π with input X for Alice Y for Bob under distribution μ is defined as ICμ(Π) = Iμ(Π; X|Y ) + Iμ(Π; Y |X) where Iμ(Π; X|Y ) denote mutual information where X and Y are distributed according to μ, similarly for Iμ(Π; Y |X). Intuitively, I(Π; X|Y )-term captures how much information Alice must inject to the transcript to compute F . Similarly I(Π; Y |X)-term captures how much information Bob must inject to the transcript. This intuition comes useful in defining analogous quantities for the asymmetric communication setting. B Data Structure and Asymmetric Communication Lower Bound In this section, we introduce relevant models in data structure for which an asymmetric communication lower bound translates to their lower bounds. Cell Probe Data Structure Cell probe data structure, which is similar to Random Access machine, can be formally defined as following. I Definition 30 (Cell-Probe). Cell Probe data structure is defined as the following model for computing F . Alice (user) sends a cell-address in each round. Bob (database) answers with the contents in the queried cell. Word size w is the maximum number of bits in the cell. The following classic “translation” theorem holds. I Theorem 31 (Lemma 1 of [12]). If there exists a (randomized) cell-probe database that solves F with cell size s, word size w and query time t, then there exists (2t log s, 2tw)-(randomized) communication protocol for F . In other words, asymmetric communication lower bounds translates to cell-probe database lower bounds. This is the key observation in [1]. Decision Tree Data Structure Decision tree data structure model is formally defined as following. I Definition 32 (Decision Tree). Depending on ~y and the randomness, Bob constructs a decision tree Ty~, a complete binary tree, in which: Each node v contains a predicate function fv : X → {0, 1} where fv ∈ F , the set of allowed predicates. Each edge is labeled 0 or 1, which denotes the answer to the parent’s fv. Each leaf is labeled 0 or 1, which denotes the final output to F . Size s of the tree is the number of nodes. Depth r is the depth of the tree. Predicate size w is log2 |F |. Under this model, a more efficient translation theorem holds which results in better lower bound under decision tree model. I Theorem 33 ([1]). If there exists a (randomized) decision tree that solves F with size s, depth d, and node size w, then there exists (O(log s), O(dw log s))-(randomized) communication protocol for F . As a corollary of Lemma 14 and Theorem 3, we get the following corollaries in data structure lower bound. Combining Theorem 31 with Theorem 18 (or Corollary 19 depending on the error guarantee), we get the following corollary. I Corollary 20. Consider any randomized cell-probe data structure solving d-dimensional near-neighbor search under `∞ with approximation factor c = O(logρ log d), with the guarantee that if there exists yi ∈ ~y that is close to x, the querier outputs 1 with > 0.95 probability. If the word size is w = n1−δ for some δ > 0, the data structure requires space nΩ(ρ/t) for query time t. Combining Theorem 33 with Theorem 18 (or Corollary 19 depending on the error guarantee), we get the following corollary. I Corollary 21. Let δ > 0 be arbitrary constant. A decision tree of depth r = n1−2δ and node size w = nδ that solves d-dimensional near-neighbor search under `∞ with approximation c = O(logρ log d), must have size s = nΩ(ρ). C Proof of Compression Lemma In this section, we show that any protocol such that Alice sends at most Ia bits, and Bob sends at most Ib bits of information with Ib 1 and Ia · Ib = o(1) can be compressed to a one-round protocol such that Alice sends at most O(Ia) bits and Bob sends at most 1 bits and H(√IaIb)-bits of information. First we use the following compression theorem to compress the size of the transcript where τ is the resulting compressed protocol: I Theorem 34 (Theorem 1.2 in [15]). Suppose Ia = ω(1). Then any protocol Π such that Alice sends at most Ia-bits of information and Bob sends at most Ib-bits of information can be simulated with O(Ia2O(Ib)) bits of communication with the following guarantee : For (x, y) ∈ F −1(1), Pr[π(x, y) 6= τ (x, y)] < ε1 + 0.01 There exists Γ ⊂ F −1(0), with μ(Γ) = μ(F −1(0)) − 0.01 such that for all (x, y) ∈ Γ, Pr[π(x, y) 6= τ (x, y)] < ε0 + 0.01 where ε1 and ε0 refers to the respective guarantee for the original protocol Π. I Remark. Theorem 1.2 in [15] is not stated as the above form. But setting the output as 1 on high divergence (x, y) pairs and adjusting the constants give the above guarantee. Also note that 2O(Ib) ≤ 2 in our setting where Ib 1. Theorem 34 allows us to assume the length of the protocol |Π| = |Πa| + |Πb| ≤ O(Ia) for our setting. Now we introduce the following round compression protocol and its guarantee as Lemma 14. Protocol 2 Compression Protocol Alice samples π ∼ Πx, the distribution of protocol conditioned on Alice’s input x. Alice sends π to Bob Bob answers 1 if π agrees with his input y and private randomness rb. Rejects otherwise. I Lemma 14. Consider Π such that I(Π; X|Y ) < Ia and I(Π; Y |X) < Ib that computes f under the distribution μ. Further, assume that Ia · Ib = o(1). Then Π can be compressed to a one round protocol τ ∼ Π0 with I(Π0; X|Y ) < O(Ia) and I(Π0; Y |X) < O H(√IaIb) with the following guarantee: For (x, y) ∈ f −1(1), Pr[τ (x, y) 6= 1] < ε1 + 0.05 There exists Z ⊂ f −1(0), with μ(Z) = μ(f −1(0)) − 0.01 such that for all (x, y) ∈ S, Pr[τ (x, y) 6= 0] < ε0 + 0.05 where ε0 = max(x,y)∈f−1(0) Pr[π(x, y) 6= f (x, y)] and similarly ε1 = max(x,y)∈f−1(1) Pr[π(x, y) 6= f (x, y)]. Towards proving Lemma 14, we first prove two facts on non-negative numbers. I Claim 35. Let Πx,y denote the distribution of transcript conditioned on X = x and Y = y. Similarly let Πx denote the distribution conditioned on X = x. Similarly let Mi|M<i,x,y and Mi|M<i,x denote the distribution of message at i-th round conditioned on all previous messages and x, y or just x. Then R D(Πx,y||Πx) = X E i=1 M<i|x,y Proof. Follows from Fact 28. D(Mi|M<i,x,y||Mi|M<i,x) I Claim 36. Consider R non-negative numbers I1, . . . IR. Then R vu R X pIr ≤ tuR · X Ir r=1 r=1 Proof. Follows from Cauchy-Schwarz. Proof of Lemma 14. Note that without loss of generality, the total amount of bits communicated in Protocol 2 is |Π| + 1 = O(Ia) since one can compress the transcript using Theorem 34. This immediately implies that I(Π0; X|Y ) < O(Ia), since the number of bits sent by Alice is at most O(Ia). Similarly, it also implies that the number of round (R) is at most O(Ia). Now, to prove the full lemma, it suffices to bound the probability of Bob sending 0. We bound by computing the probability of Bob rejecting the transcript at each round i + 1, (i.e. the odd round messages) then summing up these probabilities. J J M≤i,X,Y [kMi+1|M≤i, X, Y − Mi+1|M≤i, Xk1] E ≤ O ≤ O q E M≤i,X,Y r E M≤i,X,Y D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X) [D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X)] R−1 X E i=0 M≤i,X,Y [kMi+1|M≤i, X, Y − Mi+1|M≤i, Xk1] i:odd  ≤ O RX−1 r  i=0 i:odd ≤ O vuuutR · RX−1 E i=0 M≤i,X,Y i:odd E M≤i,X,Y [D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X)]   [D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X)]  ≤ O r ! R · XE,Y [D(ΠX,Y ||ΠX )] = O pR · Ib !  (1) (2) Let Mi denote the message sent in i-th round of the protocol. If i + 1 is odd (the round where Bob sends message), then we have Mi+1|M≤i, X, Y = Mi+1|M≤i, Y since it is a protocol and Bob’s response only depends on Y and M≤i. Then by Pinsker’s inequality, we bound the error of sampling. The probability of making error at i + 1-th round can be bounded using Pinsker’s inequality as where the last inequality is using the concavity of √x. Then we can sum and bound the probability of error as where the second inequality follows from Claim 36 and the third inequality follows from Claim 35. Now R < O(Ia) from the total number of bits communicated by Alice, which upper bounds the number of rounds, is O(Ia) from Theorem 34. This `1-norm bound implies that Bob answers 0 with at most O(√IaIb)-probability in expectation over (x, y). Then the amount of information that Alice learns is at most O H(√IaIb) . It remains to show that the error rate guarantee is preserved. We divide into two cases. If f (x, y) = 1, by design there is no guarantee lost on (x, y) ∈ f −1(1), except the loss from Theorem 34. Now suppose f (x, y) = 0. Recall that the compression fails with probability at most O(√IaIb) in expectation. Let Z0 := {(x, y)|kΠ|X = x, Y = y − Π|X = xk1 < (IaIb)2/3}. By Markov’s inequality, μ(Z0) ≥ 1 − o(1). Set Z = Z0 ∩ Γ ∩ f −1(0), where Γ is the set from Theorem 34. It is easy to check that the probability of error only increases by o(1) for any (x, y) ∈ Z since (IaIb)2/3 = o(1) by assumption while being in Γ only increases the error rate by 0.01. Thus Z indeed satisfies the guarantee. J D Omitted Proof from Section 4 Recall that with u(x) defined as the product of u(xi)’s. Under such u, we make use of the following lemma from [1]. I Lemma 37 (Isoperimetric Inequality (Lemma 9 of [1])). Consider any set S ⊆ {0, . . . M }d. Let N (S) be the set of points at distance at most 1 from S under `∞. Then u(N (S)) ≥ u(S)1/ρ. Then we get the following corollary in terms of KL-Divergence. I Corollary 38. Let u|S denote u restricted on a subset S ⊆ {0, . . . M }d. Then for any S u(N (S)) ≥ 2−D(u|S||u)/ρ Proof. Observe that D(u|S||u) = − log u(S) or u(S) = 2−D(u|S||u). Then Lemma 37 implies the desired inequality. J We also state the following simple fact about norm. I Fact 39 (`1-norm convex). Consider a family of distributions μ and {νi}i. Then kμ − Ei [νi]k1 ≤ Ei [kμ − νik1] Proof. Follows from the convexity of `1-norm. Now we are ready to prove Theorem 16. J kΠx,y1 − Πx,y2 k1 < 0.2 νf−1(0) := E E x∼μ (xy,y∼)u∈Z νf−1(1) := E E x∼μ f(xy,∼y)u=1 [Πx,y] [Πx,y] Proof of Theorem 16. Let a := δρ log n and b := 2−2000a/ρ. Recall that we assume that the protocol is one round, therefore the distribution on the protocol Π is on Alice’s message πa and Bob’s message πb. Also let Πb|πa be the distribution on Bob’s message given Alice’s message as πa. If πb ∼ Πb|πa, note that each Bob’s message πb conditioned on πa induces a prior on Bob’s input y, μπb . Suppose further Eπb∼Πb|πa D(μπb ||u) < 10b. Note that such message πa must exist by Markov argument. Since Alice does not have access to private randomness, prior on X conditioned on πa is exactly a subset on {0, . . . M }d, which we denote as Sπa . With πa fixed, with an abuse of notation let Πx,y denote the distribution on the transcript (the remaining message, Πb) conditioned on input being (x, y) and Alice’s message. And let μ denote the distribution on x conditioned on the message πa, that is u|Sπa . Note that Πx,y1 and Πx,y2 are close in `1 norm if f (x, y1) = f (x, y2) = 1 or (x, y1), (x, y2) ∈ Z, since the protocol is a one round protocol and Alice’s message is fixed. In particular, since both Πx,y1 and Πx,y2 outputs f (x, y1) = f (x, y2) with at least 0.9 probability for f −1(1) and Z. Similarly, if (x, y1) ∈ Z and f (x, y2) = 1, kΠx,y1 − Πx,y2 k1 > 1.8. Now consider First observe that from Fact 39 if (x, y) ∈ Z, kΠx,y − νf−1(0)k1 ≤ xE∼μ (xy,y0 E∼0)u∈Z [kΠx,y − Πx,y0 k1] < 0.2 kΠx,y − νf−1(1)k1 ≤ xE∼μ E y0∼u f(x,y0)=1 Then by triangular inequality if (x, y1) ∈ Z and f (x, y2) = 1, Combining (3), (4) and (5) we get kνf−1(0) − νf−1(1)k1 > 1.4. Now to derive the contradiction, we define ν and ν0 as following. ν := xP∼rμ[(x, y) ∈ Z] · νf−1(0) + xP∼rμ[(x, y) ∈ f −1(1)] · νf−1(1) y∼u y∼u 1.8 < kΠx,y1 − Πx,y2 k1 ≤ kΠx,y1 − νf−1(0)k1 + kνf−1(0) − νf−1(1)k1 + kΠx,y2 − νf−1(1)k1 (3) (4) (5) where νf−1(∗) denotes expected prior of transcripts from inputs that are neither in Z nor f −1(1). Then we have kν0 − νk1 ≥ k xP∼rμ[(x, y) ∈ f −1(1)] · νf−1(0) − νf−1(1) k1 y∼u = xP∼rμ[(x, y) ∈ f −1(1)] · kνf−1(0) − νf−1(1)k1 ≥ Ω y∼u ! since the difference is minimized when νf−1(∗) = νf−1(0). This via Pinsker’s inequality implies D(ν0||ν) ≥ Ω ! xP∼rμ[(x, y) ∈ f −1(1)]2 . y∼u xP∼rμ[(x, y) ∈ f −1(1)]2 = u(N (Supp(μ)))2 ≥ 2−2D(μ||u)/ρ y∼u 1 I Claim 40 (Density). Set ρ = (ε log d)1/c, and d ≥ (2 · log n) 1−ε . Then Pr (x,y)∼u×u [f (x, y) = 0] ≥ 1 − o(1/n) Proof. We prove by bounding Pr(x,y)∼u×u [kx − yk∞ ≤ c]. Alexandr Andoni , Dorian Croitoru, and Mihai Patrascu . Hardness of nearest neighbor under l-infinity . In Foundations of Computer Science , 2008 . FOCS'08. IEEE 49th Annual IEEE Symposium on , pages 424 - 433 . IEEE, 2008 . Boaz Barak , Mark Braverman, Xi Chen, and Anup Rao . How to compress interactive communication . SIAM Journal on Computing , 42 ( 3 ): 1327 - 1363 , 2013 . Mark Braverman . Interactive information complexity . SIAM Journal on Computing , 44 ( 6 ): 1698 - 1739 , 2015 . doi: 10 .1137/130938517. Mark Braverman , Ankit Garg, Denis Pankratov, and Omri Weinstein . From information to exact communication . In Proceedings of the forty-fifth annual ACM symposium on Theory of computing , pages 151 - 160 . ACM, 2013 . Mark Braverman and Anup Rao . Information equals amortized communication . IEEE Transactions on Information Theory , 60 ( 10 ): 6058 - 6069 , 2014 . Amit Chakrabarti , Yaoyun Shi, Anthony Wirth , and Andrew Yao . Informational complexity and the direct sum problem for simultaneous message complexity . In Foundations of Computer Science , 2001 . Proceedings. 42nd IEEE Symposium on , pages 270 - 278 . IEEE, 2001 . Thomas M Cover and Joy A Thomas . Elements of information theory . John Wiley &amp; Sons, 2012 . Anirban Dasgupta , Ravi Kumar, and D. Sivakumar . Sparse and lopsided set disjointness via information theory . In Anupam Gupta, Klaus Jansen, José Rolim, and Rocco Servedio, editors, Approximation, Randomization, and Combinatorial Optimization . Algorithms and Techniques , pages 517 - 528 , Berlin, Heidelberg, 2012 . Springer Berlin Heidelberg. Piotr Indyk . On approximate nearest neighbors under l∞ norm . Journal of Computer and System Sciences , 63 ( 4 ): 627 - 638 , 2001 . Michael Kapralov and Rina Panigrahy . Nns lower bounds via metric expansion for l∞ and emd . In International Colloquium on Automata, Languages, and Programming , pages 545 - 556 . Springer, 2012 . Peter Bro Miltersen . Lower bounds for union-split-find related problems on random access machines . In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing , pages 625 - 634 . ACM, 1994 . Peter Bro Miltersen , Noam Nisan, Shmuel Safra, and Avi Wigderson . On data structures and asymmetric communication complexity . In Proceedings of the twenty-seventh annual ACM symposium on Theory of computing , pages 103 - 111 . ACM, 1995 . Rina Panigrahy , Kunal Talwar, and Udi Wieder . Lower bounds on near neighbor search via metric expansion . In Foundations of Computer Science (FOCS) , 2010 51st Annual IEEE Symposium on, pages 805 - 814 . IEEE, 2010 . Mihai Patrascu . Lower Bound Techniques for Data Structures . PhD thesis , Massachusetts Institute of Technology, Cambridge, MA, USA, 2008 . AAI0821553. Sivaramakrishnan Natarajan Ramamoorthy and Anup Rao . How to compress asymmetric communication . In Proceedings of the 30th Conference on Computational Complexity , pages 102 - 123 . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik , 2015 .


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2018/9410/pdf/LIPIcs-APPROX-RANDOM-2018-6.pdf

Mark Braverman, Young Kun Ko. Semi-Direct Sum Theorem and Nearest Neighbor under l_infty, LIPICS - Leibniz International Proceedings in Informatics, 2018, 6:1-6:17, DOI: 10.4230/LIPIcs.APPROX-RANDOM.2018.6