#### Semi-Direct Sum Theorem and Nearest Neighbor under l_infty

LIPIcs.APPROX-RANDOM.
Semi-Direct Sum Theorem and Nearest Neighbor under `
Young Kun Ko 0 1 2
0 n F (~x, ~y) =
1 Department of Computer Science, Princeton University , 35 Olden St. Princeton NJ 08540 , USA
2 Mark Braverman
We introduce semi-direct sum theorem as a framework for proving asymmetric communication lower bounds for the functions of the form Win=1 f (x, yi). Utilizing tools developed in proving direct sum theorem for information complexity, we show that if the function is of the form Win=1 f (x, yi) where Alice is given x and Bob is given yi's, it suffices to prove a lower bound for a single f (x, yi). This opens a new avenue of attack other than the conventional combinatorial technique (i.e. “richness lemma” from [12]) for proving randomized lower bounds for asymmetric communication for functions of such form. As the main technical result and an application of semi-direct sum framework, we prove an information lower bound on c-approximate Nearest Neighbor (ANN) under `∞ which implies that the algorithm of [9] for c-approximate Nearest Neighbor under `∞ is optimal even under randomization for both decision tree and cell probe data structure model (under certain parameter assumption for the latter). In particular, this shows that randomization cannot improve [9] under decision tree model. Previously only a deterministic lower bound was known by [1] and randomized lower bound for cell probe model by [10]. We suspect further applications of our framework in exhibiting randomized asymmetric communication lower bounds for big data applications. 2012 ACM Subject Classification Theory of computation → Communication complexity Direct Sum Theorem in communication and information complexity is a key technique in lower bounding the communication and information complexity of computing functions of the following form where Alice is given a length n string ~x and Bob is given another length n string ~y according to some distribution. Many fundamental functions, namely Disjointness and Equality (under 1 Research supported in part by an NSF Awards, DMS-1128155, CCF- 1525342, and CCF-1149888, a Packard Fellowship in Science and Engineering, and the Simons Collaboration on Algorithms and Geometry. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
and phrases Asymmetric Communication Lower Bound; Data Structure Lower Bound; Nearest Neighbor Search
Introduction
de Morgan’s Law), are of the above form. Direct Sum Theorem allows us to reduce computing
f to computing F . This implies that the lower bound for f can be translated to a lower
bound for F . Since f is a function over a smaller number of bits (1-bit AND in the case
of Disjointness) providing lower bounds for f is much easier than for F from the technical
perspective.
Asymmetric communication complexity addresses a different setting where the bit-size of
Bob’s input is much larger than Alice’s input. One example of such setting is determining
whether S ∩ T = ∅ when |S| |T | (Lopsided Disjointness). In this setting, it is more
meaningful to lower bound the length of message sent by Alice and Bob separately, instead
of bounding the total length of the transcript as in the symmetric setting. In asymmetric
setting, the trivial protocol where Alice sends her whole input is usually the most efficient
protocol in terms of total number of bits communicated between Alice and Bob.
Lower bounds in asymmetric communication are not only interesting from pure
communication complexity theory perspective but also from its applications to lower bounds in data
structures as seen in [11, 12, 14]. It is now a well-taught fact in graduate communication
complexity classes that asymmetric communication lower bounds translate to data structure
(cell probe and decision tree models) lower bounds. We explain this connection formally in
the appendix Section B.
The key technical tool that was used for showing asymmetric communication lower bounds
is “Richness Lemma” from [12]. This lemma is an extension of monochromatic rectangle
lower bound for symmetric communication complexity. The main observation is that lower
bounds on the height and width of any monochromatic rectangle translates to lower bounds
on asymmetric communication complexity. But similar to symmetric communication setting,
rectangle lower bounds usually require complicated combinatorial lemmas. Further adding
to this technical complication, it is also not a great tool for analyzing the performance of
randomized protocols. For randomized protocols, rectangles are no longer monochromatic
but are allowed to be “roughly” monochromatic, which makes it harder to argue that such
rectangle does not exist.
To avoid technical complications from asymmetric communication complexity, [13, 10]
introduces a notion of “robust expansion,” to lower bound the number of cells required in
cell-probe data structure model even under randomization. However, this method does not
imply lower bound in decision tree model as shown in [1] for the deterministic case.
Instead of using “Richness Lemma,” we take an information theoretic approach in
establishing asymmetric communication lower bound, similar to the one observed in [8]
for lopsided disjointness. But we emphasize that unlike in [8] or lopsided disjointness, the
functions in consideration are of the form
n
F (x, ~y) = _ f (x, yi).
i=1
These functions are especially interesting for big data applications. In particular, it provides
a lower bound for the following set of queries:
Alice (user) with input x queries Bob (database) with y1, . . . , yn whether there exists
a data point that satisfies certain condition (i.e. f (x, yi)).
More importantly, unlike in [8], Alice’s input is recycled over the singleton function f , while
Bob’s inputs are all distinct. Therefore, simple application of Direct Sum Theorem does not
suffice for this application.
As in Direct Sum Theorem, we reduce the task of computing f to computing F . Indeed, as
in symmetric communication setting, direct sum does not necessarily hold if one considers the
total length of the transcript that is the number of communicated bits. Instead, we introduce
“asymmetric information cost,” and analogous “asymmetric information complexity,” as
introduced in [8, 15] as the key complexity measure in asymmetric communication. We then
show that asymmetric information cost lower bounds asymmetric communication cost with
an analogous argument from symmetric communication setting.
1.1
Technical Results
Semi-Direct Sum
With a properly defined notion on asymmetric information cost, it is rather straightforward
to prove the following semi-direct sum theorem.
I Theorem 1 (Semi-Direct Sum (informal)). Consider a protocol Π for computing F with the
following guarantee :
For all (x, ~y) ∈ F −1(1), Prπ[π(x, ~y) 6= 1] < ε1
For all (x, ~y) ∈ F −1(0), Prπ[π(x, ~y) 6= 0] < ε0
that is with a prior free error guarantee. Suppose μ is a distribution such that X, Y1, . . . , Yn
are i.i.d. Furthermore, suppose f (X, Yi) = 1 with probability at most o(1/n) under μ. Then
there exists a protocol Π0 (with input X and Yi) for computing a singleton f with the following
guarantee :
For all (x, yi) ∈ f −1(1), Prπ0 [π0(x, yi) 6= 1] < ε1 + 0.01
For all (x, yi) ∈ f −1(0), Prπ0 [π0(x, yi) 6= 0] < ε0 + 0.01
with n · I(x,yi)∼μ(Π0; Yi|X) ≤ I(x,y~)∼μ(Π; Y1, . . . Yn|X) and |Π0a| ≤ |Πa|, that is the total
number of bits sent by Alice.
This allows us to translate a lower bound on a singleton function f , to a lower bound on
the whole F if the distribution for measuring the information cost is i.i.d. In particular, if
one shows that for any Π0 with the prior free error guarantee require I(x,yi)∼μ(Π0; Yi|X) > β,
this implies that Π requires I(x,y~)∼μ(Π; Y~ |X) > βn.
We emphasize that the error guarantee is “prior free” in a sense that the protocol must
be correct (up to some error parameters) for all inputs, instead of being correct on average
under some distribution. In particular, the probability of error is only over the protocol, not
the input. Under the guarantee, we can convert the prior free lower bound to a lower bound
for a fixed prior using minimax argument as in [3]. More formally, we have the following
theorem.
I Theorem 2 (Minimax Theorem). Fix some 0 < α < 1. Consider a protocol Π for
computing F such that for any (x, ~y), Prπ[π(x, ~y) 6= F (x, ~y)] < ε/α. Then there exists a
“hard” distribution μ on X and Y~ such that for any protocol Π0 such that Pr(x,y~)∼μ[π0(x, ~y) 6=
F (x, ~y)] < ε,
I(x,y~)∼μ(Π0; Y~ |X)
1 − α
≥ sup I(x,y~)∼ν (Π; Y~ |X).
ν
Intuitively, this implies that the asymmetric information cost for protocols with distributional
error guarantee and prior free error guarantee are at most constant factor within each other.
In particular, setting α = 1/2, and exhibiting a lower bound on I(x,y~)∼ν (Π; Y~ |X) for a
particular ν (which is a product distribution from semi-direct sum theorem), we exhibit an
existence of a hard distribution μ such that any protocol that errors at most ε on average
must reveal a large amount of information about Y~ .
The main technical disadvantage of the rectangle bound is that it is extremely challenging
to give any bounds on non-product distribution over Yi’s as mentioned in [1]. We avoid this
complication by a standard technique in information complexity. We first provide a lower
bounds on prior-free protocols. Then using minimax argument, we can argue that a hard
distribution exists without having to explicitly construct a hard distribution.
Nearest Neighbor Lower Bound
As the main application of our approach, we revisit c-approximate Nearest Neighbor Search
under `∞-norm, improving the corresponding asymmetric communication lower bound to
asymmetric information lower bound. This strengthens the deterministic asymmetric
communication lower bound given in [1] to a randomized asymmetric
communication lower bound, and obtain two corollaries for decision tree model and cell-probe
model respectively. (1) For cell-probe model, this reproves and simplifies the proof of
[10]; (2) For decision tree model, this shows that [9] is tight for decision tree model
even under randomization, substantially improving upon [1] which only proved
tightness for deterministic decision tree model. Therefore, this closes the remaining gap
for c-approximate Nearest Neighbor Search under `∞-norm under (randomized)
decision tree model.
Formally, consider the partial function
F (x, ~y) =
(1 if ∃yi kx − yik∞ ≤ 1
0 if ∀yi kx − yik∞ ≥ c.
for c > 3 with x, y1, . . . , yn ∈ {0, . . . , M }d where Alice (user) is given x and Bob (database)
is given y1, . . . , yn. The goal of the database is to compute F . [9] showed an unorthodox
yet efficient, deterministic data structure which achieves O(logρ log d) approximation using
O(dnρpoly log n) space. Surprisingly, [1] showed the optimality of [9] under any deterministic
data structure (under decision tree model) while [10] showed that this is optimal for
no(1)word size (randomized) cell probe data structure. Whether randomization can improve
[9] under decision tree model remained an open problem for over a decade. We answer
this negatively by using semi-direct sum theorem. In particular, we prove the following
asymmetric information lower bound.
1
I Theorem 3. Set ρ = (τ log d)1/c, and d ≥ (2 · log n) 1−τ for some constant 1 > τ > 0. Let
Π be a protocol (with both private and public randomness) that computes F with the following
guarantee
For all (x, ~y) ∈ F −1(1), Pr[π(x, y) 6= 1] < 0.05
For all (x, ~y) ∈ F −1(0), Pr[π(x, y) 6= 0] < 0.05
There exists a distribution u such that for any such Π and for any sufficiently small
constant δ > 0 , if |Πa| ≤ δρ log n i.e. the number of bits sent by Alice, then Ib =
I(x,y~)∼u⊗n+1 (Π; Y~ |X) ≥ n1−O(δ).
This is an analogous theorem to asymmetric communication lower bound provided in [1].
But we point out that the error guarantee required is prior free and this is an information
lower bound which is stronger than communication lower bound. This theorem, together
with standard techniques in translating asymmetric communication lower bound to decision
tree lower bound, shows that [9] is optimal under randomized decision tree model with prior
free error guarantee.
[1] raised an interesting technical bottleneck that the hard distribution for randomized
asymmetric communication (with distributional error guarantee) must be a non-product
distribution over the inputs. We avoid this technical bottleneck by applying the minimax
theorem, thereby exhibiting a lower bound with distributional error guarantee over the inputs
without explicitly constructing a hard (non-product) distribution. Furthermore, using the
standard technique for translating asymmetric communication lower bound to cell-probe
lower bound, this reproves [10].
2
2.1
Preliminary
Asymmetric Information Cost
In this section, we define analogous quantities for asymmetric communication. We refer
the reader to Section A for the definitions in symmetric settings. First, recall the following
natural definition for defining asymmetric communication cost.
I Definition 4 (Asymmetric Communication Cost). Protocol Π has asymmetric communication
cost of (a, b) if |Πa| := Pi:odd |Πi| ≤ a and |Πb| := Pi:even |Πi| = b.
Since we have two parameters, naive notion of lower bound on one quantity no longer
applies. Here, the lower bound will be of the form “if |Πa| < a, then |Πb| > b(a).”
We introduce analogous definitions for information cost, as first defined in [15], extending
the intuition explained in Section A.
I Definition 5 (Asymmetric Information Cost). Protocol Π has asymmetric information cost
of [Ia, Ib] under μ if Iaμ(Π) := Iμ(Π; X|Y ) ≤ Ia and Ibμ(Π) := Iμ(Π; Y |X) ≤ Ib.
Similar to information cost of a protocol being a lower bound for communication cost of
a protocol, it is straightforward prove an analogous lemma for asymmetric setting.
I Lemma 6 (Asymmetric Information Cost lower bounds Communication Cost). For any
protocol Π and any distribution μ, Iaμ(Π) ≤ |Πa| and Ibμ(Π) ≤ |Πb|.
Proof. Suppose at round r, Alice sends ar bits. Now we can write the information cost
incurred at round r, that is how much additional information Bob gains, as Iμ(Mr+1; X|Y, Πr),
where Mr+1 is the message sent by Alice. Now we can write
Iμ(Mr+1; X|Y, Πr) ≤ Hμ(Mr+1|Y, Πr) ≤ ar
where the last inequality holds since Mr+1 is of length ar and thus can have entropy of at
most ar. Applying Fact 26,
Iμ(Π; X|Y ) = X Iμ(Mr+1; X|Y, M1, . . . , Mr) ≤
r
X ar = |Πa|
r
Similarly we also get Iμ(Π; Y |X) ≤ |Πb|.
J
Indeed, it is no longer meaningful to argue the infimum of one quantity. Instead, we
impose condition on |Πa| or Iaμ(Π) then argue about the infimum of Ibμ(Π). Then similar to
distributional information complexity and prior-free information complexity as defined in [3],
we can define an analogous notion for asymmetric setting conditioned on |Πa| ≤ a.
I Definition 7 (Distributional Asymmetric Information Complexity). Distributional asymmetric
information complexity for Bob of f under μ with error ε and subject to |Πa| < a is defined
as
ICμ<a(f, ε) = inΠf Ibμ(Π)
where the infimum is taken over the set of protocols that achieve Pr(x,y)∼μ,π∼Π[π(x, y) 6=
f (x, y)] < ε.
I Definition 8 (Prior Free Asymmetric Information Complexity). Prior free asymmetric
information complexity for Bob of f with error ε and |Πa| < a is defined as
IC<a(f, ε) = inf max Ibμ(Π)
Π μ
where the infimum is taken over the set of protocols that achieve Pr[π(x, y) 6= f (x, y)] < ε
for all (x, y).
Indeed, one can also similarly define it with an upper bound on the information revealed
by Alice say Iaμ, rather than |Πa|. But for our application (to data structure lower bound), the
above definition suffices. Then using a standard Minimax argument, we can also that prior
free asymmetric information complexity lower bounds distributional asymmetric information
complexity up to some constant factor in the error parameter.
I Theorem 9 (Theorem 2 rephrased). For any f , ε ≥ 0, and α ∈ (0, 1), there exists μ on
(x, y) such that
IC<a(f, ε/α) ≤
ICμ<a(f, ε)
1 − α
Proof. The proof follows from a standard minimax technique (by Theorem 3.5 of [3]) for
translating prior free lower bound to distribution lower bound. We define the following
two-player zero-sum game, where Player 1 comes up with a protocol Π for f conditioned on
|Πa| < a (note that such set of protocols is closed under convex combination) and Player 2
comes up with a distribution μ on (x, y)’s with the following payoff :
P (Π, μ) := (1 − α) ·
Ibμ(Π)
I
+ α ·
Pr(x,y)∼μ[π(x, y) 6= f (x, y)]
ε
where I := maxμ ICμ<a(f, ε). Then the rest of the argument follows from [3].
J
We remark that our hard distribution for distributional error guarantee therefore is not
explicitly defined since the proof is non-constructive and follows from bounding IC<a(f, ε/α).
Since IC<a(f, ε/α) is maximum over the distributions, it suffices to exhibit a lower bound on
a particular (in our case a product distribution) distribution. But this does not imply that μ
is a product distribution as well, since a convex combination of product distributions is not
necessarily a product distribution as well.
3
Semi-Direct Sum Theorem
In this section, we prove semi-direct sum theorem for prior free information cost, that is
where the protocol is guaranteed to be correct with good probability on all inputs when the
underlying distribution is a product distribution.
Let F (x, y) be of form Win=1 f (x, yi), that is OR of functions on singletons. Recall that
in symmetric setting (refer to [4]) we prove direct sum by extracting a strategy for a single
copy (xi, yi) from a protocol that solves for (~x, ~y). Similarly, we can prove semi-direct sum
theorem for those set of functions, by extracting a strategy for f (x, yi) from a protocol for
F (x, ~y). More precisely, we prove the following theorem.
I Theorem 10 (Semi Direct Sum Theorem). Consider a protocol Π for computing F with
the following guarantee :
For all (x, ~y) ∈ F −1(1), Prπ[π(x, ~y) 6= 1] < ε1
For all (x, ~y) ∈ F −1(0), Prπ[π(x, ~y) 6= 0] < ε0
|Πa| ≤ a and Ibμ(Π) ≤ b
where μ is a distribution such that Y1, . . . , Yn are i.i.d. with X distributed independently as
well. Furthermore, suppose f (X, Yi) = 1 with probability at most o(1/n) under μ. Then there
exists a protocol Π0 (with input X and Yi) for computing a singleton f with the following
guarantee :
For all (x, yi) ∈ f −1(1), Prπ0 [π0(x, yi) 6= 1] < ε1 + 0.01
For all (x, yi) ∈ f −1(0), Prπ0 [π0(x, yi) 6= 0] < ε0 + 0.01
|Π0a| ≤ a and Ibμ(Π0) ≤ b/n.
and |Π0a| ≤ |Πa|, that is the total number of bits sent by Alice.
Before proving the theorem, we prove necessary facts in information theory.
I Proposition 11. Suppose Y1, . . . Yn are all independent given X. Then
I(Π; Yi|X, Y1, . . . Yi−1) ≥ I(Π; Yi|X)
Proof. Via our independence assumption, I(Yi; Y1, . . . , Yi−1|X) = 0. Then applying Fact 26,
we get the desired inequality. J
I Lemma 12 (Semi Direct Sum). Suppose given X, Yi’s are i.i.d. Then
1
n · I(Π; Y |X) ≥ I(Π; Yi|X)
Proof. First we show that I(Π; Y |X) ≥ Pi I(Π; Yi|X). By Fact 27 we have
I(Π; Y1, . . . , Yn|X) = I(Π; Y1|X) + I(Π; Y2|Y1, X) + . . . + I(Π; Yn|Y1, . . . , Yn−1, X)
Now for each term I(Π; Yi|Y1, . . . , Yi−1, X), we can lower bound it with I(Π; Yi|X) from
Proposition 11 due to our assumption on independence. Applying the lower bound term
by term we have I(Π; Y |X) ≥ Pi I(Π; Yi|X). By our assumption on the distribution,
I(Π; Yi|X) = I(Π; Yj|X) for all i 6= j. Thus we have the desired inequality. J
Proof of Theorem 10.
Protocol 1 Protocol Π0
1. Alice and Bob jointly and publicly samples J ∈ [n].
2. Bob privately samples Y1 . . . YJ−1, YJ+1 . . . Yn.
3. Alice and Bob set X = x and YJ = y then run protocol Π on (x, ~y)
Consider Protocol 1. First observe that F (X, Y ) = f (X, YJ ) with high probability, since
by our assumption on the density of f −1(0) under μ,
n
Pr _ f (X, Yi) = 0 ≤ (1 − o(1/n))n−1 = o(1).
i=1
i6=J
Therefore, Π0 satisfies the claimed guarantee if Π has the guarantee. Also |Π0a| = |Πa| by
design. The bound on Ibμ(Π0) follows from Proposition 11 and Lemma 12 via following
observation
I(Π0; Y |X) = I(Π; Y |X) ≤ I(Π, J ; Y |X) = I(J ; Y |X) + I(Π; YJ |J, X)
= I(Π; YJ |J, X) = n1 · Xi=n1 I(Π; Yi|X) ≤ I(Π; Y1 .n. . Yn|X) .
J
As a contrapositive of Theorem 10, we obtain the following corollary which will be the main
component in establishing asymmetric information lower bound.
I Corollary 13. Suppose there exists a product distribution μ on X and Y with Prμ[f (x, y) =
1] = o(1/n) such that for any protocol Π that computes f with Pr[f (x, y) 6= π(x, y)] < ε for
all (x, y) with |Πa| < a, Iμ(Π; Y |X) ≥ b. Then there exists a (product distribution) μn such
that for any protocol Πn that computes F with Pr[F (x, ~y) 6= πn(x, ~y)] < ε + 0.01 for all (x, ~y)
with |Πan| < a, Iμn (Πn; Y~ |X) ≥ n · b. In other words, IC<a(f, ε) ≥ n · b.
In this section, we prove Theorem 3. To utilize Theorem 10, we focus on prior free information
cost lower bound for the following function for x, y ∈ {0, . . . , M }d.
f (x, y) :=
(1 if kx − yk∞ ≤ 1
0 if kx − yk∞ ≥ c
We focus on lower bounding IC<a(f, ε) for some sufficiently small constant ε. The main
idea of the proof is that any protocol that achieves the desired error guarantee must be
distinguishing between f −1(1) and f −1(0) for any given x. In other words, whether y is in
the neighborhood of x or not.
To make the proof simpler, we prove the following Compression Lemma whose proof we
attach in Section C which allows us to assume without loss of generality (with some minor
costs) that the protocol is one round where Bob’s reply is a single bit.
I Lemma 14 (Compression for Bob). Consider Π such that Iu⊗2 (Π; X|Y ) < Ia and
Iu⊗2 (Π; Y |X) < Ib that computes f . Further, assume that Ia · Ib = o(1). Then Π
can be compressed to a one round protocol τ ∼ Π0 with Iu⊗2 (Π0; X|Y ) < O(Ia) and
Iu⊗2 (Π0; Y |X) < O H(√IaIb) with the following guarantee:
For (x, y) ∈ f −1(1), Pr[τ (x, y) 6= 1] < ε1 + 0.05
There exists Z ⊂ f −1(0), with μ(Z) = μ(f −1(0)) − 0.01 such that for all (x, y) ∈ S,
Pr[τ (x, y) 6= 0] < ε0 + 0.05
where ε0 = max(x,y)∈f−1(0) Pr[π(x, y) 6= f (x, y)] and similarly ε1 = max(x,y)∈f−1(1) Pr[π(x, y)
6= f (x, y)].
We also prove the following simple observation which allows us to assume without loss of
generality that only Bob has access to private randomness for a single round protocol.
I Lemma 15. Suppose there is a single round protocol Π where
Alice and Bob both have access to both private and public randomness.
Alice sends at most a-bits |Πa| ≤ a
Bob sends at most b-bits of information I(Π; Y |X) ≤ b.
Then there is a protocol Π0 where Alice does not have access to private randomness but
|Π0a| ≤ a and I(Π0; Y |X) ≤ b
Proof. Consider the following simple modification to Π0 from Π. Alice additionally samples
from public randomness Rpaub (thereby revealing the randomness of Bob) instead of using
private randomness and follows Π. Bob ignores Rpaub and follows Π. By design |Π0a| ≤ a. Let
Mb denote the reply by Bob and Ma the message by Alice. Then
I(Π0; Y |X) = I(Mb; Y |X, Rpaub, Ma) ≤ I(Mb; Y |X, Ma) = I(Π; Y |X)
since I(Y ; Rpaub|Ma, X) = 0.
J
Lemma 15 implies that it suffices to lower bound the case where Alice does not have
access to private randomness. Then we set
with u(~x) with ~x as a d-dimensional vector defined as a product of u(xi)’s. Then we prove
the main technical theorem whose proof is attached in Section D.
I Theorem 16 (Single Function Lower Bound). Let Π be a one round protocol where Alice
does not have access to private randomness, Bob replies with one bit and computes f with
the following guarantee
For (x, y) ∈ f −1(1), Pr[π(x, y) 6= 1] < 0.1
There exists S ⊂ f −1(0), with u(S) ≥ u(f −1(0)) − 0.01 such that for all (x, y) ∈ S,
Pr[π(x, y) 6= 0] < 0.1
For such Π, for any sufficiently small constant δ > 0, if Ia = I(x,y)∼u⊗2 (Π; X|Y ) ≤ δρ log n,
then Ib = I(x,y)∼u⊗2 (Π; Y |X) ≥ n−O(δ).
The main intuition of the proof follows from bounding `1-norm between transcripts from
(x, y) ∈ f −1(1) and (x, y) ∈ Z. If the information sent by Bob is lower than the claimed
bound, then the protocol cannot distinguish between f −1(1) and f −1(0) (more technically
close in terms of `1-norm) which is a contradiction.
Then combining Theorem 16 with Lemma 14 and Lemma 15, we get the following prior
free bound as a corollary conditioned on |Πa| ≤ δρ log n.
I Corollary 17. Let Π be any protocol that computes f with the following guarantee
For all (x, y) ∈ f −1(1), Pr[π(x, y) 6= 1] < 0.05
For all (x, y) ∈ f −1(0), Pr[π(x, y) 6= 0] < 0.05
For such Π, for any sufficiently small constant δ > 0, if Ia = I(x,y)∼u×u(Π; X|Y ) ≤ δρ log n,
then Ib = I(x,y)∼u×u(Π; Y |X) ≥ n−O(δ).
Proof. Suppose we have a protocol that computes f with the guarantee, with Ia ≤ δρ log n
and Ib = I(Π; Y |X) < n−ω(δ). Then by Lemma 14, we can compress the protocol to a
one-round protocol with 1-bit response from Bob with information cost O(δρ log n) and
n−ω(δ) respectively for Alice and Bob, with the following error guarantee.
For all (x, y) ∈ f −1(1), Pr[τ (x, y) 6= 1] < 0.1
There exists S ⊂ f −1(0), with u(S) ≥ u(f −1(0)) − 0.01 such that for all (x, y) ∈ S,
Pr[τ (x, y) 6= 0] < 0.1
This is indeed a contradiction to Theorem 16. J
Then finally combining Corollary 17 with Corollary 13, and with a guarantee that
f (x, y) = 1 with probability o(1/n), the proof of which we append in Claim 40, we get a
lower bound for any protocol that computes F (x, ~y) with prior free error guarantee, when
the information cost is measured over the distribution where all X and Y ’s are distributed
according to u.
I Theorem 18. Let Π be any protocol that computes F with the following guarantee
For all (x, ~y) ∈ F −1(1), Pr[π(x, y) 6= 1] < 0.04
For all (x, ~y) ∈ F −1(0), Pr[π(x, y) 6= 0] < 0.04
For such Π, for any sufficiently small constant δ > 0, if |Πa| ≤ δρ log n, then Ib =
I(Π; Y |X) ≥ n1−O(δ).
Via the minimax argument, we get the following for distributional error setting (from applying
Theorem 2)
I Corollary 19. For any sufficiently small constant δ > 0, there exists μ such that for any
protocol Π that computes F with Pr(x,y~)∼μ[π(x, ~y) 6= F (x, ~y)] < 0.01, if |Πa| ≤ δρ log n, then
Ib = I(Π; Y |X) ≥ n1−O(δ).
This yields the desired data structure lower bounds from the translations in Section B. For
the definition of data structures, we refer the reader to Section B as well.
I Corollary 20 (Cell-Probe Lower Bound). Consider any randomized cell-probe data
structure solving d-dimensional near-neighbor search under `∞ with approximation factor c =
O(logρ log d), with the guarantee that if there exists yi ∈ ~y that is close to x, the querier
outputs 1 with > 0.95 probability. If the word size is w = n1−δ for some δ > 0, the data
structure requires space nΩ(ρ/t) for query time t.
I Corollary 21 (Decision Tree Lower Bound). Let δ > 0 be arbitrary constant. A decision
tree of depth r = n1−2δ and node size w = nδ that solves d-dimensional near-neighbor search
under `∞ with approximation c = O(logρ log d), must have size s = nΩ(ρ).
1
2
3
4
5
6
7
8
9
10
11
A
Information Theory
In this section, we provide necessary backgrounds on information theory and information
complexity that are used in this paper. For further reference, we refer the reader to [7] and
[6, 3, 4, 2, 5].
I Definition 22 (Entropy). The entropy of a random variable X is defined as
H(X) :=
X Pr[X = x] log
x
1
Pr[X = x]
.
Similarly, the conditional entropy is defined as
H(X|Y ) := E
Y
X Pr[X = x|Y = y] log
x
1
Pr[X = x|Y = y]
#
.
As an abuse of notation, we also denote binary entropy function H : [0, 1] → [0, 1] as
"
1
p
H(p) := p log
+ (1 − p) log
I Definition 23 (Mutual Information). Mutual information between X and Y (conditioned
on Z) is defined as
I Definition 24 (KL-Divergence). KL-Divergence between two distributions μ and ν is
defined as
In order to bound mutual information, it suffices to bound KL-divergence, due to following
fact.
I Fact 25 (KL-Divergence and Mutual Information). The following equality between mutual
information and KL-Divergence holds
I(A; B|C) = E [D(A|B=b,C=c||A|C=c)] .
B,C
I(A; B|C) ≤ I(A; B|C, D).
I Fact 26. Let A, B, C, D be random variables such that I(A; D|C) = 0. Then
I Fact 27 (Mutual Information Chain Rule).
I(A; B1, . . . Bn|C) =
n
X I(A; Bi|C, B1, . . . Bi−1)
i=1
I Fact 28. Let p(x, y) and q(x, y) be joint distributions of two random variables X and Y .
Then
D(p(x, y)||q(x, y)) = D(p(x)||q(x)) + E D(p(y|x)||q(y|x))
x∼p
To provide context to asymmetric information costs that will be introduced in the next
section, we first introduce usual definition of information cost of f .
I Definition 29 (Information Cost). The information cost of Protocol Π with input X for
Alice Y for Bob under distribution μ is defined as
ICμ(Π) = Iμ(Π; X|Y ) + Iμ(Π; Y |X)
where Iμ(Π; X|Y ) denote mutual information where X and Y are distributed according to μ,
similarly for Iμ(Π; Y |X).
Intuitively, I(Π; X|Y )-term captures how much information Alice must inject to the
transcript to compute F . Similarly I(Π; Y |X)-term captures how much information Bob
must inject to the transcript. This intuition comes useful in defining analogous quantities for
the asymmetric communication setting.
B
Data Structure and Asymmetric Communication Lower Bound
In this section, we introduce relevant models in data structure for which an asymmetric
communication lower bound translates to their lower bounds.
Cell Probe Data Structure
Cell probe data structure, which is similar to Random Access machine, can be formally
defined as following.
I Definition 30 (Cell-Probe). Cell Probe data structure is defined as the following model for
computing F . Alice (user) sends a cell-address in each round. Bob (database) answers with
the contents in the queried cell. Word size w is the maximum number of bits in the cell.
The following classic “translation” theorem holds.
I Theorem 31 (Lemma 1 of [12]). If there exists a (randomized) cell-probe database that solves
F with cell size s, word size w and query time t, then there exists (2t log s, 2tw)-(randomized)
communication protocol for F .
In other words, asymmetric communication lower bounds translates to cell-probe database
lower bounds. This is the key observation in [1].
Decision Tree Data Structure
Decision tree data structure model is formally defined as following.
I Definition 32 (Decision Tree). Depending on ~y and the randomness, Bob constructs a
decision tree Ty~, a complete binary tree, in which:
Each node v contains a predicate function fv : X → {0, 1} where fv ∈ F , the set of
allowed predicates.
Each edge is labeled 0 or 1, which denotes the answer to the parent’s fv.
Each leaf is labeled 0 or 1, which denotes the final output to F .
Size s of the tree is the number of nodes. Depth r is the depth of the tree. Predicate size w
is log2 |F |.
Under this model, a more efficient translation theorem holds which results in better lower
bound under decision tree model.
I Theorem 33 ([1]). If there exists a (randomized) decision tree that solves F with size s,
depth d, and node size w, then there exists (O(log s), O(dw log s))-(randomized)
communication protocol for F .
As a corollary of Lemma 14 and Theorem 3, we get the following corollaries in data
structure lower bound. Combining Theorem 31 with Theorem 18 (or Corollary 19 depending
on the error guarantee), we get the following corollary.
I Corollary 20. Consider any randomized cell-probe data structure solving d-dimensional
near-neighbor search under `∞ with approximation factor c = O(logρ log d), with the guarantee
that if there exists yi ∈ ~y that is close to x, the querier outputs 1 with > 0.95 probability. If
the word size is w = n1−δ for some δ > 0, the data structure requires space nΩ(ρ/t) for query
time t.
Combining Theorem 33 with Theorem 18 (or Corollary 19 depending on the error
guarantee), we get the following corollary.
I Corollary 21. Let δ > 0 be arbitrary constant. A decision tree of depth r = n1−2δ and node
size w = nδ that solves d-dimensional near-neighbor search under `∞ with approximation
c = O(logρ log d), must have size s = nΩ(ρ).
C
Proof of Compression Lemma
In this section, we show that any protocol such that Alice sends at most Ia bits, and Bob
sends at most Ib bits of information with Ib 1 and Ia · Ib = o(1) can be compressed to a
one-round protocol such that Alice sends at most O(Ia) bits and Bob sends at most 1 bits
and H(√IaIb)-bits of information.
First we use the following compression theorem to compress the size of the transcript
where τ is the resulting compressed protocol:
I Theorem 34 (Theorem 1.2 in [15]). Suppose Ia = ω(1). Then any protocol Π such that
Alice sends at most Ia-bits of information and Bob sends at most Ib-bits of information can
be simulated with O(Ia2O(Ib)) bits of communication with the following guarantee :
For (x, y) ∈ F −1(1), Pr[π(x, y) 6= τ (x, y)] < ε1 + 0.01
There exists Γ ⊂ F −1(0), with μ(Γ) = μ(F −1(0)) − 0.01 such that for all (x, y) ∈ Γ,
Pr[π(x, y) 6= τ (x, y)] < ε0 + 0.01
where ε1 and ε0 refers to the respective guarantee for the original protocol Π.
I Remark. Theorem 1.2 in [15] is not stated as the above form. But setting the output as 1
on high divergence (x, y) pairs and adjusting the constants give the above guarantee. Also
note that 2O(Ib) ≤ 2 in our setting where Ib 1.
Theorem 34 allows us to assume the length of the protocol |Π| = |Πa| + |Πb| ≤ O(Ia) for
our setting. Now we introduce the following round compression protocol and its guarantee
as Lemma 14.
Protocol 2 Compression Protocol
Alice samples π ∼ Πx, the distribution of protocol conditioned on Alice’s input x.
Alice sends π to Bob
Bob answers 1 if π agrees with his input y and private randomness rb. Rejects otherwise.
I Lemma 14. Consider Π such that I(Π; X|Y ) < Ia and I(Π; Y |X) < Ib that computes f
under the distribution μ. Further, assume that Ia · Ib = o(1). Then Π can be compressed to a
one round protocol τ ∼ Π0 with I(Π0; X|Y ) < O(Ia) and I(Π0; Y |X) < O H(√IaIb) with
the following guarantee:
For (x, y) ∈ f −1(1), Pr[τ (x, y) 6= 1] < ε1 + 0.05
There exists Z ⊂ f −1(0), with μ(Z) = μ(f −1(0)) − 0.01 such that for all (x, y) ∈ S,
Pr[τ (x, y) 6= 0] < ε0 + 0.05
where ε0 = max(x,y)∈f−1(0) Pr[π(x, y) 6= f (x, y)] and similarly ε1 = max(x,y)∈f−1(1) Pr[π(x, y)
6= f (x, y)].
Towards proving Lemma 14, we first prove two facts on non-negative numbers.
I Claim 35. Let Πx,y denote the distribution of transcript conditioned on X = x and Y = y.
Similarly let Πx denote the distribution conditioned on X = x. Similarly let Mi|M<i,x,y
and Mi|M<i,x denote the distribution of message at i-th round conditioned on all previous
messages and x, y or just x. Then
R
D(Πx,y||Πx) = X E
i=1 M<i|x,y
Proof. Follows from Fact 28.
D(Mi|M<i,x,y||Mi|M<i,x)
I Claim 36. Consider R non-negative numbers I1, . . . IR. Then
R vu R
X pIr ≤ tuR · X Ir
r=1 r=1
Proof. Follows from Cauchy-Schwarz.
Proof of Lemma 14. Note that without loss of generality, the total amount of bits
communicated in Protocol 2 is |Π| + 1 = O(Ia) since one can compress the transcript using Theorem
34. This immediately implies that I(Π0; X|Y ) < O(Ia), since the number of bits sent by
Alice is at most O(Ia). Similarly, it also implies that the number of round (R) is at most
O(Ia).
Now, to prove the full lemma, it suffices to bound the probability of Bob sending 0. We
bound by computing the probability of Bob rejecting the transcript at each round i + 1, (i.e.
the odd round messages) then summing up these probabilities.
J
J
M≤i,X,Y [kMi+1|M≤i, X, Y − Mi+1|M≤i, Xk1]
E
≤ O
≤ O
q
E
M≤i,X,Y
r
E
M≤i,X,Y
D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X)
[D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X)]
R−1
X E
i=0 M≤i,X,Y [kMi+1|M≤i, X, Y − Mi+1|M≤i, Xk1]
i:odd
≤ O RX−1 r
i=0
i:odd
≤ O vuuutR · RX−1
E
i=0 M≤i,X,Y
i:odd
E
M≤i,X,Y
[D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X)]
[D(Mi+1|M≤i, X, Y ||Mi+1|M≤i, X)]
≤ O
r
!
R · XE,Y [D(ΠX,Y ||ΠX )] = O
pR · Ib
!
(1)
(2)
Let Mi denote the message sent in i-th round of the protocol. If i + 1 is odd (the round
where Bob sends message), then we have
Mi+1|M≤i, X, Y = Mi+1|M≤i, Y
since it is a protocol and Bob’s response only depends on Y and M≤i. Then by Pinsker’s
inequality, we bound the error of sampling. The probability of making error at i + 1-th round
can be bounded using Pinsker’s inequality as
where the last inequality is using the concavity of √x. Then we can sum and bound the
probability of error as
where the second inequality follows from Claim 36 and the third inequality follows from
Claim 35. Now R < O(Ia) from the total number of bits communicated by Alice, which
upper bounds the number of rounds, is O(Ia) from Theorem 34. This `1-norm bound implies
that Bob answers 0 with at most O(√IaIb)-probability in expectation over (x, y). Then the
amount of information that Alice learns is at most O H(√IaIb) .
It remains to show that the error rate guarantee is preserved. We divide into two cases.
If f (x, y) = 1, by design there is no guarantee lost on (x, y) ∈ f −1(1), except the loss
from Theorem 34.
Now suppose f (x, y) = 0. Recall that the compression fails with probability at most
O(√IaIb) in expectation. Let Z0 := {(x, y)|kΠ|X = x, Y = y − Π|X = xk1 < (IaIb)2/3}.
By Markov’s inequality, μ(Z0) ≥ 1 − o(1). Set Z = Z0 ∩ Γ ∩ f −1(0), where Γ is the set
from Theorem 34. It is easy to check that the probability of error only increases by o(1)
for any (x, y) ∈ Z since (IaIb)2/3 = o(1) by assumption while being in Γ only increases
the error rate by 0.01. Thus Z indeed satisfies the guarantee. J
D
Omitted Proof from Section 4
Recall that
with u(x) defined as the product of u(xi)’s. Under such u, we make use of the following
lemma from [1].
I Lemma 37 (Isoperimetric Inequality (Lemma 9 of [1])). Consider any set S ⊆ {0, . . . M }d.
Let N (S) be the set of points at distance at most 1 from S under `∞. Then u(N (S)) ≥
u(S)1/ρ.
Then we get the following corollary in terms of KL-Divergence.
I Corollary 38. Let u|S denote u restricted on a subset S ⊆ {0, . . . M }d. Then for any S
u(N (S)) ≥ 2−D(u|S||u)/ρ
Proof. Observe that D(u|S||u) = − log u(S) or u(S) = 2−D(u|S||u). Then Lemma 37 implies
the desired inequality. J
We also state the following simple fact about norm.
I Fact 39 (`1-norm convex). Consider a family of distributions μ and {νi}i. Then
kμ − Ei [νi]k1 ≤ Ei [kμ − νik1]
Proof. Follows from the convexity of `1-norm.
Now we are ready to prove Theorem 16.
J
kΠx,y1 − Πx,y2 k1 < 0.2
νf−1(0) := E E
x∼μ (xy,y∼)u∈Z
νf−1(1) := E E
x∼μ f(xy,∼y)u=1
[Πx,y]
[Πx,y]
Proof of Theorem 16. Let a := δρ log n and b := 2−2000a/ρ. Recall that we assume that the
protocol is one round, therefore the distribution on the protocol Π is on Alice’s message πa
and Bob’s message πb. Also let Πb|πa be the distribution on Bob’s message given Alice’s
message as πa. If πb ∼ Πb|πa, note that each Bob’s message πb conditioned on πa induces
a prior on Bob’s input y, μπb . Suppose further Eπb∼Πb|πa D(μπb ||u) < 10b. Note that such
message πa must exist by Markov argument.
Since Alice does not have access to private randomness, prior on X conditioned on πa is
exactly a subset on {0, . . . M }d, which we denote as Sπa .
With πa fixed, with an abuse of notation let Πx,y denote the distribution on the transcript
(the remaining message, Πb) conditioned on input being (x, y) and Alice’s message. And
let μ denote the distribution on x conditioned on the message πa, that is u|Sπa . Note that
Πx,y1 and Πx,y2 are close in `1 norm if f (x, y1) = f (x, y2) = 1 or (x, y1), (x, y2) ∈ Z, since
the protocol is a one round protocol and Alice’s message is fixed. In particular,
since both Πx,y1 and Πx,y2 outputs f (x, y1) = f (x, y2) with at least 0.9 probability for f −1(1)
and Z. Similarly, if (x, y1) ∈ Z and f (x, y2) = 1, kΠx,y1 − Πx,y2 k1 > 1.8. Now consider
First observe that from Fact 39 if (x, y) ∈ Z,
kΠx,y − νf−1(0)k1 ≤ xE∼μ (xy,y0 E∼0)u∈Z [kΠx,y − Πx,y0 k1] < 0.2
kΠx,y − νf−1(1)k1 ≤ xE∼μ
E
y0∼u
f(x,y0)=1
Then by triangular inequality if (x, y1) ∈ Z and f (x, y2) = 1,
Combining (3), (4) and (5) we get
kνf−1(0) − νf−1(1)k1 > 1.4.
Now to derive the contradiction, we define ν and ν0 as following.
ν := xP∼rμ[(x, y) ∈ Z] · νf−1(0) + xP∼rμ[(x, y) ∈ f −1(1)] · νf−1(1)
y∼u y∼u
1.8 < kΠx,y1 − Πx,y2 k1 ≤ kΠx,y1 − νf−1(0)k1 + kνf−1(0) − νf−1(1)k1 + kΠx,y2 − νf−1(1)k1
(3)
(4)
(5)
where νf−1(∗) denotes expected prior of transcripts from inputs that are neither in Z nor
f −1(1). Then we have
kν0 − νk1 ≥ k xP∼rμ[(x, y) ∈ f −1(1)] · νf−1(0) − νf−1(1) k1
y∼u
= xP∼rμ[(x, y) ∈ f −1(1)] · kνf−1(0) − νf−1(1)k1 ≥ Ω
y∼u
!
since the difference is minimized when νf−1(∗) = νf−1(0). This via Pinsker’s inequality implies
D(ν0||ν) ≥ Ω
!
xP∼rμ[(x, y) ∈ f −1(1)]2 .
y∼u
xP∼rμ[(x, y) ∈ f −1(1)]2 = u(N (Supp(μ)))2 ≥ 2−2D(μ||u)/ρ
y∼u
1
I Claim 40 (Density). Set ρ = (ε log d)1/c, and d ≥ (2 · log n) 1−ε . Then
Pr
(x,y)∼u×u
[f (x, y) = 0] ≥ 1 − o(1/n)
Proof. We prove by bounding Pr(x,y)∼u×u [kx − yk∞ ≤ c].
Alexandr Andoni , Dorian Croitoru, and Mihai Patrascu . Hardness of nearest neighbor under l-infinity . In Foundations of Computer Science , 2008 . FOCS'08. IEEE 49th Annual IEEE Symposium on , pages 424 - 433 . IEEE, 2008 .
Boaz Barak , Mark Braverman, Xi Chen, and Anup Rao . How to compress interactive communication . SIAM Journal on Computing , 42 ( 3 ): 1327 - 1363 , 2013 .
Mark Braverman . Interactive information complexity . SIAM Journal on Computing , 44 ( 6 ): 1698 - 1739 , 2015 . doi: 10 .1137/130938517.
Mark Braverman , Ankit Garg, Denis Pankratov, and Omri Weinstein . From information to exact communication . In Proceedings of the forty-fifth annual ACM symposium on Theory of computing , pages 151 - 160 . ACM, 2013 .
Mark Braverman and Anup Rao . Information equals amortized communication . IEEE Transactions on Information Theory , 60 ( 10 ): 6058 - 6069 , 2014 .
Amit Chakrabarti , Yaoyun Shi, Anthony Wirth , and Andrew Yao . Informational complexity and the direct sum problem for simultaneous message complexity . In Foundations of Computer Science , 2001 . Proceedings. 42nd IEEE Symposium on , pages 270 - 278 . IEEE, 2001 .
Thomas M Cover and Joy A Thomas . Elements of information theory . John Wiley & Sons, 2012 .
Anirban Dasgupta , Ravi Kumar, and D. Sivakumar . Sparse and lopsided set disjointness via information theory . In Anupam Gupta, Klaus Jansen, José Rolim, and Rocco Servedio, editors, Approximation, Randomization, and Combinatorial Optimization . Algorithms and Techniques , pages 517 - 528 , Berlin, Heidelberg, 2012 . Springer Berlin Heidelberg.
Piotr Indyk . On approximate nearest neighbors under l∞ norm . Journal of Computer and System Sciences , 63 ( 4 ): 627 - 638 , 2001 .
Michael Kapralov and Rina Panigrahy . Nns lower bounds via metric expansion for l∞ and emd . In International Colloquium on Automata, Languages, and Programming , pages 545 - 556 . Springer, 2012 .
Peter Bro Miltersen . Lower bounds for union-split-find related problems on random access machines . In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing , pages 625 - 634 . ACM, 1994 .
Peter Bro Miltersen , Noam Nisan, Shmuel Safra, and Avi Wigderson . On data structures and asymmetric communication complexity . In Proceedings of the twenty-seventh annual ACM symposium on Theory of computing , pages 103 - 111 . ACM, 1995 .
Rina Panigrahy , Kunal Talwar, and Udi Wieder . Lower bounds on near neighbor search via metric expansion . In Foundations of Computer Science (FOCS) , 2010 51st Annual IEEE Symposium on, pages 805 - 814 . IEEE, 2010 .
Mihai Patrascu . Lower Bound Techniques for Data Structures . PhD thesis , Massachusetts Institute of Technology, Cambridge, MA, USA, 2008 . AAI0821553.
Sivaramakrishnan Natarajan Ramamoorthy and Anup Rao . How to compress asymmetric communication . In Proceedings of the 30th Conference on Computational Complexity , pages 102 - 123 . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik , 2015 .