Tighter Connections Between Formula-SAT and Shaving Logs
Tighter Connections Between Formula-SAT and
Shaving Logs
Amir Abboud1
IBM Almaden Research Center, San Jose, USA
Karl Bringmann
Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
Abstract
A noticeable fraction of Algorithms papers in the last few decades improve the running time
of well-known algorithms for fundamental problems by logarithmic factors. For example, the
O(n2 ) dynamic programming solution to the Longest Common Subsequence problem (LCS) was
improved to O(n2 / log2 n) in several ways and using a variety of ingenious tricks. This line of
research, also known as the art of shaving log factors, lacks a tool for proving negative results.
Specifically, how can we show that it is unlikely that LCS can be solved in time O(n2 / log3 n)?
Perhaps the only approach for such results was suggested in a recent paper of Abboud, Hansen,
Vassilevska W. and Williams (STOC’16). The authors blame the hardness of shaving logs on
the hardness of solving satisfiability on boolean formulas (Formula-SAT) faster than exhaustive
search. They show that an O(n2 / log1000 n) algorithm for LCS would imply a major advance in
circuit lower bounds. Whether this approach can lead to tighter barriers was unclear.
In this paper, we push this approach to its limit and, in particular, prove that a well-known
barrier from complexity theory stands in the way for shaving five additional log factors for
fundamental combinatorial problems. For LCS, regular expression pattern matching, as well as
the Fréchet distance problem from Computational Geometry, we show that an O(n2 / log7+ε n)
runtime would imply new Formula-SAT algorithms.
Our main result is a reduction from SAT on formulas of size s over n variables to LCS on
sequences of length N = 2n/2 · s1+o(1) . Our reduction is essentially as efficient as possible, and it
greatly improves the previously known reduction for LCS with N = 2n/2 · sc , for some c ≥ 100.
2012 ACM Subject Classification Theory of computation → Problems, reductions and completeness
Keywords and phrases Fine-Grained Complexity, Hardness in P, Formula-SAT, Longest Common Subsequence, Frechet Distance
Digital Object Identifier 10.4230/LIPIcs.ICALP.2018.8
Related Version A full version can be found at https://arxiv.org/abs/1804.08978.
Acknowledgements Part of the work was performed while visiting the Simons Institute for the
Theory of Computing, Berkeley, CA. We are grateful to Avishay Tal for telling us about his
algorithm for SAT on bipartite formulas. We also thank Mohan Paturi, Rahul Santhanam,
Srikanth Srinivasan, and Ryan Williams for answering our questions about the state of the art
of Formula-SAT algorithms, and Arturs Backurs, Piotr Indyk, Mikkel Thorup, and Virginia
1
The work was completed when A.A. was at Stanford University and was supported by Virginia Vassilevska
Williams’ NSF Grants CCF-1417238 and CCF-1514339, and BSF Grant BSF:2012338.
EA
TC S
© Amir Abboud and Karl Bringmann;
licensed under Creative Commons License CC-BY
45th International Colloquium on Automata, Languages, and Programming (ICALP 2018).
Editors: Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella;
Article No. 8; pp. 8:1–8:18
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
8:2
Tighter Connections Between Formula-SAT and Shaving Logs
Vassilevska Williams for helpful discussions regarding regular expressions. We also thank an
anonymous reviewer for ideas leading to shaving off a second log-factor for Formula-Pair.
1
Introduction
Since the early days of Algorithms research, a noticeable fraction of papers each year shave
log factors for fundamental problems: they reduce the best known upper bound on the time
complexity from T (n) to T (n)/ logc n, for some c > 0. While in some cases a cynic would
call such results “hacks” and “bit tricks”, there is no doubt that they often involve ingenious
algorithmic ideas and suggest fundamental new ways to look at the problem at hand. In
his survey, Timothy Chan calls this kind of research “The Art of Shaving Logs” [37]. In
many cases, we witness a race of shaving logs for some problem, in which a new upper
bound is found every few months, without giving any hints on when this race is going to
halt. For example, in the last few years, the upper bound for combinatorial Boolean Matrix
Multiplication dropped from O(n3 / log2 n) [16], to O(n3 / log2.25 n) [20], to O(n3 / log3 n)
[38], and most recently to O(n3 / log4 n) [99]. Perhaps the single most important missing
technology for this kind of research is a tool for proving lower bounds.
Consider the problem of computing the Longest Common Subsequence (LCS) of two
strings of length n. LCS has a simple O(n2 ) time dynamic programming algorithm [93, 46].
Several approaches have been utilized in order to shave log factors such as the “Four
Russians” technique [16, 61, 74, 23, 58], utilizing bit-parallelism [10, 47, 62], and working
with compressed strings [48, 54]. The best known upper bounds are O(n2 / log2 n) for constant
size alphabets [74], and O(n2 log log n/ log2 n) for large alphabets [58]. But can we do better?
Can we solve LCS in O(n2 / log3 n) time? While the mathematical intrigue is obvious, we
remark that even such mild speedups for LCS could be significant in practice. Besides its
use as the diff operation in unix, LCS is at the core of highly impactful similarity measures
between biological data. A heuristic algorithm called BLAST for a generalized version of
LCS (namely, the Local Alignment problem [87]) has been cited more than sixty thousand
times [14]. While such heurisitics are much faster than the near-quadratic time algorithms
above, they are not guaranteed to return an optimal solution and are thus useless in many
applications, and biologists often fall back to (highly optimized implementations of) the
quadratic solutions, see, e.g. [71, 72].
How would one show that it is hard to shave logs for some problem? A successful line of
work, inspired by NP-hardness, utilizes “fine-grained reductions” to prove statements of the
form: a small improvement over the known runtime for problem A implies a breakthrough
algorithm for problem B, refuting a plausible hypothesis about the complexity of B. For
example, it has been shown that if LCS can be solved in O(n2−ε ) time, where ε > 0, then
there is a breakthrough (2 − δ)n algorithm for CNF-SAT, and the Strong Exponential Time
Hypothesis (SETH, defined below) is refuted [2, 29]. Another conjecture that has been
used to derive interesting lower bounds states that the 3-SUM problem2 cannot be solved
in O(n2−ε ) time. It is natural to ask: can we use these conjectures to rule out log-factor
improvements for problems like LCS? And even more optimistically, one might hope to base
the hardness of LCS on a more standard assumption like P 6= NP. Unfortunately, we can
formally prove that these assumptions ar (...truncated)