Tighter Connections Between Formula-SAT and Shaving Logs (pdf)

Article PDF cannot be displayed. You can download it here:

https://drops.dagstuhl.de/opus/volltexte/2018/9012/pdf/LIPIcs-ICALP-2018-8.pdf

Tighter Connections Between Formula-SAT and Shaving Logs

Tighter Connections Between Formula-SAT and Shaving Logs Amir Abboud1 IBM Almaden Research Center, San Jose, USA Karl Bringmann Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany Abstract A noticeable fraction of Algorithms papers in the last few decades improve the running time of well-known algorithms for fundamental problems by logarithmic factors. For example, the O(n2 ) dynamic programming solution to the Longest Common Subsequence problem (LCS) was improved to O(n2 / log2 n) in several ways and using a variety of ingenious tricks. This line of research, also known as the art of shaving log factors, lacks a tool for proving negative results. Specifically, how can we show that it is unlikely that LCS can be solved in time O(n2 / log3 n)? Perhaps the only approach for such results was suggested in a recent paper of Abboud, Hansen, Vassilevska W. and Williams (STOC’16). The authors blame the hardness of shaving logs on the hardness of solving satisfiability on boolean formulas (Formula-SAT) faster than exhaustive search. They show that an O(n2 / log1000 n) algorithm for LCS would imply a major advance in circuit lower bounds. Whether this approach can lead to tighter barriers was unclear. In this paper, we push this approach to its limit and, in particular, prove that a well-known barrier from complexity theory stands in the way for shaving five additional log factors for fundamental combinatorial problems. For LCS, regular expression pattern matching, as well as the Fréchet distance problem from Computational Geometry, we show that an O(n2 / log7+ε n) runtime would imply new Formula-SAT algorithms. Our main result is a reduction from SAT on formulas of size s over n variables to LCS on sequences of length N = 2n/2 · s1+o(1) . Our reduction is essentially as efficient as possible, and it greatly improves the previously known reduction for LCS with N = 2n/2 · sc , for some c ≥ 100. 2012 ACM Subject Classification Theory of computation → Problems, reductions and completeness Keywords and phrases Fine-Grained Complexity, Hardness in P, Formula-SAT, Longest Common Subsequence, Frechet Distance Digital Object Identifier 10.4230/LIPIcs.ICALP.2018.8 Related Version A full version can be found at https://arxiv.org/abs/1804.08978. Acknowledgements Part of the work was performed while visiting the Simons Institute for the Theory of Computing, Berkeley, CA. We are grateful to Avishay Tal for telling us about his algorithm for SAT on bipartite formulas. We also thank Mohan Paturi, Rahul Santhanam, Srikanth Srinivasan, and Ryan Williams for answering our questions about the state of the art of Formula-SAT algorithms, and Arturs Backurs, Piotr Indyk, Mikkel Thorup, and Virginia 1 The work was completed when A.A. was at Stanford University and was supported by Virginia Vassilevska Williams’ NSF Grants CCF-1417238 and CCF-1514339, and BSF Grant BSF:2012338. EA TC S © Amir Abboud and Karl Bringmann; licensed under Creative Commons License CC-BY 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018). Editors: Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella; Article No. 8; pp. 8:1–8:18 Leibniz International Proceedings in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 8:2 Tighter Connections Between Formula-SAT and Shaving Logs Vassilevska Williams for helpful discussions regarding regular expressions. We also thank an anonymous reviewer for ideas leading to shaving off a second log-factor for Formula-Pair. 1 Introduction Since the early days of Algorithms research, a noticeable fraction of papers each year shave log factors for fundamental problems: they reduce the best known upper bound on the time complexity from T (n) to T (n)/ logc n, for some c > 0. While in some cases a cynic would call such results “hacks” and “bit tricks”, there is no doubt that they often involve ingenious algorithmic ideas and suggest fundamental new ways to look at the problem at hand. In his survey, Timothy Chan calls this kind of research “The Art of Shaving Logs” [37]. In many cases, we witness a race of shaving logs for some problem, in which a new upper bound is found every few months, without giving any hints on when this race is going to halt. For example, in the last few years, the upper bound for combinatorial Boolean Matrix Multiplication dropped from O(n3 / log2 n) [16], to O(n3 / log2.25 n) [20], to O(n3 / log3 n) [38], and most recently to O(n3 / log4 n) [99]. Perhaps the single most important missing technology for this kind of research is a tool for proving lower bounds. Consider the problem of computing the Longest Common Subsequence (LCS) of two strings of length n. LCS has a simple O(n2 ) time dynamic programming algorithm [93, 46]. Several approaches have been utilized in order to shave log factors such as the “Four Russians” technique [16, 61, 74, 23, 58], utilizing bit-parallelism [10, 47, 62], and working with compressed strings [48, 54]. The best known upper bounds are O(n2 / log2 n) for constant size alphabets [74], and O(n2 log log n/ log2 n) for large alphabets [58]. But can we do better? Can we solve LCS in O(n2 / log3 n) time? While the mathematical intrigue is obvious, we remark that even such mild speedups for LCS could be significant in practice. Besides its use as the diff operation in unix, LCS is at the core of highly impactful similarity measures between biological data. A heuristic algorithm called BLAST for a generalized version of LCS (namely, the Local Alignment problem [87]) has been cited more than sixty thousand times [14]. While such heurisitics are much faster than the near-quadratic time algorithms above, they are not guaranteed to return an optimal solution and are thus useless in many applications, and biologists often fall back to (highly optimized implementations of) the quadratic solutions, see, e.g. [71, 72]. How would one show that it is hard to shave logs for some problem? A successful line of work, inspired by NP-hardness, utilizes “fine-grained reductions” to prove statements of the form: a small improvement over the known runtime for problem A implies a breakthrough algorithm for problem B, refuting a plausible hypothesis about the complexity of B. For example, it has been shown that if LCS can be solved in O(n2−ε ) time, where ε > 0, then there is a breakthrough (2 − δ)n algorithm for CNF-SAT, and the Strong Exponential Time Hypothesis (SETH, defined below) is refuted [2, 29]. Another conjecture that has been used to derive interesting lower bounds states that the 3-SUM problem2 cannot be solved in O(n2−ε ) time. It is natural to ask: can we use these conjectures to rule out log-factor improvements for problems like LCS? And even more optimistically, one might hope to base the hardness of LCS on a more standard assumption like P 6= NP. Unfortunately, we can formally prove that these assumptions ar (...truncated)