Boundedness of Conjunctive Regular Path Queries
I C A L P
Boundedness of Conjunctive Regular Path Queries
Diego Figueira CNRS 0 1
LaBRI 0 1
Talence 0 1
France 0 1
Category Track B: Automata, Logic, Semantics, and Theory of Programming
0 Pablo Barcelo? Department of Computer Science, University of Chile , Santiago, Chile IMFD, Santiago , Chile
1 Miguel Romero Department of Computer Science, University of Oxford , Oxford , UK
We study the boundedness problem for unions of conjunctive regular path queries with inverses (UC2RPQs). This is the problem of, given a UC2RPQ, checking whether it is equivalent to a union of conjunctive queries (UCQ). We show the problem to be ExpSpacecomplete, thus coinciding with the complexity of containment for UC2RPQs. As a corollary, when a UC2RPQ is bounded, it is equivalent to a UCQ of at most tripleexponential size, and in fact we show that this bound is optimal. We also study better behaved classes of UC2RPQs, namely acyclic UC2RPQs of bounded thickness, and strongly connected UCRPQs, whose boundedness problem is, respectively, PSpacecomplete and ?2P complete. Most upper bounds exploit results on limitedness for distance automata, in particular extending the model with alternation and twowayness, which may be of independent interest. 2012 ACM Subject Classification Theory of computation ? Database query languages (principles); Theory of computation ? Quantitative automata
and phrases regular path queries; boundedness; limitedness; distance automata

Funding Barcel? is funded by the Millennium Inst. for Foundational Research on Data and Fondecyt
1170109, and Figueira by ANR project DELTA (grant ANR16CE400007) and ANR project
QUID (grant ANR18CE400031). This project has received funding from the European Research
Council (ERC) under the European Union?s Horizon 2020 research and innovation programme (grant
agreement No 714532). The paper reflects only the authors? views and not the views of the ERC or
the European Commission. The European Union is not liable for any use that may be made of the
information contained therein.
Acknowledgements We are grateful to Thomas Colcombet for helpful discussions and valuable ideas
in relation to the results of Section 5.
Boundedness is an important property of formulas in logics with fixedpoint features. At
the intuitive level, a formula ? in any such logic is bounded if its fixedpoint depth, i.e., the
number of iterations that are needed to evaluate ? on a structure A, is fixed (and thus it is
independent of A). In databases and knowledge representation, boundedness is regarded
as an interesting theoretical phenomenon with relevant practical implications [25, 8]. In
ET
A
CS
fact, while several applications in these areas require the use of recursive features, actual
realworld systems are either not designed or not optimized to cope with the computational
demands that such features impose. Bounded formulas, in turn, can be reformulated in
nonrecursive logics, such as FO, or even as a union of conjunctive queries (UCQ) when ?
itself is positive. UCQs form the core of most systems for data management and ontological
query answering, and, in addition, are the focus of advanced optimization techniques. It has
also been experimentally verified in some contexts that recursive features encountered in
practice are often used in a somewhat ?harmless? way, and that many of such queries are
in fact bounded [23]. Thus, checking if a recursive formula ? is bounded, and building an
equivalent nonrecursive formula ?0 when the latter holds, are important optimization tasks.
The study of boundedness for Datalog programs, i.e., the least fixedpoint extension of
the class of UCQs, received a lot of attention during the late 80s and early 90s. Two seminal
results established that checking boundedness is undecidable in general for Datalog [22],
but becomes decidable for monadic Datalog, i.e., those programs in which each intensional
predicate is monadic [19]. The past few years have seen a resurgence of interest in boundedness
problems. This is due, in part, to the development of the theory of cost automata over trees
(both finite and infinite) in a series of landmark results, in particular relating to its limitedness
problem. In a few words, cost automata are generalizations of finite automata associating
a cost from N ? {?} to every input tree (instead of simply accepting or rejecting). The
limitedness problem asks, given a cost automata, whether there is a uniform bound on the
cost over all (accepting) input trees. Some deep results establish that checking limitedness is
decidable for wellbehaved classes of cost automata over trees [18, 35, 36, 7]. Remarkably, for
several logics of interest the boundedness problem can be reduced to the limitedness for cost
automata in such wellbehaved classes. Those reductions have enabled powerful decidability
results for the boundedness problem. As an example, it has been shown in this way that
boundedness is decidable for monadic secondorder logic (MSO) over structures of bounded
treewidth [11], which corresponds to an extension of Courcelle?s Theorem, and also for the
guarded negation fragment of least fixedpoint logic (LFP), even in the presence of unguarded
parameters [6]. Cost automata have also been used to study the complexity of boundedness
for guarded Datalog programs [7, 3].
Graph databases is a prominent area of study within database theory, in which the use
of recursive queries is crucial [2, 1]. A graph database is a finite edgelabeled directed
graph. The most basic navigational querying mechanism for graph databases corresponds
to the class of regular path queries (RPQs), which check whether two nodes of the graph
are connected by a path whose label belongs to a given regular language. RPQs are often
extended with the ability to traverse edges in both directions, giving rise to the class of
twoway RPQs, or 2RPQs [15]. The core of the most popular recursive query languages
for graph databases is defined by conjunctive 2RPQs, or C2RPQs, which are the closure of
2RPQs under conjunction and existential quantifications [14]. We also consider unions of
C2RPQs, or UC2RPQs. It can be shown that a UC2RPQ is bounded iff it is equivalent to
some UCQ. In spite of the inherent recursive nature of UC2RPQs, their boundedness problem
has not been studied in depth. Here we develop such a study by showing the following:
The boundedness problem for UC2RPQs is ExpSpacecomplete. The lower bound holds
even for CRPQs. This implies that boundedness is not more difficult than containment
for UC2RPQs, which was shown to be ExpSpacecomplete in [14].
From our upper bound construction it follows that if a UC2RPQ is bounded, then it is
equivalent to a UCQ of triple exponential size. We show that this bound is optimal.
Finally, we obtain better complexity bounds for some subclasses of UC2RPQs; namely,
for acyclic UC2RPQs of bounded thickness, in which case boundedness becomes
PSpacecomplete, and for strongly connected UCRPQs, for which it is ?2P complete.
It is important to stress that UC2RPQs can be easily translated into guarded LFP
with unguarded parameters, for which boundedness was shown to be decidable by applying
sophisticated cost automata techniques as mentioned above. However, the complexity of
the boundedness problem for such a logic is currently not wellunderstood ? and it is at
least 2Exptimehard [7] ? and hence this translation does not yield, in principle, optimal
complexity bounds for our problem. To study the boundedness for UC2RPQs, we develop
instead techniques especially tailored to UC2RPQs. In fact, since the recursive structure of
UC2RPQs is quite tame, their boundedness problem can be translated into the limitedness
problem for a much simpler automata model than cost automata on trees; namely, distance
automata on finite words. Distance automata are nothing more than usual NFAs with two
sorts of transitions: costly and noncostly. Such an automaton is limited if there is an integer
k ? 1 such that every word accepted by the NFA has an accepting run with at most k
costly transitions. A beautiful result in automata theory established the decidability of the
limitedness problem for distance automata [24], which is actually in PSpace [29]. While
being a difficult result, by now we have quite transparent proofs of this fact (see, e.g., [26]).
We exploit our translation to obtain tight complexity upper bounds for boundedness of
UC2RPQs. Some of the proofs in the paper require extending the study of limitedness to
alternating and twoway distance automata, while preserving the PSpace bound for the
limitedness problem. We believe these results to be of independent interest.
Organization of the paper. Section 2 contains preliminaries. We present characterizations
of boundedness for UC2RPQs in Section 3 and an application of those to pinpoint the
complexity of Boundedness for RPQs in Section 4. Distance automata and results about
them are given in Section 5. We analyze the complexity of Boundedness for general
UC2RPQs in Section 6 and present some classes of UC2RPQs with better complexity of
Boundedness in Section 7. We finish with a discussion in Section 8.
2
Preliminaries
We assume familiarity with nondeterministic finite automata (NFA), twoway NFA (2NFA),
and alternating finite automata (AFA) over finite words. We often blur the distinction
between an NFA A and the language L(A) it defines; similarly for regular expressions.
Graph databases and conjunctive regular path queries. A graph database over a finite
alphabet A is a finite edgelabelled graph G = (V, E) over A, where V is a finite set of
vertices and E ? V ? A ? V is the set of labelled edges. We write u ??a v to denote
an edge (u, a, v) ? E. We define the alphabet A? := A ?? A?1 that extends A with the
set A?1 := {a?1  a ? A} of ?inverses? of symbols in A. An oriented path from u to v
in a graph database G = (V, E) over alphabet A is a pair ? = (?, `) where ? and ` are
(possibly empty) sequences ? = (v0, a1, v1), (v1, a2, v2), . . . , (vk?1, ak, vk) ? V ? A ? V , and
` = `1, . . . , `k ? {?1, 1}, for k ? 0, such that u = v0, v = vk, and for each 1 ? i ? k, we have
that `i = 1 implies (vi?1, ai, vi) ? E; and `i = ?1 implies (vi, ai, vi?1) ? E. The label of ?
is the word b1 . . . bk ? (A?)?, where bi = ai if `i = 1; otherwise bi = ai?1. When k = 0 the
label of ? is the empty word ?. If `i = 1 for every 1 ? i ? k, we say that ? is a directed path.
Note that in this case, the label of ? belongs to A?.
A regular path query (RPQ) over A is a regular language L ? A?, which we assume to
be given as an NFA. The evaluation of L on a graph database G = (V, E) over A, written
L(G), is the set of pairs (u, v) ? V ? V such that there is a directed path from u to v in G
whose label belongs to L. 2RPQs extend RPQs with the ability to traverse edges in both
directions. Formally, a 2RPQ L over A is simply an RPQ over A?. The evaluation L(G) of
L over a graph database G = (V, E) over A is the set of pairs (u, v) ? V ? V such that there
is an oriented path from u to v in G whose label belongs to L.
Conjunctive 2RPQs (C2RPQs) are obtained by taking the closure of 2RPQs under
conjunction and existential quantification, i.e., a C2RPQ over A is an expression ? :=
?z? (x1 ?L??1 y1) ? ? ? ? ? (xm ??? ym) , where each Li is a 2RPQ over A and z? is a tuple
Lm
of variables among those in {x1, y1, . . . , xm, ym}. We say that ? is a CRPQ if each Li is
an RPQ. If x? = (x1, . . . , xn) is the tuple of free variables of ?, i.e., those that are not
existentially quantified in z?, then the evaluation ?(G) of the C2RPQ ? over a graph database
G is the set of all tuples h(x?) = (h(x1) . . . , h(xn)), where h ranges over all mappings
h : {x1, y1, . . . , xm, ym} ? V such that (h(xi), h(yi)) ? Li(G) for each 1 ? i ? m.
A union of C2RPQs (UC2RPQ) is an expression of the form ? := W1?i?n ?i, where the
?i?s are C2RPQ, all of which have exactly the same free variables. The evaluation ?(G) of ?
over a graph database G is S1?i?n ?i(G). We often write ?(x?) to denote that x? is the tuple
of free variables of ?. A UC2RPQ ? is Boolean if it contains no free variables.
Given UC2RPQs ? and ?0, we write ? ? ?0 if ?(G) ? ?0(G) for each graph database G.
Hence, ? and ?0 are equivalent if ? ? ?0 and ?0 ? ?, i.e., ?(G) = ?0(G) for every G.
Boundedness of UC2RPQs. CRPQs, and even UC2RPQs, can easily be expressed in
Datalog, the least fixedpoint extension of the class of union of conjunctive queries (UCQs).
Hence, we can directly define the boundedness of a UC2RPQ in terms of the boundedness of
its equivalent Datalog program, which is a wellstudied problem [25]. The latter, however,
coincides with being equivalent to some UCQ [31]. In the setting of graph databases, a
conjunctive query (CQ) over A is simply a CRPQ over A of the form ?z? V1?i?m(xi ?a?i yi)
? . Notice that atoms of the form x ??? y correspond to
where the ais range over A ? { }
equality atoms x = y. Analogously, one can define unions of CQs (UCQs). Note that, modulo
equality atoms, a CQ over A can be seen as a graph database over A. Hence, we shall slightly
abuse notation and, in the setting of CQs, use notions defined for graph databases (such as
oriented paths).
A UC2RPQ ? is bounded if it is equivalent to some UCQ ?. In this article we study
the complexity of the problem Boundedness, which takes as input a UC2RPQ ? and asks
whether ? is bounded.
I Example 1. Consider the Boolean UCRPQ ? = ?1 ? ?2 over the alphabet A = {a, b, c, d}
Lb,d Lb,d
such that ?1 = ?x, y (x ?L?b y ? x ???? y) and ?2 = ?x, y (x ?L??d y ? x ???? y), where
Lb := a+b+c, Ld := ad+c+, and Lb,d := a+(b + d)c+. For e ? A, recall that e+ denotes the
language e(e?). As we shall explain in Example 4, we have that ?1 and ?2 are unbounded.
However, ? is bounded, and in particular, it is equivalent to the UCQ ? = ?1 ? ?2, where
?1 and ?2 correspond to ?x, y (x ?a?b?c y) and ?x, y (x ?a?d?c y), respectively. J
3
Characterizations of Boundedness for UC2RPQs
In this section we provide two simple characterizations of when a UC2RPQ is bounded that
will be useful to analyze the complexity of Boundedness. Let ?(x?) and ?0(x?) be CQs
over A with variable sets V and V0, respectively. Let =? and =?0 be the binary relations
induced on V and V0 by the equality atoms of ? and ?0, respectively, and =?? and =??0 be
their reflexivetransitive closure. A homomorphism from ? to ?0 is a mapping h : V ? V0
such that: (i) x =?? y implies h(x) =??0 h(y); (ii) h(x?) = x?; and (iii) for each atom x ??a y
in ? with a ? A, there is an atom x0 ??a y0 in ?0 such that h(x) =??0 x0 and h(y) =??0 y0. We
write ? ? ?0 if such a homomorphism exists. It is known that ? ? ?0 iff ?0 ? ? [16].
An expansion of a C2RPQ ?(x?) over A is a CQ ?(x?) over A with minimal number of
variables and atoms such that (i) ? contains each variable of ?, (ii) for each atom A = x ??L y
of ?, there is an oriented path ?A in ? from x to y with label wA ? L whose intermediate
variables (i.e., those not in {x, y}) are distinct from one another, and (iii) intermediate
variables of different oriented paths ?A and ?A0 are disjoint. Note that the free variables of
? and ? coincide. Intuitively, the expansion ? is obtained from ? by choosing for each atom
A = x ??L y a word wA ? L, and ?expanding? x ??L y into the ?fresh oriented path? ?A from
x to y with label wA. When wA = ? then ? contains the equality atom x = y. An expansion
of a UC2RPQ ? is an expansion of some C2RPQ in ?. Observe that a (U)C2RPQ is always
equivalent to the (potentially infinite) UCQ given by its set of expansions. Even more, it is
equivalent to the UCQ defined by its minimal expansions, as introduced below.
If ? is an expansion of a UC2RPQ ?, we define the size of ?, denoted by k?k, to be the
number of (nonequality) atoms in ?. We say that ? is minimal, if there is no expansion ?0
such that ?0 ? ? and k?0k < k?k. Intuitively, an expansion is minimal if its answers cannot
be covered by a smaller expansion. We can then establish the following.
I Lemma 2. Every UC2RPQ ? is equivalent to the (potentially infinite) UCQ given by its
set of minimal expansions.
We can now provide our basic characterizations of boundedness.
I Proposition 3. The following conditions are equivalent for each UC2RPQ ?.
1. ? is bounded.
2. There is k ? 1 such that for every expansion ? of ? there exists an expansion ?0 of ?
with k?0k ? k such that ? ? ?0 (i.e., such that ?0 ? ?).
3. ? has finitely many minimal expansions.
and ?x, y (x ?a?d?c y ? x ?a?d?c y).
I Example 4. Consider the Boolean UCRPQ ? = ?1 ??2 over A = {a, b, c, d} from Example 1.
To see that ?1 is unbounded (the case of ?2 is similar) we can apply Proposition 3. Indeed, the
expansions of ?1 corresponding to {?x, y (x ?a?b?n?c y ? x ?a?d?c y) : n ? 1} are all minimal. On
the other hand, ? is bounded as its minimal expansions correspond to ?x, y (x ?a?b?c y?x ?a?b?c y)
J
4
Boundedness for Existentially Quantified RPQs
As a first application of Proposition 3, we study Boundedness for CRPQs consisting of
a single RPQ; that is, RPQs or existentially quantified RPQs. Let v, w be words over A.
Recall that a word v is a prefix [resp. suffix and factor ] of w if w ? v ? A? [resp. w ? A? ? v
and w ? A? ? v ? A?]. If in addition we have v 6= w, then we say that v is a proper prefix [resp.
suffix and factor] of w. For a language L ? A?, we define its prefixfree sublanguage Lpf to
be the set of words w ? L such that w has no proper prefix in L. Similarly, we define Lsf
and Lff with respect to the suffix and factor relation. We have the following:
3. A Boolean CRPQ ?x, y(x ??L y) with x 6= y is bounded iff Lff is finite.
I Proposition 5. The following statements hold.
1. An RPQ L is bounded iff L is finite.
2. A CRPQ ?y(x ??L y) [resp. ?x(x ??L y)] with x 6= y is bounded iff Lpf [resp. Lsf] is finite.
I Theorem 6. The problem of, given an NFA accepting the language L, checking whether
Lpf is finite is PSpacecomplete. The same holds if we replace Lpf by Lsf or Lff.
Proof. We focus on upper bounds, the lower bounds are in the appendix. Given an NFA A
accepting the language L, we can construct an NFA B of polynomial size in A that accepts
precisely those words that have a proper prefix in L. By complementing and intersecting
with A, we obtain an NFA B0 of exponential size in A that accepts the language Lpf. Hence,
we only need to check whether the language accepted by B0 is finite, which can be done
onthefly in NL w.r.t. B0, and hence in PSpace. The other two cases are analogous. J
By applying Theorem 6 and Proposition 5, we can now pinpoint the complexity of
Boundedness for CRPQs with a single RPQ.
I Corollary 7. The following statements hold.
1. Boundedness for RPQs is NLcomplete.
2. Boundedness for CRPQs of the form ?y(x ??L y), with x =6 y, is PSpacecomplete.
The same holds for CRPQs ?x(x ??L y) and Boolean CRPQs ?x, y(x ??L y), where x 6= y.
It is not clear, though, how usual automata techniques, as the ones applied in the proof
of Theorem 6, can be used to solve Boundedness for more complex CRPQs. To solve
this problem we develop an approach based on distance automata, as introduced next. Our
approach also handles inverses and unions, thus dealing with arbitrary UC2RPQs.
5
Distance Automata
Distance automata [24] (equivalent to weighted automata over the (min, +)semiring [21],
minautomata [12], or {?, ic}Bautomata [17]) are an extension of finite automata which
associate to each word in the language a natural number or ?cost?. They can be represented
as nondeterministic finite automata with two sorts of transitions: costly and noncostly. For
a given distance automaton, the cost of a run on a word is the number of costly transitions,
and the cost of a word w ? A? is the minimum cost of an accepting run on w. We will use
this automaton model to encode boundedness as the problem of whether there is a uniform
bound on the cost of words, known as the limitedness problem.
Formally, a distance automaton (henceforth DA) is a tuple A = (A, Q, q0, F, ?), where
A is a finite alphabet, Q is a finite set of states, q0 ? Q is the initial state, F ? Q is the
set of finals states and ? ? Q ? A ? {0, 1} ? Q is the transition relation. A word w ? A? is
accepted by A if there is an accepting run of A on w, i.e., a (possibly empty) sequence of
transitions ? = (p1, a1, c1, r1) ? ? ? (pn, an, cn, rn) ? ?? with the usual properties: (1) if ? = ?
then q0 ? F and w = ?, (2) p1 = q0 and rn ? F , (3) for every 1 ? i < n we have ri = pi+1,
and (4) w = a1 ? ? ? an. The cost of the run ? is cost(?) = c1 + ? ? ? + cn (or 0 if ? = ?); and the
cost costA(w) of a word w accepted by A is the minimum cost of an accepting run of A on
w. For convenience, we assume the cost of words not accepted by A to be 0.
The limitedness problem for DA is defined as follows: given a DA A, determine whether
supw?A? costA(w) < ?. This problem is known to be PSpacecomplete.
I Theorem 8 ([28, 29]). The following statements hold:
1. The limitedness problem for DA is PSpacecomplete.
2. If a DA with n states is limited, then supw?A? costA(w) ? 2O(n3).
We use two extensions of DA: alternating and twoway. Twoway DA is defined as for
NFA, extending the cost function accordingly. The cost of a word is still the minimum over
the cost of all (potentially infinitely many) runs. Alternating DA is defined as usual by having
two sorts of states: universal and existential. Existential states can be seen as computing
the minimum among the cost of all possible continuations of the run, and universal states as
computing the maximum (or supremum if the automaton is also twoway). As we will see,
these extensions preserve the above PSpace upper bound for the limitedness problem.
Formally, an alternating twoway DA with epsilon transitions (A2DA?) over A is a tuple
A = (A, Q?, Q?, q0, F, ?) is an A2DA? if q0 ? Q?, F ? Q? and
? ? (Q? ? Q?) ? (A? ? {?}) ? {end, end} ? {0, 1} ? (Q? ? Q?);
where end indicates that after reading the letter we arrive at the end of the word (i.e., either
the leftmost or the rightmost end) and end indicates that we do not. When the automaton
A is twoway, it is convenient to think of its head as being between the letter positions of the
word, so an endflagged transition can be applied only if it moves the head to be right before
the first letter of the word, or right after the last one.
For any given word w ? A?, consider the edgelabelled graph GA,w = (V, E) over ?,
where V = Q ? {0, . . . , w}, with Q = Q? ? Q?, and E ? V ? ? ? V consists of all edges
(q, i) ?(?q,?a?,e?,c?,p?) (p, j) such that e = end iff j = 0 or j = w and either (a) i < w, a = w[i + 1],
and j = i + 1; (b) i > 0, a = (w[i])?1, and j = i ? 1; or (c) a = ? and j = i.
An accepting run of A on w from (q, i) ? Q ? {0, . . . , w} is a finite (possibly empty)
edgelabelled directed rooted tree1 t over ? and a labelling h from the nodes of t to the nodes
of GA,w, such that if t is empty then q ? F , and otherwise h maps the root of t to (q, i),
every leaf of t to F ? {0, . . . , w}, and for every node x of t:
if (x, ?, y) is an (labeled) edge in t for some y, then (h(x), ?, h(y)) is an edge in GA,w;
if h(x) ? Q? ? {0, . . . , w}, then for every edge (h(x), ?, c) in GA,w, there is an edge
(x, ?, y) in t so that h(y) = c;
if h(x) ? Q? ? {0, . . . , w}, then x has at most one child.
Each branch of t with label (q1, a1, e1, c1, p1), . . . , (qn, an, en, cn, pn) has an associated
cost of c1 + ? ? ? + cn; and the cost associated with t is the maximum among the costs of its
branches, or 0 if t is empty. The cost costA(w, q, i) is the minimum cost of an accepting run
on w from (q, i), or 0 if none exists; costA(w) is defined as costA(w, q0, 0).
An A2DA? with ? ? Q ? (A ? {?}) ? {end, end} ? {0, 1} ? Q is an alternating DA with ?
transitions (ADA?). An A2DA? with Q? = ? is a twoway DA with ? transitions (2DA?). An
A2DA with both the aforementioned conditions is (equivalent to) a DA with ? transitions
(DA?). Notice that in the last two cases, accepting runs can be represented as words from ??
rather than trees. By A2DA (resp., ADA, 2DA, DA) we denote an A2DA? (resp., ADA?,
2DA?, DA?) with no ?transitions. Note that DA as just defined is in every sense equivalent
to the distance automata model we have defined at the beginning of this section ? this is
why we overload the same ?DA? name.
We first observe that 2DA can be transformed into DA while preserving both the language
and limitedness problems by adapting the standard ?crossing sequence? construction for
translating 2NFA into NFA [34]. This fact will be useful for proving the ExpSpace upper
bound for Boundedness of general UC2RPQs in Section 6.
I Proposition 9. There is an exponential time procedure which for every 2DA A over A
produces a DA B over A such that the languages accepted by A and B are the same, and
costB(w) ? costA(w) ? f (costB(w)) for every w ? A?, where f is a polynomial function that
depends on the number of states of A.
1 That is, a treeshaped finite edgelabelled graph over ? with edges directed in the roottoleaf sense.
Recall that the universality problem for NFAs is known to be PSpacecomplete [27];
and that this bound actually extends to twoway and even alternating automata. We show
that, likewise, the limitedness problem remains in PSpace for A2DA?. This result will be
useful to show in Section 7 that Boundedness for the class of acyclic UC2RPQs of bounded
thickness is in PSpace.
I Theorem 10. The limitedness problem for A2DA? is PSpacecomplete.
The novelty of this result is the PSpace upper bound. In fact, decidability follows from
known results, and in particular [7, Theorem 14] claims ExpTimemembership in the more
challenging setup of infinite trees. However, this is obtained via an involved construction
spanning through several papers. The proof of Theorem 10, instead, is obtained by the
composition of the following reductions:
lim. A2DA? ?(?1?) lim. A2DA ?(?2?) lim. 2DA ?(?3?) lim. ADA? ?(?4?) lim. ADA ?(?5?) lim. DA.
Reductions (1), (3) and (4) are in polynomial time, while reductions (2) and (5), which are
basically the same, are in exponential time. Specifically, reductions (2) and (5) preserve
the statespace but the size of the alphabet grows exponentially in the number of states and
linearly in the size of the source alphabet. However, the alphabet and transition set resulting
from these reductions can be succinctly described: letters are encoded in polynomial space,
and checking for membership in the transition set is polynomial time computable.
In summary, the composition (1)+(2)+(3)+(4)+(5) yields a DA with the following
characteristics: (i) it has a polynomial number of states Q; (ii) it runs on an exponential
alphabet A ?and every letter is encoded in polynomial space?; and (iii) one can check in
polynomial time whether a tuple t ? Q ? A ? {end, end} ? {0, 1} ? Q is in its transition
relation. This, coupled with Theorem 8, item (2) (which offers a bound depending only on
the number of states), provides a polynomial space algorithm for the limitedness of A2DA?:
We can nondeterministically check the existence of a word with cost greater than the single
exponential bound N using only polynomial space, by guessing one letter at a time and
keeping the set of reachable states together with the associated costs, where each cost is
encoded in binary using polynomial space if it is smaller than N , or with a ??? flag otherwise.
The algorithm accepts if at least one final state is reached and the costs of all reachable final
states are marked ?. Since NPSpace =PSpace (Savitch?s Theorem), Theorem 10 follows.
We now provide a brief description of the reductions used in the proof of Theorem 10.
(1) From A2DA? to A2DA. This is a trivial reduction obtained by simulating ?transitions
by reading a ? a?1 for some a ? A.
(2) From A2DA to 2DA. Given an A2DA A = (A, Q?, Q?, q0, F, ?), we build a 2DA B over
a larger alphabet B, where we trade alternation for extra alphabet letters. The alphabet
B consists of triples (f ?, a, f ?), where a ? A and f ?, f ? : Q? ? ?. The idea is that
f ?, f ? are ?choice functions? for the alternation: whenever we are to the left (resp.,
right) of a position of the word labelled (f ?, a, f ?) in state q ? Q?, instead of exploring
all transitions departing from q and taking the maximum cost over all such runs (this is
what alternation does in A), B chooses to just take the transition f ?(q) (resp., f ?(q)).
Note that B is exponential in the number of states but not in the size of A. In this way,
we build a 2DA B having the same set of states as A but with a transition function which
is essentially deterministic on the states of Q?. In the end we obtain that
for every w ? B?, costB(w) ? costA(wA); and
for every w ? A? there is we ? B? so that weA = w and costA(w) = costB(w),
e
where wA and weA denote the projections onto the alphabet A. This implies that the
limitedness problem is preserved.
(3) From 2DA to ADA?. We show a polynomialtime translation from 2DA to ADA? which
preserves limitedness. In the case of finite automata, there are languagepreserving
reductions from 2NFA to AFA with a quadratic blowup in the statespace [9, 32]. However,
these translations, when applied blindly to reduce from 2DA to ADA?, preserve neither
the cost semantics nor the limitedness of languages. On the other hand, [10] shows an
involved construction that results in a reduction from 2DA to ADA? on infinite trees,
which preserves limitedness but it is not polynomial in the number of states. We show a
translation from 2DA to ADA? which serves our purpose: it preserves limitedness and
it is polynomial time computable. The translation is close to the languagepreserving
reduction from 2NFA to AFA of [32], upgraded to take into account the cost of different
alternation branches, somewhat in the same spirit as the history summaries from [10].
(4) From ADA? to ADA. This is a straightforward polynomial time reduction which
preserves limitedness but ? as opposed to (1) ? does not preserve the language: we need to
add an extra letter to the alphabet in order to make the reduction work in polynomial
time.
(5) From ADA to DA. This is exactly the same reduction as (2), noticing that the alphabet
will still be single exponential in the original A2DA?.
6
Complexity of Boundedness for UC2RPQs
Here we show that Boundedness for UC2RPQs is ExpSpacecomplete. We do so by
applying distance automata results presented in the previous section on top of the semantic
characterizations presented in Section 3. The lower bound applies even for CRPQs. We
further show that there is a triple exponential tight bound for the size of the equivalent UCQ
of a UC2RPQ (and even CRPQ), whenever this exists. This is summarized in the following
theorem. If ? is a UC2RPQ, we write k?k for the length of an arbitrary reasonable encoding
of ? ? in particular, encodings in which regular languages are described through NFA or
regular expressions.
I Theorem 11. The following statements hold.
1. Boundedness for UC2RPQs is ExpSpacecomplete. The problem remains
ExpSpacehard even for Boolean CRPQs.
2. If a UC2RPQ ? is bounded, there is a UCQ ? that is equivalent to ? and such that ? has
at most tripleexponentially many CQs, each one of which is at most of double exponential
size with respect to k k
? .
3. There is a family {?n}n?1 of Boolean CRPQs such that for each n ? 1 it is the case that:
(1) k?nk = O(n), (2) ?n is bounded, and (3) every UCQ that is equivalent to ?n has at
least tripleexponentially many CQs with respect to n.
6.1
Upper bounds
Our upper bound proof builds on top of techniques developed by Calvanese et al. [14] for
studying the containment problem for UC2RPQs: Given UC2RPQs ?, ?0, is it the case
that ? ? ?0? It is shown in [14] that from ?, ?0 it is possible to construct exponentially
sized NFAs A?,?0 and A0?,?0 , such that ? ? ?0 iff there is a word in A?,?0 ? A0?,?0 . It is a
wellknown result that the latter is solvable in NL in the combined size of (A?,?0 , A0?,?0 ), i.e.,
in ExpSpace. We modify this construction to study the boundedness of a given UC2RPQ ?.
In particular, we construct from ? in exponential time a DA D? such that ? is bounded iff
D? is limited. The result then follows from Theorem 8, which establishes that limitedness for
D? can be solved in polynomial space on the number of its states, and thus in ExpSpace.
I Proposition 12. There is a single exponential time procedure that takes as input a UC2RPQ
? and constructs a DA D? such that ? is bounded iff D? is limited.
Proof. Similarly as done in [14], the DA D? will run over encodings of expansions of the
UC2RPQ ?, i.e., words over the alphabet A1 := A? ? V ? {$}, where A is the alphabet of
?, V is the set of variables of ?, and $ is a fresh symbol. If ? = ?z? V1?i?m(xi ?L?i yi) is
a C2RPQ in ? and ? is the expansion of ? obtained by expanding each xi ?L?i yi into an
oriented path ?i from xi to yi with label wi ? Li, then we encode ? as the word
w? = $x1w1y1$x2w2y2$ ? ? ? $xmwmym$ ? A1?
Note how the subword xiwiyi encodes the oriented path ?i. Every position j ? {1, . . . , w?}
with w?[j] 6= $ represents a variable in ?: either xi or yi if w?[j] = xi or w?[j] = yi,
respectively; or the (` + 1)th variable in the oriented path ?i if w?[j] is the `th symbol in
the subword wi. Hence different positions in w? could represent the same variable in ?, e.g.,
in the encoding $xabcy$, the 5th position containing a ?c? and the 6th position containing
a ?y?, represent the same variable, namely, the last vertex y of the oriented path. It is
then easy to build, in polynomial time, an NFA A1 over A1 recognizing the language of all
such encodings of expansions of ?. Our automaton D? is the product of A1 and the DA C?
defined below. In particular, D? is limited iff C? is limited over words of the form w?, for ?
an expansion of ?.
Fix a disjunct ? of ?. As in [14], we consider words over the alphabet A2 := A1 ?(2V ?{#})
of the form (`1, ?1) ? ? ? (`n, ?n), such that w? = `1 ? ? ? `n, for some expansion ? of ?, and the
?i?s are valid ?annotations, i.e., (1) ?i = # if `i = $, (2) ?1, . . . , ?n ? 2V induce a partition
of the variable set V? of ?, and (3) for each free variable x ? V? there is some (`i, ?i) such
that `i = x and x ? ?i. It is easy to construct an NFA B1? of exponential size that given
w = (`1, ?1) ? ? ? (`n, ?n) with w? = `1 ? ? ? `n, checks if the ?i?s are valid ?annotations. Note
that if the latter holds, then the annotations encode a mapping hw from V? to the variables
of ? such that hw(x?) = x?, where x? are the free variables of ?.
Now, given w = (`1, ?1)(`2, ?2) ? ? ? (`n, ?n) with w? = `1 ? ? ? `n and the ?i?s being valid
?annotations, it is shown in [14] that one can construct in polynomial time a 2NFA B2? that
checks the existence of an expansion ?0 of ? and a homomorphism h from ?0 to ? consistent
with hw. For each atom x ??L y of ?, the automaton B2? guesses an oriented path ? in ?
from hw(x) to hw(y) with label w0 ? L, directly over the encoding w? starting at a position
jx and ending at a position jy in {0, . . . , n} (recall that the head moves in {0, . . . , n}) with
jx, jy > 0, w[jx] = (`, ?), w[jy] = (`0, ?0), x ? ? and y ? ?0. Note that we have two types of
transitions: (1) transitions that consume a ? A? and actually guess an atom of ?, and (2)
transitions to ?jump? from position j to j0 in {0, . . . , n} representing equivalent variables of
?. The latter means that j, j0 > 0 and either w?[j] and w?[j0] represents exactly the same
variable of ?, or w?[j] and w?[j0] represent variables z, z0 of ? such that z =?? z0, where =?
?
is the reflexivetransitive closure of the relation induced by the equality atoms in ?.
Let D2? be the 2DA obtained from the 2NFA B2? by setting to 0 and 1 the cost of transitions
of type (2) and (1), respectively. Hence, for a word w such that the projection of w to A1 is
w?, and the one to (2V ? {#}) is a valid ?annotation, we have that costD2? (w) is precisely the
minimum size of an expansion ?0 that can be mapped to ? via a homomorphism compatible
with hw. By Proposition 9, we can construct in exponential time in D2? a DA C2? accepting the
same language as D2? and having an exponential number of states, so that for every word w0,
wbee thhaeverecsouslttCo2?f(wta0)ki?ngcothsteDp2?r(owd0)uc?t fof(cBo1?stCa2?n(dwC0)2?) afonrdstohmene pproolyjencotminigalovfuenrctthioenalfp.haLbeett?AC1?.
For every expansion ? of ?, if ?0 is a minimal size expansion of ? such that ?0 ? ?, then we
obtain that cost?C? (w?) ? k?0k ? f (cost?C? (w?)). We define our desired C? to be the union
of ?C? over all ? in ?. We have that for every expansion ?, if ?min is a minimal size expansion
of ? such that ?min ? ?, then costC? (w?) ? k?mink ? f (costC? (w?)). By Proposition 3,
item (2), ? is bounded iff k?mink is bounded over all ?. The latter condition holds iff C?
is limited over words w?, for all expansion ?. By definition, the latter is equivalent to D?
being limited. Summing up, we obtain that ? is bounded iff D? is limited, as required. Note
that the whole construction can be done in exponential time. J
As a corollary to Proposition 12 and Theorem 8 we obtain the desired upper bound for
part (1) of Theorem 11.
I Corollary 13. Boundedness for UC2RPQs is in ExpSpace.
Size of equivalent UCQs. Here we prove part (2) of Theorem 11. Since ? is bounded
we have from Proposition 12 that D? is limited. Then, from Theorem 8 we obtain that
the maximum cost that it takes D? over a word is N , where N is exponential in the
number of states of D?, and thus double exponential in k?k by construction. Therefore, for
every expansion ? of ?, if ?min is a minimal size expansion ? such that ?min ? ?, then
k?mink ? f (N ), where f is the polynomial function of the proof of Proposition 12. In
particular, all minimal expansions of ? are of size ? f (N ). By Lemma 2, the UC2RPQ
? is equivalent to the union of all its minimal expansions. The number of such minimal
expansions is thus at most exponential in f (N ), and hence triple exponential in k?k.
6.2
Lower bounds
We reduce from the 2ntiling problem, that is, a tiling problem restricted to 2n many columns,
which is ExpSpacecomplete (see, e.g., [14]). We show that for every 2ntiling problem
T there is a CRPQ ?, computable in polynomial time from T , whose number of minimal
expansions is essentially the number of solutions to T in the following sense.
I Lemma 14. For every 2ntiling problem T with m solutions there is a Boolean CRPQ
?, computable in polynomial time from T , such that the number of minimal expansions of
? is O((g(T ) + m)n+1) and ?(m), for some double exponential function g. Further, ?
consists of a Boolean CRPQ of the form ?x, y V0?i?n(x ?L?i y), where each Li is given as a
regular expression.
As a corollary, this yields an ExpSpace lower bound for the boundedness problem (part
(1) of Theorem 11), as well as a triple exponential lower bound for the size of the UCQ
equivalent to any bounded CRPQ (part (3) of Theorem 11), since one can produce 2ntiling
problems having tripleexponentially many solutions.
7
Betterbehaved Classes of UC2RPQs
Here we present two restrictions of UC2RPQs that exhibit a better behavior in terms of the
complexity of Boundedness than the general case, namely, acyclic UC2RPQs of bounded
thickness and strongly connected UCRPQs. The improved bounds are PSpace and ?2P ,
respectively, which turn out to be optimal.
Acyclic UC2RPQs of Bounded Thickness. For any two distinct variables x, y of a C2RPQ
?, we denote by Atoms? (x, y) the set of atoms in ? of the form x ??L y or y ??L x. The
thickness of a C2RPQ ? is the maximum cardinality of a set of the form Atoms? (x, y), for
x, y variables of ? with x 6= y. The thickness of a UC2RPQ ? is the maximum thickness
over all the C2RPQs in ?. The underlying undirected graph of ? has as vertex set the set of
variables of ? and contains an edge {x, y} iff x 6= y and Atoms? (x, y) 6= ?. A C2RPQ ? is
acyclic if its underlying undirected graph is an acyclic graph (i.e., a forest). A UC2RPQ ? is
acyclic if each C2RPQ in ? is.
We show next that Boundedness for acyclic UC2RPQs of bounded thickness is
PSpacecomplete. These classes of UC2RPQs have been previously studied in the literature [4, 5].
In particular, it follows from [5, Theorem 4.2] that the containment problem for the acyclic
UC2RPQs of bounded thickness is PSpacecomplete, and hence Theorem 15 below shows
that Boundedness is not more costly than containment for these classes.
I Theorem 15. Fix k ? 1. The problem Boundedness is PSpacecomplete for acyclic
UC2RPQs of thickness at most k.
Proof (sketch). The lower bound follows directly from PSpacehardness of Boundedness
for RPQs (see Corollary 7). For the PSpace upper bound, we follow a similar strategy as
in the case of arbitrary UC2RPQs (Section 6.1), i.e., we reduce boundedness of ? to DA
limitedness. The main difference is that, since ? is acyclic, we can exploit the power of
alternation and construct an A2DA? B (instead of a 2DA, as in the proof of Proposition 12),
such that ? is bounded iff B is limited. The constant upper bound on the thickness of ?
implies that B is actually of polynomial size. The result follows then as limitedness of an
A2DA? can be decided in PSpace in virtue of Theorem 10. J
Both conditions in Theorem 15, i.e., acyclicity and bounded thickness, are necessary.
Indeed, it follows from Lemma 14 that Boundedness is ExpSpacehard even for:
Boolean acyclic CRPQs.
Boolean CRPQs of thickness one, whose underlying undirected graph is of treewidth two.
Recall that the treewidth is a measure of how much a graph resembles a tree (cf., [20]) ?
acyclic graphs are precisely the graphs of treewidth one.
Indeed, the CRPQs of the form ?x, y Vi(x ?L?i y) used in Lemma 14 are Boolean and acyclic
(but have unbounded thickness). Replacing each (x ?L?i y) with (x ??? zi) ? (zi ?L?i y), yields
an equivalent CRPQ of thickness one whose underlying undirected graph has treewidth two.
Strongly Connected UCRPQs. We conclude this section with an even better behaved class
of CRPQs in terms of Boundedness. Unlike the previous case, the definition of this class
depends on the underlying directed graph of a CRPQ ?. This contains a directed edge from
variable x to y iff there is an atom in ? of the form x ??L y. A CRPQ ? is strongly connected
if its underlying directed graph is strongly connected, i.e., every pair of variables is connected
by some directed path. A UCRPQ ? is strongly connected if every CRPQ in ? is. We can
then establish the following.
I Theorem 16. Boundedness is ?2P complete for strongly connected UCRPQs.
8
Discussion and Future Work
The main conclusion of our work is that techniques previously used in the study of containment
of UC2RPQs can be naturally leveraged to pinpoint the complexity of Boundedness by using
DA instead of NFA. This, however, requires extending results on limitedness to alternating
and twoway DA. For all the classes of UC2RPQs studied in the paper we show in fact that
the complexity of Boundedness coincides with that of the containment problem. We leave
open what is the exact size of UCQ rewritings for the classes of acyclic UC2RPQs of bounded
thickness and the strongly connected UCRPQs that are bounded.
The most natural next step is to study Boundedness for the class of regular queries
(RQs), which are the closure of UC2RPQs under binary transitive closure. RQs are one of
the most powerful recursive languages for which containment is decidable in elementary time.
In fact, containment of RQs has been proved to be 2EXPSPACEcomplete by applying
sophisticated techniques based on NFA [33]. We will study if it is possible to settle the
complexity of Boundedness for RQs with the help of DA techniques.
Another interesting future line of work is the study of Boundedness for UC2RPQs
based on the restricted classes of regular expressions often found in practical applications [13].
As it has been shown lately, the complexity of some query evaluation problems is alleviated
under this restriction [30], and it would be nice to see if the same holds for the boundedness
problem. This would be good news for the applicability of boundedness techniques in
practical applications. In fact, it would be an indication that the high complexity lower
bounds obtained in this paper are mostly witnessed by complicated interactions between
regular expressions not commonly arising in practice.
1
2
3
4
5
6
7
8
9
104:13
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
36
Renzo Angles , Marcelo Arenas, Pablo Barcel?, Aidan Hogan, Juan L. Reutter , and Domagoj Vrgo? . Foundations of Modern Query Languages for Graph Databases . ACM Computing Surveys , 50 ( 5 ): 68 : 1  68 : 40 , 2017 .
Pablo Barcel? . Querying graph databases . In ACM Symposium on Principles of Database Systems (PODS) , pages 175  188 , 2013 .
Pablo Barcel? , Gerald Berger, Carsten Lutz, and Andreas Pieris . FirstOrder Rewritability of FrontierGuarded OntologyMediated Queries . In International Joint Conference on Artificial Intelligence (IJCAI) , pages 1707  1713 , 2018 .
Pablo Barcel? , Miguel Romero, and Moshe Y. Vardi . Does Query Evaluation Tractability Help Query Containment? In ACM Symposium on Principles of Database Systems (PODS) , pages 188  199 , 2014 .
SIAM Journal on computing , 45 ( 4 ): 1339  1376 , 2016 .
Michael Benedikt , Pierre Bourhis, and Michael Vanden Boom. A Step Up in Expressiveness of Decidable Fixpoint Logics . In Annual IEEE Symposium on Logic in Computer Science (LICS) , pages 817  826 , 2016 .
Michael Benedikt , Balder ten Cate, Thomas Colcombet, and Michael Vanden Boom. The Complexity of Boundedness for Guarded Logics . In Annual IEEE Symposium on Logic in Computer Science (LICS) , pages 293  304 . IEEE Computer Society Press, 2015 . doi: 10 .1109/LICS. 2015 . 36 .
Meghyn Bienvenu , Peter Hansen , Carsten Lutz, and Frank Wolter . First OrderRewritability and Containment of Conjunctive Queries in Horn Description Logics . In International Joint Conference on Artificial Intelligence (IJCAI) , pages 965  971 , 2016 .
JeanCamille Birget . Statecomplexity of finitestate devices, state compressibility and incompressibility . Mathematical systems theory , 26 ( 3 ): 237  269 , 1993 .
Hing Leung . Limitedness Theorem on Finite Automata with Distance Functions: An Algebraic Proof . Theoretical Computer Science (TCS) , 81 ( 1 ): 137  145 , 1991 . doi: 10 .1016/ 0304  3975 ( 91 ) 90321  R .
Hing Leung and Viktor Podolskiy . The limitedness problem on distance automata: Hashiguchi's method revisited . Theoretical Computer Science (TCS) , 310 ( 13 ): 147  158 , 2004 . doi:10.
1016/ S0304  3975 ( 03 ) 00377  3 .
Wim Martens and Tina Trautner . Evaluation and Enumeration Problems for Regular Path Queries . In International Conference on Database Theory (ICDT) , pages 19 : 1  19 : 21 , 2018 .
Jeffrey F. Naughton . Data Independent Recursion in Deductive Databases . Journal of Computer and System Sciences (JCSS) , 38 ( 2 ): 259  289 , 1989 .
Nir Piterman and Moshe Y. Vardi . From bidirectionality to alternation . Theoretical Computer Science (TCS) , 295 : 295  321 , 2003 . doi: 10 .1016/S0304 3975 ( 02 ) 00410  3 .
Theoretical Computer Science (TCS) , 61 ( 1 ): 31  83 , 2017 .
John C. Shepherdson. The reduction of twoway automata to oneway automata . IBM Journal of Research and Development , 3 ( 2 ): 198  200 , 1959 .
Michael Vanden Boom . Weak Cost Monadic Logic over Infinite Trees . In International Symposium on Mathematical Foundations of Computer Science (MFCS) , pages 580  591 , 2011 .
Michael Vanden Boom . Weak cost automata over infinite trees . PhD thesis , University of Oxford, UK, 2012 .