Decidable classes of documents for XPath

Leibniz International Proceedings in Informatics, Dec 2012

We study the satisfiability problem for XPath over XML documents of bounded depth. We define two parameters, called match width and braid width, that assign a number to any class of documents. We show that for all k, satisfiability for XPath restricted to bounded depth documents with match width at most k is decidable; and that XPath is undecidable on any class of documents with unbounded braid width. We conjecture that these two parameters are equivalent, in the sense that a class of documents has bounded match width iff it has bounded braid width.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://drops.dagstuhl.de/opus/volltexte/2012/3851/pdf/11.pdf

Decidable classes of documents for XPath

F S T T C S Decidable classes of documents for XPath∗ Vince Bárány 0 Mikołaj Bojańczyk 0 Diego Figueira 0 Paweł Parys 0 0 TU Darmstadt, Darmstadt, Germany University of Warsaw, Warsaw, Poland University of Edinburgh , Edinburgh , United Kingdom We study the satisfiability problem for XPath over XML documents of bounded depth. We define two parameters, called match width and braid width, that assign a number to any class of documents. We show that for all k, satisfiability for XPath restricted to bounded depth documents with match width at most k is decidable; and that XPath is undecidable on any class of documents with unbounded braid width. We conjecture that these two parameters are equivalent, in the sense that a class of documents has bounded match width iff it has bounded braid width. 1998 ACM Subject Classification F.1.1 Models of Computation, F.4.1 Mathematical Logic, F.4.3 Formal Languages, H.2.3 Languages This paper is about satisfiability of XPath over XML documents, modelled as data trees. A data tree is a tree where every position carries a label from a finite set, and a data value from an infinite set. The data values can only be tested for equality. XPath satisfiability. XPath can be seen as a logic for expressing properties of data trees. Here are some examples of properties of data trees that can be expressed in XPath: “every two positions carry a different data value”, “if x and y are positions that carry the same data value, then on the path from x to y there is at most one position that has label b”. Our interest in XPath stems from the fact that it is arguably the most widely used XML query language. It is implemented in XSLT and XQuery and it is used in many specification and update languages. Query containment and query equivalence are important static analysis problems, which are useful to query optimization tasks. These problems reduce to checking for satisfiability: Is there a document on which a given XPath query has a non-empty result? By answering this question we can decide at compile time whether the query contains a contradiction and thus the computation of the query (or subquery) on the document can be avoided. Or, by answering the query equivalence problem, one can test if a query can be safely replaced by another one which is more optimized in some sense (e.g., in the use of some resource). Moreover, the satisfiability problem is crucial for applications on security, type checking transformations, and consistency of XML specifications. Our point of departure is that XPath satisfiability is an undecidable problem [9]. There are two main approaches of working around this undecidability. and phrases XPath; XML; class automata; data trees; data words; satisfiability Introduction 1. Restrict the formulas. The first way is to consider fragments of XPath that have decidable satisfiability. For example, fragments without negation or without recursive axes [1]; or fragments whose only navigation can be done downwards [7] or downwards and rightwards [6] or downwards and upwards [8]. However, even though all these restrictions yield decidable fragments, the most expressive ones have huge, non-primitive-recursive, complexity bounds. 2. Restrict the models. When proving undecidability of XPath satisfiability, for each Minsky machine one constructs an XPath formula ϕ, such that models of ϕ describe computations of the Minsky machine. XML documents that describe computations of Minsky machines seem unlikely to appear in the real world; and therefore it sounds reasonable to place some restrictions on data trees, restrictions that are satisfied by normal XML documents, but violated by descriptions of Minsky machines. This paper is devoted to the second approach. Comparison with tree width The archetype for our research is the connection between tree width and satisfiability of guarded second order logic, over graphs. Guarded second-order logic is a logic for expressing properties of undirected graphs. A formula of guarded second-order logic uses a predicate E(x, y) for the edge relation, and can quantify over nodes of the graph, sets of nodes of the graph, and subsets of the edges in the graph. Satisfiability of guarded second-order logic over graphs is undecidable (already first-order logic has undecidable satisfiability). However, the picture changes when one bounds the tree width of graphs. Bounded tree width is a sufficient condition for decidability. More precisely, for every k ∈ N, one can decide if a formula of guarded second-order logic is satisfied in a graph of tree width at most k [5]. Bounded tree width is also a necessary condition for decidability: if a set of graphs X has unbounded tree width, then it is undecidable if a given formula of guarded second-order logic has a model in X [12]. Our contribution Our goal in this paper is to find a parameter, which is to XPath over data trees, what tree width is to guarded second-order logic over graphs. As candidates, we de (...truncated)


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2012/3851/pdf/11.pdf

Vince B\'ar\'any, Mikolaj Bojanczyk, Diego Figueira, Pawel Parys. Decidable classes of documents for XPath, Leibniz International Proceedings in Informatics, 2012, pp. 99-111, 18, DOI: 10.4230/LIPIcs.FSTTCS.2012.99