Non-LR(1) Precedence Cascade Grammars (Short Paper)
Non-LR(1) Precedence Cascade Grammars
José-Luis Sierra
Fac. Informática. Universidad Complutense de Madrid
C/ Prof. José García Santesmases 9. 28040 Madrid, Spain
https://orcid.org/0000-0002-0317-0510
Abstract
Precedence cascade is a well-known pattern for writing context-free grammars (CFGs) that model
the syntax of expression languages. According to this method, precedence levels are represented by non-terminals, and operators’ attributes are used to write syntax rules properly. In
most cases, the resulting precedence cascade grammar (PCG) has neat properties that facilitate
its implementation. In particular, many PCGs are LR(1) grammars, which serve as input for
conventional bottom-up parser generators. However, for some cumbersome operator tables the
method does not produce such neat grammars. This paper focuses on these cumbersome operator
tables by identifying several conditions leading to non-LR(1) PCGs.
2012 ACM Subject Classification Software and its engineering → Syntax
Keywords and phrases grammarware, expression grammars, grammar patterns, grammar ambiguity, LR grammars
Digital Object Identifier 10.4230/OASIcs.SLATE.2018.11
Category Short Paper
Funding This work is supported by the project grants TIN2014-52010-R and TIN2017-88092 R.
1
Introduction
Most computer languages include an expression sub-language as their most distinctive feature.
This sub-language allows users to begin with a repertoire of primitive expressions and create
more complex expressions by combining simpler ones. Such a combination is carried out by
operators [13].
In this paper we will focus only in the most common classes of operators: binary infix,
and unary prefix and postfix operators. In addition, we will adopt the conventions of the
Prolog language to describe the attributes for these operators [5]:
Each operator will have a name (e.g., +, −, ∗ . . . ). It will be possible to overload this
name, allowing different operator definitions to share such a name.
Each operator will belong to a precedence level. Each precedence level will be represented
by a positive natural number. Operators in lower precedence levels will take priority over
(i.e., will bind tighter than) operators in higher ones1 . In addition, when an operator
is used to build an expression, this expression will take the precedence level for that
operator. Precedence levels for basic expressions will be 0.
1
That is, following Prolog conventions, in this paper precedence and priority of operators will be
contravariant properties.
© José-Luis Sierra;
licensed under Creative Commons License CC-BY
7th Symposium on Languages, Applications and Technologies (SLATE 2018).
Editors: Pedro Rangel Henriques, José Paulo Leal, António Leitão, and Xavier Gómez Guinovart
Article No. 11; pp. 11:1–11:8
OpenAccess Series in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
11:2
Non-LR(1) Precedence Cascade Grammars
Name
Precedence
Type
E3
→
⊗E3 | E2 ⊕ E3 | E2
⊗
⊕
⊗
⊗
3
3
2
2
1
fy
xfy
yfx
xfx
yf
E2
→
E2 E1 | E1 ⊗ E1 | E1
E1
→
E1 ⊗ | E0
E0
→
a | (E3 )
(a) Operator table for a sample expression
language.
(b) PCG for the descriptions in Table 1a; it is
an LR(1) grammar.
Figure 1 An operator table and its associated PCG.
Operators will constrain the precedence levels of their arguments to be: (i) lower than
their own precedence level (denoted by x in the description of the operator’s argument),
or (ii) lower or equal than such a precedence level (which will be denoted by y).
The fixity and the arguments’ allowed precedences will together form the operator’s
syntactic type. Following Prolog convention, this type will be one of the following forms:
(i) for infix operators, yf x, xf y, xf x; (ii) for prefix operators, f y, f x; and (iii) for postfix
operators, yf , xf . This way, yf x operators are left-associative, xf y right-associative, and
xf x non-associative. In turn, f y and yf are associative, while xf and f x are non-associative
unary (prefix and postfix) operators. All this information can be condensed into an operator
table for the language. Table 1a gives an example of an operator table2 .
To model the syntax of this kind of expression languages, it is possible to use a precedence
cascade pattern, which is described to a greater or lesser extent in any typical textbook on
compiler construction (e.g., [3, 8]). In order to describe the pattern, we will introduce the
following notation:
By ↓ (i) we will denote the precedence level immediately smaller than i, or 0 if i is the
smallest precedence level.
By > we will denote the greatest precedence level.
The pattern itself is based on the following conventions (Figure 1b shows the CFG that
results from applying these conventions to the Table 1a):
Each precedence level i has a non-terminal Ei associated with it that represents expressions
built with operators at that level.
Each operator in level i has a rule associated with it that characterizes the syntax of
the expressions formed with that operator. This rule depends on the operator’s type: (i)
Ei → Ei E↓(i) if the type is yf x; (ii) Ei → E↓(i) Ei if it is xf y; (iii) Ei → E↓(i) E↓(i)
if xf x; (iv) Ei → Ei if f y; (v) Ei → E↓(i) if f x; (vi) Ei → Ei if yf ; and (vii)
Ei → E↓(i) if the type is xf .
There is an additional rule Ei → E↓(i) for each level i.
Finally, there is a non-terminal symbol E0 that models the basic (i.e., literals, variables,
function calls, etc.) and parenthesized expressions. In the sequel we will abstract all the
basic expressions with a single a symbol. Thus, there will be an additional pair of rules
E0 → a | (E> )
2
Notice that, according to this operator table, an expression like “⊗a ⊕ a ⊕ a ⊗ a⊗” will mean “⊗(a ⊕
(a ⊕ (a ⊗ (a⊗))))”, while another one like “a ⊕ ⊗a” will be ill-formed (it should be written “a ⊕ (⊗a)”).
J. L. Sierra
11:3
E2
Name
Prec.
Type
2
1
yfx
xfx
E2 →
E2
E1 | E1
E1 →
E0
E0 | E0
E0 →
a | (E2 )
E2
E1
E2
E1
E1
E0
E0
E0
E0
a
a
a
a
(a) Operator table with multiple (b) PCG resulting of the oper- (c) Two different parse trees for “a
definitions of the infix operator . ator table presented in Table 2a. a”.
Figure 2 Example regarding multiple operator definitions with the same name and fixity.
We will refer to the CFGs produced by this pattern as precedence cascade grammars
(PCGs). A well-known example of using this pattern for a real programming language is Jeff
Lee’s YACC grammar for ANSI C3 .
For most operator tables, the PCGs are LR(1) grammars [6] suitable for typical bottomup, YACC-like, parser generators (this is the case, for instance, of the PCG in Figure 1b)4 .
However, there are also operator tables that lead to non-LR(1) grammars. Most of the time,
this is due to contradictory operator definitions, which in turn produce ambiguous PCGs.
Other times, such contradictions do not exist, but even so the resulting PCGs require more
than one look-ahead symbol. In this paper we address t (...truncated)