Canonical LR(1) items

An LR(1) item has the form $[A\to\alpha ● \beta, a]$ where $A\to\alpha\beta$ is a production and $a$ is a terminal or $.
The lookahead symbol $a$ has no effect when $\beta$ is not $\epsilon$. However, an item of the form $[A\to\alpha ●, a]$ calls for a reduction by $A\to\alpha$ only if the next input symbol is $a$.
An LR(1) item $[A\to\alpha ● \beta,a]$ is said to be valid for a viable prefix $\gamma$ if there is a derivation $S{\Rightarrow\atop rm}^*\delta Aw{\Rightarrow\atop rm}\delta\alpha\beta w$, where
- $\gamma = \delta\alpha$, and
- either $a$ is the first terminal symbol of $w$, or $w$ is $\epsilon$ and $a$ is $.
Just like LR(0) items, the LR(1) items are created from the augmented grammar $G'$.

Constructing LR(1) sets of items

Algorithms

CLOSURE(I) {
    repeat {
        for each item [A → αBβ, a] in I do {
            for each production B → γ in G' do {
                for each terminal b in first(βa) do {
                    add [B → ●γ, b] to set I;
                }
            }
        }
    } until no more items are add ti I;
    return I;
}

GOTO(I, X) {
    initialize J to empty set;
    for each item [A → α●Xβ, a] in I do {
        add item [A → αX●β, a] to set J;
    }
    return CLOSURE(J);
}

items(G') {
    initialize C to {CLOSURE( { [S' → ●S,$] } );
    repeat {
        for each set of items I in C do {
            for each grammar symbol X do {
                if (GOTO(I,X) is not empty and not in C then
                    add GOTO(I,X) to C;
            }
        }
    } until no new sets of items are added to C;
}

Construction of LR(1) parsing tables

Algorithm

1. Construct the collection C' = { I0, ..., In } of LR(1) items for G'.
2. Parser states are constructed from Ii as follows.
    (a) If [A → α●aβ, b] is in Ii and GOTO(Ii, a) = Ij, then set
        ACTION[i, a] to "shift j" for all terminal a.
    (b) If [A → α●, a] is in Ii and A ≠ S', then set ACTION[i, a]
        to "reduce A → α".
    (c) If [S' → S, $] is in Ii, then set ACTION[i, $] to "accept".
3. The goto transitions for state i are constructed for all
   nonterminals A using the rule: If GOTO(Ii, A) = Ij, then
   GOTO[i, A] = j.
4. All entries not defined by rules (2) and (3) are made "error".
5. The initial state of the parser is the one constructed from
   the set of items containing [S' → ●S, $].

Example

Starting from this augmented grammar

0) S' → S
1) S  → XX
2) X  → aX
3) X  → b

we arrive at this LR(1) sets of items

I0: S' → ●S, $          I4: X → b●, a/b
    S  → ●XX, $
    X  → ●aX, a/b       I5: S → XX●, $
    X  → ●b, a/b
                        I6: X → a●X, $
I1: S' → S●, $              X → ●aX, $
                            X → ●b, $
I2: S → X●X, $
    X → ●aX, $          I7: X → b●, $
    X → ●b, $
                        I8: X → aX●, a/b
I3: X → a●X, a/b
    X → ●aX, a/b        I9: X → aX●, $
    X → ●b, a/b

Example continued

From the above sets of LR(1) items, we obtain this LR(1) parsing tables

==================================
 STATE |    ACTION     |   GOTO
       +---------------+----------
       |  a    b    $  |   S   X
-------+---------------+----------
   0   |  s3   s4      |   1   2
   1   |           acc |
   2   |  s6   s7      |       5
   3   |  s3   s4      |       8
   4   |  r3   r3      |
   5   |            r1 |
   6   |  s6   s7      |       9
   7   |            r3 |
   8   |  r2   r2      |
   9   |            r2 |
-------+---------------+----------

LALR parser generation

An LR(1) parser can have a lot more states than the corresponding SLR(1) or LR(0) parser.
A Lookahead LR (LALR) parser attempts to reduce the number of states in an LR(1) parser by merging similar states.
It can usually reduce the number of states to the same as SLR(1) but retains some of the power of LR(1) lookaheads.

Example

Consider the example LR(1) parser table we derived earlier.

I0: S' → ●S, $          I4: X → b●, a/b
    S  → ●XX, $
    X  → ●aX, a/b       I5: S → XX●, $
    X  → ●b, a/b
                        I6: X → a●X, $
I1: S' → S●, $              X → ●aX, $
                            X → ●b, $
I2: S → X●X, $
    X → ●aX, $          I7: X → b●, $
    X → ●b, $
                        I8: X → aX●, a/b
I3: X → a●X, a/b
    X → ●aX, a/b        I9: X → aX●, $
    X → ●b, a/b

States I3 and I6 are similar. In fact, they are the same except for their lookahead sets. States I4 and I7 are also similar in the same way; so are states I8 and I9.

Example continued

What the LALR parser does is to merge these 3 pairs of states to just 3 states like this:

I36: X → a●X, a/b/$
     X → ●aX, a/b/$
     X → ●b, a/b/$

I47: X → b●, a/b/$

I89: X → aX●, a/b/$

However, this gives us back the SLR(1) table!
The good news is that it is not always the case that merging similar states will result in an SLR(1) table. Sometimes, the resulting LALR table does a better job than an SLR(1) table.
Can merging states in this way introduce new conflicts? An interesting observation is that merging states this way cannot create new shift-reduce conflicts. However, it can introduce new reduce-reduce conflicts. When this happens we cay that the grammar is not LALR(1).
We describe the LALR parse table construction as a post-processing of the LR parse table. However, there are techniques to construct the LALR table without fully constructing the LR table first.

Canonical LR and LALR Parser Generation

San Skulrattanakulchai

March 11, 2019

Canonical LR(1) items

Constructing LR(1) sets of items

Construction of LR(1) parsing tables

Example

Example continued

LALR parser generation

Example

Example continued