Typical regular languages are
Typical context-free languages are
A Context-free Grammar (CFG) is a 4-tuple \((V, \Sigma, R, S)\) where
A rule \((A, \alpha)\) is usually written \(A\to \alpha\).
Here is a simple CFG example. Let
Instead of specifying all the components \(V\), \(\Sigma\), \(R\), and \(S\) of a CFG, we usually give just the rules, with these understood assumptions:
Example The previous CFG is usually given as
\(S\to (S)\ \mid\ SS\ \mid\ \varepsilon\)
CFG is a string rewriting system where one starts by writing down the start symbol, then at each step, replaces the current string by choosing some variable appearing in it, and replaces that variable by the RHS of some rule that has that variable as its LHS.
Formally, for strings \(\alpha,\gamma\in (V\cup\Sigma)^*\) and rule \(B\to \beta\), we write \(\alpha B\gamma\Rightarrow \alpha\beta\gamma\) and we say that \(\alpha B\gamma\) yields \(\alpha\beta\gamma\).
We write \(\alpha\Rightarrow^* \beta\), and say that \(\alpha\) derives \(\beta\) if \(\beta\) can be obtained from \(\alpha\) in zero or more yield steps.
Example Using our example grammar, \(S\Rightarrow SS\Rightarrow (S)S\Rightarrow ()S\) shows that \(S\Rightarrow^* ()S\).
Trivially, \(\alpha\Rightarrow^* \alpha\) for any string \(\alpha\).
The language generated by the CFG \(G=(V,\Sigma,R,S)\), written \(L(G)\), is the set of all strings over \(\Sigma\) that can be derived from \(S\), that is, \(L(G) = \{ w\in\Sigma^* : S\Rightarrow^* w \}\).
A Context-free Language (CFL) is one that can be generated by some CFG.
Example CFL’s and their CFG’s follows.
A parse tree is a rooted, ordered, node-labeled tree with these properties.
The yield of a parse tree is the string obtained from concatenating the labels of all its leaves from left to right.
Using the grammar of the language of balanced parentheses, we get the string (())
as the yield of this parse tree
A leftmost derivation is a derivation such that every yield step replaces the leftmost variable with its RHS of some rule.
E.g., using the grammar for the balanced parentheses, \[
S \Rightarrow SS \Rightarrow (S)S \Rightarrow ()S \Rightarrow ()
\] is a leftmost derivation of ()
from \(S\).
A rightmost derivation is a derivation such that every yield step replaces the rightmost variable with its RHS of some rule.
And this is a rightmost derivation of the same string ()
from \(S\) \[
S \Rightarrow SS \Rightarrow S(S) \Rightarrow S() \Rightarrow ()
\]
A string of terminals is ambiguous if it is the yield of at least two distinct parse trees; equivalently, if it has at least two leftmost (or rightmost) derivations from the start variable.
Exercise. Show that the string ()
is ambiguous.
Answer.
\[ S\Rightarrow (S) \Rightarrow () \] \[ S\Rightarrow SS \Rightarrow (S)S \Rightarrow ()S\Rightarrow () \] are two distinct leftmost derivations of ()
.
A grammar is ambiguous if its start variable derives some ambiguous string of terminals.
Exercise. Give an unambiguous grammar for the language of balanced parentheses.
Answer. \(S\to (S)S\ \mid\ \varepsilon\)
A CFL is inherently ambiguous if any CFG that generates it is ambiguous.
A right-linear grammar is a CFG such that any nonterminal on the RHS of a rule occurs at the end.
For example, the grammar \[ S\to aS \ \mid\ baS \ \mid\ b \ \mid\ \varepsilon \] generates all strings without 2 consecutive \(b\)’s.
A left-linear grammar is defined similarly.
A linear grammar is one that is either a right-linear grammar or a left-linear grammar.
Theorem. A language \(L\) is regular if and only if it is generated by some linear grammar.