Regular Expressions
San Skulrattanakulchai
February 26, 2019
Definition by Structural Induction
- Given an alphabet \(\Sigma\), a regular expression over \(\Sigma\) is a language recursively defined as follow.
- \(\emptyset\) is a regular expression.
- \(\varepsilon\) is a regular expression.
- Each \(a\in\Sigma\) is a regular expression.
- If \(R_1\) and \(R_2\) are regular expressions, then \((R_1 \cup R_2)\) is a regular expression.
- If \(R_1\) and \(R_2\) are regular expressions, then \((R_1 \cdot R_2)\) is a regular expression.
- If \(R\) is a regular expression, then \((R^*)\) is a regular expression.
- Something is a regular expression if and only if it follows from one of the above rules.
- We will soon see that the rule “\(\varepsilon\) is a regular expression” is superfluous.
Short-cut notation for RE’s
- To make regular expressions easy to write and also unambiguous, we
- use juxtaposition instead of \(\cdot\)
- declare that \({}^*\) has higher precedence than \(\cdot\), and that \(\cdot\) has higher precedence than \(\cup\), and omit enclosing parentheses when possible
- declare that all three operators are left-associative
- retain pairs of enclosing parentheses only when needed to override the default precedence & associativity rules
- Therefore, \(01^*\) means \((0\cdot (1^*))\), which is different from \(((0\cdot 1)^*)\).
- Similarly, \(10 \cup 01\) means \(((1\cdot 0) \cup (0\cdot 1))\), which is different from \(((1\cdot (0\cup 0))\cdot 1)\) or \((1\cdot ((0\cup 0)\cdot 1))\).
Semantics of REs
- The meaning of each R.E. \(R\) is its language \(L(R)\) as follow.
- \(L(\emptyset) := \emptyset\).
- \(L(\varepsilon) := \{ \varepsilon \}\).
- For each \(a\in\Sigma\), we let \(L(a) := \{ a \}\).
- For any regular expressions \(R_1\), \(R_2\), we let \(L(R_1 \cup R_2) := L(R_1) \cup L(R_2)\).
- For any regular expressions \(R_1\), \(R_2\), we let \(L(R_1 \cdot R_2) := L(R_1) \cdot L(R_2)\).
- For any regular expressions \(R\), we let \(L(R^*) := L(R)^*\).
- Exercise. Why is the rule “\(\varepsilon\) is a regular expression” in the definition of regular expressions superfluous?
Identities of languages under regular operations
- We will present a bunch of identities on languages under the regular operations. When specialized to the regular languages (and blurring the distinction between REs and their languages), these identities still hold.
- Here are some identities on languages:
- \(R\emptyset = \emptyset R = \emptyset\)
- \(R\epsilon = \epsilon R = R\)
- \(R \cup \emptyset = \emptyset \cup R = R\)
- \(R \cup R = R\)
- \(R_1 \cup R_2 = R_2 \cup R_1\)
- \(R_1 (R_2 \cup R_3) = R_1 R_2 \cup R_1 R_3\)
- \((R_1 \cup R_2) R_3 = R_1 R_3 \cup R_2 R_3\)
- \(R_1 (R_2 R_3) = (R_1 R_2) R_3\)
Language identities (continued)
- More identities on languages:
- \(\emptyset^* = \epsilon\)
- \(\epsilon^* = \epsilon\)
- \((\epsilon \cup R)^* = R^*\)
- \((\epsilon\cup R)(\epsilon\cup R)^* = R^*\)
- \(R^*(\epsilon\cup R) = (\epsilon\cup R)R^* = R^*\)
- \(R_1^*R_2\cup R_2 = R_1^*R_2\)
- \(R_1(R_2R_1)^* = (R_1R_2)^*R_1\)
- \((R_1\cup R_2)^* = (R_1^*R_2)^*R_1^* = (R_2^*R_1)^*R_2^*\)
- Exercise. Prove the identities in this and the last slide.