Definition by Structural Induction

Given an alphabet \(\Sigma\), a regular expression over \(\Sigma\) is a language recursively defined as follow.
- \(\emptyset\) is a regular expression.
- \(\varepsilon\) is a regular expression.
- Each \(a\in\Sigma\) is a regular expression.
- If \(R_1\) and \(R_2\) are regular expressions, then \((R_1 \cup R_2)\) is a regular expression.
- If \(R_1\) and \(R_2\) are regular expressions, then \((R_1 \cdot R_2)\) is a regular expression.
- If \(R\) is a regular expression, then \((R^*)\) is a regular expression.
- Something is a regular expression if and only if it follows from one of the above rules.
We will soon see that the rule “\(\varepsilon\) is a regular expression” is superfluous.

Short-cut notation for RE’s

To make regular expressions easy to write and also unambiguous, we
- use juxtaposition instead of \(\cdot\)
- declare that \({}^*\) has higher precedence than \(\cdot\), and that \(\cdot\) has higher precedence than \(\cup\), and omit enclosing parentheses when possible
- declare that all three operators are left-associative
- retain pairs of enclosing parentheses only when needed to override the default precedence & associativity rules
Therefore, \(01^*\) means \((0\cdot (1^*))\), which is different from \(((0\cdot 1)^*)\).
Similarly, \(10 \cup 01\) means \(((1\cdot 0) \cup (0\cdot 1))\), which is different from \(((1\cdot (0\cup 0))\cdot 1)\) or \((1\cdot ((0\cup 0)\cdot 1))\).

The meaning of each R.E. \(R\) is its language \(L(R)\) as follow.
- \(L(\emptyset) := \emptyset\).
- \(L(\varepsilon) := \{ \varepsilon \}\).
- For each \(a\in\Sigma\), we let \(L(a) := \{ a \}\).
- For any regular expressions \(R_1\), \(R_2\), we let \(L(R_1 \cup R_2) := L(R_1) \cup L(R_2)\).
- For any regular expressions \(R_1\), \(R_2\), we let \(L(R_1 \cdot R_2) := L(R_1) \cdot L(R_2)\).
- For any regular expressions \(R\), we let \(L(R^*) := L(R)^*\).
Exercise. Why is the rule “\(\varepsilon\) is a regular expression” in the definition of regular expressions superfluous?

For an R.E. \(R\), we write \(R^+\) to mean \((R\cdot (R^*))\);
For an R.E. \(R\) and positive integer \(n\), we write \(R^n\) to mean \(n\) copies of \(R\)’s concatenated (in any order!).
The three language operations \(\cup\), \(\cdot\) and \({}^*\) on languages are termed regular operations.
A language representable by an R.E. is called a regular language.
We sometimes blur the distinction between a regular expression \(R\) and the language \(L(R)\) it represents. That is, we write \(R\) where we should write \(L(R)\) instead.

We will present a bunch of identities on languages under the regular operations. When specialized to the regular languages (and blurring the distinction between REs and their languages), these identities still hold.
Here are some identities on languages:
- \(R\emptyset = \emptyset R = \emptyset\)
- \(R\epsilon = \epsilon R = R\)
- \(R \cup \emptyset = \emptyset \cup R = R\)
- \(R \cup R = R\)
- \(R_1 \cup R_2 = R_2 \cup R_1\)
- \(R_1 (R_2 \cup R_3) = R_1 R_2 \cup R_1 R_3\)
- \((R_1 \cup R_2) R_3 = R_1 R_3 \cup R_2 R_3\)
- \(R_1 (R_2 R_3) = (R_1 R_2) R_3\)