Formal Languages
San Skulrattanakulchai
February 12, 2019
Topics
- Terminology
- Languages
- Language Operations
- Set Operations
- Concatenation
- Power
- Kleene Closure
- Positive Closure
Terminology I
- A formal language is our model for the data manipulated by computers.
- A symbol (or letter) is an undefined term.
- An alphabet is a nonempty, finite set of symbols, e.g., \(\Sigma = \{a, b\}\) is an alphabet; \(a\), \(b\) are symbols.
- A string (or word, or sentence) is a finite list of symbols, e.g., \(\langle a, b, a\rangle\), usually written as \(aba\).
- The empty string, denoted \(\varepsilon\), is the empty list.
- The length of string \(w\), denoted \(|w|\), is the length of the list.
- \(|\varepsilon| = 0\).
- If \(|w| = n\), we write \(w\) as \(w_1w_2\ldots w_n\), e.g., if \(w=aba\) then \(w_1=w_3=a\), and \(w_2=b\).
Terminology II
- For string \(w\) and symbol \(a\), we write \(|w|_a\) to denote how many times \(a\) occurs in \(w\), e.g.,
- \(|aba|_a\) = 2
- \(|aba|_b\) = 1
- \(|aba|_c\) = 0
- Note how we use the metalanguage of names like \(w\) and \(a\) to talk about symbols and strings in another language. Keep the two separate in your mind!
- Let \(x=x_1x_2\ldots x_m\) and \(y=y_1y_2\ldots y_n\) be strings. The concatenation of \(x\) and \(y\), written \(xy\), is the string \(x_1x_2\ldots x_my_1y_2\ldots y_n\) of length \(m+n\) that results from appending \(y\) to the end of \(x\), e.g., concatenating
back
and bone
gives backbone
.
- String concatenation operation is associative, i.e., \((xy)z = x(yz)\) for all strings \(x\), \(y\), and \(z\).
- The empty string \(\varepsilon\) is the identity element for concatenation, i.e., any string \(w\) satisfies \(\varepsilon w = w \varepsilon = w\).
Terminology III
- For string \(w\) and positive integer \(n\), the \(n\)th power of \(w\), written \(w^n\), is the concatenation of \(n\) copies of \(w\). The zeroth power \(w^0\) is defined to be \(\varepsilon\), but negative powers are undefined, e.g.,
- \((ab)^0=\varepsilon\)
- \((ab)^1=ab\)
- \((ab)^2=abab\)
- \((ab)^3 = ababab\).
- A string \(y\) is a substring of string \(w\) if there exist strings \(x\), \(z\) such that \(w=xyz\), e.g.,
ran
is a substring of strange
.
- A string \(x\) is a prefix of string \(w\) if there exists a string \(y\) such that \(w=xy\), e.g.,
pea
is a prefix of peachy
.
- A string \(y\) is a suffix of string \(w\) if there exists a string \(x\) such that \(w=xy\), e.g.,
age
is a suffix of language
.
Terminology IV
- The empty string is trivially a substring, prefix, and suffix of any string.
- Any string is a substring, prefix, and suffix of itself.
- A string \(x\) is a proper substring of string \(y\) if \(x\) is a substring of \(y\) and \(0 < |x| < |y|\). Similarly for proper prefix and proper suffix.
- String \(x\) is a subsequence of string \(y\) if \(x\) can be obtained by omitting 0 or more occurrences of symbols from \(y\), e.g.,
bat
is a subsequence of habitat
.
- Let \(w=w_1w_2\ldots w_n\) be a string of length \(n\). The reversal of \(w\), written \(w^R\), is the string \(w_n w_{n-1} \ldots w_1\), e.g.,
star
\({}^R\) = rats
Terminology V
- A string \(w\) is a
palindrome
if \(w^R = w\). Example palindromes are
eve
madam
racecar
deified
rotator
- Given alphabet \(\Sigma\), define \(\Sigma^*\) to be the set of all possible strings over \(\Sigma\).
- E.g., if \(\Sigma = \{a, b\}\) then \(\Sigma^* = \{\varepsilon, a, b, aa, ab, ba, bb, aaa, \ldots\}\).
- The listing of strings above is in shortlex order (or string order, or radix order), i.e., ordered like in a dictionary, except that a shorter string always precedes a longer one.
Exercises
- Define precisely the less than
<
relation for dictionary order (or lexicographic order).
- Define precisely the less than
<
relation for shortlex order (or string order, or radix order).
- What is the position of the string
ab
, when the strings of \(\{a,b\}^*\) are arranged in dictionary order? in shortlex order?
Languages
- A language over an alphabet \(\Sigma\) is a subset of \(\Sigma^*\).
- Example languages follows.
- The set of all strings with an odd number of
a
.
- The set of all palindromes.
- The set of all strings of “balanced” left and right parentheses.
- The set of all strings with equal numbers of
a
, b
, and c
.
- The set of all binary strings that represent prime numbers.
- The set of all graphs with a Hamiltonian cycle, where the graphs are encoded as strings.
- The empty language \(\emptyset\).
- The singleton language \(\{\varepsilon\}\).
Language Operations
- Set Operations.
- union \(\cup\)
- intersection \(\cap\)
- set difference \(\setminus\)
- symmetric difference \(\triangle\)
- complementation
- Concatenation. The concatenation of two languages \(A\) and \(B\), written \(AB\), is the set of all strings \(xy\) where \(x\in A\) and \(y\in B\). When precision is desired, concatenation is denoted by \(\cdot\), e.g., \(x\cdot y\), \(A\cdot B\).
- Reversal. The reversal of a language \(A\), denoted \(A^R\), is the language \(A^R = \{ w^R : w \in A \}\).
Language Operations (cont)
- Power. For a language \(A\), denote by \(A^0\) the language \(\{\varepsilon\}\), and denote by \(A^i\) the language \(AA^{i-1}\), whenever \(i > 0\).
- Kleene Closure. \[A^* = \bigcup_{i=0}^\infty A^i = A^0 \cup A^1 \cup A^2 \cup \cdots\] Note that this definition agrees with our previous definition of \(\Sigma^*\).
- Positive Closure. \[A^+ = \bigcup_{i=1}^\infty A^i = A^1 \cup A^2 \cup A^3 \cup \cdots\]
Exercises
- What is \(\emptyset^*\) equal to?
- Is it true that \(A^+ = A^* \setminus \{\varepsilon\}\) for every language \(A\)?.
- Characterize languages \(A\) that satisfy \(A^* = A^+\)?
- Describe these languages:
- \(A\emptyset\)
- \(A \{\varepsilon\}\)
- \(A\cup\emptyset\)
- \(A\cup\{\varepsilon\}\).
- What is the identity element under \(\cup\)? under \(\cdot\)?
- What rules are obeyed by \(\cup\) and \(\cdot\)?