Formal Languages
      San Skulrattanakulchai
      February 12, 2019
    
    
Topics
- Terminology
- Languages
- Language Operations
- Set Operations
- Concatenation
- Power
- Kleene Closure
- Positive Closure
 
 
Terminology I
- A formal language is our model for the data manipulated by computers.
- A symbol (or letter) is an undefined term.
- An alphabet is a nonempty, finite set of symbols, e.g., \(\Sigma = \{a, b\}\) is an alphabet; \(a\), \(b\) are symbols.
- A string (or word, or sentence) is a finite list of symbols, e.g., \(\langle a, b, a\rangle\), usually written as \(aba\).
- The empty string, denoted \(\varepsilon\), is the empty list.
- The length of string \(w\), denoted \(|w|\), is the length of the list.
- \(|\varepsilon| = 0\).
- If \(|w| = n\), we write \(w\) as \(w_1w_2\ldots w_n\), e.g., if \(w=aba\) then \(w_1=w_3=a\), and \(w_2=b\).
 
Terminology II
- For string \(w\) and symbol \(a\), we write \(|w|_a\) to denote how many times \(a\) occurs in \(w\), e.g.,
- \(|aba|_a\) = 2
- \(|aba|_b\) = 1
- \(|aba|_c\) = 0
 
- Note how we use the metalanguage of names like \(w\) and \(a\) to talk about symbols and strings in another language. Keep the two separate in your mind!
- Let \(x=x_1x_2\ldots x_m\) and \(y=y_1y_2\ldots y_n\) be strings. The concatenation of \(x\) and \(y\), written \(xy\), is the string \(x_1x_2\ldots x_my_1y_2\ldots y_n\) of length \(m+n\) that results from appending \(y\) to the end of \(x\), e.g., concatenating backandbonegivesbackbone.
- String concatenation operation is associative, i.e., \((xy)z = x(yz)\) for all strings \(x\), \(y\), and \(z\).
- The empty string \(\varepsilon\) is the identity element for concatenation, i.e., any string \(w\) satisfies \(\varepsilon w = w \varepsilon = w\).
 
Terminology III
- For string \(w\) and positive integer \(n\), the \(n\)th power of \(w\), written \(w^n\), is the concatenation of \(n\) copies of \(w\). The zeroth power \(w^0\) is defined to be \(\varepsilon\), but negative powers are undefined, e.g.,
- \((ab)^0=\varepsilon\)
- \((ab)^1=ab\)
- \((ab)^2=abab\)
- \((ab)^3 = ababab\).
 
- A string \(y\) is a substring of string \(w\) if there exist strings \(x\), \(z\) such that \(w=xyz\), e.g., ranis a substring ofstrange.
- A string \(x\) is a prefix of string \(w\) if there exists a string \(y\) such that \(w=xy\), e.g., peais a prefix ofpeachy.
- A string \(y\) is a suffix of string \(w\) if there exists a string \(x\) such that \(w=xy\), e.g., ageis a suffix oflanguage.
 
Terminology IV
- The empty string is trivially a substring, prefix, and suffix of any string.
- Any string is a substring, prefix, and suffix of itself.
- A string \(x\) is a proper substring of string \(y\) if \(x\) is a substring of \(y\) and \(0 < |x| < |y|\). Similarly for proper prefix and proper suffix.
- String \(x\) is a subsequence of string \(y\) if \(x\) can be obtained by omitting 0 or more occurrences of symbols from \(y\), e.g., batis a subsequence ofhabitat.
- Let \(w=w_1w_2\ldots w_n\) be a string of length \(n\). The reversal of \(w\), written \(w^R\), is the string \(w_n w_{n-1} \ldots w_1\), e.g., star\({}^R\) =rats
 
Terminology V
- A string \(w\) is a palindromeif \(w^R = w\). Example palindromes are
- eve
- madam
- racecar
- deified
- rotator
 
- Given alphabet \(\Sigma\), define \(\Sigma^*\) to be the set of all possible strings over \(\Sigma\).
- E.g., if \(\Sigma = \{a, b\}\) then \(\Sigma^* =  \{\varepsilon, a, b, aa, ab, ba, bb, aaa, \ldots\}\).
- The listing of strings above is in shortlex order (or string order, or radix order), i.e., ordered like in a dictionary, except that a shorter string always precedes a longer one.
 
Exercises
- Define precisely the less than <relation for dictionary order (or lexicographic order).
- Define precisely the less than <relation for shortlex order (or string order, or radix order).
- What is the position of the string ab, when the strings of \(\{a,b\}^*\) are arranged in dictionary order? in shortlex order?
 
Languages
- A language over an alphabet \(\Sigma\) is a subset of \(\Sigma^*\).
- Example languages follows.
- The set of all strings with an odd number of a.
- The set of all palindromes.
- The set of all strings of “balanced” left and right parentheses.
- The set of all strings with equal numbers of a,b, andc.
- The set of all binary strings that represent prime numbers.
- The set of all graphs with a Hamiltonian cycle, where the graphs are encoded as strings.
- The empty language \(\emptyset\).
- The singleton language \(\{\varepsilon\}\).
 
 
Language Operations
- Set Operations.
- union \(\cup\)
- intersection \(\cap\)
- set difference \(\setminus\)
- symmetric difference \(\triangle\)
- complementation
 
- Concatenation. The concatenation of two languages \(A\) and \(B\), written \(AB\), is the set of all strings \(xy\) where \(x\in A\) and \(y\in B\). When precision is desired, concatenation is denoted by \(\cdot\), e.g., \(x\cdot y\), \(A\cdot B\).
- Reversal. The reversal of a language \(A\), denoted \(A^R\), is the language \(A^R = \{ w^R : w \in A \}\).
 
Language Operations (cont)
- Power. For a language \(A\), denote by \(A^0\) the language \(\{\varepsilon\}\), and denote by \(A^i\) the language \(AA^{i-1}\), whenever \(i > 0\).
- Kleene Closure. \[A^*  = \bigcup_{i=0}^\infty A^i = A^0 \cup A^1 \cup A^2 \cup \cdots\] Note that this definition agrees with our previous definition of \(\Sigma^*\).
- Positive Closure. \[A^+ = \bigcup_{i=1}^\infty A^i = A^1 \cup A^2 \cup A^3 \cup \cdots\]
 
Exercises
- What is \(\emptyset^*\) equal to?
- Is it true that \(A^+ = A^* \setminus \{\varepsilon\}\) for every language \(A\)?.
- Characterize languages \(A\) that satisfy \(A^* = A^+\)?
- Describe these languages:
- \(A\emptyset\)
- \(A \{\varepsilon\}\)
- \(A\cup\emptyset\)
- \(A\cup\{\varepsilon\}\).
 
- What is the identity element under \(\cup\)? under \(\cdot\)?
- What rules are obeyed by \(\cup\) and \(\cdot\)?