San Skulrattanakulchai
Feb 9, 2016
Formal languages are our models for the data manipulated by computers.
A symbol (or letter) is an undefined term.
An alphabet is a nonempty, finite set of symbols, e.g., if Σ = {a, b} then Σ is the alphabet while a
and b
are symbols.
A string (word, sentence) is a finite list of symbols chosen from an alphabet, e.g., ⟨a
, b
, a
⟩, usually written aba
.
The length of string w, denoted ∣w∣, is the length of the list.
The empty string ɛ has length 0.
Formal language theory allows infinite-length strings; we don't.
If ∣w∣ = n, we write w as w1w2…wn. E.g., letting w = aba
, we have w1 = w3 = a
, and w2 = b
.
For any string w and symbol a, we write ∣w∣a to denote the number of times the symbol a occurs in string w. E.g.,
|aba
|a
= 2
|aba
|b
= 1
|aba
|c
= 0
a
and b
)? Keep them separate in your mind!Let x = x1x2…xm and y = y1y2…yn be strings. The concatenation of x and y, written xy, is the string x1x2…xmy1y2…yn of length m + n that results from appending y to the end of x, e.g., concatenating back
and bone
gives backbone
.
String concatenation operation is associative, and ɛ is the identity element, i.e., ɛw = wɛ = w for any string w.
∴ the set of all strings over an alphabet is a monoid under concatenation.
If w is a string and n is a positive integer, we write wn to mean the concatenation of n copies of w. The notation w0 is defined to be ɛ.
A string y is a substring (or subword) of string w if there exist strings x, z such that w = xyz.
A string x is a prefix of string w if there exists a string y such that w = xy.
A string y is a suffix of string w if there exists a string x such that w = xy.
By definition,
an empty string is a substring, prefix, and suffix of any string
any string is a substring, prefix, and suffix of itself
String x is a subsequence of string y if x is obtained by striking out 0 or more symbols from y. E.g., bat
is a subsequence of habitat
.
Let w = w1w2…wn be a string of length n. By the reverse of w, notated wR, we mean the string wnwn − 1…w1. For example, star
R = rats
.
A string w is a palindrome
if wR = w. Examples of palindromes are eve
, madam
, racecar
, deified
, rotator
.
Given alphabet Σ, define Σ * to be the set of all strings over Σ. E.g., if Σ = {a, b} then Σ * = {ɛ, a, b, aa, ab, ba, bb, aaa, …}.
The listing of strings above is in shortlex order (string order, radix order), i.e., ordered like in a dictionary, except that a shorter string always precedes a longer one.
Define precisely the less than relation < for dictionary order (lexicographic order).
Define precisely the less than relation < for shortlex order (string order, radix order).
What is the position of the string ab
, when the strings of {a, b} * are arranged in dictionary order? in shortlex order?
A language over the alphabet Σ is any subset of Σ * .
Some example languages:
The set of all strings with an odd number of a
.
The set of all palindromes.
The set of all strings of "balanced" left and right parentheses.
The set of all strings with equal numbers of a
, b
, and c
.
The set of all binary strings that represent prime numbers.
The set of all graphs with a Hamiltonian cycle, where the graph is encoded as a string.
∅ and {ɛ} are different languages.
The subject matter of this course is languages and machines that recognize/compute them!
Finite languages are trivial.
A lone letter like a
is ambiguous. It either represents a symbol or a string of length 1. Context decides which meaning is intended.
The concepts of "string", "concatenation", "string length", "string reversal", etc., can be defined inductively.
Set Operations: ∪ , ∩ , \ , △, complement Ā of language A
Concatenation: The concatenation of two languages A and B is AB, i.e., the set of all strings xy where x ∈ A and y ∈ B. When precision is desired, concatenation is denoted by ∘ , e.g., x ∘ y, A ∘ B.
Let O = {all strings of odd length}, E = {all strings of even length}, and N = {a
}. Find ON, OE, and EE.
Answer:
ON = { all strings of even length ending in a }
OE = O
EE = E
Power: For any language A, language A0 denotes {ɛ}; languages Ai denotes AAi − 1 whenever i > 0.
Kleene Closure: A * = ⋃i = 0∞Ai. E.g., ∅ * = {ɛ}. Note how this definition of * agrees nicely with our previous definition of * in Σ * if we identify a string of length one with the symbol contained in it.
Positive Closure: A + = ⋃i = 1∞Ai.
Is it true that A + = A * \ {ɛ} for every language A?.
Which ones of the seven example languages satisfy A = A * ?
Characterize languages A that satisfy A * = A + ?
Describe these languages: A∅, A{ɛ}, A ∪ ∅, A ∪ {ɛ}.
The ∪ and the ∘ operators for languages are comparable to the + and the × operators for numbers, respectively.
What is the identity element for ∪ ? for ∘ ?
What rules governing + and × are also obeyed by ∪ and ∘ ?