Formal Regular Expressions
Part of my notes from ITEC 420 "Computability Theory and Formal Languages" at Radford University, Fall 2015
Syntax
A regular expression is a string of symbols that describes a language (as an arithmetic expression is a string of symbols that describes a numeric value).
For some alphabet Σ, a valid (syntactically correct) regular expressions R is:
R→∅∣ϵ∣symbol∣R∗∣R+∣(R)∣RR∣R∪R
symbol→any symbol in Σ
This definition can be broken up into base cases and recursive cases:
A regular expression is:
- ∅
- ϵ
- Any single symbol in Σ
And if α and β are regular expressions, then so are:
- αβ
- α∪β
- α∗
- α+
- (α)
Semantics
Assume L is a function that, given some regular expressions, returns the equivalent regular language:
L : reg. expr. -> language
Then each of the forms given in the syntactic definition above has the following meanings:
- L(∅)={} (the empty language)
- L(ϵ)={ϵ} (language containing the empty string)
- ∃c∈Σ→L(c)={c} (the language containing c)
- L(αβ)=L(α)L(β) (concatenation)
- L(α∪β)=L(α)∪L(β) (union)
- L(α∗)=(L(α))∗ (Kleene star)
- L(α+)=L(αα∗)=L(α)(L(α))∗ (Kleene plus)
- L((α))=L(α) (grouping with parenthesis)
Operators
Precedence
From highest to lowest:
- Kleene star
- Concatenation
- Union
So a∗b∪c=((a∗)b)∪c
Properties
Union
Regular expressions describe sets, so union behaves as with sets...
- Commutative: a∪b=b∪a
- Associative: (a∪b)∪c=a∪(b∪c)
- Idempotent: a∪a=a
- ∅ for identity: a∪∅=a
Concatenation
- Associative: (ab)c=a(bc)
- ϵ for identity: aϵ=a
- ∅ for "zero": a∅=∅
- Distributes over union: (a∪b)c=(ac)∪(bc), c(a∪b)=(ca)∪(cb)
Kleene star
(these are... less intuitive)
- ∅∗=ϵ
- ϵ∗=ϵ
- (a∗)∗=a∗
- a∗a∗=a∗
- (a∪b)∗=(a∗b∗)∗
- a∗⊆b∗⇒a∗b∗=b∗a∗=b∗
- a⊆b∗⇒a∗b∗=b∗a∗=b∗