Notes for Formal Languages and Automata Theory

阅读: 评论:0

Notes for Formal Languages and Automata Theory

Notes for Formal Languages and Automata Theory

Operations : union , concatenation , Kleene closure, positive closure

Alphabet

Non-empty finite set of symbols
Σ Sigma Σ

Strings

finite sequence of symbols of Σ Sigma Σ
(word)

empty string : ε varepsilon ε

Σ ∗ Sigma^* Σ∗ : the set of all strings over alphabet Σ Sigma Σ (including ε varepsilon ε)

Σ ∗ Sigma^* Σ∗ is monoid with respect to concatenation

Languages

A language over alphabet Σ Sigma Σ is a subset (which can be infinite) of Σ ∗ Sigma^* Σ∗
(personally, similar to vocabulary)

Kleene star or Kleene closure of a language L L L
L ∗ = ∪ n ≥ 0 L n L^* = cup_{ngeq 0}L^n L∗=∪n≥0​Ln

which is consistent with the notation of Σ ∗ Sigma^* Σ∗ by considering Σ Sigma Σ as a language over alphabet Σ Sigma Σ

Positive closure of a language L L L
L + = ∪ n ≥ 1 L n L^+ = cup_{ngeq 1} L^n L+=∪n≥1​Ln

Some Properties

P1 Concatenation is associative

P2 Concatenation is not commutative

P3 L { ε } = { ε } L = L L{varepsilon}={varepsilon}L=L L{ε}={ε}L=L

P4 L ∅ = ∅ L = ∅ Lvarnothing=varnothing L= varnothing L∅=∅L=∅

P5 Distributive properties

  1. ( L 1 ∪ L 2 ) L 3 = L 1 L 3 ∪ L 2 L 3 (L_1cup L_2)L_3 = L_1L_3cup L_2L_3 (L1​∪L2​)L3​=L1​L3​∪L2​L3​
  2. L 1 ( L 2 ∪ L 3 ) = L 1 L 2 ∪ L 1 L 3 L_1(L_2cup L_3) = L_1L_2cup L_1L_3 L1​(L2​∪L3​)=L1​L2​∪L1​L3​

P6 if L 1 ⊆ L 2 L_1subseteq L_2 L1​⊆L2​ and L 3 ⊆ L 4 L_3subseteq L_4 L3​⊆L4​, then L 1 L 3 ⊆ L 2 L 4 L_1L_3 subseteq L_2L_4 L1​L3​⊆L2​L4​

P7 ∅ ∗ = { ε } varnothing^*={varepsilon} ∅∗={ε}

P8 { ε } ∗ = { ε } {varepsilon}^*={varepsilon} {ε}∗={ε}

P9 if ε ∈ L varepsilon in L ε∈L, then L ∗ = L + L^*=L^+ L∗=L+

P10 L L ∗ = L ∗ L = L + LL^*=L^*L=L^+ LL∗=L∗L=L+

P11 ( L ∗ ) ∗ = L ∗ (L^*)^*=L^* (L∗)∗=L∗

P12 L ∗ L ∗ = L ∗ L^*L^*=L^* L∗L∗=L∗

P13 ( L 1 L 2 ) ∗ L 1 = L 1 ( L 2 L 1 ) ∗ (L_1L_2)^*L_1=L_1(L_2L_1)^* (L1​L2​)∗L1​=L1​(L2​L1​)∗

P14 ( L 1 ∪ L 2 ) ∗ = ( L 1 ∗ L 2 ∗ ) ∗ (L_1cup L_2)^*=(L_1^*L_2^*)^* (L1​∪L2​)∗=(L1∗​L2∗​)∗

Regular Expressions

  1. ∅ varnothing ∅, ε varepsilon ε, and a a a for each a ∈ Σ a in Sigma a∈Σ are regular expressions representing the language for ∅ varnothing ∅, { ε } {varepsilon} {ε}, and { a } {a} {a} respectively
  2. a. ( r + s ) (r+s) (r+s) representing the language R ∪ S Rcup S R∪S
    b. ( r s ) (rs) (rs) representing the language R S RS RS
    c. ( r ∗ ) (r^*) (r∗) representing the language R ∗ R^* R∗

(keep minimum possible number of parenthesis in writing)

regular language L ( r ) L(r) L(r) represented by regular expression r r r
sometimes just use r r r instead of L ( r ) L(r) L(r) to indicate the regular language L L L

Equivalent
expressions r 1 r_1 r1​ and r 2 r_2 r2​ representing the same language
we write r 1 ≈ r 2 r_1 approx r_2 r1​≈r2​ ( r 1 = r 2 r_1=r_2 r1​=r2​ is also reasonable)

Some identities

  1. r ε ≈ ε r ≈ r rvarepsilonapprox varepsilon rapprox r rε≈εr≈r
  2. r 1 r 2 ≉ r 2 r 1 r_1r_2notapprox r_2r_1 r1​r2​​≈r2​r1​, in general
  3. ( r 1 r 2 ) r 3 ≈ r 1 ( r 2 r 3 ) (r_1r_2)r_3approx r_1(r_2r_3) (r1​r2​)r3​≈r1​(r2​r3​)
  4. r ∅ ≈ ∅ r ≈ ∅ rvarnothingapprox varnothing r approx varnothing r∅≈∅r≈∅
  5. ∅ ∗ ≈ ε varnothing^*approx varepsilon ∅∗≈ε
  6. ε ∗ ≈ ε varepsilon^*approx varepsilon ε∗≈ε
  7. if { ε } ∈ L ( r ) {varepsilon} in L(r) {ε}∈L(r) then r ∗ ≈ r + r^*approx r^+ r∗≈r+
  8. r r ∗ ≈ r ∗ r ≈ r + rr^*approx r^*r approx r^+ rr∗≈r∗r≈r+
  9. Distributive identity.
    ( r 1 + r 2 ) r 3 ≈ r 1 r 3 + r 2 r 3 (r_1+r_2)r_3approx r_1r_3+r_2r_3 (r1​+r2​)r3​≈r1​r3​+r2​r3​
    r 1 ( r 2 + r 3 ) ≈ r 1 r 2 + r 1 r 3 r_1(r_2+r_3)approx r_1r_2+r_1r_3 r1​(r2​+r3​)≈r1​r2​+r1​r3​
  10. ( r ∗ ) ∗ ≈ r ∗ (r^*)^*approx r^* (r∗)∗≈r∗
  11. ( r 1 r 2 ) ∗ r 1 ≈ r 1 ( r 2 r 1 ) ∗ (r_1r_2)^*r_1approx r_1(r_2r_1)^* (r1​r2​)∗r1​≈r1​(r2​r1​)∗
  12. ( r 1 + r 2 ) ∗ ≈ ( r 1 ∗ r 2 ∗ ) ∗ (r_1+r_2)^*approx (r_1^*r_2^*)^* (r1​+r2​)∗≈(r1∗​r2∗​)∗

Grammars

Context-Free Grammars (CFG)

A grammar is a quadruple
G = ( V , T , S , P ) mathcal{G}=(V, T, S, P) G=(V,T,S,P)
where

  1. V = N ∪ T V=N cup T V=N∪T, where N N N is a finite set of non-terminals
  2. T T T is a finite set of terminals
  3. S ∈ N S in N S∈N is the start symbol
  4. P P P is a finite set of N × V ∗ N times V^* N×V∗, called the set of production rules.

Some definitions

  • Consider a binary relation ⇒ G Rightarrow_{mathcal{G}} ⇒G​ on V ∗ V^* V∗, let’s say, one step relation.
  • α ⇒ G β alpha Rightarrow_{mathcal{G}} beta α⇒G​β iff. α = α 1 A α 2 , β = α 1 γ α 2 alpha=alpha_1Aalpha_2, beta=alpha_1gammaalpha_2 α=α1​Aα2​,β=α1​γα2​ and A → γ ∈ P Arightarrow gamma in P A→γ∈P.
  • if α ⇒ G β alpha Rightarrow_{mathcal{G}} beta α⇒G​β, then we call α alpha α yields β beta β in one step in G mathcal{G} G.
  • The reflexive-transitive closure(自反传递闭包)of ⇒ G Rightarrow_{mathcal{G}} ⇒G​ is denoted by ⇒ G ∗ Rightarrow_{mathcal{G}}^* ⇒G∗​. That is, for α , β ∈ V ∗ alpha,beta in V^* α,β∈V∗, α ⇒ G ∗ β alpha Rightarrow_{mathcal{G}}^* beta α⇒G∗​β iff. ∃ n ≥ 0 exist ngeq 0 ∃n≥0 and α 1 , α 2 , . . . , α n ∈ V ∗ alpha_1,alpha_2,...,alpha_n in V^* α1​,α2​,...,αn​∈V∗ such that α = α 0 ⇒ G α 1 ⇒ G α 2 ⇒ G . . . ⇒ G α n − 1 ⇒ G α n = β alpha=alpha_0Rightarrow_{mathcal{G}}alpha_1Rightarrow_{mathcal{G}}alpha_2Rightarrow_{mathcal{G}}...Rightarrow_{mathcal{G}}alpha_{n-1}Rightarrow_{mathcal{G}}alpha_n=beta α=α0​⇒G​α1​⇒G​α2​⇒G​...⇒G​αn−1​⇒G​αn​=β
  • if α ⇒ G ∗ β alphaRightarrow_{mathcal{G}}^*beta α⇒G∗​β, then we say α alpha α derives β beta β. Further, α ⇒ G ∗ β alphaRightarrow_{mathcal{G}}^*beta α⇒G∗​β is called as a derivation in G mathcal{G} G, and β beta β is the yield of the derivation.
  • In a given context, if we’re dealing with only one grammar G mathcal{G} G, then simply write ⇒ Rightarrow ⇒ instead of ⇒ G Rightarrow_{mathcal{G}} ⇒G​
  • if α = α 0 ⇒ G α 1 ⇒ G α 2 ⇒ G . . . ⇒ G α n − 1 ⇒ G α n = β alpha=alpha_0Rightarrow_{mathcal{G}}alpha_1Rightarrow_{mathcal{G}}alpha_2Rightarrow_{mathcal{G}}...Rightarrow_{mathcal{G}}alpha_{n-1}Rightarrow_{mathcal{G}}alpha_n=beta α=α0​⇒G​α1​⇒G​α2​⇒G​...⇒G​αn−1​⇒G​αn​=β is a derivation, then the length of the derivation is n n n and it maybe written as α ⇒ G n β alphaRightarrow_{mathcal{G}}^nbeta α⇒Gn​β
  • A string α ∈ V ∗ alpha in V^* α∈V∗ is said to be a sentential form in G mathcal{G} G, if α alpha α can be derived from the start symbol S S S of G mathcal{G} G, viz. S ⇒ ∗ α SRightarrow^*alpha S⇒∗α
  • In particular, if α ∈ Σ ∗ alpha in Sigma^* α∈Σ∗, then the sentential form α alpha α is known as a sentence. In which case, we say α alpha α is generated by G mathcal{G} G
  • The language generated by G mathcal{G} G, denoted by L ( G ) L(mathcal{G}) L(G), is the set of all sentences generated by G mathcal{G} G. That is, L ( G ) = { x ∈ Σ ∗ ∣ S ⇒ ∗ x } L(mathcal{G})={xin Sigma^*|SRightarrow^*x} L(G)={x∈Σ∗∣S⇒∗x}

Production rules A → α Arightarrow alpha A→α is independent of neighbouring symbols, i.e. context free.

Thus the grammar defined here is known as context free grammar, simply CFG.

Derivation Trees

also known as parse trees
the yield α alpha α of derivation A ⇒ ∗ α ARightarrow^*alpha A⇒∗α can be identified in the derivation tree by juxtaposing the lables of leaf nodes from left to right.

Ambiguity

ambiguous grammar/unambiguous grammar
inherently ambiguous language

Regular Grammars

linear grammar : every production rule has at most one nonterminal symbol
left-linear (right-linear) grammar : the nonterminal symbol is at the left end (right end)

The language generated by a right linear grammar is regular.
For every regular language L L L, there exists a right linear grammar that generates L L L

Right linear grammars are also called regular grammars

Things also hold for left linear grammar.

Directed Graph Representation

Directed graph (short for digraph)

Chomsky Normal Form (CNF)

All productions are of the form A → B C Arightarrow BC A→BC or A → a Arightarrow a A→a, where A , B , C A,B,C A,B,C are variables, and a a a is terminal symbol.

Greibach Normal Form (GNF)

All productions are shown as the following form :
A → a X , w h e r e a ∈ T , X ∈ V ∗ A rightarrow aX, where ain T, Xin V^* A→aX, where a∈T,X∈V∗

Simplification of CFG

“3R”

  1. Reduction (two phases, ) (Removal of useless production)
  2. Removal of unit production
  3. Removal of ϵ epsilon ϵ-production

Finite Automata (FA)

Deterministic Finite Automata (DFA)

A (deterministic) finite automaton A = ( Q , Σ , q 0 , δ , F ) mathscr{A}=(Q,Sigma,q_0,delta,F) A=(Q,Σ,q0​,δ,F)
Q Q Q is a finite set called the set of states
Σ Sigma Σ is a finite set called the input alphabet
q 0 ∈ Q q_0 in Q q0​∈Q, called the initial/start state
δ : Q × Σ → Q delta : Q times Sigma rightarrow Q δ:Q×Σ→Q is a function called transition function or next-state function (which can be extended, say δ ^ : Q × Σ ∗ → Q hatdelta : Q times Sigma^*rightarrow Q δ^:Q×Σ∗→Q)
F ⊆ Q F subseteq Q F⊆Q, called the set of final/accept states

Useful graphical representation : Transition Diagram
which can be constructed as follows:

  1. Every state in Q Q Q is represented by a node
  2. If δ ( p , a ) = q delta(p,a)=q δ(p,a)=q, then there’s an arc from p p p to q q q labeled a a a
  3. There’s an arrow with no source into the initial state q 0 q_0 q0​
  4. Final states are indicated by double circle

“accepted by”

Language of a DFA : L ( A ) L(mathscr{A}) L(A)

Configurations
C ∈ Q × Σ ∗ C in Qtimes Sigma^* C∈Q×Σ∗

the computation of A mathscr{A} A on the input x x x

Nondeterministic Finite Automata (NFA)

NFA

ε varepsilon ε-NFA

A nondeterministic finite automaton N = ( Q , Σ , q 0 , δ , F ) mathscr{N}=(Q,Sigma,q_0,delta,F) N=(Q,Σ,q0​,δ,F), where Q , Σ , q 0 , F Q,Sigma,q_0,F Q,Σ,q0​,F are as in a DFA; whereas, the transition function δ delta δ is as below:
δ : Q × ( Σ ∪ { ε } ) → ℘ ( Q ) delta : Q times (Sigma cup {varepsilon}) rightarrow wp(Q) δ:Q×(Σ∪{ε})→℘(Q)
is a function so that, for a given state an a input symbol (possibly ε varepsilon ε), δ delta δ assigns a set of next states, possibly empty set.

δ delta δ can also be extended to δ ^ hatdelta δ^

Every DFA can be treated as an NFA.

L ( N ) = { x ∈ Σ ∗ ∣ δ ^ ( q 0 , x ) ∩ F ≠ ∅ } L(mathscr{N})={x in Sigma^* | hatdelta(q_0,x) cap F neq varnothing } L(N)={x∈Σ∗∣δ^(q0​,x)∩F​=∅}

Equivalence of RE & FA

Pushdown Automata (PDA)

P = ( Q , Σ , Γ , δ , q 0 , Z 0 , F ) P = (Q,Sigma,Gamma,delta,q_0,Z_0,F) P=(Q,Σ,Γ,δ,q0​,Z0​,F)
Γ Gamma Γ : A finite stack alphabet.
Z 0 Z_0 Z0​ : The start symbol in the stack.

Equivalence of CFG & PDA

Turing Machine

< L = { a n b n c n ∣ n ≥ 0 } L = { a^nb^nc^n | ngeq 0} L={anbncn∣n≥0}
not a CFL

Can be recognized by “A modified PDA with 2 stacks” - an variation of TM

TM is a seven-tuple P = ( Q , Σ , Γ , δ , q 0 , B , F ) P=(Q, Sigma, Gamma, delta, q_0, B, F) P=(Q,Σ,Γ,δ,q0​,B,F)

  • Q Q Q is finite set of states
  • Σ Sigma Σ is finite set of input symbols
  • Γ Gamma Γ is finite set of tape symbols
  • d e l t a delta delta is transition function : Q × Γ ⇒ Q × Γ × { R , L } Qtimes Gamma Rightarrow Qtimes Gamma times {R,L} Q×Γ⇒Q×Γ×{R,L}, δ ( q , X ) = ( p , Y , → / ← ) delta(q, X)=(p, Y, rightarrow/leftarrow) δ(q,X)=(p,Y,→/←)
  • q 0 q_0 q0​ is start state
  • B B B is blank symbol
  • F F F is finite set of final states

Instantaneous

How to describe the configuration of TM

  • sequence of symbols in tape
  • state of TM
  • read/write head of TM
< X 1 . . . X i − 1 q X i . . . X n X_{i-1}X_n X1​...Xi−1​qXi​...Xn​, where q ∈ Q qin Q q∈Q

Language of TM

{ w ∣ q 0 w − > α p β , p ∈ F , α , β ∈ Γ ∗ } {w|q_0w -> alpha pbeta , pin F, alpha,beta in Gamma^*} {w∣q0​w−>αpβ,p∈F,α,β∈Γ∗}

The kind of languages accepted by TM is called recursively enumerable(RE) language. (递归可枚举语言)

本文发布于:2024-01-30 18:29:07,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170661054821972.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:Formal   Notes   Languages   Theory   Automata
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23