CMPS 260
Notes on Chapter 2 of Webber (Finite Automata)

Section 2.1 introduces the Farmer, Wolf, Goat, Cabbage, Rowboat problem and suggests that a solution could be found if we devise a state-transition diagram. Each state corresponds to one possible partitioning of the set {farmer, wolf, goat, cabbage} into two subsets that identify, respectively, the entities on the eastern and western, respectively, sides of the river. The transitions correspond to river crossings in which the farmer, possibly with one among the wolf, goat, and cabbage, uses the rowboat to go from one side of the river to the other.

The initial/start state is { East: {f,w,g,c}, West: ∅ }, describing the situation in which all four entities are on the eastern side of the river. The goal state is { East: ∅, West: {f,w,g,c} }, describing the situation in which all four entities are on the western side of the river. (For brevity, we are using 'f', 'w', 'g', and 'c', respectively, as shorthands for "farmer", "wolf", "goat", and "cabbage".)

The complete transition diagram (showing all relevant states, meaning those that are both reachable from the initial state and from which the goal state can be reached) is on Slide 7. Any state in which the goat is alone with either the wolf or the cabbage —meaning without the farmer also being present— is omitted, because the goal state cannot be reached from it (the assumption being that either the goat or the cabbage will end up having been eaten).

In accord with the diagram, a sequence of river crossings (i.e., moves) is described by a string over the alphabet {f,w,g,c}. (Webber uses n (for "nothing", I guess) rather than f as the symbol labeling a transition that represents the farmer crossing the river in the rowboat by himself.) A solution to the problem is any such sequence that takes you from the start state to the goal state.

From the completed diagram, one can easily construct a minimal-length solution, or any other solution, for that matter. There are two minimal-length solutions, namely gfcgwfg and gfwgcfg.

Also, provided with a proposed solution, one can use the diagram to determine whether it really is a solution. All you have to do is to follow the proposed sequence of transitions and see whether it describes a path from the start state to the goal state.

Observe then that the state diagram defines a language over the alphabet {f,w,g,c}, namely the set of all strings that correspond to solutions.

In Section 2.2, the notion of an error state, more commonly referred to as a dead state, is introduced. For the problem under consideration, you can think of the dead state as representing all those situations in which the goat has been left alone in the presence of either the wolf or the cabbage, without the farmer also being there. No solution can describe a move sequence that passes through any such state.

In the state diagram in Slide 7, the error/dead state was omitted, but for purposes of completeness and unambiguity, it can be included. (See Slide 11.) By doing so, we get the property that for every state/symbol pair, there is a transition from that state labeled by that symbol. Thus, we obtain a deterministic state diagram.

Section 2.3 begins with an informal definition of deterministic finite automaton (DFA). (Often, the word "automaton" is replaced by "machine".) Such a structure is composed of

a set of states, each of which is depicted by a circle (or bubble, or rectangle)
- one of which is designated as the start (or initial) state, depicted by an unlabeled arrow pointing to it.
- zero of more of which are designated to be accepting (or final) states. To indicate such a state, a double circle (or bubble, or rectangle) is used.
for each (state,symbol) pair (p,c), a transition (or move) to some state q, depicted by an arrow, labeled by c, and directed from state p to state q. (For the purposes of reducing clutter, we can label an arrow with multiple symbols, thereby depicting multiple transitions.)

By this definition a DFA is a state transition diagram (i.e., directed graph) in which exactly one state (i.e., vertex/node) is designated as the start/initial state, zero or more states are designated as accepting/final, and every state has exactly one outgoing transition for each member of the alphabet.

A DFA defines a language, namely the set of all strings that describe transition sequences going from the start state to some accepting state. (A transition sequence is described by the string "spelled out" by the labels of the transitions in the sequence.) A string in that set is said to be accepted by the DFA; strings that are not accepted are said to be rejected.

As a very simple example, Slide 16 presents a DFA that accepts the language over {a,b} whose strings end in a. Using a set former, such a language is described by

{ xa | x ∈ {a,b}* }

Unlike transitions, states need not be labeled (i.e., named), but for the purpose of making it possible to refer to states easily, it is helpful to name them. For that matter, it is often helpful to give meaningful names to states (or to in some way to augment the diagram with descriptions as to what each state "means") in order to aid the reader's understanding of the DFA's underlying logic.

In the simple DFA on Slide 16, it is clear that the machine will be in the accepting state iff it has read/consumed at least one input symbol and the last such symbol is a. That the language accepted by this DFA is { xa | x ∈ {a,b}* } is thus readily apparent.

This would be a good point to look at end-of-chapter exercises

Exercise 1 is flawed and is too easy anyway. (For a given state transition diagram (intended to depict a DFA) and several strings, it asks, for each string, whether it is accepted by the DFA.)
As it stands, the state transition diagram fails to be deterministic because each state has two outgoing transitions labeled by the same symbol and no outgoing transitions labeled by the other. To correct the mistake, the directions of the "horizontal" transitions should be reversed.
Exercise 2 is like Exercise 1, but with a more complicated DFA. It is still very easy. A more interesting problem is to precisely describe the language accepted by the given DFA.
Exercise 3 presents some DFAs and asks the reader to describe each one's language using a set former.
Exercise 4 presents some set formers and, for each one, asks that the reader present a DFA that accepts the described language.
Exercise 5 presents three algebraic descriptions of DFA's (in accord with the 5-tuple model) and asks the reader to draw the corresponding state transition diagrams.
Exercise 6 is the inverse of 5, presenting three DFA's as state transition diagrams and asking for the corresponding algebraic descriptions.
Exercise 7 asks for the languages accepted by the DFA's in Exercise 6 to be described using set formers. All three are extremely easy.
Exercise 8 presents a DFA and asks for several examples of expressions of the form δ*(p,x) to be evaluated.
Exercise 9 is like 4, but with more complex languages
Exercise 10 asks for algorithms to determine, for a given DFA M, whether L(M) is empty, or finite. Also asks for an algorithm to find a minimum-length member of L(M).

DFA's are rather limited in what features of a string they can ascertain, but among them are the ability to

count occurrences of a symbol (or even a substring) up to a threshold. A special case of this is to ensure that a "required" substring occurs and/or a "prohibited" one does not.
count occurrences of a symbol (or even a substring) and modulo any fixed integer n.

Thus, for example, you can design a DFA to accept a language such as

{ x ∈ {a,b}* | #_a(x) is odd ∧ #_b(x) is at least four }

which counts occurrences of a modulo 2 and occurrences of b with respect to a threshold of four. (For a character c and a string z, #_c(z) is the number of occurrences of c within z.)

Exercise: Design such a DFA.

Section 2.5 explains how a DFA can be described in an algebraic style, as a 5-tuple M = (Q, Σ, δ, q₀, F).

Q is the set of states
Σ is the alphabet
δ: Q × Σ ⟶ Q is the transition function, corresponding to the DFA's transitions. δ(p,c) = q means that the outgoing transition from state p on symbol c goes to state q.
q₀ is the start/initial state
F ⊆ Q is the set of accepting/final states.

A third way to describe a DFA is in a tabular format. Thus, we have at least three formats that are commonly used to describe a DFA: as a state transition diagram, as a table, and in the 5-tuple algebraic form.

Slide 21 shows a transition diagram (for the ending-in-a) language and the corresponding algebraic description of it.

In class, several DFA's were examined, including one that accepts the language

{ x ∈ {a,b}* | #_a(x) (mod 3) = #_b (mod 2) }

That DFA is shown below as both a state transition diagram and as a table:

State δ

Start? Accepting? State ID a b

✓ ✓ [0,0] [1,0] [0,1]

[1,0] [2,0] [1,1]

[2,0] [0,0] [2,1]

[0,1] [1,1] [0,0]

✓ [1,1] [2,1] [1,0]

[2,1] [0,1] [2,0]

Assuming that we used M to name this DFA, its 5-tuple description would go something like this:

M = (Q, {0,1}, δ, [0,0], {[0,0], [1,1]}), where Q = {0,1,2}×{0,1} and δ is ...

A verbose description of δ is below left; on the right is a more concise version that exploits the fact that the states are named in a way that allows the transitions to be expressed in accord with the (modular) arithmetic that underlies the DFA's logic.

δ[0,0],a) = [1,0], δ[0,0],b) = [0,1], δ[1,0],a) = [2,0], δ[1,0],b) = [1,1], δ[2,0],a) = [0,0], δ[2,0],b) = [2,1], δ[0,1],a) = [1,1], δ[0,1],b) = [0,0], δ[1,1],a) = [2,1], δ[1,1],b) = [1,0], δ[2,1],a) = [0,1], δ[2,1],b) = [2,0] For all j ∈ {0,1,2} and k ∈ {0,1}: δ([j,k],a) = [(j+1)%3, k] and δ([j,k],b) = [j, (k+1)%2]

Section 2.5 goes on to show how to formally define the language accepted by a DFA in terms of the 5-tuple algebraic description of a DFA.

To aid in that, we define an extended transition function δ* : Q × Σ* ⟶ Q. The intent is that, for a state p and a string x,

δ*(p,x) = q corresponds to the statement that, if we begin in state p and follow the sequence of transitions that "spell out" x, we will end in state q. The latter condition is more concisely represented by the notation p ⟶^x q.

δ*(p,aba) = < applying recursive case with x=ab, c=a > δ(δ*(p,ab), a) = < applying recursive case with x=a, c=b > δ(δ(δ*(p,a), b), a) = < applying recursive case with x=λ, c=a > δ(δ(δ(δ*(p,λ), a), b), a) = < applying base case > δ(δ(δ(p, a), b), a) = < applying δ(p,a) = q > δ(δ(q, b), a) = < applying δ(q,b) = r > δ(r, a) = < applying δ(r,a) = s > s

δ* can be defined in terms of δ recursively as follows:

δ*(p,λ) = p (base case)

δ*(p,xc) = δ(δ*(p,x),c) (recursive case)

Here, it is understood that x ∈ Σ* and c ∈ Σ

Suppose, for example, that in some DFA with states p, q, r, and s there is a path p ⟶^aba s of transitions whose labels "spell out" aba like this:

p ⟶^a q ⟶^b r ⟶^a s

Of course, this means that δ(p,a) = q, δ(q,b) = r, and δ(r,a) = s. Then δ*(p,aba) = s should hold. That it does is shown to the right.

Now, recall that, in terms of the state transition graph representing a DFA, the language L(M) accepted by a DFA M is

{ x ∈ Σ* | q₀ ⟶^x s for some s ∈ F}

Equivalently, in terms of the δ* function, that language is

{ x ∈ Σ* | δ*(q₀, x) = s for some s ∈ F}

CMPS 260 Notes on Chapter 2 of Webber (Finite Automata)

CMPS 260
Notes on Chapter 2 of Webber (Finite Automata)