CMPS 260 (Theoretical Foundations of CS)
The CYK Algorithm

The CYK algorithm (named for Cocke, Young, and Kasami, each of whom develeped it independently of the others in the mid-1960's) solves the membership problem for context-free grammars in Chomsky Normal Form. That is, given as input a CFG G in Chomsky Normal Form (CNF) and a string w, the algorithm determines whether or not w Î L(G). Because any context-free grammar can be transformed into CNF (with the possible loss of the empty string from the generated language), this gives us a way of solving the membership problem for all CFG's. Letting G and w denote its two inputs (and assuming that CYK is a boolean function that, given a CNF grammar and a string, returns true iff the string is generated by the grammar), the algorithm is as follows:

function Member_of( G : CFG; w : string ) return boolean is

begin
  if w=e then
     if S is erasable      --(where S is the start symbol of G) 
        then return true;
        else return false;
     end if;
  else  -- w /= e
     G' := CNF grammar generating L(G) - {e};
     return CYK(G',w);
  end if; 

Recall that a context-free grammar is said to be in Chomsky normal form if all its rules are of one of the two forms A ® b or A ® BC, where b is a terminal symbol and B and C are nonterminals.

The CYK algorithm is based on the following. Let G be a context-free grammar in Chomsky normal form, and let w = a1 a2 ¼an (ai Î S) be a string over the terminal alphabet S of G. For i and j satisfying 1 £ i £ j £ n, let wi,j denote the substring ai ai+1 ¼aj of w beginning with its i-th symbol and ending with its j-th symbol, and let Ni,j denote the set of nonterminals in G from which wi,j can be derived. That is,

Ni,j = { A : A *
Þ
 
wi,j }

Lemma 1: For all i, Ni,i = { A : A ® ai  is a rule in  G }

Proof: Because G is in CNF, the only way that a string of length one can be derived from a nonterminal symbol is via an application of a rule of the form A ® b.

Lemma 2: For all i and j satisfying 1 £ i < j £ n, A Î Ni,j if and only if there exist nonterminals B and C and a number k satisfying i £ k < j such that A ® BC is a rule in G, B Î Ni,k, and C Î Nk+1,j.

Proof: Sufficiency (if):

A ® BC  is a rule  Ù B Î N[i,k ÙC Î Nk+1,j
=
      <  by defn of  N   >
A ® BC  is a rule  Ù B *
Þ
 
wi,k Ù C *
Þ
 
wk+1,j
Þ
      <  by properties of derivations  >
A Þ BC *
Þ
 
wi,k C *
Þ
 
wi,k ·wk+1,j
Þ
      <  by properties of derivations  >
A *
Þ
 
wi,k ·wk+1,j
Þ
      <   wi,j = wi,k ·wk+1,j   >
A *
Þ
 
wi,j
=
      <  by defn of  Ni,j   >
A Î Ni,j

Necessity (only if):

A Î Ni,j
=
      <  defn of  Ni,j   >
A *
Þ
 
wi,j
Þ
      <  see note below  >
A Þ BC *
Þ
 
wi,j  for some nonterminals  B,C
Þ
      < property of derivations  >
A ® BC  is a rule  Ù B *
Þ
 
wi,k Ù C *
Þ
 
wk+1,j  for some  B, C, k
=
      <  defn of  Ni,k, Nk+1,j   >
A ® BC  is a rule  Ù B Î Ni,k ÙC Î Nk+1,j  for some  B, C, k
Note: The second step in the proof of necessity is justified by the fact that, in a CNF CFG, a derivation of a terminal string of length two or more from a nonterminal symbol A must begin with the application of a rule of the form A ® BC.

End of proof of Lemma 2.

Lemma 3: w Î L(G) if and only if S Î N1,n, where S is the start symbol of G.

Proof:

w Î L(G)
=
      <  defn of  L(G)   >
S *
Þ
 
w
=
      <   w = w1,n   >
S *
Þ
 
w1,n
=
      <  defn of  N1,n   >
S Î N1,n

>From Lemma 3, it follows that, in order to determine whether w Î L(G), it suffices to compute the set N1,n and then to check whether S is a member of that set. But how can we compute N1,n? Lemmas 1 and 2 provide strong suggestions. Lemma 1 tells us that, for any i, in order to compute Ni,i it suffices to examine each rule in G. Lemma 2 tells us that, for any i and m such that 1 £ i < i+m £ n, in order to compute Ni,i+m it suffices to examine each rule in G, as well as the sets Ni,k and Nk+1,i+m for k satisfying i £ k < i+m. From this we conclude that the ``correct'' order in which to compute the Ni,i+m's is in increasing order of m. That way, each time we are to compute a particular set Ni,i+m, all the sets Ni,k and Nk+1,i+m (i £ k < i+m) on which its value depends have been computed already.

We arrive at the following algorithm. (For ease of typesetting, in the algorithm we enclose subscripts within square brackets.)

CYK Algorithm.

Input:  CFG G in CNF, string w = a[1] a[2] a[3] ... a[n]   
Output: YES if w in L(G), NO otherwise


   for i in 1..n  loop
      N[i,i] := empty set;
      for each nonterminal A in G  loop
         if A --> a[i] is a production in G  then    --note that a[i] = w[i,i]
            insert A into N[i,i];
         end if;
      end loop;
   end loop;

   for m in 1..n-1  loop
      --compute N[i,i+m] for i satisfying 1 <= i <= n-m
      for i in 1..n-m  loop
         --compute N[i,i+m]
         N[i,i+m] := empty set;
         for k in i..i+m-1  loop
            for each production A --> BC in G  loop 
               if B is in N[i,k]  and  C is in N[k+1,i+m]  then
                  insert A into N[i,i+m];
               end if;
            end loop;
         end loop;
      end loop;
   end loop;

   if S is in N[1,n]  then    --w in L(G) iff S is in N[1,n]
      then  return YES;
      else  return NO;
   end if; 


File translated from TEX by TTH, version 2.00.
On 13 May 1999, 02:28.