CMPS 144 Intersession 2023
Binary Codeword Tree Builder

Background

              *
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       *             *
      / \           / \
     /   \         /   \
    *     *       *     *
   / \     \      E    /
  /   \     \         /
 /     \     \       /
*       *     *     *
A      / \    F    / \
      /   \       /   \
     *     *     *     *
     G     C     D     B
A : 000
B : 1101
C : 0011
D : 1100
E : 10
F : 011
G : 0010
A binary codeword tree The induced
mapping
Huffman trees (trees that are constructed during the execution of David Huffman's algorithm by which a mapping from symbols to binary codewords is computed) are examples of a wider class of rooted binary trees that here we will refer to as binary codeword trees

To the right is an example of such a tree. We imagine that the edges leading into left children are labeled by 0 and those leading into right children are labeled by 1. The sequence of labels on the edges along the path from the root node to a given node is referred to as that node's path label. For each leaf node, the path label of that node is taken to be the binary codeword of the symbol shown underneath that node.

In this way, a binary codeword tree defines/induces a mapping from a set of characters (namely those that are associated with the leaf nodes of the tree) to a set of binary codewords (namely those that are the path labels of the leaves). The tree to the right induces the mapping shown next to it.

Note that, because only path labels of leaf nodes serve as codewords, it is not possible for a binary codeword tree to define a mapping in which one codeword is a proper prefix of another. (Reasoning: For two nodes to be such that one's path label is a proper prefix of another's, the former node must be a proper ancestor of the latter. But leaf nodes are never proper ancestors.)

It follows that the set of codewords produced by any binary codeword tree is uniquely decipherable, which is of vital importance.

One restriction that we could have placed upon binary codeword trees (but did not) is that all interior nodes must have two children. Indeed, all Huffman trees satisfy this property. Allowing a node to have only one child gives rise to unnecessarily long codewords. In our example, the codewords for characters F, D, and B could be shortened by one bit by getting rid of the two one-child interior nodes in the tree.

With respect to a character-to-codeword mapping, the problem of encoding is that of translating a sequence of characters into the concatenation of their codewords. The problem of decoding is the inverse, in which a concatenation of codewords is translated back into the corresponding sequence of characters.

The mapping defined by our example tree would encode the string DEAD as 1100100001100 (the concatenation of 1100 for D, 10 for E, 000 for A, and 1100 for D, respectively). Decoding translates in the opposite direction, which here is to say that 1100100001100 would decode to DEAD.

A binary codeword tree would be a very valuable tool in performing decoding, but it would be quite a bad tool to use when encoding. A much better tool to use for encoding is an array of type String[]. Such an array codeWords[] could define a character-to-codeword mapping like this:

Suppose character x (of type char) has bit string y (of type String) as its codeword. Then codeWords[x] should have y as its value.

In case it strikes you as odd that Java allows an array index to be of type char, note that values of type char are represented by integers, and the characters that typically occur in "plain text" files are represented by the integers in the range [32..127). Thus, for example, the character 'k' is represented by the integer 107, and so codeWords['k'] means the same thing as codeWords[107].

The problems addressed in this programming assignment are

  1. building a binary codeword tree that defines the same character-to-codeword mapping as does a given array map[] (of type String[]), where, for example, map['w'] = "011001" means that the character w has 011001 as its codeword.
  2. making use of a binary codeword tree to decode a message that had been encoded using the mapping defined by the tree.

The Student's Task

Pertaining to the problem of building a binary codeword tree, provided are the following files:

Mapping described in input file:
A : 000
B : 1101
C : 0011
D : 1100
E : 10
F : 011
G : 0010

Mapping described by the constructed tree:
A: 000
G: 0010
C: 0011
F: 011
E: 10
D: 1100
B: 1101

To illustrate how the augmentTree() method works, suppose that on successive calls it was used to build the tree representing the mapping described in map0.txt (as shown above). The figures below show what the tree would like initially and after each call to augmentTree().

  *
         *
        /
       /
      *
     /
    /
   *
  /
 /
*
A 
         *
        / \
       /   \
      *     *
     /       \
    /         \
   *           *
  /           /
 /           /
*           *
A            \
              \
               *
               B
         *
        / \
       /   \
      *     *
     /       \
    /         \
   *           *
  / \         /
 /   \       /
*     *     *
A      \     \
        \     \
         *     *
         C     B
           *
          / \
         /   \
        /     \
       /       \
      *         *
     /           \
    /             \
   *               *
  / \             /
 /   \           /
*     *         *
A      \       / \
        \     /   \
         *   *     *
         C   D     B
Initially After adding
A:000
After adding
B:1101
After adding
C:0011
After adding
D:1100

              *
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       *             *
      /             / \
     /             /   \
    *             *     *
   / \            E    /
  /   \               /
 /     \             /
*       *           *
A        \         / \
          \       /   \
           *     *     *
           C     D     B
              *
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       *             *
      / \           / \
     /   \         /   \
    *     *       *     *
   / \     \      E    /
  /   \     \         /
 /     \     \       /
*       *     *     *
A        \    F    / \
          \       /   \
           *     *     *
           C     D     B
              *
             / \
            /   \
           /     \
          /       \
         /         \
        /           \
       *             *
      / \           / \
     /   \         /   \
    *     *       *     *
   / \     \      E    /
  /   \     \         /
 /     \     \       /
*       *     *     *
A      / \    F    / \
      /   \       /   \
     *     *     *     *
     G     C     D     B 
After adding
E:10
After adding
F:011
After adding
G:0010

Pertaining to the problem of decoding an encoded message by using a binary codeword tree, provided are the following files:


Submitting Your Work

Submit the files BinCodeTreeBuilder.java and Decoder.java to the approprate Brightspace dropbox.