In standard Huffman coding, the compressor builds a Huffman Tree based upon the counts/frequencies of the symbols occurring in the filetobecompressed and then assigns to each symbol the codeword implied by the path from the root to the leaf node associated to that symbol. For example, if we adopt the convention that an edge from a node to its left (respectively, right) child is labeled 0 (resp., 1), then if the path from the root to a particular leaf is left, left, right, left, right, left, then the codeword assigned to the associated symbol will be 001010.
Canonical Huffman Coding recognizes that the essential information provided by a Huffman Tree is the mapping from symbols to their codeword lengths; the particular bit patterns of the codewords are secondary and can be computed independently of the tree. Indeed, in Canonical Huffman Coding the set of codewords that is employed depends solely upon the distribution of codeword lengths. This set is chosen so as to satisfy not only the familiar prefixfreeness property (i.e., no codeword is the prefix of any other) but also this property:
LongerisLesser property:
If x and y are codewords, with x > y, then x' ≺ y, where x' is the prefix of x of length y.
Using standard notation, z denotes the length of z and ≺ denotes the "lexicographically less than" relation. Lexicographic ordering is essentially the same as alphabetic ordering.
With respect to bit strings u and v, to say that u ≺ v is to say that either u is a proper prefix of v or else the leftmost bit in which they differ is a 0 in u and a 1 in v. For example, 100101 ≺ 10100 because of the bits in the 3rd position (counting from one at the left). (For essentially the same reason, the word "carwash" precedes "cattle" in the dictionary.)
Now, if A and B are leaves in a Huffman Tree (in which edges to left (respectively, right) children are labeled 0 (resp., 1)) with corresponding codewords x and y (i.e., the labels on the edges along the path from the root to A (respectively, B) spell out x (resp., y)) then x ≺ y is equivalent to A being to the left of B in the tree.
Thus, in order for the set of codewords induced by a Huffman Tree to satisfy the LongerisLesser property, the tree must have this property:
LefterisDeeper property:
If A and B are leaves and A is to the left of B, then depthOf(A) ≥ depthOf(B). (The depth of a node is its distance from the root.)
But we can take any Huffman Tree and, by a judicious sequence of swaps of subtrees rooted at nodes of the same depth, arrive at another Huffman Tree having the LefterisDeeper property and having a set of codewords whose length distribution is the same as that in the original tree.
Even though such a Huffman Tree transformation process is possible, it's not necessary to do it that way. A better approach is to take the codeword length distribution of the original tree and to build a LefterisDeeper tree directly therefrom. Indeed, for any given distribution of lengths, there is only one possible tree structure.
For example, suppose that the symbol frequencies led us to build one of the many Huffman Trees in which the codeword length distribution was as on the left below. Then the corresponding (unique) LefterisDeeper tree (where each leaf's depth is explicitly indicated) is in the middle, and the resulting set of codewords (listed in lexicographically increasing —and thus length descending— order) is to the right:

* / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ * * / \ / \ / \ / \ / \ / \ / \ / \ * * * * / \ / \ / \ 2 / \ / \ / \ / \ / \ / \ * * * * * * / \ / \ 3 3 3 3 * * * * / \ 4 4 4 * * / \ 5 * * 6 6 
Codewords 000000 000001 00001 0001 0010 0011 010 011 100 101 11 
Significantly, the LongerisLesser set of codewords that arises from a LefterisDeeper tree has some interesting properties when you interpret each codeword as a natural number (in accord with the binary numeral system).
Longer'sPrefixisLesser property:
Let x and y be codewords, with x > y, and let x' be the prefix of x of length y. Then #(x') < #(y), where # is the function that maps bit strings into their numerical equivalents according to the standard binary numeral system. (E.g., #(1001) = 9, #(00110) = 6.)
ConsecutiveValues property:
For any particular length, the codewords of that length represent a consecutive range of natural numbers.
In our example, the codewords of length four represent the range 1..3 and those of length three represent 2..5.
HalfofSuccessor property:
For all k less than the maximum length among codewords, the smallest codeword of length k has value ⌈(m+1)/2⌉, where m is the value of the largest codeword of length k+1.
All this is quite interesting, of course, but is there any advantage in employing a set of codewords that arises from a LefterisDeeper tree? Answer: Yes.
Briefly, we list them:
One way to encode the distribution is by indicating the minimum and maximum among the codeword lengths, and then indicating, for each length, the number of codewords of that length. For the example shown above, the list would be <2, 6, 1, 4, 3, 1, 2>.
For our ridiculously small example, encoding this list would require a slightly larger number of bits than encoding the tree's structure. But for a realisticsized example, the opposite would usually be the case, although the savings would typically be small.
Details to be provided at some point in time...
The biggest gains to come from using Canonical Huffman Coding are in performing decompression, so we look at those first.
Because of the constrained nature of the codeword set (in particular, the Longer'sPrefixisLesser and ConsecutiveValues properties), it turns out that, in place of storing an explicit representation of the Huffman Tree, all that the decompresser needs are two arrays, minCW[] and CW2Symbol[][]. For each relevant value of i, minCW[i] contains the (numeric) value of the lexicographically smallest codeword of length i. For each pair of relevant values of i and j, CW2Symbol[i][j] is the native code of the jth symbol having a codeword of length i.
For the example tree above, these arrays would look like this, where in each element of minCW[] is shown not only the numeric value but also the corresponding codeword.
minCW CW2Symbol   ++ ++ 2  3 (11)  2 'e' ++ +++ 3  4 (011)  3 'c''i' ++ ++++++ 4  3 (0011)  4 'a''f''l''o''j' ++ ++++++ 5  2 (00010)  5 'b''k''h''g' ++ +++++ 6  0 (000000) 6 'p''n''m''d' ++ +++++ 0 1 2 3 4 
Of course, the decompresser must make use of the metadata at the beginning of the compressed file to construct these arrays. (How that is accomplished is addressed later.) Having done that, its job is basically that described by this highlevel algorithm:
while (hasMoreBits()) { BitString x := nextBit(); while (!isCodeword(x)) { x := x · nextBit(); // append next bit onto rear of x } emit nativeCodeOf(x); // emit the native code of the } // symbol whose codeword is x 
What is not obvious is how to implement isCodeword() and nativeCodeOf() making use of nothing but the data stored in arrays minCW[] and CW2Symbol[][].
The solutions to these two problems rely, respectively, upon the guarantees that the set of codewords possesses the Longer'sPrefixisLesser and ConsecutiveValues properties!
To illustrate how we can tell whether the value of x (in the algorithm above) is a codeword, suppose that z is a codeword and let z_{k} be the prefix of z of length k, for all k in the range 1..z. By the LongerisLesser property of the codewords, we have that z_{i} ≺ minCW[i] for all i<z. Trivially, we also have that z ≽ minCW[z]. That is, every proper prefix of z is lexicographically less than the smallest codeword of its length, but z itself is (obviously) lexicographically greater than or equal to the smallest codeword of its length.
We will assume, of course, that the file produced by the compresser begins with metadata describing a symboltocodeword mapping that is consistent with a LefterisDeeper Huffman Tree. We consider two possible ways in which the metadata might describe that mapping, one in which the Huffman tree is described explicitly and the other in which it is described implicitly. (Both of these possibilities were mentioned earlier.)
Following that would be a list of the native codes of the symbols, going from the symbol with the lexicographically smallest codeword (corresponding to the tree's leftmost leaf) to the one with the largest (corresponding to the tree's rightmost leaf). Of course, this list of native codes would have to be parsable, meaning that the boundaries between the elements could be determined algorithmically. (If the native codes are of a known fixed length, that would not be a problem; otherwise one could precede each native code with a length indicator in EliasGamma form, for example.) Here we are not concerned with the details of how to encode the list of native codes, however.