SICP Exercise 2.70
Question
The following eight-symbol alphabet with associated relative frequencies was designed to efficiently encode the lyrics of 1950s rock songs. (Note that the “symbols” of an “alphabet” need not be individual letters.)
A | 2 | NA | 16 |
BOOM | 1 | SHA | 3 |
GET | 2 | YIP | 9 |
JOB | 2 | WAH | 1 |
Use generate-huffman-tree
(Exercise 2.69) to generate a corresponding Huffman
tree, and use encode
(Exercise 2.68) to encode the following message:
Get a job
Sha na na na na na na na na
Get a job
Sha na na na na na na na na
Wah yip yip yip yip
yip yip yip yip yip
Sha boom
How many bits are required for the encoding? What is the smallest number of bits that would be needed to encode this song if we used a fixed-length code for the eight-symbol alphabet?
Answer
The encoded string is
111111100111101110000000001111111001111011
100000000011010101010101010101010111011011
which is 84 characters long. Since our Huffman set does not have leaves that represent line breaks and is also case-sensitive, the message had to be brought into the right format:
GET A JOB SHA NA NA NA NA NA NA NA NA GET A JOB SHA NA NA NA NA
NA NA NA NA WAH YIP YIP YIP YIP YIP YIP YIP YIP YIP SHA BOOM
If we were to change this to a fixed-length alphabet, we would require 3 bits per word (because \(2^3=8\)). The original text has 36 words, so the encoded message would require \(36*3=108\) bits in total, which is an increase of about 29%.