Greedy

Huffman coding data compression algorithm

Pinterest LinkedIn Tumblr

Huffman coding is a lossless data compression algorithm and derives this table based on the estimated frequency of occurrence for each possible value of the source symbol.

The most frequent character gets smallest code and least frequent character get the largest code. Two trees with the least frequencies are joined as the subtrees of a new root that is assigned the sum of their frequencies. Repeat until all characters are in one tree. One code bit represents each level. Thus more frequent characters are near the root and are coded with few bits, and rare characters are far from the root and are coded with many bits.

Write program to implement Huffman coding?

source Code

package com.dsacode.Algorithm.greedy;
 
import java.util.PriorityQueue;
 
 
abstract class HuffmanTree implements Comparable < HuffmanTree > {
    public final int frequency;
 
    public HuffmanTree(int freq) {
        frequency = freq;
    }
 
    public int compareTo(HuffmanTree tree) {
        return frequency - tree.frequency;
    }
}
 
class HuffmanLeaf extends HuffmanTree {
 
    public final char value;
 
    public HuffmanLeaf(int freq, char val) {
        super(freq);
        value = val;
    }
}
 
class HuffmanNode extends HuffmanTree {
    public final HuffmanTree left, right;
 
    public HuffmanNode(HuffmanTree l, HuffmanTree r) {
        super(l.frequency + r.frequency);
        left = l;
        right = r;
    }
}
 
 
public class HuffmanCode {
 
    public static HuffmanTree buildTree(int[] charFreqs) {
            PriorityQueue < HuffmanTree > trees = new PriorityQueue < HuffmanTree >();
             
            for (int i = 0; i < charFreqs.length; i++)
                if (charFreqs[i] > 0)
                    trees.offer(new HuffmanLeaf(charFreqs[i], (char) i));
 
             
            while (trees.size() > 1) {
                HuffmanTree a = trees.poll();
                HuffmanTree b = trees.poll();
                trees.offer(new HuffmanNode(a, b));
            }
            return trees.poll();
        }
 
        public static void printCodes(HuffmanTree tree, StringBuffer prefix) {
 
            if (tree instanceof HuffmanLeaf) {
                HuffmanLeaf leaf = (HuffmanLeaf) tree;
                System.out.println(leaf.value + "\t" + leaf.frequency + "\t"
                        + prefix);
 
            } else if (tree instanceof HuffmanNode) {
                HuffmanNode node = (HuffmanNode) tree;
 
                prefix.append('0');
                printCodes(node.left, prefix);
                prefix.deleteCharAt(prefix.length() - 1);
 
                prefix.append('1');
                printCodes(node.right, prefix);
                prefix.deleteCharAt(prefix.length() - 1);
            }
        }
 
        public static void main(String[] args) {
            String str = "Hello World";
 
            System.out.println("Huffman code for '"+str +"'");
            int[] charFreqs = new int[256];
            for (char c : str.toCharArray())
                charFreqs[c]++;
 
            HuffmanTree tree = buildTree(charFreqs);
 
            System.out.println("char\tWeight\tHuffmanCode");
            printCodes(tree, new StringBuffer());
        }
    }
Huffman code for 'Hello World'
char    Weight  HuffmanCode
    1   000
r   1   001
e   1   010
W   1   011
l   3   10
o   2   110
H   1   1110
d   1   1111

Algorithm Explanation

Create a leaf node for each symbol and add it to the priority queue
While there is more than one node in the queue:
* Remove the two nodes of highest priority (lowest probability) from the queue
* Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes probabilities.
* Add the new node to the queue.
The remaining node is the root node and the tree is complete.

Time Complexity

Best CaseAverage CaseWorst Case
O(n log k)

Reference

  1. https://en.wikipedia.org/wiki/Huffman_coding
  2. https://www.cs.auckland.ac.nz/software/AlgAnim/huffman.html
  3. http://mathworld.wolfram.com/HuffmanCoding.html

Write A Comment