Huffman Code in Java
The Huffman coding is a data compression algorithm that creates a binary tree of nodes. The node can be either internal nodes or leaf nodes.
This tutorial describes and demonstrates the Huffman code with Java in detail.
Demonstrate the Use of Huffman Coding Algorithm in Java
The idea of the Huffman coding algorithm is to assign variable-length codes to input characters based on the frequencies of corresponding characters.
These codes are called the Prefix codes since the code given to each character is unique, which helps Huffman coding with decoding without any ambiguity.
We can build a Huffman tree using a priority queue in Java, where the node with the highest priority has the lowest frequency. We will follow the steps given below.
- First, create a leaf node for each character of the given text and add the nodes to the priority queue.
- If there is more than one node in a queue, remove two nodes with the lowest frequency and highest priority from that queue.
- Now, create a new node with two children nodes that were removed before, the frequency of the new node will be equal to the sum of both nodes’ frequencies. And then add that node to the priority queue.
- Finally, the remaining node will be the root node, and the tree will be completed.
Let’s see an example in Java to convert a text into Huffman coding.
The main class Huffman.java
:
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
class Huffman {
// Huffman Tree Traversing and storing the Huffman Codes in a dictionary.
public static void encode_huffman(
Huffman_Node root_node, String str, Map<Character, String> huffman_Code) {
if (root_node == null) {
return;
}
// if the root node is a leaf node
if (is_Leaf(root_node)) {
huffman_Code.put(root_node.charac, str.length() > 0 ? str : "1");
}
encode_huffman(root_node.left, str + '0', huffman_Code);
encode_huffman(root_node.right, str + '1', huffman_Code);
}
// Huffman Tree Traversing and decoding the encoded string
public static int decode_huffman(Huffman_Node root_node, int index, StringBuilder sb) {
if (root_node == null) {
return index;
}
// if the root node is a leaf node
if (is_Leaf(root_node)) {
System.out.print(root_node.charac);
return index;
}
index++;
root_node = (sb.charAt(index) == '0') ? root_node.left : root_node.right;
index = decode_huffman(root_node, index, sb);
return index;
}
// This function checks if Huffman Tree contains only one single node
public static boolean is_Leaf(Huffman_Node root_node) {
return root_node.left == null && root_node.right == null;
}
// Main Huffman tree build function
public static void Main_Build_HuffmanTree(String text) {
// Base case: empty string
if (text == null || text.length() == 0) {
return;
}
// Calculate the frequency of each character and store it in a map of dict
Map<Character, Integer> frequency = new HashMap<>();
for (char c : text.toCharArray()) {
frequency.put(c, frequency.getOrDefault(c, 0) + 1);
}
// priority queue to store nodes of the Huffman tree
// the highest priority item has the lowest frequency
PriorityQueue<Huffman_Node> prio_queue;
prio_queue = new PriorityQueue<>(Comparator.comparingInt(l -> l.frequency));
// leaf node for each character, adding it to the priority queue.
for (var entry : frequency.entrySet()) {
prio_queue.add(new Huffman_Node(entry.getKey(), entry.getValue()));
}
// repeat the process till there is more than one node in the queue
while (prio_queue.size() != 1) {
// Then remove the two nodes with the highest priority and lowest frequency
Huffman_Node left = prio_queue.poll();
Huffman_Node right = prio_queue.poll();
// Now create a new internal node with two children nodes, and the frequency will be the some
// of both nodes; add the new node to the priority queue.
int sum = left.frequency + right.frequency;
prio_queue.add(new Huffman_Node(null, sum, left, right));
}
Huffman_Node root_node = prio_queue.peek();
// Huffman tree Traversing and storing the Huffman codes in a dict or map
Map<Character, String> huffmanCode = new HashMap<>();
encode_huffman(root_node, "", huffmanCode);
// Display the Huffman codes
System.out.println("The Huffman Codes for the given text are: " + huffmanCode);
System.out.println("The original text is: " + text);
// display the encoded string
StringBuilder sb = new StringBuilder();
for (char c : text.toCharArray()) {
sb.append(huffmanCode.get(c));
}
System.out.println("The encoded text is: " + sb);
System.out.print("The decoded text is: ");
if (is_Leaf(root_node)) {
// For input like a, aa, aaa, etc.
while (root_node.frequency-- > 0) {
System.out.print(root_node.charac);
}
} else {
// Huffman Tree traversing with decoding the encoded string
int index = -1;
while (index < sb.length() - 1) {
index = decode_huffman(root_node, index, sb);
}
}
}
// Call the Huffman code
public static void main(String[] args) {
String text = "This is delftstack";
Main_Build_HuffmanTree(text);
}
}
The node class Huffman_Node.java
:
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
// A Tree node
class Huffman_Node {
Character charac;
Integer frequency;
Huffman_Node left = null, right = null;
Huffman_Node(Character charac, Integer frequency) {
this.charac = charac;
this.frequency = frequency;
}
public Huffman_Node(Character charac, Integer frequency, Huffman_Node left, Huffman_Node right) {
this.charac = charac;
this.frequency = frequency;
this.left = left;
this.right = right;
}
}
The first class is the main class that performs the operations of the Huffman coding algorithm, and the second class is for creating the nodes. The code will generate Huffman codes for a given text, the encoded text, and decode it.
Output:
The Huffman Codes for the given text are: { =010, a=11100, c=1010, d=11101, e=1000, f=11011, H=0110, h=10010, i=1111, k=11010, l=000, m=01110, .=01111, o=1100, s=001, T=10011, t=1011}
The original text is: Hello This is delftstack.com
The encoded text is: 011010000000001100010100111001011110010101111001010111011000000110111011001101111100101011010011111010110001110
The decoded text is: Hello This is delftstack.com
As we can see, the given text contains 25 characters which are 25×8 = 200 bits, and the encoded string is only 111 bits, almost 45% data compression. This data compression is the primary purpose of Huffman coding.
Sheeraz is a Doctorate fellow in Computer Science at Northwestern Polytechnical University, Xian, China. He has 7 years of Software Development experience in AI, Web, Database, and Desktop technologies. He writes tutorials in Java, PHP, Python, GoLang, R, etc., to help beginners learn the field of Computer Science.
LinkedIn Facebook