Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to correctly understand Huffman coding

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to correctly understand Huffman coding". The content of the explanation is simple and clear, and it is easy to learn and understand. let's follow the editor's train of thought to study and learn "how to correctly understand Huffman coding"!

To tell you the truth, I heard about Huffman coding a long time ago. In addition to knowing that it is commonly used in conventional compression formats such as GZIP, BZIP2, and PKZIP, I also know that it is usually used to compress character data with high repetition rates.

If you think about it, the infinite combination of 26 letters in English has a high repetition rate. There are not many commonly used Chinese characters, about 2500, do not ask me how to know, I have asked the search engine.

The higher the frequency of character repetition, the more efficient Huffman coding will be!

It's time to join you to understand how Huffman coding works. after all, a good programmer needs to know why-please allow me to use this sentence again.

Suppose the following string is to be sent over the network.

As you should know, each character occupies 8 bits, and the above string of characters has a total of 15 characters, so it takes up a total of 15, 800, 120 bits. Is there any doubt? If you have any questions, please feel sorry.

If we use Huffman coding, we can compress this string of characters to a smaller size. How do you do that?

Huffman coding first uses the frequency of characters to create a tree, and then generates a specific encoding for each character through the structure of the tree. Characters with high frequency use shorter encoding and those with low frequency use longer encoding. This will reduce the average length of the encoded string, thus achieving the purpose of lossless data compression.

Use the above string of initial characters to explain the steps of Huffman coding step by step.

The first step is to calculate the frequency of each character in the string.

B appeared once, C appeared 6 times, An appeared 5 times, D appeared 3 times.

The second step is to sort the characters according to the frequency of occurrence to form a queue Q.

The one with the lowest frequency is in the front, and the one with high frequency is behind.

The third step is to use these characters as leaf nodes to start building a tree. First create an empty node z, assign the characters of the minimum frequency to the left side of z, and assign the second frequency to the right side of z, and then assign z to the sum of the two character frequencies.

B has the lowest frequency, so it's on the left, then D with frequency 3, on the right, and then set the value of their parent node to 4, the sum of the frequencies of their child nodes.

Then remove B and D from queue Q and add their sum to the queue, as indicated by * in the figure above. Then, recreate an empty node z with 4 as the left node, A with a frequency of 5 as the right node, and the sum of 4 and 5 as the parent node.

Continue to build the tree as before until all the characters appear in the nodes of the tree.

In the fourth step, for each non-leaf node, 0 is assigned to the left side of the connector and 1 to the right side of the connector. At this point, the Hoffman tree is built. Hoffman tree, also known as optimal binary tree, is a kind of binary tree with the shortest length of weighted path.

When the tree is built, let's count the number of bits to be sent.

1) look at the character column. The four characters A, B, C and D add up to 4'8'32 bits. Each English letter occupies one byte, that is, 8 bits.

2) look at the frequency column. A 5 times, B 1 time, C 6 times, D 3 times, a total of 15 bits.

3) look at the coding column. The code of An is 11, which corresponds to 15 → 9 → 5 on the Huffman tree, that is, from the root node to the leaf node A, you need to go through the path 11; the corresponding B needs to go through the path 100; the corresponding D needs to go through the path 101; and the corresponding C needs to go through the path of 0.

4) look at the length column. The code of An is 11, and it appears 5 times, so it takes up 10 bits, that is, 1111111111 position B has a code of 100, and appears once, so it takes up 3 bits, that is, the code of 100 position C is 0, it appears 6 times, so it takes up 6 bits, that is, the code of 000000th D is 101.It appears 3 times, so it takes up 9 bits, that is, 101101101.

Huffman coding, in essence, is to give the most valuable resources (the shortest coding) to the data with the highest probability of occurrence. In the above example, C appears most frequently, and its coding is 0, which saves a lot of space.

Think about it in the light of some situations in our life. In the same way, we keep the most commonly used ones at hand, so that we can improve efficiency and save time. So, I have a bold guess that this is how Hoffman found the optimal solution of the code.

Before being encoded by Huffman, the binary of the string "BCAADDDCCACACAC" is:

10000100100001101000001010000010100010001000100010001000100001101000011010000010100001101000001010000110100000101000011

That's 120 bits.

After encoding, it is:

0000001001011011011111111111

Accounted for 28 bits.

However, considering the decoding, the structure of the Huffman tree needs to be passed, so the 32 bits occupied by characters and 15 bits occupied by frequency also need to be passed. Overall, the number of bits after coding is 32 + 15 + 28 = 75, 45 bits less than 120 bits, and the efficiency is still very high.

With regard to the Java example of Hoffman coding, I will also post it here for your reference.

Class HuffmanNode {int item; char c; HuffmanNode left; HuffmanNode right;} class ImplementComparator implements Comparator {public int compare (HuffmanNode x, HuffmanNode y) {return x.item-y.item }} public class Huffman {public static void printCode (HuffmanNode root, String s) {if (root.left = = null & & root.right = = null & & Character.isLetter (root.c)) {System.out.println (root.c + "|" + s); return;} printCode (root.left, s + "0"); printCode (root.right, s + "1") } public static void main (String [] args) {int n = 4; char [] charArray = {'Agar,' Bond, 'Clearing,' D'}; int [] charfreq = {5,1,6,3}; PriorityQueue Q = new PriorityQueue (n, new ImplementComparator ()); for (int I = 0; I

< n; i++) { HuffmanNode hn = new HuffmanNode(); hn.c = charArray[i]; hn.item = charfreq[i]; hn.left = null; hn.right = null; q.add(hn); } HuffmanNode root = null; while (q.size() >

1) {HuffmanNode x = q.peek (); q.poll (); HuffmanNode y = q.peek (); q.poll (); HuffmanNode f = new HuffmanNode (); f.item = x.item + y.item; f.c ='-'; f.left = x; f.right = y Root = f; q.add (f);} System.out.println ("character | Huffman coding"); System.out.println ("-"); printCode (root, ");}}

The output of this example is as follows:

Character | Huffman coding-C | 0 B | 100 D | 101 A | 11 Thank you for your reading. The above is the content of "how to correctly understand Huffman coding". After the study of this article, I believe you have a deeper understanding of how to correctly understand Huffman coding, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report