In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "what is Huffman coding". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Basic introduction
Huffman coding is also translated into (Huffman coding) Huffman Coding, also known as Huffman coding, which is a coding method and belongs to a program algorithm.
Huffman coding is one of the classic application scenarios of Huffman tree in telecommunications.
Huffman coding is widely used in data file compression. Its compression ratio is usually between 20% and 90%.
Huffman code is a kind of variable word length coding (VLC). Hufuuman proposed a coding method in 1952, which is called optimal coding.
Analysis of the principle of information processing in the field of communication 1: fixed length coding
For example, i like like like java do you like a java has a total of 40 characters, including spaces, and the corresponding ASCII code and binary code are as follows
The message is transmitted in binary, with a total length of 359 (including spaces)
The processing of information in the field of communication 2: variable length coding
I like like like java do you like a java has a total of 40 characters, including spaces. The variable length coding process is as follows
The encoding of characters cannot be the prefix of other character encodings, and the coding that meets this requirement is called prefix coding, that is, the encoding that cannot be matched to repetition.
The processing of information in the field of communication 3: Huffman coding
I like like like java do you like a java has a total of 40 characters, including spaces. The variable length coding process is as follows
Construct a Huffman tree according to the number of times the above characters appear, with the number of times as the weight.
According to the Huffman tree, encode each character (prefix coding), the path to the left is 0 and the path to the right is 1: the code is as follows:
According to the Huffman code above, the encoding corresponding to our "i like like like java do you like a java" string (note the lossless compression we use here) is shown in the following figure.
Description:
The original length is 359, compressed (359-133) / 359-62.9%
This encoding satisfies the prefix encoding, that is, the encoding of the characters cannot be the prefix of other character encodings. It does not cause polysemy of the match.
Huffman coding is lossless compression!
Note:
This Huffman tree may be different depending on the sorting method, so the corresponding Huffman code is not exactly the same, but the wpl is the same and is the smallest. For example, if we always rank the new binary tree with the same weight at the end of the binary tree with the same weight, the resulting binary tree is:
Create the corresponding Huffman tree
According to the principle of Huffman coding and compressing data, it is necessary to create a Huffman tree corresponding to "i like like like java do you like a java".
Train of thought:
First create Node node, Node {data {store data}, weight (weight), left,right}
Get the byte [] array corresponding to "i like like like java do you like a java"
Write a method to put the node node that is going to build the Huffman tree into the List collection
You can create a corresponding Huffman tree through the collection List
Application case of Huffman Tree
Compress and decompress a string of strings
Package com.xie.huffmancode; import java.util.*; public class HuffmanCode {public static void main (String [] args) {String str = "i like like like java do you like a java"; byte [] contentBytes= str.getBytes (); System.out.println ("contentBytes=" + Arrays.toString (contentBytes)); List nodes = getNodes (contentBytes); / / generate Huffman tree Node hufffmanTreeRoot = createHufffmanTree (nodes) / / generated Huffman coding table getCodes (hufffmanTreeRoot, ", stringBuilder); byte [] huffmanCodeBytes = zip (contentBytes, huffmanCodes); System.out.println (" huffmanCodeBytes = "+ Arrays.toString (huffmanCodeBytes)); byte [] decode = decode (huffmanCodes, huffmanCodeBytes); System.out.println (" corresponding array after Huffman decoding "+ new String (decode)) / * contentBytes= [105,32,108,105,107,101,108,105,107,101,106,97,118,97,32,100,111,31,121,111,117,32,105,107,101,32,97,106,97,118,97] * huffmanCodeBytes = [- 88,-65,-56,-65,-56,-65,-55 77,-57, 6,-24,-14,-117,-4,-60,-90, 28] * Huffman decoded the corresponding array i like like like java do you like a java * /} / / to complete the data decompression / / 1. Convert huffmanCodeBytes [- 88,-65,-56,-65,-56,-65,-55, 77,-57, 6,-24,-14,-117,-4,-60,-90, 28] / / back to the Huffman coded binary string "101010001011111111001000101111." / / 2. Huffman coding corresponding binary string "10101000101111111001000101111." = > compare with Huffman coding table = > "i like like like java do you like a java" / * * complete decoding of compressed data * @ param huffmanCodes Huffman coding table * @ param huffmanBytes Huffman encoded byte array * @ return corresponding to the original string * / public static byte [] decode (Map huffmanCodes Byte [] huffmanBytes) {StringBuilder stringBuilder = new StringBuilder () For (int I = 0; I
< huffmanBytes.length; i++) { //判断是不是最后一个字节 boolean flag = (i == huffmanBytes.length - 1); stringBuilder.append(byteToBitString(!flag, huffmanBytes[i])); } Map map = new HashMap(); for (Map.Entry entry : huffmanCodes.entrySet()) { Byte k = entry.getKey(); String v = entry.getValue(); map.put(v, k); } List list = new ArrayList(); for (int i = 0; i < stringBuilder.length();) { int count = 1; boolean flag = true; Byte b = null; while (flag) { String key = stringBuilder.substring(i, i + count);//i 不动,count移动,直到匹配一个字符 b = map.get(key); if (b == null) { count++; } else { flag = false; } } list.add(b); i += count; } byte[] bytes = new byte[list.size()]; for (int i = 0; i < list.size(); i++) { bytes[i] = list.get(i); } return bytes; } /** * 将一个byte 转成一个二进制的字符串 * * @param flag 标识是否需要补高位,true标识需要补高位,如果是false表示不补,如果是最后一个字节,无需补高位 * @param b 传入的byte * @return 该byte对应的二进制字符串,(注意是按补码返回) */ public static String byteToBitString(boolean flag, byte b) { //将b 转成 int int temp = b; //如果temp是正数还需要补高位 if (flag) { // 按位或 如 256|1=>1 0000 0000 | 0000 0001 = > 1 0000 0001 temp | = 256;} / / returns the temp binary complement String bitStr = Integer.toBinaryString (temp); if (flag) {/ / the last 8-bit return bitStr.substring (bitStr.length ()-8);} else {return bitStr }} / * Encapsulation original byte array to Huffman byte array * * @ param bytes * @ return * / public static byte [] huffmanZip (byte [] bytes) {List nodes = getNodes (bytes); / / create Huffman tree Node hufffmanTreeRoot = createHufffmanTree (nodes) / generate Huffman coding getCodes (hufffmanTreeRoot, "", stringBuilder); / / return the compressed Huffman coded byte array return zip (bytes, huffmanCodes) } / * the byte [] array corresponding to the string is passed through the Huffman coding table Return a Huffman code * @ return generated by byte [] * @ param huffmanCodes corresponding to the byte [] * * @ param bytes original string after Huffman coding is compressed. Return byte [] * / public static byte [] zip (byte [] bytes) after Huffman coding processing. Map huffmanCodes) {/ / convert bytes to Huffman encoded string StringBuilder stringBuilder = new StringBuilder () using huffmanCodes For (byte b: bytes) {stringBuilder.append (huffmanCodes.get (b));} / will "101010001011111111001000101111...." Convert to byte [] / return byte [] huffmanCodeBytes length int len; if (stringBuilder.length ()% 8 = = 0) {len = stringBuilder.length () / 8;} else {len = stringBuilder.length () / 8 + 1 } / / create byte [] array byte [] huffmanCodeBytes = new byte [len]; int index = 0; for (int I = 0; I) after storage compression
< stringBuilder.length(); i += 8) { String strByte; if (i + 8 >StringBuilder.length () {strByte = stringBuilder.substring (I);} else {strByte = stringBuilder.substring (I, I + 8);} / / convert strByte into a byte and put it into huffmanCodeBytes huffmanCodeBytes [index] = (byte) Integer.parseInt (strByte, 2); index++;} return huffmanCodeBytes } / / generate Huffman coding table corresponding to Huffman tree / / idea: / / 1. The Huffman coding table is stored in Map in the form of 32-> 01 < 97-> 100. Static Map huffmanCodes = new HashMap () / / 2. When generating the Huffman coding table, you need to splice the path and define a path where StringBuilder stores a leaf node static StringBuilder stringBuilder = new StringBuilder () / * get the Huffman coding of all the leaves of the incoming node node, and put it into the huffmanCodes collection * * @ param node incoming node * @ param code path: the left child node is 0 The right child node is 1 * @ param stringBuilder for splicing path * / public static void getCodes (Node node, String code, StringBuilder stringBuilder) {StringBuilder stringBuilder2 = new StringBuilder (stringBuilder) StringBuilder2.append (code); if (node! = null) {/ / determine whether the current node is a leaf node or a non-leaf node if (node.data = = null) {/ / non-leaf node / / left recursive processing getCodes (node.left, "0", stringBuilder2) / / right recursive processing getCodes (node.right, "1", stringBuilder2);} else {/ / leaf node huffmanCodes.put (node.data, stringBuilder2.toString ()) } / / preorder traversal public static void preOrder (Node root) {if (root! = null) {root.preOrder ();} else {System.out.println ("Huffman tree cannot be empty ~") }} / * convert byte array to node collection * * @ param bytes byte array * @ return * / public static List getNodes (byte [] bytes) {ArrayList nodes = new ArrayList (); / / store the number of occurrences per byte Map counts = new HashMap () For (byte b: bytes) {counts.merge (b, 1, (a, b1)-> a + b1);} / / convert each key-value pair into a node object and add it to the nodes collection counts.forEach ((k, v)-> nodes.add (new Node (k, v)); return nodes } / * * generate Huffman tree * @ param nodes incoming node * @ return * / public static Node createHufffmanTree (List nodes) {while (nodes.size () > 1) {/ / sort, from small to large Collections.sort (nodes) / / (1) take out the node with the lowest weight (binary tree) Node leftNode = nodes.get (0); / / (2) take out the node with the second smallest weight (binary tree) Node rightNode = nodes.get (1); / / (3) build a new binary tree Node parent = new Node (null, leftNode.weight + rightNode.weight) Parent.left = leftNode; parent.right = rightNode; / / (4) delete the processed binary tree nodes.remove (leftNode) from ArrayList; nodes.remove (rightNode); / / (5) add parent to nodes nodes.add (parent) The last one of the / / nodes is the root node return nodes.get (0) of the Huffman tree;} / / create a Node with data and weights class Node implements Comparable {/ / store the data itself, such as' axiom = >'97 'Byte data;' = >'32 'Byte data; / / weight, indicating the number of character occurrences int weight; Node left; Node right Public Node (Byte data, int weight) {this.data = data; this.weight = weight;} public void preOrder () {System.out.println (this); if (this.left! = null) {this.left.preOrder ();} if (this.right! = null) {this.right.preOrder () } @ Override public int compareTo (Node o) {/ / sort return this.weight-o.weight;} @ Override public String toString () {return "Node {" + "data=" + data + ", weight=" + weight +'}' }} points for attention of Huffman compressed file
If the file itself has been compressed, then the use of Huffman coding re-compression efficiency will not change significantly, such as video, ppt and other files.
Huffman coding is processed in bytes, so it can handle all files (binary files, text files).
If the contents of a file, there is not much duplicate data, the compression effect will not be obvious.
This is the end of "what is Huffman Code". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.