In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
Who would have thought that the "capriciousness" of a student who did not want to take an exam later affected the entire Internet?
In an information theory class in MIT 70 years ago, a teacher put forward a multiple choice question in order to "reduce stress" for his students.
Either take the final exam, or write a paper to improve the existing algorithm, choose your own.
What the teacher, Robert Fano, did not tell his students is that this "existing algorithm" is the Shannon-Fano code, which he co-authored with Shannon, the founder of information theory. In order to improve the algorithm, he has invested a lot of time in research.
(OS: what a surprise.) )
It's a little damaging, but it really works. As soon as they listened to the paper, they didn't have to take an exam and decided to write a paper, including David Huffman.
If you don't choose and don't know, you'll be shocked when you choose. The fledgling Huffman soon realized that the teacher had dug the hole-the paper was too difficult.
After months of writing, and struggling, Huffman still got nothing.
But fate, sometimes it is very wonderful. Just when Huffman finally gave up "skipping the exam" and was ready to throw his paper notes into the dustbin, there was a sudden inspiration! Here comes the answer!
Huffman abandoned the existing research on coding, turned to a new exploration, and finally found a method based on ordered frequency binary tree coding.
The efficiency of his idea successfully surpassed that of his teacher's methodology. Even in the later development, the coding method named after him, Huffman coding, directly changed the data compression paradigm.
As for the concluding report at that time, it has been quoted nearly ten thousand times.
Inefficient traditional coding methods in 1951, Robert Fano, who was teaching at MIT, was thinking about an information theory puzzle:
How to efficiently represent numbers, letters, or other symbols in binary code?
The most common and direct method at that time was to assign a unique binary number to each character.
For example, the letter A may denote 01000001pm! Expressed as 00100001, each eight-digit number corresponds to a character.
This makes the code easy to parse, but extremely inefficient.
There is also an optimization method, similar to Morse code. The commonly used letter E is represented by only one dot, but the uncommon Q requires a longer and more laborious "-".
In this way, the length of the code is different, the information is not easy to understand, and the gap between characters needs to be added in the transmission, otherwise it is impossible to distinguish different character combinations.
Fano realized that perhaps the advantages of these two methods could be combined-representing characters in binary codes of different lengths. Furthermore, in order to avoid code "overlap", he also built a binary tree.
He thoroughly tested the possibility of each arrangement for maximum efficiency, and finally came up with an effective situation:
Each message is divided into two branches according to frequency, and the letters on both sides are used as often as possible.
In this way, the commonly used characters will be on shorter, less dense branches.
In 1948, Shannon, the father of information theory, put forward this method in his article "Mathematical Theory of Communication", which introduced information theory; soon after, Fano published it independently in the form of a technical report. Therefore, this method is called Shannon-Fano coding.
But this method does not always work. When the probability of occurrence of letters is respectively {0.35, 0.17, 0.17, 0.16, 0.15}, the ideal code cannot be given.
Fano believes that there must be a better compression strategy. As a result, such an important task was left to his students.
In a flash of inspiration, a century paper said that Professor Fano's method was to build a character tree from top to bottom and to keep the pairs of branches as symmetrical as possible.
So Huffman's approach directly subverts this process-- building a binary tree from the bottom up.
He believes that no matter what happens, in a valid piece of code, the two least common characters should have the two longest codes.
So first identify the two least common characters, combine them as a branch pair, then repeat the process, and then find the least common character (pair) from the remaining character pairs with the one you just built.
Take schoolroom as an example, O appears four times, S, C, H, L, R, M appear once each.
Fano's approach is to first assign O and another letter to the left branch, so that there are five total usage on both sides, resulting in a total of 27 bits of code.
By contrast, Huffman's method, such as starting with the usual r and m, combines them into a letter pair.
After the combination, the existing characters (pairs) include O (4 times), RM (2 times), and a single letter S, C, H, and L.
Divide by frequency of occurrence, repeat the previous operation-group two unusual options, and then update the number tree and frequency graph.
Eventually, "schoolroom" becomes 11101111110000110110000101, one bit less than the Fano top-down approach.
Although 1 bit is not much here, it is a big savings when expanded to billions of bytes.
In fact, Huffman's method has proved to be so powerful that, according to Google academic statistics, papers have been cited 9570 times that year.
As for his teacher's method, it has hardly been used again.
Until today, almost all lossless compression methods use Huffman's method, which can compress images, audio, tables and so on. It supports everything from the PNG image standard to the ubiquitous software PKZip.
Gartner, a pioneer in modern computer science and winner of the Turing Award, once described Huffman's achievements as follows:
Huffman coding is the basic idea that people have been using in the field of computer science and data communication.
Later, Huffman recalled that "flash of inspiration" moment, when he was about to throw his paper notes into the dustbin when suddenly his thoughts converged and the answer came to mind:
It was the strangest moment of my life.
It suddenly dawned on me like lightning.
He said that if he had known that his professor Fano had struggled with the problem, he might never have tried to solve it, let alone at the age of 25.
Achievement and sense of order, using mathematical art Huffman coding to change the data compression paradigm, but also won it many honors and medals.
For example, Huffman received the Golden Jubilee Award for technological Innovation from the IEEE Information Theory Society in 1998 and the Richard hamming Medal (Richard Hamming Medal) from the Institute of Electrical and Electronic Engineers (IEEE) in 1999.
But even so, in his life, compared to the invention of lossless compression, he is most proud of this doctoral thesis.
Title: The Synthesis of Sequential Switching Circuits.
Huffman published this important paper on sequential switching circuits while studying for his doctoral degree at MIT. At that time, Huffman was almost the first scholar to explain how to design asynchronous sequential switching circuits, and this theory later provided an important logical support for the development of computers.
The publication of this paper not only helped him get Louis E. Levy Medal of the Franklin Institute, but also granted him a detention qualification to teach courses on switching circuits.
While in school, Huffman also proposed an innovative mathematical formula that could convert one binary number sequence into another without losing any information. This research played an important role in cryptography at that time and got him an important position.
William O. Baker, then vice president of research at Bell Labs, included his recruitment on a review committee that reviewed future technology plans for the National Security Agency. Dr. Baker has served as a scientific adviser to presidents Eisenhower, Kennedy, Johnson, Nixon and Reagan.
Hoffman, who was a full professor in 1967, chose to leave MIT to join the University of California, Santa Cruz (UCSC). During this period, he led the establishment of the Department of computer Science and participated in the development of academic courses, which laid an important foundation for the development of the Department of computer Science.
Mathematics can be said to be one of Huffman's lifelong pursuits, so that later, when he was engaged in art, he could not do without mathematics.
Since the 1970s, Huffman developed a strong interest in origami, studied mathematics and origami art, made hundreds of origami works, and published papers to analyze the mathematical properties of origami, becoming a pioneer in the field of origami mathematics.
In retrospect, Huffman won numerous honors and awards in his life, but never applied for a patent for any of his inventions.
Finally, to borrow a passage from Huffman himself.
As a scientist and teacher, I am really very persistent. If I feel that I have not found the simplest solution to the problem, I will be very dissatisfied, and this dissatisfaction will continue until I find the best way. To me, this is the essence of a scientist.
Reference link:
[1] https://www.quantamagazine.org/how-lossless-data-compression-works-20230531
[2] https://www.huffmancoding.com/my-uncle/scientific-american
[3] https://www.nytimes.com/1999/10/13/us/d-a-huffman-computer-expert-dies-at-74.html
This article comes from the official account of Wechat: quantum bit (ID:QbitAI), author: Yang Jingshang
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.