Why is the idea output garbled? 07/04 Update SLTechnology News&Howtos

Why is the idea output garbled?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "Why the idea output will garbled". In the daily operation, I believe that many people have doubts about why the idea output will be garbled. The editor consulted all kinds of data and sorted out a simple and easy-to-use method of operation. I hope it will be helpful to answer the question of "Why the idea output will garbled"! Next, please follow the editor to study!

Preface

I believe everyone has encountered garbled code. Today, my girlfriend came up to me in a hurry to find me: "Honey, how did my idea output garbled code?"

I did a good job for him in a meal, but Sanjie is my girlfriend, and her curiosity is the same as mine, whatever I want.

Then why is there garbled?

What is encoding and what is decoding?

What is a character code and what is a character set?

Why Unicode? what's the difference between UTF-8 and GBK?

Three crooked sitting on my lap to me like spoiling a series of questions, I am a favorite fan but more spoiled girlfriend, so I have this article.

Why are there garbled codes? we know that what is stored in the computer will only be a byte stream of 0 and 1, but only numbers can not meet our needs, we also need text processing and so on, but computers only recognize numbers. so we need to tell the computer what numbers represent what characters.

For example, I designated 0000 on behalf of A 0001 on behalf of B, so the computer will know, so if I want to store these two characters in the computer, the actual storage is 0000 0001, which is tantamount to customizing a unique code for each character.

But this is my designation, and different people have different ideas. For example, Xiaoming likes 1000 for An and 1111 for B. Xiaoming's computer is stored according to the coding method specified by him, that is, 1000 1111, and then transmitted to my computer. I got 1000 1111, and according to my code, it may be% &, which is garbled.

So the essence of garbled code is that there is no correspondence between coding and decoding.

Some students may not be familiar with the concepts of encoding and decoding. Let me explain:

Coding: it is actually the process of converting characters into byte streams according to a certain format.

Decoding: parsing the byte stream into characters.

You can see that random coding will result in the situation that their respective computers cannot be parsed correctly, so there needs to be a standard, according to which everyone specifies the correspondence between characters and numbers.

Standard character coding

The American National Standards Institute (ANSI) has developed a standard, the American Standard Code for Information Exchange (ASCII), which specifies the set of commonly used character sets and the corresponding number, for example, 65 represents A.

ASCII is actually 7-bit code, represented by binary code is 0000000mm 1111111, but a byte is 8-bit, so it is usually stored in 8-bit. You can see that ASCII represents 128characters, which is actually an American code. You can see that in the same English-speaking UK, there is no pound mark on the ASCII.

There are also other people's Korean, Japanese and so on, not to mention our Chinese.

1 byte can only represent a maximum of 256characters, so it is not enough for us, so we need to expand, such as GB2312 is the "Chinese character coding character set for information exchange" issued by our State Administration of Standards, and later released GBK, this K means expansion, adding a lot of characters such as traditional characters on the basis of GB2312.

So it means that each country has its own standard, because the language is different, and the difference of each character set makes the communication of documents between computers very difficult, so we start another wave of standardization.

For example, the ANSI organization in the United States has formulated the ANSI standard character coding, which is actually the default coding for the platform. For example, the Chinese operating system uses GBK, and if it is the United States, it uses ASCII, and the operating system will preinstall these standard character sets.

However, this can only solve the case of one character encoding in a document. Suppose there are Japanese, French, German, Russian and Chinese in my document. What do you think?

Unicode

So I got a Unicode, also known as unified code, universal code, single code.

The Unicode character set covers all the characters currently used by humans, and each character is uniformly numbered and assigned a unique character code. You see, someone has to do this kind of thing, otherwise it will not be unified.

Here are a few terms I would like to explain to make everyone clearer.

Characters: actually, it's like the English alphabet, or our Chinese characters are called characters.

Character set: that is the set of characters and numbers

Character code: the number or number corresponding to the character in the character set. For example, in the ASCII character set, the character code of An is 65.

Character coding: according to the mapping relationship between characters and numbers in a character set, it is transformed into a byte stream.

One thing that is different from the previous encoding for Unicode is that it decouples the character set from the coding implementation.

Previous codes such as ASCII coding, GBK coding, etc., their character set and coding implementation are tied, you can understand that the previous coding is actually a look-up table, there is a fixed table to store this character and the corresponding fixed binary, for example, A corresponding number is 65, its binary sequence is 01000001.

But Unicode is different, it separates the character set and the character coding implementation, for example, the corresponding number of An is 65, but the corresponding binary sequence is not necessarily, it depends on the specific character coding, if it is UTF-8 coding, it is 01000001, if it is UTF-16 coding (big end), it is 00000000 01000001.

In fact, this is why we now commonly use UTF-8 instead of UTF-16, we can see that UTF-16 encoding storage efficiency is low, at least two bytes are used, and many functions like C language will parse 0x00 bytes as a string stop, so I made a UTF-8, which uses 1x4 bytes to encode each character, which is longer, I will not say how to encode it, just check it.

At this point, the study on "why the idea output will be garbled" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.