How to understand the ANSI code of linux 04/16 Update SLTechnology News&Howtos

How to understand the ANSI code of linux

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to understand the ANSI code of linux". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to understand the ANSI code of linux".

Create a text file text.txt with Notepad++. The default encoding format is ANSI (at first glance, it is thought to be ASCII), and the input Chinese characters are not garbled:

Save as test.txt and send it to your colleague Bob in the United States. He also uses Notepad++, unfortunately, only to find that your file reads like this:

Perhaps you will think: you use the Chinese system, can display Chinese normally; he uses the English system, can not display Chinese!

It seems reasonable to think so.

But think about it again: a system displays garbled code, indicating that it does not support this encoding format (or decoded incorrectly). Doesn't the English system support ANSI? Is ANSI a Chinese code?

If you have a Korean system around you, and you also install a Notepad++, default or ANSI code, you can enter "encoding codes" and find that it will display normally:

But if you want to type "Chinese characters", you may find it is garbled.

Through this counterexample, it can be shown that ANSI is not a Chinese code. So, what exactly is the ANSI code?

Open the test.txt file with the content of "Chinese characters" with a hexadecimal editor:

You will find that baba and d7d6 happen to be the GBK code values of the words "Han" and "Zi".

Similarly, use a hexadecimal editor to open the test.txt file with the content "thanks thanks":

You will find that c7d1, b1b9, and beee happen to be the EUC-KR code values of "characters", "characters" and "characters".

From this we can see: in fact, ANSI is not a specific character encoding, but in different systems, ANSI represents different encodings. Your American colleague Bob's system ANSI code is actually ASCII code (ASCII code can not represent Chinese characters, so Chinese characters are garbled), while your system ("Chinese character" normal display) ANSI code is actually GBK code, and Korean system ("Chinese character" normal display) ANSI code is actually EUC-KR code.

It is said that computers are made in the United States, and they think that one byte (which can represent 256 codes) is more than enough to represent all the letters, numbers and common special symbols in the English-speaking world (in fact, ASCII only uses the first 127codes).

Later, when the Europeans quit, the French said: I need to add consonants to the lowercase letters (e.g., é), and the Germans said: I also want to add a few letters (ä, ö, Ü ü, ²). As a result, Europeans used the unused ASCII codes (128255) as their own symbol codes (later called the "extended character set").

By the time we Chinese start to use computers, Nima, 256 codes will be enough. In our great China, we can get at least more than 10,000 Chinese characters, and even primary school students have to master two or three thousand words. GB finally decided: one byte is not enough, then we will use multiple bytes to encode Chinese characters, but the national conditions are so poor, bytes are so expensive, three bytes can not afford, then use two bytes, first encode thousands of commonly used Chinese characters, and then when the country is strong and the people are rich, we will expand again-- so GB2312 came into being.

The Taiwan compatriots saw that Nima was all simplified characters, and that we were not allowed to write traditional characters, so the Taiwan compatriots themselves came up with a traditional character code-Big five (Big-5). At the same time, other countries are also coding their own words. In the end, Microsoft suffered: the customer is God, and I have to satisfy all your codes.

In this way, the system sold to the United States will use ASCII coding by default, the system sold to the Chinese will default to GBK coding, and the system sold to South Korea will default to EUC-KR coding, but in order to avoid your misunderstanding that the system functions I sold to you are different, I will uniformly display your default codes as ANSI. -this story is pure fiction, but "ANSI coding" does exist only in Windows systems.

So how does the Windows system distinguish between the real code behind ANSI?

Microsoft uses a value called "Windows code pages" (the value of the current code page can be checked by executing the chcp command from the command line) to determine the default encoding of the system, for example, the code page value in simplified Chinese is 936 (it represents GBK coding, win95 before GB2312, for details: Microsoft Windows' Code Page 936), and the code page value in traditional Chinese is 950 (representing Big-5 coding).

Can we change the "ANSI code" by changing the value of Windows code pages?

Command prompt, we can use the chcp command to modify the current terminal active code page, for example: (1) execution: chcp 437 chcp code page to 437, the current terminal default code for ASCII coding (Chinese characters become garbled); (2) execution: chcp 93jigging code page to 936, the current terminal default code for GBK coding (Chinese characters can be displayed normally). The above operation only works on the current terminal and does not affect the system's default "ANSI encoding". (change the command line default codepage see: how to set the codepage of cmd.)

Code page under Windows is set according to the current system region (locale). To modify the default "ANSI code" of the system, we can modify the system region ("Control Panel" = > "clock, language and region" = > "region and language" = > "manage" = > "change the system locale."):

The system locale in the figure is in simplified Chinese, which means that the current "ANSI code" is actually GBK code. When you change it to Korean (Korea), "ANSI code" is actually EUC-KR code, and "ANSI coding" can be displayed normally; when you change it to English (US), "ANSI code" is actually ASCII code, and "Chinese characters" and "Chinese characters" are garbled. The system needs to be rebooted after the change. )

Note: locale is an important concept in internationalization and localization, which is not explained in depth in this article.

What you said above is the case of windows, isn't it? where is Linux?

Copy the aforementioned "Chinese character" file test.txt to Linux and open it with Emacs:

It's also garbled! The reason is also a problem with locale:

Change the locale before opening it:

It shows up normally.

Thank you for reading, the above is the content of "how to understand the ANSI code of linux". After the study of this article, I believe you have a deeper understanding of how to understand the ANSI code of linux, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.