Linux this editor vim display utf-8 document garbled how to solve 07/09 Update SLTechnology News&Howtos

Linux this editor vim display utf-8 document garbled how to solve

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "Linux editor vim shows utf-8 document garbled how to solve", in daily operation, I believe that many people in the Linux editor vim show utf-8 document garbled how to solve the problem, editor consulted all kinds of information, sorted out a simple and easy to use method of operation, hope to answer the "Linux editor vim shows utf-8 document garbled how to solve" the doubt is helpful. Next, please follow the editor to study!

1. Introduction to relevant basic knowledge

In Vim, there are four coding-related options: fileencodings, fileencoding, encoding, and termencoding. In practical use, errors in any of the options will lead to garbled code. Therefore, every Vim user should be clear about the meaning of these four options. Next, we will describe in detail the meaning and function of these four options.

(1) encoding

Encoding is the character encoding used internally in Vim. When we set up encoding, all the buffer, registers, strings in the script, etc., within Vim, all use this encoding. When Vim is working, if the encoding method is not consistent with its internal coding, it will first convert the coding to internal coding. If the working encoding contains characters that cannot be converted to internal encoding, those characters will be lost. Therefore, when choosing the internal coding of Vim, be sure to use a code that is strong enough to avoid affecting the normal work.

Because the encoding option involves the internal representation of all characters in Vim, it can only be set once when Vim starts. Modifying encoding in the course of Vim work can cause a lot of problems. It is recommended in the user's manual to change its value only in .vimrc, which in fact seems to make sense only in .vimrc. If there is no particular reason, always set encoding to utf-8. To avoid garbled menus and system prompts on non-UTF-8 systems such as Windows, you can make these settings at the same time:

Set encoding=utf-8

Set langmenu=zh_CN.UTF-8

Language message zh_CN.UTF-8

(2) termencoding

Termencoding is the code that Vim uses for screen display. During display, Vim converts the internal code into screen code and then uses it for output. When the internal encoding contains a character that cannot be converted to screen encoding, the character becomes a question mark, but does not affect the editing of it. If termencoding is not set, use encoding directly without conversion.

For example, when you log in to a Linux workstation through telnet under Windows, you will garble in the Vim under telnet because the telnet in Windows is GBK-coded, while UTF-8 is used in Linux. At this time, there are two ways to eliminate garbled code: one is to change the encoding of Vim to gbk, and the other is to keep encoding as utf-8, change termencoding to gbk, and let Vim transcode when displayed. Obviously, using the former method, if you encounter characters in the edited file that cannot be represented by GBK, those characters will be lost. However, if you use the latter method, although these characters cannot be displayed due to terminal limitations, they will not be lost in the editing process.

For GVim under the graphical interface, its display does not depend on TERM, so termencoding has no meaning to it. In GVim under GTK2, termencoding is always utf-8 and cannot be modified. However, GVim under Windows ignores the existence of termencoding.

(3) fileencoding

When Vim reads a file from disk, it probes the encoding of the file. If the encoding of the file is different from the internal encoding of Vim, Vim will convert the encoding. After the conversion is complete, Vim sets the fileencoding option to the encoding of the file. When Vim is saved, if encoding is not the same as fileencoding, Vim will transcode. Therefore, by opening the file and setting fileencoding, we can convert the file from one encoding to another. However, as you can see from the previous introduction, fileencoding is automatically set by Vim after probing when the file is opened. Therefore, if there is garbled, we cannot correct it by resetting fileencoding after opening the file.

In short, fileencoding is the character encoding of a file currently edited in Vim, and Vim saves the file to this character encoding (whether it's new or not).

(4) fileencodings

The automatic recognition of the code is achieved by setting the fileencodings, note that it is plural. Fileencodings is a comma-separated list, and each item in the list is a coded name. When we open the file, VIM tries to decode it sequentially using the encoding in fileencodings, and if successful, decode it using that encoding and set fileencoding to this value, and if it fails, continue to experiment with the next encoding.

Therefore, when we set up fileencodings, we must put the strict encoding method, which is more prone to decoding failure when the file is not this encoding, in front of us, and put the loose encoding method behind. For example, latin1 is a very loose encoding method, and any text obtained by encoding will not fail to decode with latin1-of course, the result of decoding is naturally "garbled". Therefore, if you put latin1 first in fileencodings, it is only natural that any Chinese file is garbled.

The following is a fileencodings setting recommended online:

Set fileencodings=ucs-bom,utf-8,cp936,gb18030,big5,euc-jp,euc-kr,latin1

Among them, ucs-bom is a very strict encoding, and it is almost impossible for non-encoded files to be misjudged as ucs-bom, so it is put in the first place.

Utf-8 is also quite strict, except for very short documents (for example, the "Unicom" of GBK coding that many people like to talk about is misjudged as a classic mistake of UTF-8 coding), it is almost impossible for ordinary documents to be misjudged in real life, so it is put in the second place.

Then there are cp936 and gb18030, which are relatively loose, and if you put them first, there will be a lot of misjudgments, so keep them back. Cp936 has less coding space than gb18030, so put cp936 in front of gb18030.

As for big5, euc-jp and euc-kr, they are as strict as cp936. Putting them behind will inevitably lead to a lot of misjudgments when editing these encoded files, but this is something that cannot be solved by Vim's built-in coding detection mechanism. Since Chinese users rarely have the opportunity to edit these encoded files, we decided to put cp936 and gb18030 first to ensure the identification of these codes.

Finally, there is latin1. It is such a loose code that we have to put it last. Unfortunately, when you come across a real latin1-encoded file, in most cases, it does not have a chance to fall-back to latin1, and is often misjudged in the previous encoding. However, as mentioned earlier, Chinese users do not have much access to such documents.

If the encoding is misjudged, the decoded result cannot be recognized by humans, so we say that the file is garbled. At this point, if you know the correct encoding of the file, you can use + + enc=encoding to open the file when you open the file, such as:

: e + + enc=utf-8 myfile.txt

At this point, on the "Linux editor vim shows how to solve the utf-8 document garbled" on the end of the study, I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.