Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the problem that uft8 bom causes display on the page after php reads the csv file

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces how to solve the problems caused by uft8 bom on the page after php reads the csv file. The article is very detailed and has certain reference value. Interested friends must finish reading it!

Date.csv:

"ID"NAME"EMAIL"

"1"Xiaoming"xm@163.com"

"2"Xiao Dong"xd@sina.com"

"3", "Xiao Xiao" and "shaozi@hotmai.com"

Read this csv file

The copy code is as follows:

When it is displayed on the page after reading, it looks like this:

"ID" NAME EMAIL

1 Xiao Ming xm@163.com

2 Xiao Dong xd@sina.com

3 Xiao Xiao shaozi@hotmai.com

The field wrapping character of the fgetcsv function defaults to double quotation marks

Why is it that when I read it, all the other fields are fine, but the ID is wrapped in double quotation marks?

I checked on the Internet and found that the bom encoded by utf8 could not be recognized under php.

The following is the information found:

There is a BOM concept in the Unicode specification. BOM--Byte Order Mark is the byte order mark. In

Here

Find a note about BOM:

There is a character called "ZERO WIDTH NO-BREAK SPACE" in the UCS code, which is encoded as FEFF. FFFE is a character that does not exist in UCS, so it should not appear in the actual transmission. The UCS specification recommends that we transfer the character "ZERO WIDTH NO-BREAK SPACE" before transferring the byte stream. Thus, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. So the character "ZERO WIDTH NO-BREAK SPACE" is also called BOM.

UTF-8 does not need BOM to indicate byte order, but BOM can be used to indicate how it is encoded. The UTF-8 encoding of the character "ZERO WIDTH NO-BREAK SPACE" is EF BB BF. So if the receiver receives a byte stream that starts with EF BB BF, it knows that this is UTF-8 coding.

Windows uses BOM to mark the encoding of text files.

In addition, the unicode website

FAQ-BOM

BOM is introduced in detail. The official natural authority, which is only in English, seems laborious.

BOM accounts for three bytes of UTF-8-encoded files. If you use notepad to save a text file as UTF-8 encoding, open the file with UE and switch to hexadecimal editing state to see the beginning of the FFFE. This is a good way to identify UTF-8-encoded files. Software uses BOM to identify whether the file is UTF-8-encoded. Many software also requires that files read in must carry BOM. However, there are still a lot of software that does not recognize BOM. When I was studying Firefox, I knew that in early versions of Firefox, extensions could not have BOM, but versions after Firefox 1.5 have begun to support BOM. Now it turns out that PHP doesn't support BOM either.

PHP was not designed with BOM in mind, which means that he will not ignore the three characters at the beginning of the BOM in the UTF-8-encoded file. Because you have to convert-> UTF-8 to ASCII, or select ASCII encoding in Save as. If it is a line end character in DOS format, you can open it in notepad, click Save as, and choose ASCII encoding. If it contains Chinese characters, you can use the save as function of UE and select "UTF-8 without BOM". Please refer to the following picture:

According to Bo-Blog 's wiki, Editplus needs to be saved as gb and then as UTF-8. Be careful, however, that all characters not included in the GBK encoding will be lost. Don't use this method if there are some non-Chinese characters in the file. (from this small point of view, UE--UltraEdite-32 is really much better than Editplus. Editplus is too lightweight.)

In addition, I found a way to use the file editor provided by Wordpress. This method is unlimited, there is no need to download a special editor, after all, everyone is using Wordpress. First open the write permission of the file to be edited in ftp, then go to the Wordpress background-> Management-> File Editor, enter the path of the file to be edited, and click to edit the file. You can't see the first three characters in the editing interface that appears, but it doesn't matter, position the cursor in front of the first character of the entire file and press Backspace. OK, click to update the file, refresh it in ftp, you can see that the file is 3 bytes smaller, and it is done.

The above is all the contents of this article entitled "how to solve the problem caused by uft8 bom after php reads the csv file". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report