Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the problem of Chinese truncation in php using iconv

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly shows you "how to solve the problem of Chinese truncation in php using iconv". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "how to solve the problem of Chinese truncation in php using iconv".

The specific analysis is as follows:

Today, I made a collection program, the principle is very simple, use curl method to get and analyze the html of the other page, and then regularly extract the needed data and save it in the database.

Because the other page is GB2312 encoding, and the local use of UTF-8 encoding. Therefore, it is necessary to carry out coding conversion after acquisition.

The iconv method is used for transcoding

Iconv-the string is converted according to the required character encoding

String iconv (string $in_charset, string $out_charset, string $str)

Converts the string str from in_charset to out_charset.

The method of conversion is very simple, just use the iconv method directly.

After testing several pages, all of them can be collected normally. However, in the subsequent collection, several pages were collected incompletely.

At first, consider whether there is something wrong with the regularity, check and eliminate the problem. After investigation, it is found that the content after iconv transcoding is a long section less than the collected content.

Look at the apache log and see the prompt: Notice: iconv (): Detected an illegal character in input string.

Check the manual and see the following instructions

If you add the string / / TRANSLIT after out_charset, the transliteration function will be enabled. This means that when a character cannot be represented by the target character set, it can be expressed approximately by one or more similar characters.

If you add the string / / IGNORE, characters that cannot be expressed in the target character set will be silently discarded. Otherwise, str truncates from the first invalid character and results in an E_NOTICE.

It turns out that when iconv encounters unrecognized content, it will truncate from the first unrecognized character and generate an E_NOTICE. So the following content is discarded.

Adding / / IGNORE to the output character set discards only the unrecognized content, not truncating and discarding the following content.

Everything is normal after modifying the program.

Tips: when using iconv, if you want to use UTF-8 encoding, use UTF-8 instead of UTF8, because some servers in UTF8 may have problems.

The above is all the contents of this article entitled "how to solve the problem of Chinese truncation in php using iconv". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report