Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The method of saving the special characters of Wechat nicknames by mysql

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

I'm saving WeChat nicknames with mysql. When inserting nickname data, an error is reported. The following work was done:

I. Introduction

MySQL added this utf8mb4 encoding after 5.5.3, mb4 means most bytes 4, specifically for compatibility with four-byte unicode. Fortunately, utf8mb4 is a superset of utf8, and no conversion is required except for changing the encoding to utf8mb4. Of course, in order to save space, utf8 is usually enough.

II. Description of content

it says that since utf8 can store most Chinese characters, why use utf8 mb4? The maximum character length of utf8 encoding supported by mysql is 3 bytes. If you encounter a 4-byte wide character, you will insert an exception. The maximum Unicode character that can be encoded with three bytes of UTF-8 is 0xffff, which is the basic multilingual plane (BMP) in Unicode. That is, any Unicode character that is not in the basic multitext plane cannot be stored using Mysql's utf8 character set. This includes emoji (Emoji is a special Unicode encoding commonly found on iOS and Android phones), and many less commonly used Chinese characters, as well as any new Unicode characters and so on.

III. Root causes of problems

The original UTF-8 format used one to six bytes and could encode up to 31-bit characters. The latest UTF-8 specification uses only one to four bytes and can encode up to 21 bits, just enough to represent all 17 Unicode planes.

utf8 is a character set in Mysql that only supports UTF-8 characters up to three bytes long, which is the basic multitext plane in Unicode.

Why does utf8 in Mysql only support UTF-8 characters with a maximum length of three bytes? I thought about it, maybe because Mysql was just starting to develop, Unicode did not have an auxiliary plane. At the time, the Unicode committee was dreaming of "65535 characters enough for the world." String length in Mysql counts as characters rather than bytes, and for CHAR data types, you need to reserve enough length for strings. When using the utf8 character set, the length that needs to be reserved is the longest character length of utf8 multiplied by the string length, so the maximum length of utf8 is naturally limited to 3, for example, CHAR(100) Mysql will retain 300 bytes. As for why later versions don't support 4-byte UTF-8 characters, I think one reason is backward compatibility, and the other is that characters outside the basic multilingual plane are rarely used.

To save UTF-8 characters of 4 bytes in Mysql, you need to use the utf8mb 4 character set, but only after version 5.5.3 (see version: select version();). I feel that for better compatibility you should always use utf8mb4 instead of utf8. For CHAR type data, utf8mb4 will consume some more space, according to Mysql official recommendations, use VARCHAR instead of CHAR.

How to do it specifically:

1. In MYSQL database, modify/usr/local/mysql/my.cnf to:

[client] #password = your_password port = 3306 socket = /usr/local/mysql/data/mysql.sockdefault-character-set=utf8mb4 # Here follows entries for some specific programs # The MySQL server [mysqld] port = 3306 socket = /usr/local/mysql/data/mysql.sockcharacter-set-server=utf8mb4 collation-server=utf8_general_ci #no-auto-rehash datadir =/usr/local/mysql/data skip-external-locking key_buffer_size = 16K max_allowed_packet = 1M table_open_cache = 4 sort_buffer_size = 64K read_buffer_size = 256K read_rnd_buffer_size = 256K net_buffer_length = 2K thread_stack = 128K log_error=/usr/local/mysql/data/mysql-error.log [mysql] no-auto-rehash socket = /usr/local/mysql/data/mysql.sockdefault-character-set=utf8mb4

Some people on the Internet said that it should be revised as follows:

[mysqld]character-set-client-handshake=FALSEcharacter-set-server=utf8mb4collation-server=utf8mb4_unicode_ci init-connect='SET NAMES utf8mb4'

Mine didn't, so I didn't modify it.

II. Modify the character set of the table column

For example: alter table users change nickname nickname varchar(50) character set utf8mb4 collate utf8mb4_unicode_ci ;

3. Modify the connection string

JDBC connection string problem, some projects in the jdbc connection string specified encoding, for example: jdbc:mysql://localhost/mydb? characterEncoding=UTF-8

summary

The above is the method of mysql saving WeChat nickname special characters introduced by Xiaobian to you. I hope it will help you. If you have any questions, please leave a message to me. Xiaobian will reply to you in time. Thank you very much for your support!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report