What is the compatibility analysis of Emoji emoji in AndroidJNI? 07/11 Update SLTechnology News&Howtos

What is the compatibility analysis of Emoji emoji in AndroidJNI?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article analyzes "what the compatibility analysis of Emoji emoji is like in AndroidJNI". The content is detailed and easy to understand, "what is the compatibility analysis of Emoji expression in AndroidJNI?" interested friends can follow the editor's train of thought to read it slowly and deeply. I hope it can be helpful to everyone after reading. Let's follow the editor to learn more about the compatibility analysis of Emoji expressions in AndroidJNI.

cause

Recently, we encountered a problem: calculate the MD5 of a string, then encrypt the string and upload it to the server together with MD5. After decryption, the server recalculates the md5 and finds that it is inconsistent with the uploaded MD5. All the strings with problems have Emoji emoji. But I have no problem uploading an emoticon string myself.

Finally confirm that this is a problem with the jstring-> char array below Android 5.1. Here is an example to restore this process.

Event restore

Suppose there is a string sline string s = "\ uD83D\ uDC8B"; that corresponds to the facial expression? By calling the getBytes () method, you will see that the corresponding byte array is [- 16,-97,-110,-117] and the hexadecimal output is [f0,9f, 92,8b].

Define a native method with an argument of String, public native String test (String str); in the corresponding env- Candle + code, get the char array corresponding to the incoming String through env- > GetStringUTFChars, and output each element of the char array in hexadecimal.

On the test machine of Android 7.1.2, the output of the Native layer is [f0,9f, 92,8b], which is the same as the byte array of Java, but on the test machine of Android 4.4.4, the output is [ed, a0, bd, ed, b2,8b]. As a result, the encrypted results are different.

The server receives the data of the old version of Android and decrypts it to get [ed, a0, bd, ed, b2,8b]. Naturally, calculating MD5 cannot be the same as [f0,92,8b] calculating MD5.

Unicode 、 UTF-8 、 UTF-16

Some people may not know exactly where the above two byte arrays come from. First of all, we need to know that UTF-8 and UTF-16 are both implementations of Unicode. \ uD83D\ uDC8B is actually a representation of the big end of UTF-16. For Unicode larger than 0xFFFF (0x10000~0x10FFFF), the steps to convert it to UTF-16 are as follows:

Subtract Unicode from 0x10000, and the result is a value of length 20bit. In the first step, the high 10bit of 20bit is calculated with 0xD800 to get the high agent of UTF-16. The low 10bit of 20bit in the first step is calculated with 0xDC00 to get the low agent of UTF-16. High-order agent + low-order agent is the large-end form of UTF-16 corresponding to Unicode.

Follow this step to deduce:

If the binary bit of\ uD83D\ uDC8B is 1101 1000 0011 1101 1101 1100 1000 1011, the high agent is 1101 1000 0011 1101 and the low agent is 1101 1100 1000 1011. The high proxy is obtained by the operation of high 10bit and 0xD800, so the high 10bit is 00 0011 1101. The low-order proxy is obtained by the operation of low 10bit and 0xDC00, so the low 10bit is 00 1000 1011. The value of all 20bit is 0000 1111 0100 1000 1011. Plus 0x10000, it is 0001 1111 0100 1000 1011, or 0x1F48B.

So, facial expression? The corresponding Unicode is 0x1F48B.

The rule of UTF-8 is that for a symbol that occupies N bytes (N > 1), the first N bit of the first byte is 1, the N + 1 bit of the first byte is 0, and the first 2 bits of the subsequent byte are 10, and then fill the binary bit of Unicode into the vacant binary bit, and fill the vacant position with 0. Therefore, the conversion from Unicode 0x1F48B to UTF-8 above takes up 4 bytes, which are:

11110 00010 01111110 01001010 001011

That is, 0xF09F928B, which is the origin of the byte array [f 0,9f, 92,8b].

So how did [ed, a0, bd, ed, b2,8b] come from this byte array? This is to treat\ uD83D\ uDC8B as two separate characters. According to the above logic from Unicode to UTF-8, Unicode 0xD83D is converted to UTF-8 to 1110 1101 10 100000 10 111101, that is, 0xEDA0BD Unicode 0xDC8B to UTF-8 to 1110 1101 10 110010 10 001011, that is, 0xEDB28B.

This is the end of the analysis on the compatibility of Emoji expressions in AndroidJNI. I hope the above content can improve everyone. If you want to learn more knowledge, please pay more attention to the editor's updates. Thank you for following the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.