In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to use java to do emoji expression". The content in the article is simple and clear, easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use java to do emoji expression".
Just at that time, I was free, and this function seemed relatively simple and ready to be realized.
But when it is really realized, it is not that simple.
I first tried to store an emoji emoji in the database:
Sure enough, something went wrong, and the reason for this exception is that emoji cannot be stored in the current encoding supported by the database, so what exactly is emoji expression?
In essence, the information stored by the computer is binary 01 and emoji is no exception, as long as it is stored and read (codec) in the same way, then this information can be accurately displayed.
More codec content will be introduced later, here first think about how to quickly solve the problem.
Storage emoji
Although there are several ways to store emoji in MySQL, for example, you can upgrade the stored character set to hold emoji, but this requires MySQL version support.
So a safer way is to solve it at the application level, such as whether we can store emoji as a string and format it as an emoji emoji when displayed, so that it is compatible with all database versions.
So our requirement here is that an emoji expression is converted to a string, and at the same time the string is converted to emoji.
For this reason, I found a library on GitHub that can easily convert an emoji to an alias of a string, as well as support the conversion of this alias to emoji.
Https://github.com/vdurmont/emoji-java
Test public void emoji () throws Exception {String str = "An: grinning:awesome: smiley:string 😄 with a few: wink:emojis!"; String result = EmojiParser.parseToUnicode (str); System.out.println (result); result = EmojiParser.parseToAliases (str); System.out.println (result);}
So based on this basic library, the expression function is finally realized.
In fact, it essentially maintains a mapping between an emoji alias and its Unicode encoding (essentially UTF-16), and then translates from this table every time the data is formatted.
Review of coding knowledge
Since then, the demand has been completed, but there are still several problems to be solved.
How is emoji stored in Java?
How is emoji encoded?
ASCII
Before talking about emoji, it is very necessary to know the ASCII code of the originator of computer coding.
It is now known that data stored inside a computer is essentially binary 0 ∧ 1, with 8 bits for a byte; each bit can represent two states, that is, 0 or 1, so that one byte can represent 256 (2 bit 8) different states.
For the United States, the English they use every day requires only 26 English letters and some punctuation marks are enough to communicate with computers.
So in the 1960s, a set of mapping relationship between binary and English characters was defined, which can show 128 different English characters, which is now the ASCII code.
In this way we can use one byte to represent modern English, which looks very good.
Unicode
With the development of computers, it is becoming more and more popular in Europe and Asia; it is obviously not possible to use this set of ASCII codes for information exchange, many areas do not use English at all, and it is far more than 128bit characters (let alone Chinese).
Although a byte only uses 128bits in ASCII code, the rest is still not enough to describe other languages.
At this time, if there can be a character set that contains all the text in the world, the text of each region has a unique binary representation in this character set, so that there will be no garbled problem.
Unicode is here to do this, so far Unicode has included 10W + characters, all the characters you can use are included.
UTF-8
Although Unicode contains almost all the text, it seems to be seldom seen in our daily use, and we use more coding rules like UTF-8.
There are several reasons for this, for example, apart from English, most other words need to be represented in 2 or more bytes; if uniform is expressed in Unicode, it must be based on the length of the character that takes up the most bytes.
For example, Chinese characters need 2 bytes to represent, while English only needs one byte; at this time, it is necessary to specify 2 bytes to represent one character, otherwise Chinese characters cannot be represented.
But this also brings a problem: using two bytes to represent English makes the first byte completely wasteful, and if a piece of information is all in English, it is a huge waste of memory.
At this time, you should all think that we need a variable length character encoding rule, when it is English, we will use a byte to represent it, and it can even be fully compatible with ASCII codes.
UTF-8 implements this requirement, using two rules to represent one-byte and multi-byte characters.
The general rules are as follows:
When the first bit of the first byte is 0, it is represented as a single-byte character, which is consistent with the ASCII code and is fully compatible.
When the first byte is 1, several 1s represent several bytes of Unicode characters.
In this way, storage space can be saved as much as possible according to the length of characters.
Of course, there are other coding rules, such as UTF-16 and UTF-32, which are not usually used, but in essence, like UTF-8, they are different implementations of Unicode and are used to represent the character set of most of the world's text.
Emoji in Java
Now let's go back to this topic, emoji.
I just mentioned that Unicode contains most of the characters in the world, and emoji is no exception.
Https://apps.timwhitlock.info/emoji/tables/unicode
This table contains all the emoji and its corresponding Unicode codes, as well as the corresponding UTF-8 codes.
You can also see from the figure that emoji emoji takes up 4 bytes when represented in UTF-8, so how is it stored in Java?
It's very simple. Debug will know it at once.
In Java, emoji is also stored through char, and char as a basic data type takes up 2 bytes. As you can see from the figure just now, emoji uses UTF-8 to take up four bytes, so it is obvious that char cannot be stored, so here it is actually using UTF-16 encoding for storage.
Based on this principle, we can also convert an emoji expression into a string, and we can also convert a string into emoji by ourselves.
Thank you for your reading, the above is the content of "how to use java to do emoji expression". After the study of this article, I believe you have a deeper understanding of how to use java to do emoji expression, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.