Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What codes must be mastered by java architects

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article focuses on "what codes java architects must master". Friends who are interested may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what codes java architects must master.

Coding is often encountered in our daily development process, the common coding formats are ASCII, ISO-8859-1, GB2312, GBK, GB18030, UNICODE, UTF-8, UTF-16 and so on, in which GB2312, GBK, GB18030, UTF-8, UTF-16 can be used to express Chinese.

Why is there a code?

We know that the smallest storage unit in a computer is byte, and the number of characters that can be represented by a byte is limited. 1byte=8bit, a byte can only represent at most 255characters, but there are many languages in the world, there are different characters, which can not be represented by one byte, so the char representation character in java is to solve this coding problem. One char occupies two bytes. So it must be encoded from char to the smallest unit byte.

Common coding

ASCII

The full name is American Standard Code for Information Interchange, the American standard code for information interchange, which is the most common single-byte coding system in the world, mainly used to display modern English and other Western European languages.

The ASCII code is represented by 7 digits and can only represent 128 characters. 0: 31 indicates control characters such as carriage return, backspace, deletion, etc. 32 '126 means that printed characters can be entered through the keyboard and displayed.

Among them, 48'57 is 0 to 9 10 Arabic numerals, 65'90 is 26 capital English letters, 97 '122 is 26 lowercase English letters, the rest are punctuation marks, operation symbols, etc., please refer to the ASCII standard table.

ISO-8859-1

Since ASCII can only represent 128characters, the display can not be fully expressed, so ISO-8859-1 extends the ASCII code and adds the corresponding text symbols of Western European languages, Greek, Thai, Arabic and Hebrew to the ASCII code, which is downwards compatible with ASCII coding.

ISO-8859-1 is also a single-byte encoding, but it is an 8-bit container that can represent 256 characters.

GB2312

The full name is the Chinese character coding character set for information interchange. it was issued in China in 1980 and is mainly used for Chinese character processing in computer systems. GB2312 mainly contains 6763 Chinese characters and 682 symbols.

GB2312 covers most of the usage of Chinese characters, but it can not handle special rare words such as ancient Chinese, so codes such as GBK and GB18030 emerged later.

GBK

GBK, whose full name is Chinese Internal Code Specification, is the code extension specification for Chinese characters, which was formulated in 1995. It mainly extends GB2312 and adds more Chinese characters to it. It contains a total of 21003 Chinese characters.

GBK is backward compatible with GB2312 coding, that is to say, the Chinese characters encoded by GB2312 can be decoded normally with GBK without garbled codes, but the Chinese characters encoded by GBK may not be decoded by GB2312.

GB18030

The internal code extension specification of GB18030 is the latest internal code character set released in 2000 and enforced in 2001. it contains the language characters of most of China's ethnic minorities and contains more than 70000 Chinese characters.

It mainly uses single-byte, double-byte and four-byte character coding. It is downwards compatible with GB2312 and GBK. Although it is a compulsory use standard in our country, it is rarely used in actual production. Instead, GBK and GB2312 are most used.

UNICODE

In order for their own language to be displayed normally in the computer, each country and region has its own code, so no one knows each other's code when there is too much code. at this time, the ISO organization put forward a new code called UNICODE coding to support cultures, characters and symbols all over the world. At the time of formulation, the computer capacity of UNICODE is no longer a problem, so it is designed to have a fixed two bytes, and all characters are represented by 16 bits, including English characters that used to occupy only 8 bits, so it will cause a waste of space. UNICODE has not been promoted and applied for a long time.

UTF-16

UTF-16 is the concrete implementation of UNICODE, 16 bits, UTF-16 is this reason, defines the storage of UNICODE characters in the computer, UTF-16 also uses two bytes to represent any character, which makes the operation string very efficient, which is also an important reason why java uses UTF-16 as a format for character storage in memory.

UTF-16 is suitable for use between disk and memory, and the conversion between characters and bytes is simpler and more efficient, but it is not suitable for transmission over the network, because network transmission may damage the byte stream.

UTF-8

Although UTF-16 is very efficient, it is also the biggest disadvantage of UNICODE, so that all single-byte characters must occupy two bytes, and the storage space is doubled, which obviously consumes resources and does not accord with the current situation of the rapid development of the Internet. So with UTF-8, it is a variable-length character encoding implementation of UNICODE, which can encode UNICODE characters with 1 to 6 fixed-length bytes.

UTF-8 uses single-byte storage for ASCII characters, and damage to a single character does not affect subsequent characters, so UTF-8 is well suited for traditional networking and is one of the most widely used encodings today.

If you want to represent Chinese, the coding efficiency of UTF-8 is higher than GBK and less than UTF-16, so it is also the most ideal coding method besides GBK.

At this point, I believe you have a deeper understanding of "what codes java architects must master". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report