In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "the introduction of several coding formats of python and the methods of setting the coding format". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. ASCII code
Computer storage data is stored in 0,1, in order to store English characters, etc., so there is an ASCII coding table, through this table is the corresponding English conversion corresponding to the corresponding 0,1 data stored in the computer, but English only has 26 letters, Chinese has more than 60,000 Chinese characters, ASCII coding is not enough, so unicode, utf-8 and other codes appear according to the need. In fact, it can be understood that they correspond the global text code to the 0,1 of the computer to store and identify.
ASCII codes use a specified combination of 7-bit or 8-bit binary numbers to represent 128 or 256 possible characters. Standard ASCII codes, also known as base ASCII codes, use 7-digit binary numbers to represent all uppercase and lowercase letters, numbers 0 to 9, punctuation, and special control characters used in American English. Where:
31,127 (33 in total) are control characters or communication special characters (the rest are displayable characters), such as control characters: LF (line feed), CR (carriage return), FF (page feed), DEL (delete), BS (backspace), BEL (ring), etc. Special characters for communication: SOH (header), EOT (tail), ACK (confirmation), etc.
ASCII values of 8, 9, 10, and 13 are converted to backspace, tabulation, line feeds, and carriage return characters, respectively. They do not have a specific graphical display, but they have different effects on text display depending on the application.
32 '126 (95 total) are characters (32sp is a space), where 48' 57 is 0 to 9 ten Arabic numerals
65'90 is 26 uppercase letters, 97 '122 is 26 lowercase letters, the rest are punctuation marks, arithmetic symbols, etc.
Unicode
Unicode is produced to solve the limitations of traditional character coding. Most of the text systems in the world are coded and sorted out, so that the computer can process and display the text more conveniently. Unicode uses 16-bit encoding space, each character occupies 2 bytes. The implementation of Unicode is called Unicode conversion format.
The Unicode code extends from the ASCII character set. In strict ASCII, each character is represented by 7 bits, or each character commonly used on computers is 8 bits wide. Unicode, on the other hand, uses a full 16-bit character set. This enables Unicode to represent characters, hieroglyphs and other symbols that may be used for computer communication in all the written languages in the world.
Different coding methods can cause garbled problems, and Unicode includes all the symbols in the world. Each symbol is given a code, which solves the problem of garbled code. The current scale of Unicode can hold more than 1 million symbols, and the coding of each symbol is different. For example, U+4E0A represents, U+4E0B represents, the specific symbol corresponding table can be viewed: http://www.chi2ko.com/tool/CJK.htm
UTF-8
The full name of UTF (Unicode Transformation Format), so it is a coding format for the Unicode mentioned earlier, the common format is UTF-8, as well as UTF-16, UTF-32.
8 of UTF-8 represents 8 bit, that is, every 8 bits in Unicode represents a character, UTF-16 and UTF-32 are similar, because Unicode is only 21 bits at most, and 32 bits is greater than 21 bits, so the format of UTF-32 can represent the corresponding Unicode code of all characters, but 32 bits is 4 bytes, so each character takes up 4 bytes of space, so there are UTF-8 and UTF-16.
The UTF-8 coding rules are as follows:
UnicodebitUTF-8byte0x0000-0x007f0-70XXX XXXX10x0080-0x07ff8-11110X XXXX 10XX XXXX20x0800-0xffff12-161110 XXXX 10XX XXXX 10XX XXXX30x1 0000-0x1f ffff17-211111 0XXX 10XX XXXX 10XX XXXX1 0XX XXXX4
If there are less than 8 bits in each byte, the high bit (left) is filled with 0 first, such as 0XXXX XXXX
For a UTF-8 represented by more than two bytes, two 1s and a 0 are added to the first byte, and 10 is added to the next byte
3 bytes and 4 bytes are the same, add a few 1s plus a 0 for a few byte high bits, and add 10 for the remaining bytes.
Set the encoding format
The default encoding format in Python is ASCII format. Chinese characters cannot be printed correctly when the encoding format is not modified, so an error will be reported when reading Chinese. The solution is to add #-*-coding: UTF-8-*-or # coding=utf-8 to the beginning of the file.
We should remember that when writing python programs, we generally use the utf-8 encoding format to store the encoding format, and the web page also declares utf-8. Utf-8 is a universal coding format that can be used in Chinese, English, Japanese and other global characters.
To set the encoding format in Pycharm: File-> setting-> File Encodings
This is the end of the introduction of several coding formats of python and the methods of setting the coding format. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.