In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces the relevant knowledge of what the RPC message protocol design principle is, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe you will gain something after reading this article on the design principle of RPC message protocol. Let's take a look at it.
Message boundary
RPC requires multiple messages to be delivered on a single TCP link. There must be a clear segmentation rule between two consecutive messages so that the receiver can split the message, where the receiver can receive the request from the RPC server or the response from the RPC client.
If a single message based on a TCP link is too large, it will be split into multiple packets by the network protocol stack for transmission. If the message is too small, the network protocol stack may combine multiple messages into a single packet to send. For the receiver, all it sees is a string of byte arrays. If there are no clear message boundary rules, the receiver has no way to know whether the string of byte arrays contains multiple messages or just part of a message.
The two commonly used segmentation methods are special separator method and length prefix method.
The message sender appends a special separator to the end of each message and ensures that the data in the middle of the message cannot contain a special separator. For example, the most common separator is. When the receiver traverses the byte array, it can immediately conclude that the previous byte array is a complete message and can be passed to the upper logic to continue processing. Separators are widely used in HTTP and Redis protocols. This kind of message generally requires that the content of the message body is a text message.
The message sender adds a 4-byte integer value at the beginning of each message to mark the length of the message body. In this way, the message receiver can first read the length information, and then read the byte array of the corresponding length to separate a complete message. This kind of message is more commonly used in binary messages.
The advantage of the special separator-based method is that the message is readable and the text content of the message can be seen directly. The disadvantage is that it is not suitable to transmit binary messages, because two consecutive bytes can easily appear in the binary byte array, which happens to be the ascii value of the separator. If it needs to be delivered, the binary is generally base64-encoded into a plain text message and then transmitted.
The advantages and disadvantages of the length prefix method are exactly the opposite of the special separator method. The length prefix method is not readable because it is suitable for binary protocols. However, there are no special restrictions on the content itself, both text and content can be transmitted, and no special processing is needed. The Content-Length header of the HTTP protocol is used to mark the length of the message body, which can also be seen as an application of the length prefix method.
HTTP protocol is a hybrid protocol based on special separator and length prefix. For example, the header of HTTP uses plain text plus a separator, while the body determines the length by the value of Content-Type in the header. Although HTTP protocol is called text transfer protocol, it can also transmit binary data in the message body, such as audio and video images, so HTTP protocol is called "hypertext" transmission protocol.
The structure of the message
Each message has its semantic structure information, some message protocol structure information is explicit, and some are implicit. For example, json message, its structure can be directly reflected through its content, so it is an explicit structure of the message protocol.
The readability of json, an intuitive messaging protocol, is great, but its shortcomings are also obvious, because there is too much redundant information. For example, each string is delimited by double quotes, key/value must be separated by colons, objects must be separated by curly braces, and so on. These are just small heads of redundancy, and the biggest redundancy is that consecutive multiple json messages need to send the same key string information even if the structure is exactly the same, but only the value of value is different.
The structure of the message can be reused on the same message channel. For example, at the beginning of establishing the link, the RPC client and the server first communicate and negotiate the structure of the message. When sending the message later, you only need to send a series of value values of the message, and the receiver will automatically associate the value value with the key of the corresponding location to form a completed structured message. Avro message protocol, which is widely used in Hadoop system, is implemented in this way. The structure of the message is exchanged where the RPC link is established, and the subsequent message delivery can save a lot of traffic.
The implicit structure of the message generally refers to the message protocol in which the structural information is prescribed by the code. In the message data of RPC interaction, it is only pure binary data, and the code determines which field the binary of the corresponding position belongs to. For example, the following code
If you simply look at the message content, it is impossible to know the meaning of which bytes in the node message content, its message structure is determined by the structural order of the code. The advantage of this implicit message is that it saves transport traffic and does not need to transmit structural information at all.
Message compression
If the content of the message is too large, it is necessary to consider compressing the message, which can reduce the pressure on network bandwidth. But this will also increase the burden on CPU, because the compression algorithm is a computing-intensive operation of CPU, which will increase the load on the operating system. Therefore, whether to compress the message in the end must be weighed according to the business situation.
If compression is determined, be sure to pick the underlying algorithm libraries implemented in C when choosing the compression algorithm package, because the Python bytecode is too slow to execute. The more popular message compression algorithm is Google's snappy algorithm, its performance is very good, although the compression ratio is not optimal, but the gap from the optimal is not very large. Ali's SOFA RPC uses snappy as the protocol layer compression algorithm.
Ultimate optimization of flow rate
Open source popular RPC messaging protocols often optimize message traffic to the extreme, and they impress users and attract users to use them in this way. For example, for a shaping number, 4 bytes are generally used to represent an integer value.
However, after research, it is found that most of the integer values used in message delivery are very small non-negative integers, so it would be wasteful to use 4 bytes to represent an integer. So we invented a type called variable length integer varint. The value is very small and only needs to be stored in one byte, a slightly larger value can use 2 bytes, and a larger value can use 3 bytes, and it can also be more than 4 bytes to represent long shaping numbers.
The principle is also very simple, that is, to keep the highest bit of bit of each byte to identify whether there are any more bytes, 1 means that there are still bytes to read, and 0 means that the reading ends when the current byte is read.
What if it's negative? The hexadecimal number of-1 is 0xFFFFFFFF. If you want to follow this code, it will take 6 bytes to save it. -1 is also a very common integer.
So the zigzag code came to solve the problem of negative numbers. Zigzag encoding maps a range of integers to a range of natural numbers one by one, followed by varint encoding.
Zigzag encodes negative numbers as positive odd numbers and positive numbers as even numbers. When decoding, an even number is directly divided into 2 is the original value, and when an odd number is encountered, add 1 divided by 2 and then take a negative value.
This is the end of the article on "what is the principle of RPC message protocol design?" Thank you for reading! I believe you all have a certain understanding of "what is the principle of RPC message protocol design". If you want to learn more, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.