In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
RPC is a convenient network communication programming model. Because of its high combination with programming language, it greatly reduces the complexity of dealing with network data and greatly improves the readability of the code. However, the composition of RPC itself is more complex, due to the constraints of programming language, network model and usage habits, there are a lot of compromises and trade-offs. The purpose of this paper is to provide reference when designing RPC system by analyzing several popular cases of RPC implementation.
As the underlying network development of RPC is generally related to the specific use environment, and the means of programming are also very diversified, but do not affect users, so this paper basically involves how to implement a RPC system.
Recognize RPC (remote invocation)
We have come into contact with the concept of "remote invocation" in a variety of operating systems and programming language ecosystems. Generally speaking, they refer to using a simple line of code to call a program on another computer over the network. For example:
RMI--Remote Method Invoke: call a remote method. A "method" is usually attached to an object, so RMI usually refers to a call to a method function on an object on a remote computer. RPC--Remote Procedure Call: remote procedure call. It refers to a call to a specific piece of function code on another computer on the network.
Remote call itself is a concept of network communication, which is characterized by encapsulating the network communication into a similar function call. In addition to remote calls, network communication generally has several other concepts: packet processing, message queuing, flow filtering, resource pulling and waiting. Let's compare their differences:
Scheme
Programming mode
Information encapsulation
Transmission model
Typical application
Remote call
Call the function, enter the parameters, and get the return value.
Use variables, types, and functions of a programming language
Make a request and get a response
Java RMI
Packet processing
Call Send () / Recv (), use bytecode data, encode and decode, and process content
Construct the communication content into a binary protocol packet
Send / receive
UDP programming
Message queue
Call Put () / Get (), and use the package object to process its contents
Messages are encapsulated into objects or structures that are available to the language
Store a message in a queue; take out a message
ActiveMQ
Stream filtering
Read a stream, or write a stream, and the unit package in the convection is processed immediately
Unified data structure with very small unit length
Connect; send / receive; process
Network video
Resource pull
Enter a resource ID to get the resource content
The request or response contains: header + body
Wait for a response after a request
WWW
According to the characteristics of remote calls-calling functions. The industry has developed similar solutions in a variety of languages, and some of them try to be cross-language. Although remote calls seem to be the easiest to use programmatically, they also have obvious disadvantages. So understanding the advantages and disadvantages of remote invocation is a key issue in deciding whether to develop or use the remote invocation model.
The advantages of remote calls are:
Blocking the network layer. Therefore, we can choose different schemes in terms of transmission protocol and coding protocol. For example, WebService scheme uses HTTP transport protocol + SOAP coding protocol, while REST scheme often uses HTTP+JSON protocol. Facebook's Thrift can even customize any different transport and coding protocols, you can use TCP+Google Protocol Buffer or UDP+JSON. Because of blocking the network layer, you can independently optimize the network part according to the actual needs, without involving the processing code of business logic, which is very valuable for programs that need to run in a variety of network environments. Function mapping protocol. You can write data structures and function definitions directly in programming languages instead of writing a large number of coding protocol formats and subcontracting processing logic. For those systems with very complex business logic, such as online games, you can save a lot of time defining message formats. And the function call model is very easy to learn, there is no need to learn communication protocols and processes, so that inexperienced programmers can easily start using network programming.
Disadvantages of remote calls:
Increased performance consumption. Because the network communication is packaged as a "function", a lot of extra processing is needed. For example, you need to pre-produce code, or use a reflection mechanism. These are all operations that consume extra CPU and memory. And in order to express complex data types, such as longer type string/map/list, these need to add more descriptive information to the packet, which will take up more network packet length. Unnecessary complication. If you are only for certain business requirements, such as transferring a fixed file, then you should use the HTTP/FTP protocol model. For monitoring or IM software, it is faster and more efficient to send and receive messages with simple message coding. If it is to be a proxy server, it will be easy to use streaming processing. In addition, if you want to do data broadcasting, message queuing is easy to do, while remote invocation is almost impossible.
Therefore, the most suitable scenario for remote invocation is: changeable business requirements and changeable network environment.
The core issues of RPC scheme
Because the interface for remote calls is a "function", how to build this "function" gives rise to three problems that require decision-making:
1. How to represent "remote" information
The so-called remote refers to another location on the network, so the network address is the part that must be entered. In the TCP/IP network, the IP address and port number represent an entrance to a running program. So specifying the IP address and port is necessary to initiate a remote call.
However, a program may run many functions and can receive multiple remote calls with different meanings. So how to let the user specify these different meanings of remote call entry, has become another problem. Of course, the simplest thing is one call per port, but an IP supports up to 65535 ports, and other network functions may also require ports, so this solution may not be enough, and a number represents a function is not easy to understand, you must look up the table to understand.
So we have to think of another way. Under the idea of object-oriented, some schemes are proposed: to sum up different functional combinations with different objects, first specify the object, and then specify the method. This idea is very much in line with the way programmers understand it, and EJB is the solution. Once you have determined to use the object model to define the address of a remote call, you need a way to specify the remote object. In order to specify the object, you must be able to transfer some information about the object from the callee (server side) to the caller (client side).
The simplest solution is for the client to enter a string of strings as the "name" of the object and send it to the server to find the object that has registered the "name". If found, the server will use some technology to "transfer" the object to the client, and then the client can call his method. Of course, this kind of transmission can not copy the object data on the whole server to the client, but use some symbols or flags to represent the objects on the server, and then send them to the client.
If you are not using an object-oriented model, then a remote function must be located and transferred, because the function you call must first be found and then become an interface on the client side before it can be called. How to express "remote objects" (including object-oriented objects or just functions) before they can be located on the network, and in what form they can be called by clients after successful positioning, are the first important issues in the design of "remote invocation".
2. How should the interface form of a function be represented?
Due to the constraints of network communication, remote calls often can not fully support all the features of the programming language. For example, the pointer type parameters in C language functions cannot be passed through the network. Therefore, the function definition of remote calls, what features can be used in the language and what features can not be used, need to be specified in the design.
If this rule is too strict, it will affect the ease of use of the user; if it is too broad, it may lead to poor performance of remote calls. How to design a way to describe a function in a programming language as a function called remotely is also a problem to be considered. Many solutions use the general approach of configuration files, while others can add special comments directly to the source code.
Generally speaking, compiled languages such as Champact + can only use the source code generated according to the configuration file program, virtual machine languages such as C#/JAVA can use reflection mechanism combined with the configuration file (set in the source code with special comments instead of the configuration file scheme), if the script language is easier, sometimes even the configuration file is not needed, because the script itself can act. In short, what kind of constraints the remote calling interface should meet is also a problem that needs to be carefully considered.
3. What method is used to realize network communication?
The most important implementation detail of remote calls is about network communication. The question of which communication mode is used to host remote calls is divided into two sub-questions: what kind of service program is used to provide network functions? What kind of communication protocol is used?
The remote calling system can program TCP/IP directly to communicate, or it can delegate some other software, such as Web server, message queuing server, etc. Different network communication frameworks, such as Netty/Mina, can also be used. Communication protocols generally have two layers: one is the transport protocol, such as TCP/UDP or higher-level HTTP, or the self-defined transport protocol; the other is the coding protocol, which is how to serialize and deserialize objects in a programming language into binary byte streams. Popular programs include JSON, Google Protocol Buffer, and so on. Many development languages also have their own serialization schemes, such as JAVA/C#. The above technical details, which should be used, are directly related to the performance and environmental compatibility of the remote calling system.
The above three problems are the core selection that must be considered in the remote calling system. According to the different constraints faced by each scheme, they will make a choice on these three issues in order to adapt to their constraints. But now there is no "universal" or "universal" solution, the reason is: in such a complex system, the more features to take care of, the more costs (ease-of-use cost, performance overhead) will be paid.
Next, we can study the various existing remote calling schemes in the industry to see how they balance and choose in these three aspects.
Example of industry solution 1. CORBA
CORBA is an "old" and ambitious solution that tries to complete cross-language communication while making remote calls, so it is the most complex, but its design ideas are also learned by more other schemes later. In the positioning of communication objects, it uses URL to define a remote object, which is very easy to accept in the Internet age. The content of its object is limited to the C language type and can only pass values, which is also very easy to understand. In order to enable programs in different languages to communicate, it is necessary to independently design a language that is only used to describe remote interfaces outside of various programming languages, which is the so-called IDL:Interface Description Language interface description language.
In this way, you can first define interfaces in a language that is independent of all languages, and then use tools to automatically generate code for various programming languages. This scheme is almost the only choice for compiled languages. CORBA does not have any agreement on communication issues, but leaves it to language-specific implementers, which may be one of the reasons why it is not widely popular.
In fact, CORBA has a very famous successor, which is Facebook's Thrift framework. Thrift is also a remote call scheme that uses IDL to compile and generate multiple languages, and uses C++/JAVA and other languages to complete the communication bearer, so it is a particularly charismatic one in the open source framework. Another feature of Thrfit's communication bearer is that it can combine and use a variety of different transport and coding protocols, such as TCP/UDP/HTTP with JSON/BIN/PB. This allows it to choose almost any network environment.
The model of Thrift is similar to the following figure, where some stub means "pile code", which is a functional program directly used by the client; skeleton means "skeleton code", which requires programmers to write template code that provides remote service functions. Generally, you can fill in the blanks or inherit (expand) the template. This stub-skeleton model is standard for almost all remote invocation scenarios.
2. JAVA RMI
JAVA RMI is a remote calling scheme that comes with the JAVA virtual machine. It can also use URL to locate remote objects and pass parameter values using the serialization coding protocol that comes with JAVA. In terms of interface description, because this is a scheme limited to the JAVA environment, the Interface type of Java language is directly used as the definition language. Users provide remote services by implementing this interface type, and JAVA automatically generates client-side calling code for the caller based on this interface file. His underlying communication is implemented using the TCP protocol. In this case, the Interface file is the IDL of the JAVA language, as well as a skeleton template for developers to fill in the remote service content. The stub code is directly arranged by the virtual machine because of the reflection function of JAVA.
Due to the support of the JAVA virtual machine, this scheme is very easy to use and can easily solve the problem in accordance with the logo JAVA programming method, but it can only run in the JAVA environment, which limits its scope of application. You can't have both fish and bear's paw, and ease of use and applicability often conflict with each other. This is very different from CORBA/Thrift 's pursuit of the maximum range of applicability, and also leads to the difference in ease of use between the two.
3. Windows RPC
The support for RPC in Windows is relatively early and perfect. First it queries the object through GUID, and then uses the C language type to pass the parameter value. Since the API of Windows is mainly in C language, for the RPC function, it is still necessary to use an IDL to describe the interface, and finally generate .h and .c files to produce the stub and skeleton code of RPC. As the communication mechanism is included in the operating system, it is more convenient for users to use kernel LPC mechanism to carry it. But it also limits the ability to make calls between Windows programs.
4. WebService & REST
In the Internet age, programs need to call each other through the Internet. The most popular protocols on the Internet are HTTP protocol and WWW service, so Web Service using HTTP protocol has naturally become the most popular scheme for cross-system calls. Because most of the Internet infrastructure is available, the development and implementation of Web Service is almost without difficulty. In general, it uses URL to locate remote objects, and parameters are passed through a series of predefined types (mainly C basic types) and object serialization. In terms of interface generation, you can parse HTTP directly yourself, or you can use specifications such as WSDL or SOAP. In REST's scheme, only four operation functions are limited to PUT/GET/DELETE/POST, and the others are parameters.
Summing up the above RPC schemes, we find that the industry generally has the following options for the three core issues of remote invocation:
Remote object location: use URL; or name service to find remote invocation parameter passing: use C's basic type definition, or use some kind of subscribed serialization (deserialization) scheme interface definition: use a specific format technology, directly according to the pre-agreed interface definition file Or use some description protocol IDL to generate these interface file communication bearers: there are servers that use specific TCP/UDP, there are also communication models that allow users to develop their own customized communication models, and there are more advanced transport protocol options such as HTTP or message queuing.
After we have identified several feasible options for the remote calling system, it is natural to make clear the advantages and disadvantages of each scheme, so that we can choose the design that is really suitable for the requirements:
1. Description of remote objects: using URL is a popular standard on the Internet, which is easy for users to understand and easy to add to the content that needs to be extended in the future, because URL itself is a string composed of multiple parts. The name service is more old-fashioned, but it still has its advantages, that is, the name service can be equipped with a series of features, such as load balancing, disaster recovery, capacity expansion, custom routing and so on.
two。 Interface description of remote invocation: if it is limited to a certain language, operating system or platform, directly using the interface description of "metaphorical" mode, or marking the source code by means of "annotation" type annotation, it is the most convenient to realize the definition of remote calling interface. However, if you need to be compatible with compiled languages, such as Candlespace compiled languages, be sure to use some kind of IDL to generate the source code for these compiled languages.
3. Communication bearer: customize the communication module for the user, which can provide the best applicability, but it also increases the complexity of the user. However, the HTTP/ message queue is relatively simple in system deployment, operation and maintenance, and programming. The disadvantage is that the customization space for performance and transmission characteristics is relatively small.
After analyzing the core issues, we also need to consider some applicability scenarios:
1. Object-oriented or process-oriented: if we are just thinking about making procedure-oriented remote calls, we just need to locate the "function". If it is object-oriented, you need to navigate to the "object". Because the function is stateless, its positioning process can be as simple as a name, while the object needs to find its ID or handle dynamically.
two。 Cross-language or single-language: in a single-language scenario, header files or interface definitions can be processed entirely in one language, and if it is cross-language, it will inevitably require IDL
3. Hybrid communication bearer or HTTP server bearer: hybrid bearer may use underlying technologies such as TCP/UDP/ shared memory to provide optimal performance, but it must be very troublesome to use. Using HTTP server is very simple, because there are many open source software and libraries of WWW service, and the client can debug using a browser or some JS pages, but its performance is low.
If we are now designing a remote calling system for a domain where the business logic is very changeable, such as the enterprise business application domain, or the game server side domain, we might choose as follows:
1. Use name service to locate remote objects: because enterprise services require high availability, name services can be used to identify and select availability service objects when querying names. EJB (Enterprise JavaBean) in the J2EE scheme is served by name.
two。 Use IDL to generate interface definitions: because the development language of an enterprise service or game service may not be uniform, or may require a high-performance programming language such as C _ IDL, you can only use IDL.
3. Use hybrid communication bearer: although enterprise services do not seem to need to run under a very complex network, the network environment of different enterprises may be very different, so to make a general-purpose system, it is best to provide hybrid communication bearer without fear of trouble, so that you can choose from various protocols such as TCP/UDP.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.