In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you about the practice of container SDN technology and micro-service architecture. The article is rich and analyzed and described from a professional point of view. I hope you can get something after reading this article.
About SDN and containers
As a hot concept in recent years, it is well known that SDN is the abbreviation of Software Defined Network, that is, software-defined network. But different people have different understandings of SDN. In a broad sense, as long as you implement something through software, and then that thing can flexibly achieve the deployment and scalability of the network, this can be considered SDN. Later, we will analyze the three solutions of Flannel, Calico and Weave, and focus on their technical implementation from the control layer and forwarding layer, although they do not claim to be the solution of SDN.
As container technology poses some new challenges to traditional virtualized networks, many different network solutions have emerged around Docker to make up for the shortcomings of Docker in these aspects.
Open source SDN solutions around containers
Docker's own network solution is relatively simple, that is, each host will run a very pure Linux Bridge, this Bridge can be regarded as a layer 2 switch, but its capacity is limited, can only do some simple learning and forwarding. Then the traffic coming out of the bridge will go to IPTABLES, do the address translation of NAT, and then rely on route forwarding to do the communication between hosts. However, when a more complex service is deployed with its network model, there will be many problems, such as the IP changes after the container is restarted, or because each host is assigned a fixed network segment, so when the same container is moved to a different host, its IP may change because it is in a different network segment. At the same time, the existence of NAT will cause both sides to see each other's address during communication is unreal, because it has been NAT; and NAT itself has performance loss and so on. These problems pose obstacles to the use of Docker's own network solutions.
Flannel
Figure 1
Flannel is an overlay network (Overlay Network) tool designed by the CoreOS team for Kubernetes, as shown in figure 1. Their control plane is actually very simple, as shown in figure 2.
Figure 2
The Flannel process on each machine listens to ETCD, asks ETCD for the available IP address range of each node, and gets the network segment information of all other hosts from ETCD, so that it can do some routing. For its forwarding plane (figure 3)-- the forwarding plane mainly shows the flow direction of the data flow-- it creates a new device called VXLAN on the basis of the bridge that Docker enters (figure 4). VXLAN is a tunnel scheme, which can put a layer 2 packet with a packet header in front, and then take the whole packet as a packet of the physical network to route and flow.
Figure 3
Figure 4
Why would there be such a thing? Because usually the IP and MAC of the virtual network are unknown in the physical network. Because identifying IP requires physical network support, this is actually a tunnel solution.
Summing up the Flannel scheme, we can see that it does not achieve isolation, and it also makes IP allocation according to the network segment, that is, when a container is moved from one host to another, its address must change.
Calico
Figure 5
The idea of Calico is relatively new, as shown in figure 5. It regards the protocol stack of each operating system as a router, and then regards all the containers as network terminals connected to this router, runs the standard routing protocol BGP between routers, and then lets them learn how to forward the network topology. So the Calico solution is actually a pure three-layer solution, that is, the three layers of the protocol stack of each machine are used to ensure the connectivity between the two containers and the three layers across the host containers. For the control plane (figure 6), it runs two main programs on each node, one is its own called Felix, and the one on the left, it listens to the storage in the ECTD center and gets events from it, such as the user adds an IP to the machine, or allocates a container, and so on. Then a container is created on this machine, and its network card, IP and MAC are all set, and then a note is written in the routing table of the kernel indicating that the IP should go to this network card. The green part is a standard routing program, which will get which IP routes have changed from the kernel, and then spread to the entire other host through the standard BGP routing protocol to let the outside world know that the IP is here, and you will get here when you route.
Figure 6
A question about Calico is discussed here, because it runs a pure three-layer protocol, so it is actually intrusive to the physical architecture. Calico officials say you can run in a second-tier network. The so-called second layer means that there are no three-tier gateways, and all machines, hosts and physical machines are reachable on the second layer. In fact, this scheme has some drawbacks. In fact, in the eyes of many experienced network engineers, a sophomore is actually a single failure rate, that is to say, any one will have a certain hardware risk that will paralyze the entire sophomore.
In addition, when Calico runs on a physical network with a three-layer gateway, it needs to BGP all the routing protocols on all machines and the layer three paths of routers in the entire physical network. In fact, this will bring a problem, the number of containers here may be tens of thousands, and then you let all the physical routes learn this knowledge, which will actually bring some pressure to the BGP routing in the physical cluster. Although I have not measured this pressure, according to professional network engineers, when the number of network endpoints is large enough. It takes a lot of resources and time to learn and discover topology and convergence process.
The forwarding plane (figure 7) is an advantage of Calico. Because it is a pure layer 3 forwarding with no NAT or overlay in the middle, its forwarding efficiency may be the highest of all schemes, because its packet goes directly to the native TCP/IP protocol stack, and its isolation becomes easier because of this stack. Because TCP/IP 's protocol stack provides a complete set of firewall rules, it can achieve more complex isolation logic through IPTABLES rules.
Figure 7
Weave
Figure 8
The Weave scenario is interesting, as shown in figure 8. First of all, it will run a self-written Router program on each machine to act as a router, then establish a fully connected PC connection between routers, and then run routing protocols with each other in this TCP connection network to form a control plane (figure 9). It can be seen that its control plane is consistent with Calico, while the forwarding plane (figure 10) is tunneled, which is consistent with Flannel, so Weave is considered to combine the characteristics of Flannel and Calico.
Figure 9
Figure 10
Figure 11 shows a simple solution for its service discovery and load balancing. It sets up two network cards in each container, one of which is connected to a bridge that can connect with other hosts; the other is tied to a bridge of the native Docker and listens to a DNS service on this bridge. The DNS is actually embedded in Router, that is, it can learn some configuration of the back end of some services from Router. So if the container initiates a DNS query, it will actually be routed to the host, DNS Server, and then DNS server will make some responses. Their official load balancing also depends on this, but this is actually a deficiency, because we prefer four-tier or seven-tier more fine load balancing.
In terms of isolation, Weave's scheme is rough, only at the subnet level (figure 12). For example, if there are two containers in the 10.0.1-24 network segment, it will add a route to all the containers saying that the network segment will go out through the bridge on the left, but all traffic not on this network segment will go through Docker0. At this time, Docker0 and others are not connected, so it achieves an isolation effect.
Figure 11
Figure 12
Summary of three schemes
To sum up:
1. It is good for Flannel to interconnect containers only as a single tenant, but additional components are needed to implement more advanced functions, such as service discovery and load balancing.
2. Calico has good performance and isolation strategy, but it may have certain requirements and intrusiveness to the physical architecture based on the principle of layer 3 forwarding.
3. Weave comes with DNS, which can solve the problem of service discovery to some extent. However, due to the limited isolation function, the Unicom solution as a multi-tenant is still lacking.
4. In addition, both Calico and Weave use routing protocols as the control plane, but the performance of autonomous routing learning under large-scale network endpoints is actually unverified. I have consulted relevant network engineers that the topology calculation and convergence of large-scale endpoints often require a certain amount of time and computing resources.
The concrete practice of Qiniu
Business requirements
Qiniu has actually been embracing the changes brought about by containers and the concept of a new micro-service architecture. Therefore, we have built a set of container platform. On the one hand, we want to simplify the R & D and launch process by containerizing the existing business, and on the other hand, we also want to meet some of the computing needs of users in this way. After all, the closer the computing and the data, the better.
So our business requirements for the network are:
1. First of all, it is able to run on the underlying heterogeneous basic network, which is very important to promote the containerization of existing services, otherwise it will involve large-scale changes in the basic network, which is unacceptable.
two。 We try to construct a network structure that is friendly to container migration, allowing container scheduling if necessary.
3. We believe that service discovery and load balancing are basic and universal requirements for business, especially in today's micro-service architecture, a well-designed component should be horizontally scalable, so for the caller of the component, service discovery and load balancing are very necessary functions. Of course, some people will say that this function has nothing to do with the network layer, but should be implemented by the application layer, which is quite reasonable, but I will talk about the benefits of directly supporting these two functions by the network layer later.
4. In order to meet some of the existing isolation requirements of Qiniu itself, and to meet the richer permission model and business logic of the upper layer, we try to make isolation more flexible.
Driven by these requirements, we finally try to jump out of the shackles of the traditional network model and try to construct a more flat and controlled network structure.
Forwarding plane
First of all, at the forwarding level, in order to accommodate heterogeneous basic networks, we choose to use Open vSwitch to construct L2 overlay model and realize layer 2 interworking of virtual networks by connecting vxlan tunnels between OVS. As shown in figure 13. However, the tunnel usually has a computational cost, and the tunnel needs to unpack the virtual layer 2 frame frequently, but the general cpu is not good at this. We increase the bandwidth utilization of a 10 Gigabit network card from 40% to about 95% by offload the calculation of vxlan to the hardware network card.
Another reason for choosing overlay is that, as far as we know, hardware manufacturers generally prefer the overlay model in their support for SDN.
Figure 13
Control plane
At the control level, we consider some of the differences between containers and traditional virtual machines:
As mentioned earlier, under the micro-service architecture, the responsibilities of each container are more detailed and fixed than the virtual machine, which results in a relatively fixed dependency between the container and the container. Then the outbound that may be generated by the container on each host can also be inferred. If you think about it further, the theoretical scope derived is usually much larger than the actual outbound generated by the container. So we try to implement the injection of control instructions in a passive way. Therefore, we introduce OpenFlow as the control plane protocol. As the current protocol standard of SDN control plane, OpenFlow has strong expressive ability. From the point of view of packet matching, it can match almost any field in the packet header and supports a variety of flow aging strategies. In addition, the scalability is also very good, supporting the third-party Vendor protocol, which can implement functions that cannot be provided in the standard protocol. OpenFlow can organize flow tables by Table and jump between tables (this is actually very similar to IPTABLES, but OpenFlow will be more semantic). With this Table organization mode of OpenFlow, relatively complex processing logic can be realized. As shown in figure 14.
Figure 14
If we choose OpenFlow, our control plane will be very regular, that is, logical centralized control, which is not as cool as weave/calico 's P2P. Under this structure, when ovs encounters an unknown message, it will actively submit the packet information to Controller,Controller. After judging according to the packet information, it will send appropriate flow table rules to ovs. To achieve load balancing and high availability, we configure multiple Controller for each group of ovs. As shown in figure 15.
For example:
1. Controller for illegal traffic will be simply discarded by ovs and will not be asked for some time in the future.
two。 For legitimate traffic, Controller tells ovs how to route the packet and eventually reach the correct destination.
Figure 15
Service discovery and load balancing
With regard to service discovery and load balancing, we provide the following object models:
1. Container, container instance, and multiple Container form a Pod (entity).
2. Pod, each Pod shares a network stack, IP address and port space (entity).
3. Service. Multiple copies of the same Pod form a Service and have a Service IP (logic).
4. Security group, multiple Service form a security group (logic).
Among them, the dynamically scalable relationship is a mapping between Service and its back-end Pod, which is accomplished by the automatic service discovery of the platform. As long as the access to the Service IP is initiated, the Service itself will complete the functions of service discovery and load balancing. If the backend Pod changes, the caller does not need to know it at all.
In terms of implementation, we implement this function on each host, and this component on each host will directly proxy the Service traffic generated by the machine, so as to avoid additional private network traffic overhead.
Functionally, we have implemented IP-level load balancer. What do you mean, the accessible port of each Service IP is the same as the actual listening port of the backend Pod. For example, if the backend Pod listens on 12345, then you can directly access port 12345 of the Service IP without additional port configuration.
Here is a comparison of several common load balancers:
1. More refined than DNS equilibrium.
two。 It is easier to use and less intrusive than port-level load balancers.
In addition, the layer 7 load balancer actually has a lot of room for imagination. We have implemented most of the common configurations of Nginx, and users can configure them flexibly. The business can even specify the backend for access.
Security group
At the isolation level, we logically divide security groups, and multiple service form a security group, and flexible access control can be achieved between security groups. Containers in the same security group have unrestricted access to each other. One of the most common functions is to give some specific Service Export in security group A to another security group B. After Export, containers in security group B can access these exported Service, but not other Service in A. As shown in figure 16.
Figure 16
After introducing the basic functions of our network, here through the analysis of two actual cases of Qiniu to illustrate how this structure promotes the evolution of the business architecture.
Case study 1 evolution of FOP architecture for Murray-Qiniu file processing
The first is Qiniu's file processing architecture (File OPeration), as shown in figure 17. File processing has always been a very innovative and core feature of Qiniu. After uploading a file, users can download it directly to the file processed by parameters by simply adding some parameters to the resource url. For example, you can add some parameters to the url of a video file, and finally download it to an image that is watermarked and rotated 90 degrees on a video frame and cut to a size of 40 × 40.
Figure 17
The architecture that underpins such a business was very clumsy in the early days. Figure 17 shows the business entrance on the left and various worker clusters that are actually calculated on the right, including image processing, video processing, document processing and other processing examples.
1. The cluster information is written in the portal configuration, and the backend configuration changes are not flexible enough.
two。 The service entrance becomes a component of traffic penetration (the instruction flow of the business is mixed with the data flow).
3. In the case of a sudden request, the response may not be timely.
Later colleagues in charge of document processing evolved the architecture like this (figure 18).
1. Add a Discovery component to automatically discover worker information in the cluster. Each worker added to the cluster will actively register itself.
two。 The business entry obtains the cluster information from Discovery to complete the load balancing of the request.
3. A new Agent component is added on each computing node, which is used to report heartbeat and node information to the Discovery component, and cache the processed result data (separating the data flow from the ingress). In addition, it is also responsible for the request load balancing within the node (there may be multiple instances).
4. At this point, the service entry is only responsible for distributing the instruction flow, but it still needs to do node-level load balancing for the request.
Figure 18
Figure 19 depicts the early structure of the file processing architecture after it was migrated to the container platform, with the following changes from before the migration.
1. Each Agent corresponds to a calculation worker, and independently into Service according to the type of work, such as Image Service,Video Service.
two。 Cancel the Discovery service of the business and transfer it to the service discovery function of the platform itself.
3. The functional degradation of each Agent:
There is no need to maintain a heartbeat with Discovery, and there is no need to report node information.
Since there is only one worker in the backend, there is no need for load balancing logic within the node.
4. There is no need for load balancer at the service entry, and you only need to brainlessly request the entry address provided by the container platform.
Figure 19
Figure 20 is another evolution that occurs after the migration. In fact, in the previous stage, each Agent is still tied to the computing instance, and this is only to facilitate the painless migration of the business, because the code of the Agent itself will have some logical assumptions.
In this figure, we further separate Agent and worker,Agent into a single Service, and all worker are separated into Service according to the type of work. The purpose of this separation is that Agent is a stateful service that may have file contents cached, while all worker is a real working, stateless service. The advantage of separation is that the number of worker can be adjusted and scaled at any time without affecting the state carried in the Agent.
Benefits:
1. It can be seen that compared with the earliest architecture, the business side only needs to focus on developing the business itself, without having to repeat the wheels to achieve a variety of complex service discovery and load balancing code.
two。 In addition, since it is deployed to the container platform, the scheduler of the platform will automatically migrate instances with more resource consumption of nodes, which makes the resource consumption of each node in the computing cluster more balanced.
Case study 2 Mel-user customization file handling UFOP architecture evolution
Another case is Qiniu's user-defined file processing.
User-defined File processing (User-defined File OPeration,UFOP) is a framework provided by Qiniu for running file handlers uploaded by users. Its role is actually the same as that described earlier, except that it allows the user to customize his computing instance. For example, Qiniu's existing porn detection service is a third-party worker that can be used to identify whether an image contains pornographic content. It is precisely because of the introduction of user programs, so the architectural difference between UFOP and the official FOP is that UFOP requires isolation.
Figure 20 is the architecture of the original UFOP. In fact, container technology has been used for resource isolation. All containers map ports to physical machines through Docker Expose, and then register the address and port information with a central service through a centralized registration service, and then the portal distribution service obtains cluster information through this central service to do the requested load balancing.
In the isolation of the network, due to the weak isolation of Docker itself, this architecture chooses to prohibit all communication between containers, but only allows traffic coming from the ingress. This isolation scale limits the flexibility of user-defined programs to some extent.
Figure 20
After migrating to the container platform, due to the flexible security group control, the processing programs uploaded by different users are naturally isolated, and users can create a variety of Service with different responsibilities to complete more complex processing logic. As shown in figure 21.
In addition, the migrated program will have a complete port space, further opening up the flexibility of user-defined handlers.
Figure 21
The above is the practice of container SDN technology and micro-service architecture shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.