Calico policy practices for Kubernetes network components (BGP, 07/01 Update SLTechnology News&Howtos

Calico policy practices for Kubernetes network components (BGP,

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Calico Strategy practice of Kubernetes Network Scheme

Case study: since the solution before K8s cluster deployment is flannel network policy, the flannel policy is switched to calico network policy here.

Calico is a pure three-tier data center network solution. Calico supports a wide range of platforms, including Kubernetes, OpenStack and so on. Calico implements an efficient virtual router (vRouter) in each computing node using Linux Kernel to be responsible for data forwarding, and each vRouter is responsible for transmitting the routing information of the workload running on it to the whole Calico network through the BGP protocol. In addition, the Calico project also implements the Kubernetes network strategy and provides ACL functions.

Calico uses kernel to implement a virtual router to maintain these routing tables, and then each node acts as a virtual router, and then exchanges these routing information with each other according to BGP, resulting in the exchange of these data among the entire cluster nodes, and calico implements the K8s network strategy to provide ACL functions.

1. Overview of BGP

In fact, the network solution provided by the Calico project is almost the same as Flannel's host-gw model. In other words, Calico also implements container packet forwarding based on the routing table, but it is different from the practice that Flannel uses the flanneld process to maintain routing information, while the Calico project uses BGP protocol to automatically maintain routing information for the entire cluster. The full name of BGP is Border Gateway Protocol, which is the Border Gateway Protocol. It is a dynamic route discovery protocol between autonomous systems, which exchanges network reachable information with other BGP systems.

Here, BGP is used to realize the data exchange of the whole network, which is almost the same as the host-gw mode of flannel, except that BGP is used to exchange data, and flannel maintains these on its own, while BGP is a dynamic protocol used in large networks, so calico uses this BGP as a routing exchange, which is also when the performance and data packets, data tables and cluster size reach a certain amount. Can also guarantee a good performance.

In general, the server in the computer room will say: our home computer room uses BGP multi-line. " This is the Border Gateway Protocol. Routing means choosing a path and forwarding it out according to the routing table. The routing table is divided into dynamic and static, and static means to manually add those routing table information. Dynamic means that they can perceive each other. When there are many devices in the network and there are many vlan, cross-company communication is realized, then if you manually configure these routing table information, obviously the manpower is very large. So when this reaches a certain scale, basically will use dynamic routing protocols, BGP is one of them, such as OSPF, RIP are routing table learning, dynamic to perceive the network topology, that is, in this large Internet network, the various routing devices they learn through dynamic learning, then BGP is one of the protocols

To give you a better understanding of BGP, for example:

In this figure, there are two autonomous systems (autonomous system, or AS for short): AS 1 and AS 2.

Autonomous systems can be thought of as the company's network and other company's network, two can be understood as two autonomous systems, each autonomous system is composed of its own switches, routers, and these switches and routers run separately, it does not rely on other companies, other companies do not rely on your company's network, can go to run independently, a university An enterprise can be described as an autonomous system, but there is no communication in this autonomous system, and there is no communication between your network and your neighbor's network, but if they want to communicate, they are not in the same network. Your network and the network exit switch at the top of its network must learn from each other. Moreover, the computers we use are all private IP, and the neighbors also use private IP, and even these IP are in conflict. If you want to achieve communication between the two private networks, first of all, ensure that the ip addresses used cannot conflict, and ensure that the upper routing exits can learn their own routing tables from each other, for example, your router can learn the information of the current routing table.

BGP is simply to connect two autonomous systems, two can communicate with each other, this is the role of BGP, there are two AS, can be like two schools and two companies, both are not the same network, now if you want to communicate between 192.168 under AS 1 and 172.17 under AS 2, how should I go?

192.168.1.10 first go to the switch, and then to the router, and enter through the router's port A. if the two companies want to communicate, the router must be reachable and communicable, as long as the communication is established and the packet arrives at node B. then how does it know to forward to router 2? Therefore, the existence of the routing table is needed. according to the destination address of 172.17.1.20, router 1 will find that there is a local routing table according to the destination address of 172.17.1.20. It will find that there is a routing table locally, which is forwarded to router 2, which is then forwarded to router 2. Router 2 must know how many networks it has jurisdiction over, because it has learned its own destination address. However, there is no next hop, because within its own jurisdiction, for example, 1.20, there is this IP, then there is no need for the next hop, and then it is forwarded to port An according to the interface, which happens to be connected with the switch, and then the switch is transferred from layer 2 to the corresponding destination address, so that nodes can communicate.

So what role does BGP initiate in this environment?

This routing table can also be added manually, and then point to the router whose next hop is this route, and it will also be forwarded, as long as their network is connected, if there are many nodes and vlan, then the routing table will be very large, and there will be other routers in the Internet, which may also be used to communicate with other networks. So the BGP here can learn each other's routing table information, so why does route1 have the information about route2? in fact, it learned in route? well, if AS 2 wants to access the node of AS 2, first of all, its routing table can also learn the destination address, then it can be forwarded to the router and then to the destination server.

This dynamic routing protocol is somewhat similar to the host-gw pattern in our flannel, for example, the server is the container in our K8s, and AS 1 and as 2 are treated as the node of K8s.

In the past, the current node was regarded as a gateway, but now this Calico is introduced into this BGP, which makes each node into a virtual router. They exchange routing information with each other through BGP, while flannel is each routing table maintained by a daemon and sent to the local node, while calico uses BGP to achieve data exchange, but the routing table is also written to each node.

In the Internet, an autonomous system (AS) is a small unit that has the right to independently decide which routing protocol should be used in the system. This network unit can be a simple network or a network group controlled by one or more ordinary network administrators. It is a single manageable network unit (such as a university, an enterprise or an individual company). An autonomous system is sometimes called a routing domain (routing domain). An autonomous system will assign a globally unique 16-digit number, which is sometimes called the autonomous system number (ASN). Under normal circumstances, there will be no communication between autonomous systems. If the hosts in the two autonomous systems want to communicate directly through the IP address, we must use a router to connect the two autonomous systems. The BGP protocol is a way to connect them.

2. Calico BGP implementation

This is the architecture diagram of calico, and it is also an official picture 10.0.0.1 is Container 1Magnet 10.0.0.2 is Container 2, there will be an interface for cali, these two devices are also veth devices, here it treats the host as a virtual router, and then the virtual router follows the BGP protocol, which involves two components, one is Client, the other is Felix, which is mainly responsible for writing the reason table information of the machine. Previously, the routing table information was written by the flannel daemon, and the calico was written using Felix.

In addition, daemonset is also used to deploy to each node. Calico also uses etcd to maintain the network policy and configuration status set by calico, in which the most important thing is BGP's client, which mainly provides this protocol, that is, each node has a BGP client to establish a BGP connection between them, and then use the BGP connection to exchange their routing table information. In this way, each node container of the whole machine forms a complete topology rule, so they follow a BGP rule, and flannel is a simple TCP rule to do.

After learning about BGP, the architecture of the Calico project is easy to understand. Calico consists of three main parts:

Felix: deployed as DaemonSet and run on each Node node, it is mainly responsible for maintaining the routing rules and ACL rules on the host.

BGP Client (BIRD): mainly responsible for distributing the routing information written by Felix to Kernel to the cluster Calico network.

Etcd: distributed key-value storage to save the policy and network configuration status of Calico.

Calicoctl: allows you to implement advanced policies and networks from a simple command line interface.

3. Calico deployment

Git clone git@gitee.com:zhaocheng172/calico.git

You need to give me your public key before you can pull it down, otherwise you don't have permission.

After downloading, you also need to modify the configuration items:

Because Calico uses etcd, some policies, some network configuration information, and some calico attribute information are stored in etcd, and etcd is also deployed in the K8s cluster, so we can use the existing K8s etcd between us. If we use https, we have to configure the certificate, and then choose some pod network, as well as the working mode.

The specific steps are as follows:

Configure the connection etcd address, and if you use https, you also need to configure the certificate.

(ConfigMap,Secret) modify Pod CIDR (CALICO_IPV4POOL_CIDR) and select operation mode (CALICO_IPV4POOL_IPIP) according to actual network planning to support BGP,IPIP

Calico also uses configmap to save configuration files, and secret stores the certificate of etcd its https, which is divided into three items

Etcd-key: nulletcd-cert: nulletcd-ca: null

Specify the address of the etcd connection: etcd_endpoints: "http://:"

When starting secret to mount to the container, which file should be mounted? specify here.

Etcd_ca: "#" / calico-secrets/etcd-ca "etcd_cert:" # "/ calico-secrets/etcd-cert" etcd_key: "#" / calico-secrets/etcd-key "

Now switch the network to calico.

1. So modify the etcd and modify a total of 3 locations

1. Etcd's certificate

The location where I put the certificate is under / opt/etcd/ssl, but we need to put it in secret, and we need to convert it to base64 encoding before we can store it, and this execution is also done by line wrapping, which must be spliced into a whole string.

[root@k8s-master1 ~] # cat / opt/etcd/ssl/ca.pem | base64-w 0

Add all the corresponding ones and remove the comments

# etcd-key: null converts the certificate under the corresponding ssl into base64 encoding, and removes the comments # etcd-cert: null # etcd-ca: null

2. To read the location of the secret landing in the container, simply remove the comments

Etcd_ca: "/ calico-secrets/etcd-ca" etcd_cert: "/ calico-secrets/etcd-cert" etcd_key: "/ calico-secrets/etcd-key"

3. The string that connects etcd, which is the same as the string that K8s connects to API.

This is under the directory [root@k8s-master1 ~] # cat / opt/kubernetes/cfg/kube-apiserver.conf, because each cluster is deployed on its own, and the location may be different.

Etcd_endpoints: "https://10.4.7.11:2379,https://10.4.7.12:2379,https://10.4.7.21:2379"

Put this certificate in the calico configuration

Second, modify Pod CIDR according to the actual network plan

This position is the default in this one and needs to be changed to your own.

-name: CALICO_IPV4POOL_CIDR value: "192.168.0.0Uniplic16"

The default address that can be configured in the controller is 10.244.0.0.16.

[root@k8s-master1 ~] # cat / opt/kubernetes/cfg/kube-controller-manager.conf--cluster-cidr=10.244.0.0/16\ change this to this-name: CALICO_IPV4POOL_CIDR value: "10.244.0.0and16" in the configuration

Third, choose the mode of work

IPIP# Enable IPIP-name: CALICO_IPV4POOL_IPIP value: "Always"

This variable asks if you want to turn on IPIP, because there are two modes, the first is IPIP and the second is BGP.

In fact, the most common application is BGP. Changing this Always to Never means to turn it off.

You can delete the flannel network now

[root@k8s-master1 K8s] # kubectl delete-f kube-flannel.yaml

Then delete the virtual network card generated by flannel and the bridge cni. It is best to delete this, because the subnet we use is the same as the subnet of flannel, and there will be conflicts.

[root@k8s-master1 calico] # ip routedefault via 10.4.7.1 dev eth0 proto static metric 100 10.4.7.0/24 dev eth0 proto kernel scope link src 10.4.7.11 metric 100 10.244.0.0/24 via 10.4.7.21 dev eth0 10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 10.244.2.0/24 via 10.4.7.12 dev eth0 172.17.0.0/16 Dev docker0 proto kernel scope link src 172.17.0.1

Starting to delete, each node has to delete the routing table and bridge, which is left by the previous deployment of flannel, and also to avoid conflicts with calico.

[root@k8s-node1 ~] # ip link delete cni0 [root@k8s-node1 ~] # ip link delete flannel.1 [root@k8s-node1 ~] # ip route delete 10.244.0.0 dev eth0 24 via 10.4.7.21 dev eth0 [root@k8s-node1 ~] # ip route delete 10.244.1.0 dev eth0

Start deploying calico. Here will help you start two roles. Calico-node is actually calico's client and felix, which will start one on each node.

Calico-kube-controllers, this is the calico that dynamically acquires some network rules in etcd, and handles some policies by this controller.

[root@k8s-master1 calico] # kubectl create-f calico.yaml [root@k8s-master1 calico] # kubectl get pod-o wide-n kube-systemNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATEScalico-kube-controllers-f68c55884-q7bsl 1 Running 0 2m17s 10.4.7.21 K8s-node2 calico-node-g9tfw 1/1 Running 0 2m17s 10.4.7.21 k8s-node2 calico-node-tskkw 1/1 Running 0 2m17s 10.4.7.11 k8s-master1 calico-node -vldl8 1 Running 1 Running 0 2m17s 10.4.7.12 k8s-node1

So far, if you look at the network, you will find that you cannot find the routing table of calico. Because the current pod does not use the current calico network, it needs to be rebuilt before it will be applied, so this will also be affected. This needs to be prepared in advance.

After rebuilding these pod, the network will generate the routing table according to the rules of calico. You will find that the previous pod and the pod deployed with flannel can no longer communicate with each other, so switching networks is also a big thing, so you need to make time to do it.

[root@k8s-master1 ~] # ip routedefault via 10.4.7.1 dev eth0 proto static metric 100 10.4.7.11 metric 10.244.113.128 dev calibe9d0ccbf7b scope link blackhole 10.244.113.128 dev calibe9d0ccbf7b scope link blackhole 10.244.113.128 proto bird 10.244.203.64 via 10.4.7.21 dev eth0 proto bird 10.244.245.0 via 10.4.7.12 dev eth0 proto bird 172.17.0 . 0/16 dev docker0 proto kernel scope link src 172.17.0.1

4. Calico management tools

Calico management tools will be used here to manage some calico configurations, such as switching to ipip mode

Download tool: https://github.com/projectcalico/calicoctl/releases# wget-O / usr/local/bin/calicoctl https://github.com/projectcalico/calicoctl/releases/download/v3.9.1/calicoctl# chmod + x / usr/local/bin/calicoctl

After installing this management tool, you can view the node status of the current node BGP

The main purpose of this tool is to operate in etcd, mainly to get it in etcd. This table is just to help you take it out of etcd and format the output.

As you can see from this command, make a long link output.

[root@k8s-master1 ~] # netstat-anpt | grep birdtcp 0 0 0.0 0 anpt 179 0 0 0. 0 LISTEN 221854/bird tcp 0 0 10 4.7.11 14 9 9 10 7 12 ESTABLISHED 221854/bird tcp 0 0 10 4 7 11 : 179 10.4.7.21:51460 ESTABLISHED 221854/bird

If you want to use calicoctl get node, you need to specify a configuration file, which is under / etc/calico/calicoctl.cfg by default

It mainly modifies the path of etcd and the certificate it connects to. It mainly operates etcd.

# mkdir / etc/calico# vim / etc/calico/calicoctl.cfg apiVersion: projectcalico.org/v3kind: CalicoAPIConfigmetadata:spec: datastoreType: "etcdv3" etcdEndpoints: "https://10.4.7.11:2379,https://10.4.7.12:2379,https://10.4.7.21:2379" etcdKeyFile:" / opt/etcd/ssl/server-key.pem "etcdCertFile:" / opt/etcd/ssl/server.pem "etcdCACertFile:" / opt/etcd/ssl/ca.pem "

In this way, you can use calicocatl get node, so you can get the data in etcd.

# calicoctl get nodesNAME k8s-master k8s-node1 k8s-node2

View the IP address pool for IPAM:

[root@k8s-master1 ~] # calicoctl get ippool-o wideNAME CIDR NAT IPIPMODE VXLANMODE DISABLED SELECTOR default-ipv4-ippool 10.244.0.0 16 true Never Never false all ()

5. Analysis of the principle of Calico BGP

See how the default BGP works?

This is also cross-node communication. Compared to flannel, this picture is similar to flannel. Through a router icon, the cni of flannel is removed from vxlan mode, so you will find that there is no bridge here, which is achieved entirely through routing. This data packet is also sent from the other port of the device pair of veth and reaches the virtual network card at the beginning of cali on the host. When you reach this end, you will reach the network protocol stack on the host. In addition, when you create a pod, you will first set up an infra containers container, then call the binary of calico to help you configure the container's network, and then decide where the packet will be sent according to the routing table. You can see the routing table information from ip route, which shows the subnetwork assigned by the destination cni and the network of the destination host. That is, when cross-host communication is carried out, the eth0 network card that forwards to the next hop address and the host host goes out, that is, a direct static route, this next hop is in the same form as host-gw. The biggest difference between this and host-gw is the switching of BGP routes used by calico, while host-gw uses its own route switching, while BGP is more mature and is often used in large networks. So it's much better than flannel, and this routing information is transmitted by BGP client, using the BGP protocol.

Why is this called Border Gateway Protocol?

This working mode is basically the same as that of host-gw, except that BGP exchanges routing rules, and BGP becomes a border router, mainly transmitting rules with other autonomous systems at the most boundary of each autonomous system. This also comes from the BGP network formed between these nodes, which is an all-networked network, which is called a BGP Peer.

You can see that the startup file is also in / opt/cni/bin, which is a directory where the yaml file has two images specially written into it.

This is the configuration information about the subnet.

[root@k8s-master1 ~] # cat / etc/cni/net.d/10-calico.conflist

The general process for Pod 1 to access Pod 2 is as follows:

Packets go out of container 1 to the other end of the Veth Pair (on the host, starting with the cali prefix)

The host forwards the packet to the next hop (gateway) according to the routing rules

Arrive at the Node2 and forward the packet to the cali device according to the routing rules to reach Container 2.

Among them, the core "next-hop" routing rule here is maintained by Calico's Felix process. The routing rule information is transmitted through BGP Client, that is, BIRD component, using BGP protocol.

It is not difficult to find that the Calico project actually treats all the nodes in the cluster as border routers, and together they form a fully connected network, exchanging routing rules with each other through the BGP protocol. These nodes are called BGP Peer.

The only difference between host-gw and calico is that when the next hop arrives at the container of the node2 node, the packet changes, and the outgoing packet also changes. We know that it is from the device of the veth to let the packet in the container reach the host data packet. After the packet arrives at the node2, it follows a special routing rule, which records the destination communication address of the cni network. Then enter the container through the cali device, which is just like the network cable. The packet is sent to the container through this network cable. This is also a layer 2 network interconnection that can be realized. If the layer 2 is not available, you can use IPIP mode.

How does calico get out without a bridge packet?

The packet of pod1 is paired from the device of veth to a segment of eth0 of the host. The previous packet actually forwards the traffic to the device of cali of calico through the default host gateway, and then forwards it to the host through the next hop address of the routing table information.

6. Route Reflector mode (RR) (route reflection)

Https://docs.projectcalico.org/master/networking/bgp

The network maintained by Calico is (Node-to-Node Mesh) full interconnection mode by default, and the nodes in the Calico cluster will establish connections to each other for routing and switching. However, with the expansion of the size of the cluster, the mesh model will form a huge service grid, and the number of connections will increase exponentially.

At this point, you need to use Route Reflector (Router reflection) mode to solve this problem.

Determine that one or more Calico nodes act as routing reflectors, allowing other nodes to obtain routing information from this RR node.

In BGP, you can see that startup is in the form of node-to-node mesh grid through calicoctl node status, which is a fully interconnected mode. By default, BGP acts as a loudspeaker of a BGP in each node of K8s, and has been yelling to spread to other nodes. With the increase in the number of cluster nodes, hundreds of nodes have to build hundreds of links, which is the way of full interconnection. It is necessary to establish connections back and forth to ensure network interoperability, so adding a node will multiply this link to ensure network interoperability, so it will use a lot of network consumption, so you need to use Route reflector, that is, to find several large nodes, and let them go to this big node to establish a connection, which is also called RR, that is, when the employees of the company do not have a WeChat group, it is very troublesome to communicate with everyone. So build a group, everyone in it can receive it, so to find nodes or multiple nodes to act as routing reflectors, it is recommended that at least 2 to 3 nodes are used as backup, and one does not affect other uses during maintenance.

The specific steps are as follows:

1. Close the node-to-node BGP grid

Add default BGP configuration, adjust nodeToNodeMeshEnabled and asNumber:

[root@k8s-master1 calico] # cat bgp.yaml apiVersion: projectcalico.org/v3 kind: BGPConfiguration metadata: name: default spec: logSeverityScreen: Info nodeToNodeMeshEnabled: false asNumber: 63400

For a direct application, when we disable node-to-node mesh, the network will be cut off immediately, so if it is disconnected, the scope of influence should be done in advance, that is, switching this network needs to be disconnected. Using node-to-node BGP is also recommended to have less than 100 nodes. When more than 100 nodes must use route reflection RR mode.

[root@k8s-master1 calico] # calicoctl apply-f bgp.yaml Successfully applied 1 'BGPConfiguration' resource (s)

Check the bgp network configuration. False is closed.

[root@k8s-master1 calico] # calicoctl get bgpconfigNAME LOGSEVERITY MESHENABLED ASNUMBER default Info false 63400

To check that the network test of pod has been disconnected, this is because we have disabled node-to-node mesh using the configuration of caclico

[root@k8s-master1 calico] # ping 10.244.245.2PING 10.244.245.2 (10.244.245.2) 56 (84) bytes of data.

The ASN number can be obtained by obtaining # calicoctl get nodes-- output=wide

There's a number, ASN64300, and a number is an autonomous system.

[root@k8s-master1 calico] # calicoctl get nodes-- output=wideNAME ASN IPV4 IPV6 k8s-master1 (63400) 10.4.7.11 take 24 k8s-node1 (63400) 10.4.7.12 take 24 k8s-node2 (63400) 10.4.7.21 take 24

2. Configure the designated node to act as a route reflector

To make it easy for BGPPeer to select nodes, match them through tag selector, that is, you can call the tags in K8s for association. We can tag which node is used as a routing transmitter.

Label the router reflector node, and I will label the node1

[root@k8s-master1 calico] # kubectl label node k8s-node1 route-reflector=true

Check the node status of the node BJP, because the grid is disabled, so it is closed, so it does not work.

[root@k8s-master1 calico] # calicoctl node statusCalico process is running.IPv4 BGP statusNo IPv4 peers found.IPv6 BGP statusNo IPv6 peers found.

Then configure the router reflector node routeReflectorClusterID to add a cluster node ID

The following can be output through-o yaml

[root@k8s-master1 calico] # calicoctl get node k8s-node2-o yaml > node.yamlapiVersion: projectcalico.org/v3kind: Nodemetadata: annotations: projectcalico.org/kube-labels:'{"beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/os": "linux", "kubernetes.io/arch": "amd64", "kubernetes.io/hostname": "k8s-node2" "kubernetes.io/os": "linux"} 'creationTimestamp: null labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/os: linux kubernetes.io/arch: amd64 kubernetes.io/hostname: k8s-node2 kubernetes.io/os: linux name: k8s-node2spec: bgp: ipv4Address: 10.4.7.12 24 routeReflectorClusterID: 244.0.0.1 # Cluster ID orchRefs:-nodeName: k8s-node2 orchestrator: K8s

Apply it.

[root@k8s-master1 calico] # calicoctl apply-f node.yaml

Now, it is easy to use a tag selector to configure a route reflector node to be peer-to-peer with other non-route reflector nodes: now other nodes are connected to the k8s-node1 tagged route emitter

[root@k8s-master1 calico] # cat bgp1.yaml apiVersion: projectcalico.org/v3kind: BGPPeermetadata: name: peer-with-route-reflectorsspec: nodeSelector: all () # all nodes peerSelector: route-reflector = = 'true'

# that is, everyone with route-reflector goes to connect to match this. We just tagged it, so we need to connect this route reflector.

Check the BGP rules and connection status of the node, so that a node with a route reflector is displayed

View container network connectivity

[root@k8s-master1 calico] # ping 10.244.203.80PING 10.244.203.80 (10.244.203.80) 56 (84) bytes of data.64 bytes from 10.244.203.80: icmp_seq=1 ttl=63 time=1.71 ms

Add multiple routing reflectors

Now add multiple routing reflectors. 2-3 routing reflectors are recommended within 100 nodes.

1) label the cluster nodes

[root@k8s-master1 calico] # kubectl label node k8s-node2 route-reflector=truenode/k8s-node2 labeled

2) add and configure the router reflector node to the k8s-node2

[root@k8s-master1 calico] # calicoctl get node k8s-node2-o yaml

3) View node status

4) Test network connectivity

[root@k8s-master1 calico] # ping 10.244.203.81PING 10.244.203.81 (10.244.203.81) 56 (84) bytes of data.64 bytes from 10.244.203.81: icmp_seq=1 ttl=63 time=12.7 ms64 bytes from 10.244.203.81: icmp_seq=2 ttl=63 time=1.40 ms

So this is to use routing reflectors to solve the consumption caused by the increase of BGP nodes.

7. IPIP mode

Ipip mode is similar to flannel's vxlan mode, which is also an encapsulation of a packet.

As mentioned earlier, the main limitation of Flannel host-gw mode is that it requires layer 2 connectivity between cluster hosts. For Calico, this limitation also exists and cannot cross vlan. The main limitation is that packets mainly encapsulate containers, source IP and destination IP, because they all work in layer 2, so layer 2 does not consider forwarding packets between containers, but if you add a routing table, the destination IP can also be realized through static routes, and the data communication between different vlan However, this approach has not been tested at present.

Another drawback is that you will find that calico has more routing tables than flannel, because it also needs to add some incoming routing table information to the device, that is, each pod adds a routing table, so its routing table is also much larger than flannel.

Change to IPIP mode:

Calicoctl get ippool-o yaml > ipip.yamlvi ipip.yamlapiVersion: projectcalico.org/v3kind: IPPoolmetadata: name: default-ipv4-ippoolspec: blockSize: 26 cidr: 10.244.0.0 ipipMode 16 ipipMode: Always natOutgoing: truecalicoctl apply-f ipip.yaml

Check the details after creation, and ippool has been enabled.

[root@k8s-master1 calico] # calicoctl get ippool-o wideNAME CIDR NAT IPIPMODE VXLANMODE DISABLED SELECTOR default-ipv4-ippool 10.244.0.0Universe 16 true Always Never false all ()

Viewing the network device adds a tunl0 and a tunnel network card

Tunl0@NONE: mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000 link/ipip 0.0.0.0 brd 0.0.0.0 inet 10.244.113.131/32 brd 10.244.113.131 scope global tunl0 valid_lft forever preferred_lft forever

IPIP schematic diagram:

Well, there is such a demand, there are two different vlan, because you need to break through layer 2, now there are two vlan, these two vlan itself is reachable at three layers, and you have to use routers to reach layer three, that is, these two nodes, to deploy a K8s cluster may be within two vlan, but they are connected to the network, and you can also form a cluster, but to use calico's BGP, if you only make them reachable at three layers It also cannot be used with BJP, because BJP uses layer 2, because both the data source IP and destination IP are the IP of pod, and this router does not know who the source IP and destination IP should be sent to, because this routing table information is not written. If it is written, theoretically, different network segments of nodes can communicate, so without adding this routing table, this BGP cannot be passed. So in this case, you have to enable ipip mode. IPIP is the driver of the linux kernel and can tunnel data packets. Then it sees two different networks, vlan1 and vlan2, and there is a premise for starting ipip mode. It belongs to layer 4, because it is based on the existing Ethernet network to encapsulate the original IP in your original packet, because the existing network is already through. Layer 3 routing actually communicates with different vlan, so unpacking through tunl0, this tunl0 is similar to the ipip module, this is similar to vxlan's veth, so this pattern is roughly the same as the vxlan pattern.

The general process for Pod 1 to access Pod 2 is as follows:

Packets go out of container 1 to the other end of the Veth Pair (on the host, starting with the cali prefix)

Enter the IP tunnel device (tunl0) and be encapsulated by the Linux kernel IPIP driver in the IP packet of the host network (the new IP packet is destined for the next hop address of the original IP packet, that is, 192.168.31.63). In this way, it becomes the Node1 to Node2 packet.

Type of package at this time: original IP package: source IP:10.244.1.10 destination IP:10.244.2.10 TCP: source IP: 192.168.31.62 destination iP:192.168.32.63

Well, this IPIP itself, like vxlan, works at layer 3, because it uses the current Ethernet for transmission, and now the physical machine transmits another vlan2 in this aspect of the router. This network must be accessible, so the IPIP and layer 3 can be routed to the destination another vlan.

The packet is forwarded to Node2 through layer 3 of the router

After the Node2 receives the packet, the network protocol stack uses the IPIP driver to unpack the packet and get the original IP packet

Then, according to the routing rules, the packet is forwarded to the cali device according to the routing rules to reach container 2.

Routing Table:

[root@k8s-node1 ~] # ip routedefault via 10.4.7.1 dev eth0 proto static metric 100 10.4.7.12 metric 10.244.113.128 metric 26 via 10.4.7.11 dev tunl0 proto bird onlink 10.244.203.64 dev calie1d6cd79d22 scope link 26 via 10.4.7.21 dev tunl0 proto bird onlink blackhole 10.244.245.0 dev calie1d6cd79d22 scope link 10.244 . 245.2 dev calid6a1fb2294e scope link 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

It is not difficult to see that when Calico uses IPIP mode, the network performance of the cluster degrades due to additional packet and unpacking work. Therefore, it is recommended that you put all host nodes in one subnet and avoid using IPIP.

8. Advantages, disadvantages and final choice of CNI network scheme.

Consider a few questions first:

Need fine-grained network access control? This flannel is not supported, but calico supports it, so to control ACL in multi-tenant network, choose calico.

Pursue network performance? There is no doubt that the routing schemes of flannel and calico are the highest, that is, host-gw of flannel and BGP of calico.

Can the server run the BGP protocol before? Many public clouds do not support running the BGP protocol, so it is not possible to use calico's BGP mode.

What is the size of the cluster? If the scale is small, flannel can be used to maintain less than 100 nodes.

Is it capable of maintenance? Calico has a lot of routing tables, and using the BGP protocol, it is difficult to troubleshoot once problems arise, and it is also very troublesome to check hundreds of routing tables. This specific need also depends on your own hunger.

Small topic: how to interwork between office network and K8s network

Now the architecture is also micro-service is also popular, the test environment is in the K8s cluster, the developers are in the office network, the network is obviously different, micro-services use pod to communicate, office network access to pod is naturally different, then it is necessary to get through this network.

For example, if you develop a micro service, you don't want to start the whole set, registry, service discovery, and so on, but it only wants to run a module, directly call the registry database in the test environment, and use it directly. then you have to consider opening up the network.

For example, one is the office network, and then the development computer is used by others, and the other is the K8s cluster running a lot of micro-services. Then the ip of pod is 10.244.0. 0. 0. 0. 0. 0. 0. 0. 16, then the service is 10. 0. 0. 10 shock 24, and the host is 17. 2. 17. 0. 0. 0. 0. 0. 24, then the office network is 192.168.1.0 shock 24.

Then it must be impossible for the office network to access the IP of pod. Unless routed, even if the office network and the host network of K8s can be connected, some special forwarding strategies need to be made. The problem to be solved now is that the office network can access the IP of pod and the IP of service, that is, the internal network of K8s is exposed, and the office network accesses like virtual machines, so solve this problem. There are two situations.

In the first case, the K8s cluster test environment is in the subnet of the office network, so this implementation is relatively simple. You only need to add a rule to the router in the middle and upper layers of the subnet. Ip route add, the destination IP is 10.244.0.0x16, and the next hop via that arrives is the k8s node, such as k8s-node1 dev interface A.

Ip route add 10.244.0.0/16 via dev A

By adding such a rule, the office network can directly access the nodes of K8s, directly access podIP, that is, after the next-hop address, it can directly access podIP.

Second, K8s cluster and office network are in different VLAN and different computer rooms.

The premise is that the K8s cluster and the office network are interoperable and can be reached on the third floor.

There are two options: 1) add a routing table to the router, 10.244.0.0swap 16

2) using BGP, if the layer 3 routing supports the BGP protocol, you can directly connect the router BGP to the route reflector BGP, so that the BGP on the router can get the routing table information on K8s and forward it to the pod of the destination node via the next hop.

Summary: as long as it is a different vlan, it must be reachable at layer 3. It can access the network segment of the cluster, the pod segment or the service segment on the upper router. Be sure to tell it and help it forward to whom the next hop is. If the next hop is the destination node, forward the packet directly.

4.5 Network Policy

1. Why do I need network isolation?

The CNI plug-in solves the problem of Pod interconnection between different Node nodes, thus forming a flattened network. By default, the Kubernetes network allows all Pod-to-Pod traffic, that is, ping each other in the K8s network, to access and transmit packets. In some scenarios, we do not want the Pod to access each other by default, for example:

Access control between applications. For example, microservice An allows access to microservice B, while microservice C cannot access microservice A.

The development environment namespace cannot access the test environment namespace Pod

When Pod is exposed to the outside world, you need to make a Pod whitelist

Multi-tenant network environment isolation

For example, this namespace interconnects with other namespaces, exposing your pod to the office network. For convenience, but to improve some security, then you can whitelist directly who can access and who cannot, and then there are more micro-services deployed, and you also want to do some isolation, so you can also use network isolation, so you can use network policy to isolate the pod network.

Since it comes to the restriction of the network, that is, ACP access control, there are naturally two directions, one is the entrance direction, the other is the exit direction.

A user accesses the virtual machine, the client accesses this is the entry direction, for the client, this is the exit direction, the virtual machine accesses the external network naturally is the exit direction, doing ACL one is the entry direction and the other is the exit direction. We aim at pod who can access pod,pod and who can access.

Therefore, we need to use network policy to isolate the Pod network. Support for Pod-level and Namespace-level network access control.

Pod network entrance direction isolation

Based on Pod-level network isolation: only specific objects are allowed to access Pod (using label definition), and IP addresses or IP segments on the whitelist are allowed to access Pod

Based on Namespace-level network isolation: multiple namespaces, An and B namespaces Pod are completely isolated.

Pod network egress direction isolation

Deny external access to all Pod on a Namespace

Network isolation based on destination IP: only Pod is allowed to access IP addresses or IP segments on the whitelist

Network isolation based on target ports: only Pod is allowed to access ports on the whitelist

2. Overview of network policy

An example of NetworkPolicy:

ApiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: test-network-policy namespace: defaultspec: podSelector: matchLabels: role: db policyTypes:-Ingress-Egress ingress:-from:-ipBlock: cidr: 172.17.0.0db policyTypes 16 except:-172.17.1.0 Egress ingress 24-namespaceSelector: matchLabels: project: myproject-podSelector: matchLabels: Role: frontend ports:-protocol: TCP port: 6379 egress:-to:-ipBlock: cidr: 10.0.0.0 cidr 24 ports:-protocol: TCP port: 5978

Configuration resolution:

PodSelector: used to select the Pod group to which the policy is applied, that is, which pod is isolated from the network

PolicyTypes: it can include either Ingress,Egress or both. This policyTypes field indicates whether the given policy is used for inbound or outbound traffic for Pod, or both. If no value is specified, the default value is Ingress. If the network policy has exit rules, set egress, which is the direction of entry and exit, and restrict the direction of entry or exit.

Ingress:from is an accessible whitelist, which can come from IP segments, namespaces, Pod tags, and so on. Ports is an accessible port. What is the strategy of entry? the policy of entry is so limited.

Egress: this Pod group can access external IP segments and ports. The policy is so limited, pod going out is so limited, whether to access Baidu's ip or other ip, which ip segment or which namespace.

In the 172.17.0.0amp 16 large subnet, except for the 172.17.1.0amp 24, all other subnets can be accessed, and the namespaces are also accessible and inaccessible.

Cidr: 172.17.0.0/16

Except:

172.17.1.0/24

Pod can also define which port you can access to me, which is the definition of exit policy, that is, who you can access when you go out, and which ip segment port you can access.

Ports:protocol: TCP

Port: 6379

According to the rules of the yaml above, the structure is that the pod of the pod namespace carrying the label role:db is isolated from the network, and only anyone under the 172.17.0.0 / 16 subnet can access me except 172.17.1.0 amp 24.

NamespaceSelector:

MatchLabels:

Project: myprojectpodSelector:

MatchLabels:

Role: frontend

This namespace can access me, and those with role:frontend can also access me. These can only access my 6379 port, and they themselves can only access port 5978 of the ip of the 10.0.0.0/24IP segment.

3. Inbound and outbound network traffic access control cases

Now make access restrictions on pod

Pod access restrictions

Prepare the test environment, one web pod, two client podkubectl create deployment nginx-- image=nginxkubectl run client1-- generator=run-pod/v1-- image=busybox-- command-- sleep 36000kubectl run client2-- generator=run-pod/v1-- image=busybox-- command-- sleep 36000kubectl get pods-- show-labels

Requirements: isolate Pod with run=nginx tags in default namespaces, and only allow Pod with run=client1 tags in default namespaces to access port 80

ApiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: test-network-policy namespace: defaultspec: podSelector: matchLabels: app: nginx policyTypes:-Ingress ingress:-from:-namespaceSelector: matchLabels: project: default-podSelector: matchLabels: run: client1 ports:-protocol: TCP port: 80

Quarantine policy configuration:

Pod object: Pod with run=nginx tags in the default namespace

Allow access to port: 80

Allow access to objects: Pod with run=client1 tags in the default namespace

Deny access to objects: all objects except allowed access to objects

The test shows that the current client network cannot communicate, and this component calico support, like other flannel components, is not supported.

Namespace isolation

Requirements: all pod under the default namespace can access each other, but not other namespaces Pod, and other namespaces cannot access the default namespace Pod.

[root@k8s-master1] # kubectl run client3-generator=run-pod/v1-image=busybox-n kube-system-command-sleep 36000

Default's pod can be interconnected, kube-system 's pod cannot be interconnected with default's pod, and default cannot access kube-system 's pod, which means he is isolated to his own network namespace.

Now let's implement this requirement, and after it is created, because we do not limit the isolation of networks, networks with different namespaces are interoperable by default.

Create a pod under default named pod with the name nginx

[root@k8s-master1 ~] # kubectl get pod-o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESnginx-86c57db685-cv627 1 Running 0 5m57s 10.244.113.132 k8s-master1

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.