Detailed introduction of Cilium multi-cluster 07/04 Update SLTechnology News&Howtos

Detailed introduction of Cilium multi-cluster

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "detailed introduction of Cilium multi-cluster". In daily operation, I believe many people have doubts about the detailed introduction of Cilium multi-cluster. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "detailed introduction of Cilium multi-cluster"! Next, please follow the editor to study!

This paper is an in-depth study of ClusterMesh (multi-cluster implementation of Cilium). In short, ClusterMesh provides:

Pod IP routing to multiple Kubernetes clusters with local performance through tunneling or direct routing without the need for any gateways or agents.

Transparent service discovery using standard Kubernetes services and coredns/kube-dns.

Network policy enforcement across multiple clusters. The policy can be specified as a Kubernetes NetworkPolicy resource or an extended CiliumNetworkPolicy CRD.

Transparent encryption for all communications between nodes in the local cluster and across cluster boundaries.

The multi-cluster feature is built on a layer-by-layer basis, and you can choose to use all the layers, or you can select and use only the layers you want.

Use case

Before delving into the implementation details, let's review some of the use cases that connect multiple Kubernetes clusters.

Use case: high availability is the most obvious use case for most people. This use case includes running a Kubernetes cluster in multiple zones (regions) or availability zones (availability zones) and running a copy of the same service in each cluster. If it fails, the request can fail over to another cluster. The failure scenarios covered in this use case are not primarily entire areas or failure domains that are completely unavailable. It is more likely that resources in a cluster are temporarily unavailable or misconfigured so that specific services cannot be run or extended in a cluster. Use case: shared services

The initial trend of Kubernetes-based platforms is to build large multi-tenant Kubernetes clusters. It is becoming more and more common to build a single cluster for each tenant or for different types of services, such as different levels of security sensitivity. However, some services (such as key management, logging, monitoring, or DNS) are usually still shared across all clusters. This avoids the operational overhead of maintaining these services in each tenant cluster.

The main motivation of this model is the isolation between tenant clusters. To maintain this goal, tenant clusters connect to shared service clusters but not to other tenant clusters.

Use case: split stateful and stateless services

The operational complexity of running stateful or stateless services is very different. Stateless services are easy to extend, migrate, and upgrade. Running the cluster entirely with stateless services keeps the cluster flexible and agile. It is easy to move from one cloud provider to another.

Stateful services may introduce potentially complex dependency chains. Migration services usually involve storage migration.

Pod IP routing

Pod IP routing is the foundation of multi-cluster capability. It allows pod across clusters to connect with each other through its pod IP. Cilium can run in a variety of modes to perform pod IP routing. All of these modes are capable of performing multi-cluster pod IP routing.

Tunnel mode encapsulates all network packets sent out in pod in so-called packet headers. The packet header can contain VXLAN or Geneve frames. The encapsulated frame is then transmitted through a standard UDP header. The concept is similar to a VPN tunnel.

Pros: pod IP is never visible on the underlying network. The network can only see the IP address of the worker node. This simplifies installation and firewall rules.

Cons: the additional network headers required will reduce the theoretical maximum throughput of the network. The exact cost depends on the configured MTU, which is more pronounced when using 1500 traditional MTU than jumbo frames that use MTU 9000.

Cons: in order not to consume too much CPU, the entire network stack, including the underlying hardware, must support checksum and segmented offload to calculate the checksum and perform segmentation in the hardware, just as you do with "regular" network packets. Nowadays, the availability of this uninstall feature is very common.

Direct routing pattern

In the direct routing mode, all network packets are routed directly to the network. This requires the network to be able to route pod IP. You can use several options to propagate pod IP routing information across nodes:

Use the-- auto-direct-node-routes option, which is an ultra-lightweight routing propagation method through kvstore, which works if all worker nodes share a single layer 2 network. This requirement is usually met for all forms of cloud provider-based virtual networks.

Use kube-router integration to run the BGP routing daemon.

Use any other routing daemon to inject routes into the standard Linux routing table (bird,quagga,... )

When the network no longer understands pod IP, the network packet address needs to be disguised.

Advantages: reduced network packet headers optimize network throughput and latency.

Cons: the entire network must be able to route pod IP, which increases the complexity of the operation.

Hybrid routing pattern

Hybrid routing patterns allow direct routing when available, usually in local clusters or other clusters in the same VPC, while fallback to tunnel mode when spanning VPC or cloud providers. This limits operational complexity and allows optimization costs to be paid only when needed.

Service discovery

Service discovery in Cilium's multi-cluster model is built using standard Kubernetes services and is designed to be completely transparent to existing Kubernetes application deployments:

ApiVersion: v1 kind: Service metadata: name: rebel-base annotations: io.cilium/global-service: "true" spec: type: ClusterIP ports:-port: 80 selector: name: rebel-base

Cilium monitors Kubernetes services and endpoints, as well as listening services, through an annotation io.cilium/global-service: "true". For such services, all services with the same name and namespace information are automatically merged to form global services that are available across clusters.

According to the standard Kubernetes health check logic, any traffic to the ClusterIP of the global service is automatically load balanced to endpoints in all clusters.

Each cluster continues to maintain its own ClusterIP for each service, which means that Kubernetes and kube-dns / coredns do not know about the other clusters. The DNS server continues to return ClusterIP,Cilium that is valid only in the local cluster will transparently perform load balancing.

There are several additional comments for fine-grained control, such as one-way exposure or affinity policies.

All traffic from frontend-1 to ClusterIP 30.1.1.1 will be automatically load balanced to the backend pod IP of cluster 1 [10.0.0.1jue 10.0.0.2] and the backend pod IP of cluster 2 [20.0.0.1c20.0.0.2]. Each cluster performs a health check of the local backend instance and notifies other clusters when the container is created, destroyed, or becomes unhealthy.

Transparent encryption

Transparent encryption introduced in Cilium 1.4 is compatible with multi-cluster. Ensure that all nodes in all clusters are configured with a public key so that all communication between nodes is automatically encrypted.

Multi-cluster network strategy

The simple version is that policy enforcement that you are familiar with from a single cluster will simply extend and work across clusters. Because the policy is specified using the pod tag, the policy that allows frontend to communicate with backend will be applied to cluster traffic, just as traffic crosses the cluster.

Cilium does not automatically propagate NetworkPolicy or CiliumNetworkPolicy across clusters. It is the user's responsibility to import the policy into all clusters. This is intentional because it means that each cluster can decide whether to allow the cluster to receive or send traffic from or to the remote cluster.

Allow cross paths for specific clusters

You can only establish policies that apply to pod in a particular cluster. The cluster name is represented by Cilium as a tag on each pod, and the matching cluster name can be endpointSelector or a matchLabels tag constructed by toEndpoints and fromEndpoints:

ApiVersion: "cilium.io/v2" kind: CiliumNetworkPolicymetadata: name: "allow-cross-cluster" description: "Allow x-wing in cluster1 to contact rebel-base in cluster2" spec: endpointSelector: matchLabels: name: x-wing io.cilium.k8s.policy.cluster: cluster1 egress:-toEndpoints:-matchLabels: name: rebel-base io.cilium.k8s.policy.cluster: cluster2

The above example policy will allow x-wing in cluster1 to talk to rebel-base in cluster2. Unless there is an additional strategy to whitelist traffic, x-wing will not be able to communicate with rebel-base in the local cluster.

Relationship with Istio multi-cluster

The two projects are independent, but they can well complement each other. A common way to combine Cilium and Istio multi-cluster is to use Cilium's multi-cluster Pod IP routing layer to meet the following requirements of the Istio multi-cluster guide:

All pod CIDR in each cluster must be routable to each other.

In addition, the Cilium policy enforcement function can be used to protect communication with the Istio control plane, to protect sidecar bypass attempts through unsupported protocols such as UDP or IPV6, and to prevent compromised sidecar agents.

You can also run both the global Istio service and the Cilium global service. All Istio managed services have access to Cilium's global services because they can be discovered through DNS like regular services.

At this point, the study of "detailed introduction to Cilium multi-cluster" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.