How to securely isolate multi-tenant clusters by kubernetes 04/16 Update SLTechnology News&Howtos

How to securely isolate multi-tenant clusters by kubernetes

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

This article mainly discusses the solution of security isolation of multi-tenant cluster by kubernetes. There is a certain reference value, friends in need can refer to, follow the editor to see the solution.

What is a multi-tenant cluster?

First of all, let's introduce the concept of "tenant". The concept of tenant is not limited to the users of the cluster, it can include a set of workloads composed of computing, network, storage and other resources. In a multi-tenant cluster, different tenants need to be securely isolated as much as possible within a cluster (which may be multi-clusters in the future), so as to avoid malicious tenants to other tenants as much as possible. at the same time, it is necessary to ensure a fair distribution of shared cluster resources among tenants.

In terms of the security of isolation, we can divide it into soft isolation (Soft Multi-tenancy) and hard isolation (Hard Multi-tenancy).

Among them, soft isolation is more oriented to the multi-rent demand within the enterprise, and there is no malicious tenant by default in this form. The purpose of isolation is to protect the business between internal teams and to protect the possible security. On the other hand, hard isolation is more aimed at service providers who provide services, because the security background of business users in different tenants cannot be guaranteed under this business form. We default that there is a possibility of mutual sex between tenants and between tenants and K8s system, so stricter isolation is also needed as a security guarantee.

The different application scenarios of multi-tenancy will be described in more detail in the next section.

Multi-tenant application scenario

Here are two typical enterprise multi-tenant application scenarios and different isolation requirements:

Multi-tenancy of shared cluster within an enterprise

In this scenario, all users of the cluster come from within the enterprise, which is the current usage mode of many K8s cluster customers. Because of the controllable identity of service users, the security risk of this business form is relatively controllable. After all, the boss can directly lay off malicious employees:) according to the complexity of the internal personnel structure of the enterprise. We can logically isolate resources from different departments or teams through namespaces, and define business people in the following roles:

Cluster administrator: with cluster management capabilities (capacity expansion, adding nodes, etc.); responsible for creating and allocating namespaces for tenant administrators; CRUD; tenant administrators responsible for various policies (RAM/RBAC/networkpolicy/quota...): at least have RAM read-only permissions for the cluster; manage the RBAC configuration of relevant personnel within the tenant; intra-tenant users: use K8s resources within the permissions of the corresponding namespace of the tenant.

Based on the establishment of user role-based access control, we also need to ensure network isolation between namespaces, and only cross-tenant application requests within the whitelist range can be allowed between different namespaces.

In addition, for application scenarios that require a high level of business security, we need to limit the kernel capabilities of the application container, which can be combined with policy tools such as seccomp / AppArmor / SELinux to limit the capabilities of the container.

Of course, Kubernetes's existing single-layer logical isolation of namespaces is not enough to meet the isolation requirements of some complex business models of large enterprise applications. We can focus on Virtual Cluster, which abstracts a higher-level tenant resource model to achieve more refined multi-rent management, so as to make up for the lack of native namespace capabilities.

Multi-tenancy under SaaS & KaaS Service Model

In the SaaS multi-rent scenario, the tenants in the Kubernetes cluster correspond to each service application instance in the SaaS platform and the SaaS control plane. In this scenario, each service application instance on the platform can be divided into different namespaces. On the other hand, the end users of the service cannot interact with the control plane components of Kubernetes. These end users can see and use the SaaS's own console, and they use the service or deploy the business through the upper customized SaaS control plane (shown in the figure on the left below).

For example, a blog platform is deployed to run on a multi-tenant cluster. In this scenario, the tenant is each customer's blog instance and platform's own control plane. The control plane of the platform and each managed blog will run in a different namespace. Customers will create and delete blogs and update blog software versions through the interface of the platform, but will not be able to understand how the cluster works.

KaaS multi-lease scenarios are common to cloud service providers, in which the services of the business platform are directly exposed to users under different tenants through the Kubernetes control plane. End users can use K8s native API or CRDs/controllers-based APIs extended by service providers. For the most basic requirements of isolation, different tenants also need to logically isolate access through namespaces, while ensuring isolation on network and resource quotas between different tenants.

Unlike the shared cluster within the enterprise, the end users here come from untrusted domains, and it is inevitable that malicious tenants execute malicious code on the service platform, so we need a higher standard of security isolation for the multi-tenant cluster under the SaaS/KaaS service model, while the existing native capabilities of Kubernetes are not enough to meet the security requirements. To this end, we need kernel-level isolation such as security containers at the run time of the container to enhance tenant security in this business form.

Implement a multi-tenant architecture

When planning and implementing a multi-tenant cluster, we can first make use of Kubernetes's own resource isolation layer, including the cluster itself, namespaces, nodes, pod and containers are different levels of resource isolation model. When the application loads of different tenants can share the same resource model, there will be security risks to each other. For this reason, we need to control the resource domain that each tenant can access when implementing multi-lease, and at the resource scheduling level, we should try our best to ensure that the containers dealing with sensitive information run in relatively independent resource nodes; if from the perspective of resource overhead, when loads from different tenants share the same resource domain, the risk of cross-tenant * * can be reduced through run-time security and resource scheduling control strategies.

Although the existing security and scheduling capabilities of Kubernetes are not enough to implement multi-rent isolation completely, in application scenarios such as shared clusters within enterprises, resource domains between tenants are isolated through namespaces, while tenants' restrictions on resource access scope and capabilities are controlled by policy models such as RBAC, PodSecurityPolicy, NetworkPolicy, and the combination of existing resource scheduling capabilities can provide considerable security isolation capabilities. For service platforms such as SaaS and KaaS, we can achieve container kernel-level isolation through the security sandbox container recently launched by Aliyun CCS, which can minimize the cross-tenant isolation of malicious tenants by means of escape.

This section focuses on multi-tenancy practices based on Kubernetes's native security capabilities.

Access control AuthN & AuthZ & Admission

The authorization of ACK cluster is divided into two steps: RAM authorization and RBAC authorization, in which RAM authorization acts on the access control of cluster management interfaces, including CRUD permissions to the cluster (such as cluster visibility, expansion, adding nodes, etc.), while RBAC authorization is used for access control of Kubernetes resource model within the cluster, which can specify the detailed authorization of resources in the namespace granularity.

ACK authorization management provides different levels of preset role templates for users within tenants, supports binding to multiple user-defined cluster roles, and supports authorization for batch users. For more information on cluster-related access control authorization on ACK, please refer to the relevant help documentation.

NetworkPolicy

NetworkPolicy can control the network traffic between different tenant business pod. In addition, you can turn on the business access restrictions between different tenants by whitelist.

You can configure NetworkPolicy on a container service cluster that uses the Terway network plug-in, and here are some examples of policy configuration.

PodSecurityPolicy

PSP is the native cluster-dimensional resource model of K8s. It can verify whether its runtime behavior meets the constraints of the corresponding PSP policy during the admission phase of the pod creation request in apiserver, such as checking whether pod uses host's network, file system, designated port, PID namespace, etc., at the same time, it can restrict users in tenants to open privileged (privileged) containers, restrict disk types, and force read-only mounts. Not only that, PSP can also add corresponding SecurityContext to pod based on binding policy, including container runtime uid,gid and added or deleted kernel capabilities and other settings.

For information on how to enable the use of PSP admission and related policies and permissions, see here.

OPA

OPA (Open Policy Agent) is a powerful policy engine that supports decoupled policy decisions services and the community already has a relatively mature integration solution with Kubernetes. When the existing RBAC namespace granularity isolation can not meet the complex security requirements of enterprise applications, fine-grained access policy control at the object model level can be provided through OPA.

At the same time, OPA supports seven-layer NetworkPolicy policy definition and labels/annotation-based cross-namespace access control, which can be used as an effective enhancement of K8s native NetworkPolicy.

Resource Quotas & Limit Range related to resource scheduling

In a multi-tenant scenario, when different teams or departments share cluster resources, it is inevitable that there will be resource competition, so we need to limit the resource usage quota for each tenant. ResourceQuota is used to limit the total resources request and limit,LimitRange occupied by all pod under the tenant's corresponding namespace and to set the default resource request and limit values of the deployment pod in the tenant's corresponding namespace. In addition, we can also limit the quota of storage resources and the number of objects for tenants.

Detailed instructions on resource quotas can be found here.

Pod Priority/Preemption

Since version 1.14, the priority and preemption of pod have become stable from beta, where pod priority identifies the priority that pod waits in the scheduling queue of pending status; when high-priority pod cannot be scheduled due to insufficient node resources, scheduler will try to expel low-priority pod to ensure that high-priority pod can be scheduled and deployed.

In a multi-tenant scenario, priority and preemption settings can be used to ensure the availability of important business applications within a tenant. At the same time, pod priority can be used in conjunction with ResouceQuota to limit the quota of tenants at a specified priority.

Dedicated Nodes

Note: malicious tenants can circumvent policies enforced by node stains and tolerance mechanisms. The following instructions are used only for clusters of trusted tenants within the enterprise, or clusters where tenants do not have direct access to the Kubernetes control plane.

By smudging some of the nodes in the cluster, you can use them to designate several tenants for dedicated use. In a multi-tenant scenario, for example, the GPU nodes in the cluster can be reserved for use by service teams that need to use GPU in business applications. The cluster administrator can stain the node with tags such as effect: "NoSchedule", and only pod configured with the appropriate tolerance setting can be dispatched to that node.

Of course, malicious tenants can also access the node by adding the same tolerance configuration to their own pod, so only using node stain and tolerance mechanism can not guarantee the exclusivity of the target node on the untrusted multi-rent cluster.

See here for information on how to use the node stain mechanism to control scheduling.

Sensitive information protection secrets encryption at REST

In a multi-tenant cluster, different tenant users share the same set of etcd storage. In the scenario where end users can access the Kubernetes control plane, we need to protect the data in secrets to avoid disclosure of sensitive information when access control policies are improperly configured. To do this, you can refer to the native secret encryption capabilities of K8s, see here.

ACK also provides an open source secrets encryption solution based on Aliyun's KMS service, which can be found here.

Summary

In the implementation of multi-tenant architecture, we first need to determine the corresponding application scenarios, including judging the credibility of users and application load within the tenant and the corresponding degree of security isolation. On this basis, the following are the basic requirements of security isolation:

Enable the default security configuration of Kubernetes cluster: enable RBAC authentication and implement soft isolation based on namespace; enable secrets encryption capability to enhance the protection of sensitive information; make corresponding security configuration based on CIS Kubernetes benchmarks; enable the privilege mode of NodeRestriction, AlwaysPullImages, PodSecurityPolicy and other related admission controllers; to restrict pod deployment through PSP, while controlling its runtime SecurityContext; configuration NetworkPolicy; to use Resource Quota & Limit Range to limit tenants' resource usage quota Follow the principle of minimizing permissions when the application is running, and minimize the system permissions of containers in pod as much as possible; Log everything; interconnects with the monitoring system to realize the monitoring of container application dimensions.

For service models such as SaaS and KaaS, or when we cannot guarantee the trustworthiness of the users within the tenant, we need to adopt some more powerful isolation measures, such as:

Use dynamic policy engines such as OPA for fine-grained access control at the network or Object level; use security containers to achieve kernel-level security isolation at the container runtime; and multi-rent isolation schemes for complete monitoring, logging, storage and other services.

This is the end of kubernetes's method of securely isolating multi-tenant clusters. I hope the above content can be of some help and learn more knowledge. If you like this article, you might as well share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.