Analysis of Zero Trust Security Architecture under Kubernetes 04/26 Update SLTechnology News&Howtos

Analysis of Zero Trust Security Architecture under Kubernetes

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Click to download "different double 11 Technologies: cloud Native practice in Alibaba economy"

This article is excerpted from the book "different double 11 Technologies: cloud Native practice in Alibaba economy". Click on the picture above to download it!

Author

Yang Ning (Lintong) Senior Security expert of Aliyun basic products Division

Liu Zixi (Luo Bai) Ant Financial Services Group is a basic security expert.

Li Tingting (Hongshan) Ant Financial Services Group Security Foundation Security Senior Security expert

Brief introduction

Zero-trust security was first developed by John, chief analyst of Forrester, a well-known research institution. Kindvig proposed it in 2010. Zero-trust security re-evaluates and examines the traditional border security architecture, and gives new suggestions on the security architecture.

The core idea is that no person / device / system inside and outside the network should be trusted by default, and the trust basis of access control should be reconstructed based on authentication and authorization. Such as IP address, host, geographic location, network, etc., can not be used as trusted credentials. Zero trust subverts the paradigm of access control and leads the security architecture from "network centralization" to "identity centralization". Its essential demand is identity-centered access control.

At present, the concept of landing zero trust includes Google BeyondCorp, Google ALTS, Azure Zero Trust Framework, etc., cloud zero trust system is still a new technology trend, and the same zero trust model is also suitable for Kubernetes. This paper focuses on the technical analysis of zero trust security architecture under Kubernetes.

Traditional concept of Zero Trust and current Landing Microsoft Azure

Azure's zero trust is relatively perfect. From an architecture point of view, it covers end, cloud, On-Permises, SaaS and other applications. Let's analyze the related components:

User Identity: then authenticated by Identity Provider (components that create, maintain and manage user identities). Account passwords can be used in the re-authentication process, or MFA (Multi Factor Auth) multi-factor authentication can be used, including soft, hard Token, SMS, human features, etc. Device Identity: the device contains the company's devices and devices that are not uniformly managed. The information of these devices, including IP address, MAC address, installed software, operating system version, patch status, etc., is stored in Device Inventory;. In addition, the device will also have the corresponding Identity to prove the identity of the device; the device will have the corresponding device status, device risk to determine. Security Policy Enforcement: through the collected user Identity and status, device information, status and Identity, SPE policy is comprehensively determined, and Threat Intelligence can be combined to enhance the scope and preparation of SPE policy decision. Examples of policies include access to the following Data, Apps, Infrastructure, Network;Data: policies that classify, label, and encrypt data (Emails, Documents). Apps: the corresponding SaaS and On-Permises applications can be accessed adaptively; Infrastructure: including IaaS, PaaS, Container, Serverless, JIT (enable access on demand) and GIT version control software; Network: policy communication for network delivery process and internal micro-isolation.

The following Microsoft diagram is explained in more detail. Users (employees, partners, users, etc.) include Azure AD, ADFS, MSA, Google ID, etc., devices (trusted compliance devices) include Android, iOS, MacOS, Windows, Windows Defender ATP, client (client APP and authentication methods) include browsers and client applications, location (physical and virtual addresses) includes address location information, corporate network, etc. Microsoft machine learning ML, real-time evaluation engine, policy, etc. are used to make comprehensive decisions for users, clients, locations and devices to continuously and adaptively access On-Permises, Cloud SaaS Apps and Microsoft Cloud. The policies include Allow, Deny, restrict access, enable MFA, force password reset, block and lock illegal authentication, etc. As can be seen from the figure below, Azure has opened up On-Permises, Cloud, SaaS and other levels, and built a large and comprehensive zero-trust system.

Google BeyondCorp

Google BeyondCorp is a network security solution to deal with new network threats. In fact, Google BeyondCorp itself does not have too many technical upgrades, but uses a way of continuous verification to do so, eliminating * * and no longer dividing the internal and external networks. Google predicted before 2014 that the security of the Internet is as dangerous as that of the intranet, because once the boundary of the intranet is breached, it is easy for * * users to access some internal applications of the enterprise. Due to the problem of security awareness, the enterprise will think that my internal applications are very secure, so I deal with the internal applications with a low priority, resulting in a large number of internal security problems. Nowadays, more and more enterprises apply mobile and cloud technology, which makes it more and more difficult to protect the boundary. Therefore, Google simply treats it equally, regardless of internal and external, using the same security means to defend.

Look at the BeyondCorp model of Google from the perspective of *. For example, if you visit the internal application http://blackberry.corp.google.com of Google, it will jump to https://login.corp.google.com/, that is, the Google Moma system. You need to enter the account password before you can log in. During this login process, you will make a comprehensive decision on the device information and user information. After the account password is correct and the device information has been verified by the rules engine, Will continue to jump to the need for YubiKey login interface, each Google employee will have Yubikey, through Yubikey to do secondary verification. The value of Yubikey, Google believes that fishing can be completely eliminated. Another similar is Amazon's Midway-Auth mode (https://midway-auth.amazon.com/login?next=%2F).

Network zero trust under container zero trust model under Kubernetes

First of all, this paper introduces the network zero trust component Calico,Calico under the container, which is an open source network and network security solution for containers, virtual machines and host-based native Workload. Calico supports a wide range of platforms, including Kubernetes, OpenShift, Docker EE, OpenStack and bare metal services. The greatest value of zero-trust is that zero-trust networks are resilient even if people break applications or infrastructure in a variety of other ways. The zero-trust architecture makes it difficult for people to move horizontally, and targeted stepping-point activities are easier to detect.

Under the zero-trust architecture of container network, Calico+Istio is currently a hot solution. Let's take a look at some differences between the two. From the differences, we can see that Istio is the access control for Pod layer Workload and Calico for Node layer access control:

IstioCalicoLayerL3-L7L3-L4 implementation user state kernel state policy enforcement point PodNode

The following focuses on some of the technical details of Calico components and Istio. Calico builds a layer 3 routable network through Calico's Felix, a daemon running on Node, which runs on each Node resource. Felix is responsible for compiling routing and ACL rules and any other content required on the Node host in order to provide the necessary network connectivity for the normal operation of resources on the host.) run on each Node, mainly for routing and ACL policies and building the network; fine-grained access control through Iptables running on the Node. You can set the default Deny policy through Calico, and then minimize the implementation of access control policies through adaptive access control to build a zero-trust system under the container; Dikastes/Envoy: optional Kubernetes sidecars, which can protect Workload-to-Workload communication through mutual TLS authentication and add related control policies

Istio

Before we talk about Istio, let's talk about some security requirements and risk analysis of microservices:

1. After the micro-service is broken, the traffic is monitored through Sniffer, and then the middleman is used. In order to solve this risk, the traffic needs to be encrypted.

2. In order to control access between micro-services and micro-services, bi-directional TLS and fine-grained access policies are needed.

3. Audit tools are needed to examine who did what and when

After analyzing the corresponding risks, let's explain how Istio implements a zero-trust architecture. First of all, an obvious feature is that the whole link is encrypted by two-way mTLS, and the second feature is that the access between micro-services and micro-services can also be authenticated, and audit needs to be carried out after permission access. Istio separates the data plane from the control plane, and the control plane distributes authorization policy and security naming information to Envoy through Pilot, and then the data plane communicates with micro-services through Envoy. Envoy is deployed on the Workload of each microservice, and each Envoy agent runs an authorization engine that authorizes requests at run time. When the request arrives at the agent, the authorization engine evaluates the request context according to the current authorization policy and returns the authorization result ALLOW or DENY.

Zero Trust API Security under Micro Services

42Crunch (https://42crunch.com/) extends API security from the enterprise edge to each individual microservice and is protected by an ultra-low latency micro API firewall that can be deployed on a large scale. The deployment mode of 42Crunch API firewall is deployed in Sidecar proxy mode in Kubernetes Pod with millisecond performance response. This eliminates the process of writing and maintaining a single API security policy, implements a zero-trust security architecture, and improves API security under micro-services. 42Crunch's API security capabilities include: auditing: running more than 200 security audit tests defined by the OpenAPI specification and conducting detailed security scores to help developers define and strengthen API security; scanning: scanning real-time API endpoints for potential vulnerabilities; and protection: protecting API and deploying lightweight, low-latency Micro API Firewall on applications.

Best practices for Ant Zero Trust Architecture Landing

With the evolution of Service Mesh architecture, ants have begun to implement service authentication capabilities in internal landing Workload scenarios. How to build a set of service authentication capabilities among Workload based on ant architecture? we divide the problem into three sub-questions:

1. How to define the identity of Workload and how to realize a set of universal identity system.

2. Implementation of authorization model for inter-Workload access.

3. How to select the access control enforcement point.

Workload identity definition & Authentication method

The ant internally uses the Identity format given in the SPIFFE project to describe the identity of Workload, namely:

Spiffe:///cluster//ns/

However, in the process of project landing, it is found that the granularity of the identity format of this dimension is not fine enough, and it is strongly coupled with the partition rules of K8s for namespace. The volume of ants is large and there are many scenes, and the division rules of namespace in different scenarios are not completely the same. So we adjusted the format, combed out a set of necessary attributes (such as application name, environment information, etc.) needed to identify a Workload example in each scenario, and carried these attributes in the Labels of Pod. The adjusted format is as follows:

Spiffe:///cluster/

In line with this identity format standard, we add a Validating Webhook component to K8s API Server to verify the attribute information that must be carried in the above Labels. If one of the attribute information is missing, the instance Pod cannot be created. As shown in the following figure:

After solving the problem of Workload identity definition, all that is left is how to convert the identity into a verifiable format and pass through the service invocation link between Workload. In order to support different usage scenarios, we chose X.509 certificate and JWT format.

For the Service Mesh architecture scenario, we store the identity information in the Subject field of the X.509 certificate to carry the identity information of the Workload. As shown in the following figure:

For other scenarios, we store identity information in the Claims of JWT, and Secure Sidecar is used to provide services for the issuance and verification of JWT. As shown in the following figure:

Authorization model

In the initial stage of the project landing, the RBAC model is used to describe the authorization strategy of inter-Workload service invocation. For example, a service of application A can only be invoked by application B. This authorization strategy is fine in most scenarios, but in the process of the project, we found that this authorization strategy is not applicable to some special scenarios. We consider such a scenario in which there is an application A within the production network, which is responsible for providing centralized services for some dynamic configuration that all applications within the production network need to use at run time. The definition of this service is as follows: an apply-get dynamically configured RPC service:

Message FetchResourceRequest {/ / The appname of invokerstring appname = 1bot / The ID of resourcestring resource_id = 2;} message FetchResourceResponse {string data = 1;} service DynamicResourceService {rpc FetchResource (FetchResourceRequest) returns (FetchResourceResponse) {}}

In this scenario, if the RBAC model is still used, the access control policy of application A cannot be described, because all applications need to access the services of An application. However, this will lead to obvious security problems, and caller Application B can obtain the resources of other applications through this service. Therefore, we upgrade the RBAC model to ABAC model to solve the above problems. We use DSL language to describe the logic of ABAC and integrate it into Secure Sidecar.

Selection of access Control Enforcement Point

In terms of execution point selection, considering that it will take a certain amount of time to advance the Service Mesh architecture, we provide two different ways to be compatible with the Service Mesh architecture and the current scenario.

In the Service Mesh architecture scenario, RBAC Filter and ABAC Filter (Access Control Filter) are integrated in Mesh Sidecar.

In the current scenario, we currently provide JAVA SDK, and applications need to integrate SDK to complete all authentication and authorization related logic. Similar to the Service Mesh architecture scenario, all Identity issuance, verification, authorization and Secure Sidecar interaction are done by Secure Sidecar.

Conclusion

The core of zero trust is "Never Trust, Always Verify". In the future, it will continue to deepen the practice of zero trust throughout Alibaba, giving different roles different identities, such as employees, applications, and machines, and sinking access control points to various points of cloud native infrastructure to achieve global fine-grained control and create a new boundary of security protection. From the best practice of the zero-trust system in the industry to the zero-trust landing mode based on Kubernetes, this paper simply describes the zero-trust landing mode based on Cloud Native, hoping to trigger more discussion about the zero-trust architecture under Cloud Native and to see more excellent solutions and products in the industry appear.

The highlight of this book

In the practice of Shuang 11 super large K8s cluster, the problems and solutions encountered are described in detail. The best combination of Yunyuan biochemistry: Kubernetes+ container + Shenlong, to achieve the technical details of the core system 100% on the cloud. Double 11 Service Mesh super large-scale landing solution

"Alibaba Cloud Native focus on micro-services, Serverless, containers, Service Mesh and other technology areas, focus on cloud native popular technology trends, cloud native large-scale landing practice, to be the best understanding of cloud native developers of the technology circle."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.