Kubernetes's first serious security vulnerability discoverer, talking about the discovery process and mechanism 04/09 Update SLTechnology News&Howtos

Kubernetes's first serious security vulnerability discoverer, talking about the discovery process and mechanism

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

On November 26th North American time, Kubernetes exposed a serious security vulnerability discovered by Darren Shepherd, co-founder and chief architect of Rancher Labs. The vulnerability CVE-2018-1002105 (also known as Kubernetes privilege escalation vulnerability, https://github.com/kubernetes/kubernetes/issues/71411) is identified as a severity of 9.8 out of 10, and malicious users can use the Kubernetes API server to connect to the back-end server to send arbitrary requests and authenticate through the TLS credentials of the API server. The seriousness of this security vulnerability is that it can be executed remotely. It is not complex and does not require user interaction or special permissions.

After the vulnerability was discovered and verified, Kubernetes responded quickly and released patched versions v1.10.11, v1.11.5, v1.12.3 and v1.13.0-rc.1. Users who are still using Kubernetes v1.0.x to Kubernetes v1.9.x are advised to stop and upgrade to the patched version immediately.

This article was written by Darren Shepherd, co-founder and lead architect of Rancher Labs, the discoverer of the vulnerability. He described how he discovered the loophole, analyzed the mechanism and principle of the problem, and shared the corresponding solutions and his own views on Kubernetes and the open source community.

Darren Shepherd, co-founder and chief architect of Rancher Labs, is also one of only four top individual contributors to the Docker Governance Council (DGAB), the ecological core organization of Docker.

The problem with Amazon ALB

It all started in 2016, when Rancher Labs released Rancher 1.6. In mid-2016, Amazon released ALB, a new HTTP (layer 7) load balancer. ALB is much easier to set up than ELB, so we recommend that users use ALB. Soon after, we began to receive reports of failed ALB backend settings, and many random requests only got 401,403,404,503 errors. However, Rancher Labs's team was unable to reproduce these errors, and none of the logs we got from community members could be referenced. We saw HTTP requests and responses, but could not associate them with the code. At that time, we had to think that it was because the ALB was just released and there might be some errors in the product itself. Apart from ALB, we have never encountered any other load balancer problems before. So at that time, we had to finally tell users not to use ALB.

In August of this year, members of the Rancher community submitted the same question to Rancher 2.1 (https://github.com/rancher/rancher/issues/14931). As before, using ALB results in odd 401 and 403 errors. This caught my attention a lot, because there is no common code between Rancher 1.x and 2.x, and ALB should be quite mature by now. After repeated in-depth study, I found that the problem was related to not handling non-101responses and reverse proxy caching TCP connections. If you want to really understand this problem, you must understand TCP connection reuse, how websockets uses TCP connections, and HTTP reverse proxies.

TCP connection reuse

In a very naive HTTP approach, the client opens the TCP socket, sends the HTTP request, reads the HTTP response, and then closes the TCP socket. You will soon find that you spend too much time opening and closing TCP connections. Therefore, the HTTP protocol has a built-in mechanism so that clients can reuse TCP connections across requests.

WebSockets

Websockets is a two-way communication that works differently from the HTTP request / response flow. In order to use websockets, the client first sends a HTTP upgrade request, and the server responds with a HTTP 101 Switch Protocols response. Upon receipt of the 101, the TCP connection will be dedicated to websocket. For the rest of the life cycle of a TCP connection, it is considered to be dedicated to that websocket connection. This means that this TCP connection will never be reused.

HTTP reverse proxy

HTTP reverse proxies (load balancers are reverse proxies) receive requests from clients and send them to different servers. For a standard HTTP request, it only writes the request, reads the response, and then sends the response to the client. This logic is fairly straightforward, and Go also includes a built-in reverse proxy: https://golang.org/pkg/net/http/httputil/#ReverseProxy.

By contrast, Websockets is a little more complicated. For websocket, you have to look at the request, see that it is an upgrade request, then send the request, read the 101response, then hijack the TCP connection, and start copying bytes back and forth. For reverse proxies, it does not view the contents of the connection after that, it just creates an "abandoned pipe". This logic does not exist in the standard Go library, and many open source projects have written code to do so.

Where is the mistake?

As for the error, the explanation for not looking at the version for too long is that Kubernetes did not check the response before starting the "abandoned pipe". In the defense of code, it is common not to check 101. (this is also the reason why we mentioned above that Rancher users will find problems with Rancher 1.x and Rancher 2.x, even though Rancher 1.x and Rancher 2.x use different code. ) the error scenarios are as follows:

Client sends websocket upgrade request

The reverse proxy sends an upgrade request to the back-end server

The back-end server responded with 404

The reverse agent starts the replication cycle and writes 404 to the client

The client sees the 404 response and adds the TCP connection to the Free connection Pool

In this case, if the client reuses the TCP connection, it will write a request to the TCP connection, which will go through the "abandoned pipe" in the reverse proxy and send it to the previous backend. Usually this is not so bad, for example, in the case of a load balancer, because all requests go to the same set of similar backends. However, this problem occurs when the reverse proxy is intelligent and it performs authentication, authorization, and routing (that is, all the work Kubernetes does).

Security loophole

Because 101 is not processed, the client ends up using a TCP connection, which is an "obsolete pipeline" for some previously accessed back-end services. This will lead to an escalation of privileges. The problem is that Kubernetes will only perform authorization for many requests in the reverse proxy. This means that if I route a failed authorization websocket request to a kubelet, I can maintain a persistent connection to that kubelet and then run any API command I choose, whether I am authorized or not. For example, you can run exec on any pod and copy out the secrets. Therefore, in this case, authorized users can basically get full API access to kubelet (the same thing applies to services running through kube-aggregation).

Another problem occurs when you add another reverse proxy. In this case, you put the HTTP load balancer before the Kubernetes API (not a layer 4 load balancer). If you do this, the authenticated TCP connection running "abandoned pipe" will be added to a free pool that any user can access. Then, user A creates a TCP connection, and then user B can still reuse the connection. This allows unauthenticated users to access your Kubernetes cluster.

You may panic at this point, because of course everyone will put a load balancer in front of the kube-apiserver. Um. First, you must run the HTTP load balancer instead of the TCP load balancer. Load balancers must understand the semantics of HTTP to cause this problem. Second, fortunately, most reverse proxies don't care about 101 replies. This is why this problem has been undiscovered for a long time (in many open source projects). Most load balancers do not reuse TCP connections after seeing upgrade requests rather than 101responses. So, if you are affected by this vulnerability, your Kubernetes settings should be unreliable and you should be able to see requests that fail randomly or cannot be completed. At least I know that's how ALB works, so don't use ALB until you upgrade to a version of Kubernetes that has fixed this vulnerability.

In short, this security vulnerability in Kubernetes allows any authenticated user with the correct permissions to gain more privileges. If you are running a hardware multi-tenant cluster (with untrusted users), you should really worry and respond in a timely manner. If you are not worried about users actively interacting with each other (as is the case with most multi-tenant clusters), don't panic, just upgrade to the version of Kubernetes that has fixed the vulnerability. In a worst-case scenario, your load balancer may also prevent this if unauthenticated users can enter your cluster. As long as you don't expose API to the world and put some appropriate ACL on it, maybe your cluster is secure.

Rancher escorts Kubernetes.

For users using the Rancher Kubernetes platform, you don't have to be nervous.

For users who deploy the cluster on the intranet, there is no need to worry too much about this problem, because the outside cannot directly *.

Users of kubernetes clusters deployed through Rancher2.0 or RKE also need not worry too much. Because clusters deployed through Ranche2.0 or RKE are forbidden and anonymous user access by default. For the problem of raising rights through pod exec/attach/portforward, the general repair method released by Kubernetes is to upgrade to the specified version of Kubernetes. For this Rancher, a fix has also been issued. For more information, please see https://forums.rancher.com/t/rancher-security-advisory-kubernetes-cve-2018-1002105/12598.

Thank you for open source.

I deeply feel that the initial discovery, repair and final delivery of this security vulnerability in Kubernetes is a testament to the vitality of the open source community. I first found this problem because of the feedback from Rancher's non-paying open source users. In fact, we have confirmed that this problem does not affect paying customers of Rancher 2.x, because Rancher's HA architecture negates ALB's behavior, but we study and solve the problem because we love our open source users so much. It was in the process of studying and fixing this problem that I discovered the security risks of Kubernetes itself and reported the problem to the Kubernetes community through the established security disclosure process.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.