In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the problems and solutions encountered in scaling Kubernetes to 2500 nodes, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.
Kubernetes has been claimed to be able to carry more than 5000 nodes since 1.6. but it is hard to avoid problems on the way from tens to 5000.
The problem encountered and how to solve the problem 1: after 1 ~ 500 nodes
Question:
Kubectl sometimes appears timeout (p. S. Kubectl-vault 6 can display all API detail instructions)
Try to resolve:
At first, I thought it was the load of the kube-apiserver server, so I tried to add proxy to do replica to help with load balancing.
However, when there are more than 10 backup master, it is not because the kube-apiserver cannot bear the load. The GKE can carry 500 nodes through a single 32-core VM.
Reason:
Eliminate the above reasons and start troubleshooting the remaining services on master (etcd, kube-proxy)
Start trying to adjust the etcd
Abnormal latency (latency spiking ~ 100ms) was found by using datadog to view etcd throughput
Through the performance evaluation of the Fio tool, it was found that only 10% of IOPS (Input/Output Per Second) was used, which degraded performance due to write latency (write latency 2ms).
Try to change the SSD from a network hard disk to a local temp drive (SSD) on each machine.
Results from ~ 100ms-> 200us
Question 2: when there are 1000 nodes
Question:
Found that kube-apiserver reads 500mb from etcd every second
Try to resolve:
View network traffic between container via Prometheus
Reason:
It is found that Fluentd and Datadog crawl data on each node too frequently.
Reduce the crawl frequency of the two services, and the network performance decreases from 500mb/s to almost zero.
Etcd tip: through-- etcd-servers-overrides, you can write the data of Kubernetes Event as a cut, which can be processed by different machines, as shown below
-- etcd-servers-overrides=/events# https://0.example.com:2381;https://1.example.com:2381;https://2.example.com:2381 question 3: 1000 ~ 2000 nodes
Question:
Unable to write data again, error cascading failure
Kubernetes-ec2-autoscaler does not return the problem until all etcd has been stopped, and all etcd is closed.
Try to resolve:
Guess etcd's hard drive is full, but there's still plenty of room to check SSD.
Check if there is a preset space limit and find that there is a 2GB size limit
Solution:
Add-- quota-backend-bytes to the etcd startup parameters
Modify kubernetes-ec2-autoscaler logic-if there is a problem with more than 50%, shut down the cluster
High availability of optimized Kube masters for various services
Generally speaking, our architecture is a kube-master (the main Kubernetes service provider component with kube-apiserver, kube-scheduler, and kube-control-manager on it) plus multiple slave. However, to achieve high availability, you should refer to the implementation method:
Kube-apiserver needs to set up multiple services, and restart and set with the parameter-- apiserver-count.
Kubernetes-ec2-autoscaler can help us turn off idle resources automatically, but this is contrary to the principles of Kubernetes scheduler, but through these settings, we can help us concentrate resources as much as possible.
{"kind": "Policy", "apiVersion": "v1", "predicates": [{"name": "GeneralPredicates"}, {"name": "MatchInterPodAffinity"}, {"name": "NoDiskConflict"}, {"name": "NoVolumeZoneConflict"}, {"name": "PodToleratesNodeTaints"}], "priorities": [{"name": "MostRequestedPriority", "weight": 1} {"name": "InterPodAffinityPriority", "weight": 2}]}
The above is to adjust the kubernetes scheduler example, by increasing the weight of InterPodAffinityPriority to achieve our goal. More demonstration and reference examples.
It should be noted that Kubernetes Scheduler Policy does not support dynamic switching, so you need to restart kube-apiserver (issue: 41600).
The impact of adjusting scheduler policy
OpenAI used KubeDNS, but soon found out--
Question:
Situations that can not be queried by DNS occur frequently (random occurrence)
Exceed ~ 200QPS domain lookup
Try to resolve:
Try to find out why there is such a state, and find that there are more than 10 KuberDNS running on some node
Solution:
Due to scheduler policy, many POD are concentrated.
KubeDNS is lightweight and easy to be assigned to the same node, resulting in the centralization of domain lookup
Need to modify POD affinity (related introduction) so that KubeDNS can be assigned to different node as far as possible.
Affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution:-weight: 100labelSelector: matchExpressions:-key: k8s-app operator: In values:-kube-dns topologyKey: kubernetes.io/hostname slow Docker image pulls when creating new nodes
Question:
Every time a new node is set up, it takes docker image pull 30 minutes.
Try to resolve:
There is a large container image Dota, almost 17GB, which affects the image pulling of the entire node.
Start checking if kubelet has other image pull options
Solution:
Add the option-- serialize-image-pulls=false to start image pulling in kubelet, so that other services can pull earlier (see: kubelet startup option)
This option requires docker storgae to switch to overlay2 (refer to docker teaching articles)
And storing docker image to SSD can make image pull faster.
Supplement: source trace
/ / serializeImagePulls when enabled, tells the Kubelet to pull images one// at a time. We recommend * not* changing the default value on nodes that// run docker daemon with version < 1.9or an Aufs storage backend.// Issue # 10959 has more details.SerializeImagePulls * bool `json: "serializeImagePulls" `to increase the speed of docker image pull
In addition, the speed of pull can be improved in the following ways
Kubelet parameter-image-pull-progress-deadline must be increased to 30mins docker daemon parameter max-concurrent-download adjusted to 10 before multithreaded download
Network performance improvement
Flannel performance limit
The network traffic between OpenAI nodes can reach 10-15GBit/s, but due to Flannel, the traffic will drop to ~ 2GBit/s.
The solution is to remove the Flannel and use the actual network
HostNetwork: true
DnsPolicy: ClusterFirstWithHostNet
So much for the problems and solutions encountered in scaling Kubernetes to 2500 nodes. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.