In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Author | Shengdong Aliyun after-sales technical expert
Guide: I don't know if you are aware of a reality: most of the time, we no longer use a system through the command line or visual window as before.
Preface
Now we go to Weibo, or online shopping, in fact, what we operate is not the device in front of us, but one cluster after another. Typically, such a cluster has hundreds of nodes, each of which is a physical machine or virtual machine. Clusters are generally far away from users and are located in the data center. In order to make these nodes cooperate with each other and provide consistent and efficient services, the cluster needs an operating system. Kubernetes is such an operating system.
Comparing Kubernetes with the stand-alone operating system, Kubernetes is equivalent to the kernel, which is responsible for the management of cluster software and hardware resources and provides a unified entrance through which users can use the cluster and communicate with the cluster.
The programs running on the cluster are very different from ordinary programs. Such a procedure is a "caged" procedure. They are unusual from being made, to being deployed, to being used. Only by digging deep into the root can we understand its essence.
The program code of "in a cage"
We use go language to write a simple web server program app.go, which listens on port 2580. Access the root path of the service through the http protocol, and the service will return "This is a small app for kubernetes..." String.
Package mainimport ("github.com/gorilla/mux"log"net/http") func about (w http.ResponseWriter, r * http.Request) {w.Write ([] byte ("This is a small app for kubernetes...\ n"))} func main () {r: = mux.NewRouter () r.HandleFunc ("/") About) log.Fatal (http.ListenAndServe ("0.0.0.0virtual 2580", r))}
Use the go build command to compile the program to produce the app executable. This is a normal executable file that runs in the operating system and relies on the library files in the system.
# ldd applinux-vdso.so.1 = > (0x00007ffd1f7a3000) libpthread.so.0 = > / lib64/libpthread.so.0 (0x00007f554fd4a000) libc.so.6 = > / lib64/libc.so.6 (0x00007f554f97d000) / lib64/ld-linux-x86-64.so.2 (0x00007f554ff66000) "cage"
In order to make this program independent of the operating system's own library files, we need to create a container image, that is, an isolated runtime environment. Dockerfile is a "recipe" for making container images. Our recipe has only two steps: download a basic image of centos and put the executable app in the image / usr/local/bin directory.
FROM centosADD app / usr/local/bin address
The finished image is stored locally, and we need to upload the image to the image repository. The image warehouse here is the equivalent of an app store. We use Aliyun's image repository. After uploading, the image address is:
Registry.cn-hangzhou.aliyuncs.com/kube-easy/app:latest
The image address can be split into four parts: repository address / namespace / image name: image version. Obviously, the image above the image, in the Aliyun Hangzhou image warehouse, uses the namespace kube-easy, and the image name: version is app:latest. At this point, we have a "caged" Mini Program that can run on the Kubernetes cluster.
Enter the entrance through the door.
Kubernetes as an operating system, like ordinary operating systems, has the concept of API. With API, the cluster has an entrance; with API, we use the cluster to get in. The API of Kubernetes is implemented as a component API Server running on the cluster node. This component is a typical web server program that provides services by exposing the http (s) interface to the outside.
Here we create an Aliyun Kubernetes cluster. Log in to the cluster management page and we can see the public network entrance of API Server.
API Server intranet connection endpoint: https://xx.xxx.xxx.xxx:6443 two-way digital certificate verification
Aliyun Kubernetes cluster API Server component uses two-way digital certificate authentication based on CA signature to ensure secure communication between client and api server. This sentence is very eloquent and difficult for beginners to understand. Let's explain it in depth.
Conceptually, a digital certificate is a file used to verify network communication participants. This is similar to the diploma issued by the school to the students. Between the school and the students, the school is the trusted third party CA, and the students are the communication participants. If the society generally trusts the reputation of a school, then the diploma issued by the school will also be recognized by the society. The participant certificate and the CA certificate can be compared to the graduation certificate and the school license.
Here we have two types of participants, CA and ordinary participants; correspondingly, we have two kinds of certificates, CA certificates and participant certificates; and we also have two relationships, certificate issuance relationship and trust relationship. These two relationships are crucial.
Let's look at the issuance relationship first. As shown in the figure below, we have two CA certificates and three participant certificates.
The top CA certificate issues two certificates, one is the middle CA certificate, the other is the participant certificate on the right, and the middle CA certificate issues the following two participant certificates. These six certificates are linked by the issuing relationship, forming a tree-like certificate issuance diagram.
However, the certificate and the issuing relationship itself do not guarantee that trusted communication can take place between participants. As an example, suppose the rightmost participant is a website and the leftmost participant is a browser. The browser trusts the data of the website, not because the website has a certificate or because the certificate of the website is issued by CA, but because the browser believes in the top CA, that is, the trust relationship.
After understanding CA (certificate), participant (certificate), issuing relationship, and trust relationship, we look back at "two-way digital certificate authentication based on CA signature". As ordinary participants in communication, the client and API Server each have a certificate. These two certificates are both issued by CA, and we simply call them cluster CA and client CA. The client trusts the cluster CA, so it trusts the API Server; that has the certificate issued by the cluster CA. In turn, API Server needs to trust the client CA before it is willing to communicate with the client.
Aliyun Kubernetes cluster, cluster CA certificate, and client CA certificate are actually a certificate in implementation, so we have such a diagram.
KubeConfig file
Log in to the cluster management console and we can get the KubeConfig file. This file includes client certificates, cluster CA certificates, and others. The certificate is encoded using base64, so we can use the base64 tool to decode the certificate and use openssl to view the certificate text.
First, the issuer of the client certificate CN is the cluster id c0256a3b8e4b948bb9c21e66b0e1d9a72, and the CN of the certificate itself is the sub-account 252771643302762862 Certificate: Data: Version: 3 (0x2) Serial Number: 787224 (0xc0318) Signature Algorithm: sha256WithRSAEncryption Issuer: O=c0256a3b8e4b948bb9c21e66b0e1d9a72, OU=default, CN=c0256a3b8e4b948bb9c21e66b0e1d9a72 Validity Not Before: Nov 29 06:03:00 2018 GMT Not After: Nov 28 06:08:39 2021 GMT Subject: O=system:users, OU=, CN=252771643302762862 secondly, the above client certificate can be verified by API Server only if API Server trusts the client CA certificate. The kube-apiserver process specifies its trusted client CA certificate through the parameter client-ca-file, and its specified certificate is / etc/kubernetes/pki/apiserver-ca.crt. This file actually contains two client CA certificates, one of which is related to cluster control, which is not explained here, and the other is as follows. Its CN is the same as the CN, the issuer of the client certificate. Certificate: Data: Version: 3 (0x2) Serial Number: 787224 (0xc0318) Signature Algorithm: sha256WithRSAEncryption Issuer: O=c0256a3b8e4b948bb9c21e66b0e1d9a72, OU=default, CN=c0256a3b8e4b948bb9c21e66b0e1d9a72 Validity Not Before: Nov 29 06:03:00 2018 GMT Not After: Nov 28 06:08:39 2021 GMT Subject: O=system:users, OU=, CN=252771643302762862 again, the certificate used by API Server is determined by the kube-apiserver parameter tls-cert-file This parameter points to the certificate / etc/kubernetes/pki/apiserver.crt. The CN of this certificate is kube-apiserver and the signer is c0256a3b8e4b948bb9c21e66b0e1d9a72, that is, the cluster CA certificate Certificate: Data: Version: 3 (0x2) Serial Number: 2184578451551960857 (0x1e512e86fcba3f19) Signature Algorithm: sha256WithRSAEncryption Issuer: O=c0256a3b8e4b948bb9c21e66b0e1d9a72, OU=default, CN=c0256a3b8e4b948bb9c21e66b0e1d9a72 Validity Not Before: Nov 29 03:59:00 2018 GMT Not After: Nov 29 04:14:23 2019 GMT Subject: CN=kube-apiserver finally, the client needs to verify the above API Server certificate, so the KubeConfig file contains its signer, that is, the cluster CA certificate. Comparing the cluster CA certificate with the client CA certificate, we find that the two certificates are exactly the same, which is in line with our expectations. Certificate: Data: Version: 3 (0x2) Serial Number: 786974 (0xc021e) Signature Algorithm: sha256WithRSAEncryption Issuer: C=CN, ST=ZheJiang, L=HangZhou, O=Alibaba, OU=ACS, CN=root Validity Not Before: Nov 29 03:59:00 2018 GMT Not After: Nov 24 04:04:00 2038 GMT Subject: O=c0256a3b8e4b948bb9c21e66b0e1d9a72, OU=default, CN=c0256a3b8e4b948bb9c21e66b0e1d9a72 visit
After understanding the principle, we can do a simple test: take the certificate as a parameter, use curl to access api server, and get the expected results.
# curl-- cert. / client.crt-- cacert. / ca.crt-- key. / client.key https://xx.xx.xx.xxx:6443/api/{ "kind": "APIVersions", "versions": ["v1"], "serverAddressByClientCIDRs": [{"clientCIDR": "0.0.0.0Comp0", "serverAddress": "192.168.0.222purl 6443"} choose two kinds of nodes A task.
As mentioned at the beginning, Kubernetes is the operating system that manages multiple nodes in the cluster. The roles of these nodes in the cluster do not have to be exactly the same. There are two types of nodes in Kubernetes clusters: master nodes and worker nodes.
The distinction of this role is actually a division of labor: master is responsible for the management of the entire cluster, and the main components running on it are cluster management components, including api server; that implements the cluster portal, while worker nodes are mainly responsible for carrying common tasks.
In Kubernetes clusters, tasks are defined as the concept of pod. Pod is an atomic unit that can host tasks in a cluster. Pod is translated into a container group, which is actually a free translation, because a pod actually encapsulates multiple containerized applications. In principle, a container encapsulated in a pod should have a considerable degree of coupling.
Choose the best to live in
The problem that the scheduling algorithm needs to solve is to choose a comfortable "residence" for pod so that the tasks defined by pod can be completed smoothly on this node.
In order to achieve the goal of "living according to the best", the Kubernetes cluster scheduling algorithm adopts a two-step strategy:
The first step is to exclude the nodes that do not meet the conditions from all the nodes, that is, pre-selection; the second step is to score the remaining nodes, and finally the one with the highest score wins, that is, optimization.
Let's use the image we made at the beginning of the article to create a pod and analyze how the pod is dispatched to a cluster node through the log.
Pod configuration
First, we create a configuration file for pod in the format of json. There are three key points in this configuration file: the mirror address, the command, and the port of the container.
{"apiVersion": "v1", "kind": "Pod", "metadata": {"name": "app"}, "spec": {"containers": [{"name": "app", "image": "registry.cn-hangzhou.aliyuncs.com/kube-easy/app:latest" "command": ["app"], "ports": [{"containerPort": 2580}]} log level
The cluster scheduling algorithm is implemented as a system component running on a master node, similar to api server. Its corresponding process name is kube-scheduler. Kube-scheduler supports multiple levels of log output, but the community does not provide detailed log level documentation. To see the process of filtering and scoring nodes by the scheduling algorithm, we need to raise the log level to 10, that is, add the parameter-vroom10.
Kube-scheduler-- address=127.0.0.1-- kubeconfig=/etc/kubernetes/scheduler.conf-- leader-elect=true-- Vroom10 create Pod
Using curl, using certificate and pod configuration file as parameters, and requesting access to the api server interface through POST, we can create the corresponding pod in the cluster.
# curl-X POST-H 'Content-Type: application/json;charset=utf-8'-- cert. / client.crt-- cacert. / ca.crt-- key. / client.key https://47.110.197.238:6443/api/v1/namespaces/default/pods-d@app.json preselection
Pre-selection is the first step in Kubernetes scheduling, which is to filter out nodes that do not meet the criteria according to predefined rules. The preselection rules implemented by different versions of Kubernetes vary greatly, but the basic trend is that the preselection rules will become more and more abundant.
The two common preselection rules are PodFitsResourcesPred and PodFitsHostPortsPred. The former rule is used to determine whether the remaining resources on one node can meet the needs of pod, while the latter rule checks whether a port on a node is already used by other pod.
The following figure shows the log of the preselected rules output by the scheduling algorithm when processing the test pod. This log records the execution of the preselected rule CheckVolumeBindingPred. Some types of storage volumes (PV) can only be mounted on one node, and this rule can filter out nodes that do not meet pod's PV requirements.
As you can see from app's orchestration file, pod has no requirement for storage volumes, so this condition does not filter out nodes.
Optimal selection
The second stage of the scheduling algorithm is the optimization stage. At this stage, kube-scheduler scores the remaining nodes according to the resources available to the node and other rules.
At present, CPU and memory are the two main resources considered by the scheduling algorithm, but the way to consider them is not simple. The more remaining CPU and memory resources, the higher the score.
Two calculation methods are recorded in the log: LeastResourceAllocation and BalancedResourceAllocation.
The former method calculates the proportion of the node's remaining CPU and memory to the total CPU and memory after the pod is dispatched to the node. The higher the ratio, the higher the score. The second method calculates the absolute value of the difference between CPU and memory usage ratio on the node. The greater the absolute value, the lower the score.
In these two ways, one tends to select the nodes with low resource utilization, and the other wants to select the nodes with similar proportion of resource utilization. There are some contradictions between these two ways, and finally rely on a certain weight to balance these two factors.
In addition to resources, the optimization algorithm takes into account other factors, such as the affinity between pod and nodes, or the degree of dispersion of multiple pod on different nodes if a service is composed of multiple pod, which is a strategy to ensure high availability.
Score
Finally, the scheduling algorithm multiplies all the scoring items by their weights, and then sums up to get the final score of each node. Because the test cluster uses the default scheduling algorithm, and the default scheduling algorithm sets the weight of the score items in the log to 1, if it is calculated according to the score items recorded in the log, the final scores of the three nodes should be 2928 and 29.
The reason why the score of the log output does not match the score calculated by ourselves is that the log does not output all the score items, and the strategy that is guessed to miss is NodePreferAvoidPodsPriority. The weight of this strategy is 10000, and each node scores 10, so the final log output is obtained.
Concluding remarks
In this article, we take a simple containerized web program as an example, focusing on how the client is authenticated by Kubernetes cluster API Server and how the container application is dispatched to the appropriate node.
In the process of analysis, we abandoned some convenient tools, such as kubectl, or the console. We use some small experiments closer to the bottom, such as dismantling KubeConfig files, and analyzing scheduler logs to analyze the operation of authentication and scheduling algorithms. I hope these will help you to further understand the Kubernetes cluster.
Live broadcast of architect growth series
"Alibaba Cloud Native focus on micro-services, Serverless, containers, Service Mesh and other technology areas, focus on cloud native popular technology trends, cloud native large-scale landing practice, to be the best understanding of cloud native developers of the technology circle."
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.