In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "what are the new functions of Kubernetes1.3". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Support more types of applications
1 Init container
Init container is the alpha feature in 1.3.It is designed to support a class of applications that require initialization of Pod before starting the Pod "normal container". The container that performs the initialization task is called the initialization container (init container). For example, initialize the database or wait for the database to start before starting the application. The following figure shows a Pod containing init container:
The running strategy for this type of Pod,kubernetes is as follows:
Initialize the container sequentially, that is, container 1-> 2 in the figure
If one of the initialization containers fails, the entire Pod fails
When all initialization containers are running successfully, start the normal containers, that is, containers An and B in the figure
Using init container in the alpha version requires annotation. The following is an example from K8s (slightly cropped):
As you can see, before starting the nginx normal container, we first use init container to get the index.html, and then visit nginx to return the file directly. When the init container function is stable, k8s will directly add the init Containers field to the pod.spec, as shown below:
Init container seems to be a small feature, but there are still a lot of issues to consider in implementation, such as a few more important points:
Resource problem: when the Pod of init container exists, how should the required resources be calculated? Two extreme cases: if you sum the resources needed by init container and regular container, then when init container successfully initializes Pod, the requested resources will no longer be used, and the system thinks that it will cause waste if it is in use; on the contrary, the resources that do not calculate init container will lead to system instability (the resources used by init container are not counted as scheduling resources). The current approach is to compromise: since the initialization container and the normal container do not run at the same time, the resource request for Pod is the maximum of the two. For initialization containers, because they run in turn, select the maximum value; for ordinary containers, because they are running at the same time, select the sum of container resources.
Pod Status: currently, Pod has Pending, Running, Terminating and other statuses. For Pod with an initialization container, if you still use the Pending state, it is difficult to tell whether the Pod is currently running an initialization container or a normal container. Therefore, ideally, we need to add a state similar to Initializing. It has not been added in the alpha version yet.
Health check and usability check: with init container, how can we check the health of the container? The alpha version turns off both checks, but init container is a container that actually runs on node and theoretically needs to be checked. For usability checking, shutting down is a viable option, because the usability of init container is actually when it finishes running. For health checks, node needs to know whether a Pod is in the initialization phase; if it is in the initialization phase, node can do a health check on the init container. Therefore, it is very likely that kubernetes will turn on the health check of init container after adding a Pod state similar to Initializing.
There are still many problems around init container, such as the update of QoS,Pod, and so on, many of which need to be solved.)
2 PetSet
PetSet should be a long-awaited feature of the community to support stateful and clustered applications, which is also the alpha phase. There are many application scenarios of PetSet, including quorum leader election applications such as zookeeper and etcd, Decentralized quorum similar to Cassandra, etc. In PetSet, each Pod has a unique identity, including name, network, and storage, and is created and maintained by the new component PetSet Controller. Let's take a look at how kubernetes maintains the unique identity of Pod.
The name is easy to understand. After we create a RC, kubernetes creates a Pod with a specified number of copies. When we use kubectl to obtain Pod information, we get the following information:
Among them, the suffix of 5 characters is automatically generated by kubernetes. When Pod restarts, we will get different names. For PetSet, Pod restart must keep the name unchanged. Therefore, the PetSet controller maintains an identityMap, and each Pod in each PetSet has a unique name. When the Pod is restarted, the PetSet controller senses which Pod it is, and then tells API server to create a new Pod with the same name. The current perception method is simple: the identityMap maintained by the PetSet controller numbers the Pod starting at 0, and then the synchronization process is like reporting the number, restarting the number that is not there.
In addition, this number has another function, the PetSet controller uses the number to ensure the Pod startup sequence, only after the 0 Pod starts, can start the 1 Pod.
The maintenance of network identity is mainly through stable hostname and domain name, which are specified by PetSet configuration files. For example, the following figure shows an Yaml file for PetSet (with cropping), where metadata.name specifies the hostname prefix of Pod (the suffix is the 0-based index mentioned earlier) and spec.ServiceName specifies domain name.
Create two Pod:web-0 and web-1 from the Yaml file above. The complete domain name is web-0.nginx.default.svc.cluster.local, where web-0 is the hostname,nginx and the domain name specified in Yaml, and the rest is the same as the ordinary service. When the creation request is sent to the node, kubelet sets the UTS namespace through container runtime, as shown in the following figure (some components such as apiserver are omitted).
At this point, the hostname has been set up at the container level, and the rest needs to add the resolution at the cluster level for hostname and the resolution for domain name. This part of the work is of course left to kube dns. Readers who know Kubernetes should know that to add parsing, we need to create the same service;, and here we also need to create a service for PetSet. The difference is that the default backend Pod of an ordinary service is replaceable, and the backend Pod is selected in ways such as roundrobin,client ip. Here, since each Pod is a Pet, we need to locate each Pod, so the service we create must be able to meet this requirement. In PetSet, kubernetes headless service is used. Headless service will not allocate cluster IP to load balance the backend Pod, but will add records to the clustered DNS server: the creator needs to make use of these records himself. The following figure shows the headless service we need to create. Notice that the clusterIP is set to None, indicating that this is a headless service.
After some processing by Kube dns, the following record is generated:
As you can see, accessing web-0.nginx.default.svc.cluster.local returns pod IP, and accessing nginx.default.svc.cluster.local returns pods IP in all Pet. A common way is to get all the peers by accessing the domain, and then communicate with the individual Pod in turn.
The identity storage is implemented by PV/PVC. When we create a PetSet, we need to specify the data volume assigned to the Pet, as shown below:
Here, volumeClaimTemplates specifies the storage resources required for each Pet. Note that all Pet currently get data volumes of the same size and type. When the PetSet controller gets the request, it creates a PVC for each Pet and then associates each Pet with the corresponding PVC:
The subsequent PetSet only needs to make sure that each Pet is bound to the corresponding PVC, and other tasks, such as creating data volumes, mounting, and so on, are left to other components.
Through the name, network, storage, PetSet can cover most cases. However, there are still many areas that need to be improved, and interested readers can refer to: https://github.com/kubernetes/kubernetes/issues/28718
3 Scheduled Job
Scheduled Job is essentially a cluster cron, similar to mesos chronos, using the standard cron syntax. Unfortunately, the published standard has not yet been met in 1.3. Scheduled Job was actually mentioned a long time ago, but at that time the focus of kubernetes was on the API level, and even if there was a lot of demand, it was planned to be implemented after Job (1.2GA). When scheduled job is released in a later version, users can run Job on kubernetes with a simple command, such as: kubectl run cleanup-image=cleanup-- runAt= "0100 *"-/ scripts/cleanup.sh some updates about scheduled job can be found in: https://github.com/kubernetes/kubernetes/pull/25595
4 Disruption Budget
Disruption Budget is proposed to provide a feedback mechanism to Pod to ensure that applications will not be affected by changes in the cluster itself. For example, when a cluster needs to be rescheduled, the application can use Disruption Budget to indicate whether Pod can be migrated. Disruption Budget is only responsible for the changes initiated by the cluster itself, not for emergencies such as sudden disconnection of nodes, or problems with the application itself, such as constant restart changes. Disruption Budget is also not released in 1. 3.
Like most kubernetes resources, we need to create a PodDisruptionBudget resource from the Yaml file. For example, the DisruptionBudget shown in the following figure selects all pod with app:nginx tags and requires at least three Pod to run at the same time.
There is a new component Disruption Budget Controller in Controller manager, which is responsible for maintaining the state of all Budget. For example, the status in the figure above shows that there are currently 4 healthy Pod (currentHealthy), at least 3 (desiredHealthy) for applications, and a total of 5 Pod (expectedPods). To maintain this state, Disruption Budget Controller traverses all Budget and all Pod. With the state of Budget, all components that need to change the state of Pod should be queried first. If the operation causes the minimum available number to be lower than the application requirement, the operation will be rejected. Disruption Budget has a close relationship with QoS. For example, what should the system do if an application with a very low QoS level has a very strict Disruption Budget? At present, kubernetes does not strictly deal with this problem, a feasible way is to give priority to Disruption Budget to ensure that high-priority applications have high-priority Disruption Budget;. In addition, Disruption Budget can join the Quota system, and high-priority applications can get more Disruption Budget Quota. For a discussion of Disruption Budget, please refer to: https://github.com/kubernetes/kubernetes/issues/12611
Support better cluster management
1 Cascading Deletion
Until kubernetes 1.2, deleting the control unit did not delete the underlying resources. For example, after the RC is deleted through API, the Pod managed by it will not be deleted (it can be deleted using kubectl, but there is reaper logic in kubectl, which deletes all the underlying Pod in turn, which is essentially client logic). As another example, when you delete the Deployment in the figure below, the ReplicaSet will not be deleted automatically, and of course, the Pod will not be recycled.
Cascading deletion means that after the control unit is deleted, it will be reclaimed by the management unit at the same time. However, cascading deletion in kubernetes 1.3 does not simply copy the logic in kubectl to the server side, but does a higher level of work: garbage collection. To put it simply, garbagecollector controller maintains a list of almost all cluster resources and receives events for resource modifications. Controller updates the resource diagram based on the event type and places the affected resources in Dirty Queue or Orphan Queue. For specific implementation, please refer to the official documentation and garbage collector controller implementation: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/garbage-collection.md
2 Node eviction
Node/kubelet eviction refers to the process of eliminating Pod in advance before the node is about to overload, mainly for memory and disk resources. Before kubernetes 1.3, kubelet did not perceive the load of the node "in advance", but only dealt with known problems. When memory is tight, kubernetes relies on kernel OOM killer; disk aspects to garbage collect image and container periodically. However, this method has limitations. OOM killer itself consumes certain resources, and the time is uncertain; recycling containers and images cannot handle the problem of container logging: if the application keeps writing logs, it will consume all disks, but will not be processed by kubelet.
Node eviction solves the above problems by configuring kubelet. When starting kubelet, we ensure the stable operation of the node by specifying parameters such as memory.available, nodefs.available, nodefs.inodesFree and so on. For example, memory.available < 200Mi means that when memory is less than 200Mi, kubelet needs to start removing Pod (which can be configured to remove immediately or delayed, that is, hard vs soft). In kubernetes 1.3, the feature of node eviction is opt-in, which is turned off by default. You can turn on related functions by configuring kubelet.
Although node eviction is a measure taken at the kubelet level, we must also consider the interaction with the entire cluster. The most important point is how to feed this problem back to scheduler, otherwise the deleted Pod is likely to be rescheduled back. To do this, kubernetes added a new node condition:MemoryPressure, DiskPressure. When the state of a node contains any of these, the scheduler avoids dispatching new Pod to that node. Another problem here is that if the node's resource usage happens to be near the threshold, the node's state may wobble between Pressure and Not Pressure. There are many ways to prevent jitter, such as smooth filtering, that is, historical data is also taken into account, weighted evaluation. K8s currently adopts a relatively simple method: if the node is in the Pressure state, in order to transition to the Not Pressure state, the resource usage must be below the threshold for a period of time (default is 5 minutes). This approach can lead to false alarm, for example, if an application requests a piece of memory at regular intervals and then releases it quickly, it may cause the node to remain in the Pressure state. But in most cases, this method can deal with jitter.
When it comes to eviction pod, then another problem you have to consider is to find an unlucky Pod. Here kubernetes defines a lot of rules, which can be summed up as follows: 1. According to QoS, the application of low QoS should be considered first. According to the usage judgment, the Pod with a large proportion of usage to the total request is preferred. For details, please refer to: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/kubelet-eviction.md
3 Network Policy
The purpose of Network policy is to provide isolation between Pod. Users can define communication rules between any Pod with a granularity of port. For example, the rule in the following figure can be interpreted as: the Pod with the label "db" can only be accessed by the Pod with the label "frontend" and can only access tcp port 6379.
Network policy is currently in the beta version and is only API. In other words, kubernetes will not really achieve network isolation: if we submit the above Yaml file to kubernetes, there will be no feedback, kubernetes just saves the Policy content. The real implementation of policy requires other components. For example, calico implements a controller, which will read the user-created Policy to achieve isolation. Refer to: https://github.com/projectcalico/k8s-policy/. For more information about Network Policy, please refer to: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/network-policy.md
4 Federation
Federation cluster is translated into Chinese as "federated cluster", which means that multiple kubernetes clusters are combined into a whole without changing the way the original kubernetes cluster works. According to the official design document of kubernetes, federation is designed to meet the requirements of high availability of services, hybrid cloud and so on. Prior to version 1.3, kubernetes implemented federation-lite, that is, machines in a cluster can come from different zone;1.3 versions of the same cloud, and federation-full support is already the beta version, that is, each cluster comes from a different cloud (or the same).
The core components of Federation are mainly federation-apiserver and federation-controller-manager, which run in one of the clusters as Pod. As shown in the following figure, the external request communicates directly with the Federation Control Panel, and the Federation analyzes the request and sends it to the kubernetes cluster.
At the application level, Federation currently supports federated services, that is, access to an application across multiple clusters
This is the end of the content of "what are the new features of Kubernetes1.3". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
Display startupMainBoard: Startup system software: sd1:/ar2240s-v200r003c01spc900
© 2024 shulou.com SLNews company. All rights reserved.