How to deeply understand Kubernetes micro-service platform 04/21 Update SLTechnology News&Howtos

How to deeply understand Kubernetes micro-service platform

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to deeply understand Kubernetes micro-service platform". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to deeply understand Kubernetes micro service platform".

The concept and function of Kubernetes

Architects generally have such a vision: there are three services such as ServiceA, ServiceB and ServiceC in the system, in which ServiceA needs to deploy 3 instances, ServiceB and ServiceC each need to deploy 5 instances, and want a platform (or tool) to automatically complete the distributed deployment of the above 13 instances and continuously monitor them. When a server is found to be down or a service instance fails, the platform can repair itself to ensure that the number of service instances running at any point in time is as expected. In this way, the team only needs to focus on the service development itself and no longer has to worry about infrastructure and operation and maintenance monitoring.

Before the advent of Kubernetes, no platform publicly claimed to achieve the above vision. Kubernetes is the first platform in the industry to really promote the concept of service to the first place. In the world of Kubernetes, all concepts and components revolve around Service. It is this breakthrough design that makes Kubernetes really solve many problems that have plagued us in distributed systems for many years, and gives the team more time to pay attention to the code itself related to business requirements and business, thus greatly improving the work efficiency and input-output ratio of the whole software team.

Service in Kubernetes is actually the concept of micro-service in micro-service architecture, and it has the following obvious characteristics.

Each Service is assigned a fixed virtual IP address-Cluster IP.

Each Service provides services on one or more ports (Service Port) as TCP/UDP.

When the client accesses a Service, it is like accessing a remote TCP/UDP service, as long as a connection is established with the Cluster IP, and the target port is a Service Port.

Now that Service has an IP address, it makes sense to use the DNS domain name to avoid changes in IP addresses. The DNS component of Kubernetes automatically establishes a mapping table of domain name and IP for each Service, where the domain name is Service's Name,IP is the corresponding Cluster IP, and DNS Server with DNS Server as Kubernetes is set in each Pod of Kubernetes (similar to Docker' container). In this way, the basic problem of service discovery in micro-service architecture is cleverly solved, not only without complex service discovery API for clients to call. It also makes it easy for all distributed systems that communicate in TCP/IP to migrate to the Kubernetes platform. From this design alone, Kubernetes is far better than other products.

We know that behind each micro service, there are multiple process instances to provide services. On the Kubernetes platform, these process instances are encapsulated in Pod, and Pod is basically equivalent to Docker containers. The slight difference is that Pod is actually a group of Docker containers that are closely tied together and "live and die together". This group of containers share the same network stack and file system, are not isolated from each other, and can communicate directly between processes. The most typical example is Kubenetes Sky DNS Pod, where there are four Docker 'containers in this Pod.

So, how do Service and Pod in Kubernetes correspond? How do we know which Pod provides specific services for a Service? The following picture gives the answer-"tagging".

Each Pod can be labeled with one or more different tags (Label), and each Service has a "tag selector" (Label Selector) that determines which tags to select. The following YAML-formatted content defines a Service called ku8-redis-master whose tag selector reads "app: ku8-redis-master", indicating that all Pod with the tag "app= ku8-redis-master" serve it:

Apiversion: v1 kind: Service metadata: name: ku8-redis-masterspec: ports:-port: 6379selector: app: ku8-redis-master

Here is the definition of ku8-redis-master, the Pod, whose labels attribute exactly matches the content of Service's tag selector:

Apiversion: v1kind: Pod metadata: name: ku8-redis-masterlabels: app: ku8-redis-master spec: containers: name: serverimage: redisports:-containerPort:6379 restartPolicy: Never

If we need a Service to have N Pod instances to provide services at any time, and after one of the Pod instances fails, a new Pod instance is discovered and automatically generated to fill the gap, then what should we do? The answer is to adopt Deployment/RC, which tells Kubernetes that a Pod with a particular label needs to create several replica instances in the Kubernetes cluster. The definition of Deployment/RC includes the following two parts.

The number of copies of the ● target Pod (replicas).

Creation template (Template) of the ● target Pod.

The following example defines a RC. The goal is to ensure that there are two Pod at any time in the cluster. The label is "app:ku8-redis-slave", and the corresponding container image is redis slave. These two Pod and ku8-redis-master form a Redis master-slave cluster (one master and two slaves):

Apiversion: v1 kind: ReplicationControllermetadata: name: ku8-redis-slavespec: replicas: 2template: metadata: labels: app: ku8-redis-slavespec: containers: name: server image: devopsbq/redis-slave env: name: MASTER ADDR value: ku8-redis-masterports:-containerPort:6379

At this point, the above YAML file creates an one-master and two-slave Redis cluster, where Redis Master is defined as a microservice that can be accessed by other Pod or Service, as shown in the following figure.

Note that in the figure above, there is the environment variable of MASTER_ADDR in the container of ku8-reids-slave. This is the address of Redis Master. What is entered here is "ku8-redis-master", which is the name of Redis Master Service. As mentioned earlier, the name of Service is its DNS domain name, so the Redis Slave container can communicate with Redis Master Service through this DNS to achieve the Redis master-slave synchronization function.

The core concepts of Kubernetes are Service, Pod and RC/Deployment. Around these three core concepts, Kubernetes has implemented the most powerful micro-service architecture platform based on container technology in history. For example, in the above Redis cluster, if we want to form a cluster of one master and three slaves, we only need to change the replicas in the ReplicationController that controls Redis Slave to 3, or use the kubectrl scale command line function to expand the capacity. The order is as follows, and we find that the horizontal expansion of the service has become so convenient:

Kubectl scale-replicas=3 rc/ku8-redis-slave

Moreover, Kubernetes also implements the advanced feature of horizontal automatic expansion-HPA (Horizontal PodAutoscaling), which is based on the performance measurement parameters of Pod (CPU utilization and custom metrics) to automatically scale the Pod managed by RC/Deployment. For example, if we think that the Pod corresponding to the above Redis Slave cluster also provides query services, the CPU utilization of the Pod will continue to change during the service period, and when the average CPU utilization of these Pod exceeds 80%, the capacity will be automatically expanded until the CPU utilization drops below 80% or reaches a maximum of 5 replica locations, and after the pressure on the request is reduced, the number of Pod replicas is reduced to 1. This can be achieved with the following HPA command:

Kubectl autoscale rc ku8-redis-slave-min=1-max=5-cpu-percent=80

In addition to easily realizing the horizontal expansion function of micro-services, Kubernetes also provides a simple and powerful rolling upgrade function (rolling update) of micro-services, which can complete tasks quickly with a simple command. For example, if we want to upgrade the mirrored version of the above Redis Slave service from devopsbq/redis-slave to leader/redis-slave, simply execute the following command:

Kubectl rolling-update ku8-redis-slave-image=leader/redis-slave

The principle of rolling upgrade is shown in the following figure. When Kubernetes performs a rolling upgrade, a new RC is created. This new RC uses a new Pod image, and then Kubernetes reduces the number of replicas of the old RC by one at regular intervals, resulting in a decrease in the number of copies of the old version of Pod, and then an increase in the number of replicas of the new RC. Therefore, there is an extra copy of the new version of Pod, and the number of copies of Pod remains basically the same during the upgrade. The upgrade did not end until all the copies became the new version.

The composition and principle of Kubernetes

The Kubernetes cluster itself, as a distributed system, also adopts the classical Master-Slave architecture. As shown in the following figure, one node in the cluster is a Master node, on which three main control programs are deployed: API Sever, ControllerManager and Scheduler, as well as Etcd processes, which are used to persist and store Kubernetes-managed resource objects (such as Service, Pod, RC/Deployment).

The other nodes in the cluster are called Node nodes and belong to workers (Worker nodes). They are led by the Master node and are mainly responsible for taking care of the Pod copies allocated on their respective nodes. The following diagram shows more clearly the interactions between the various processes in Kubernetes.

As you can see from the figure above, the central process is API Server, with which all other processes interact directly, and there is no direct interaction between other processes. So what is the purpose of APl Server? It is actually the data gateway of Kubernetes, that is, all the data entering Kubernetes is saved to the Etcd database through this gateway, and the changed data in Eted is sent to other relevant Kubernetes processes in real time through API Server. API Server provides interfaces in the form of REST, which are basically divided into the following two categories.

CRUD API of all resource objects: resource objects are stored in Etcd and provide Query interfaces, such as operations on Pod, Service, RC, and so on.

Watch API of the resource object: the client uses this API to get timely notification of resource changes, such as the successful creation of a Service-related Pod instance or a change in the status of a Pod. Watch API is mainly used for efficient automatic control logic in Kubernetes.

The following are the main functions of the other Kubernetes processes in the figure above.

Controller manager: responsible for all automatic control things, such as automatic control of RC/Deployment, control of automatic horizontal expansion of HPA, periodic disk cleaning and so on.

Scheduler: the scheduling algorithm responsible for Pod. After a new Pod is created, Scheduler finds the best Node node according to the algorithm, a process also known as Pod Binding.

Kubelet: responsible for the creation, monitoring, restart, deletion, status update, performance collection and regular reporting of Pod and native Node node information of Pod instances on this Node node to Master nodes. Since Pod instances are ultimately embodied as Docker' containers, Kubelet will also interact with Docker.

Kube-proxy: a load balancer for Service, which is responsible for establishing NAT forwarding rules between Service Cluster IP and the corresponding Pod instance, which is implemented through Linux iptables.

After understanding the capabilities of the various processes of Kubernetes, let's take a look at what happens behind a RC from the definition of YAML to its eventual deployment into multiple Pod and containers. To clearly illustrate this complex process, a schematic diagram is given here.

First, when we create a RC (resource object) through the kubectrl create command, kubectrl submits the data to APl Server through the REST interface Create RC, and then API Server writes the data to Etcd for persistent storage. At the same time, Controller Manager listens (Watch) all RC, and once a RC is written to Etcd, Controller Manager is notified, it reads the definition of RC, compares the difference between the actual number of copies of Pod controlled in RC and the expected value, and then acts accordingly. At this point, Controller Manager finds that there is no corresponding Pod instance in the cluster, so according to the definition of the Pod template (Template) in RC, create a Pod and save it to Etcd through API Server. Similarly, the Scheduler process listens on all Pod, and once it is found that the system has generated a new Pod, it starts to execute the scheduling logic, arranges a Node for the Pod, and if all goes well, the Pod is assigned to a Node node, that is, Binding to a Node. Next, the Scheduler process updates this information and Pod status to the Etcd. Finally, the Kubelet on the target Node node listens to a new Pod and pulls the container image and creates the corresponding container as defined in Pod. After the container is successfully created, the Kubelet process updates the status of the Pod to Running and updates it to the Etcd through API Server. If there is a corresponding Service for this Pod, the Kube-proxy process on each Node will listen for the changes of all Service and the corresponding Pod instances of these Service. Once any changes are found, the corresponding NAT forwarding rules will be added or deleted in the iptables on the Node node, and finally the intelligent load balancing feature of Service will be realized, all of which is done automatically without human intervention.

So what happens if one of the Node' goes down? If a Node goes down for a period of time, because there is no Kubelet process on this node to report the status of these Pod regularly, all Pod' instances on this Node will be judged as failed. At this time, Controller Manager will delete these Pod and generate new Pod instances, so these Pod will be dispatched to other Node to be created, and the system automatically recovers.

The last part of this section talks about the evolution of Kube-proxy, as shown in the following figure.

At the beginning, Kube-proxy is a proxy server similar to HAProxy, which implements the software-based load balancing function. The request initiated by Client is proxied to a Pod at the back end, which can be understood as the load balancer of Kubernetes Service. The initial implementation mechanism of Kube-proxy is to manipulate iptables rules to redirect the traffic accessing Cluster IP to the local Kube-proxy through NAT. This process involves multiple copies of network packets from kernel state to user state, so it is inefficient. The version after Kube-proxy has changed the implementation method. When generating iptables rules, it will directly NAT to the destination Pod address and no longer forward through Kube-proxy, so it is more efficient and faster. This method is slightly less efficient than using client-side load balancing, but it is simple to program, has nothing to do with the specific communication protocol, and has a wider range of application. At this time, we can think that Kubernetes Service implements routing and load balancing mechanism based on iptables mechanism. From then on, Kube-proxy is no longer a real "proxy", but only a tool class "agent" for routing rule configuration.

Although the routing and load balancing mechanism based on iptables has much better performance than ordinary Proxy, it also has its own inherent defects, because each Service will produce a certain number of iptables rules. In the case of a large number of Service, the number of iptables rules will surge, which will have a certain impact on the forwarding efficiency of iptables and the stability of Linux kernel. So many people are trying to replace iptables with IPVS (IP virtual server). Kubernetes has added Kube-proxy support for IPVS since version 1.8, and officially incorporated GA in version 1.11. Different from iptables, IPVS itself is positioned as a load balancer solution for TCP/UDP services in the official Linux standard, so it is very suitable to replace iptables to implement Service routing and load balancing.

In addition, there are some mechanisms to replace Kube-proxy, such as SideCar in Service Mesh completely replaces the functions of Kube-proxy. When Service is based on HTTP interface, we will have more choices, such as Ingress, Nginx and so on.

PaaS platform based on Kubernetes

PaaS is actually a heavyweight but not very successful product, limited by the rigidity of multi-language support and development model, but recently, with the development of container technology and cloud computing, it has attracted people's attention again, because container technology has completely solved the problem of application packaging, deployment and automation. The PaaS platform redesigned and implemented based on container technology not only improves the technical content of the platform, but also makes up for the shortcomings of the previous PaaS platform, such as difficult to use, complex, low level of automation and so on.

OpenShift is a PaaS cloud computing platform launched by RedHat in 2011. OpenShift has evolved into two versions (v1 and v2) before the launch of Kubernetes, but after the launch of Kubernetes, the third version of OpenShift, v3, abandoned its container engine and container orchestration module and embraced Kubernetes.

Kubernetes has the following features.

Pod (containers) allows developers to deploy one or more containers as an "atomic unit".

The service discovery mechanism of fixed Cluster IP and embedded DNS makes it easy for different Service to Link each other.

RC ensures that the number of instances of the Pod replicas we follow is always in line with our expectations.

A very powerful network model that allows Pod on different hosts to communicate with each other.

Stateful and stateless services are supported, and persistent storage can also be choreographed into containers to support stateful services.

The easy-to-use choreography model makes it easy for users to orchestrate a complex application.

Many companies at home and abroad have adopted Kubernetes as the kernel of their PaaS platform, so this section explains how to design and implement a powerful PaaS platform based on Kubernetes.

A PaaS platform should have the following key features.

Multi-tenant support: the tenant here can be the developer or the application itself.

Application lifecycle management: for example, support for application definition, deployment, upgrade, removal and so on.

It has complete infrastructure, such as single sign-on service, role-based user rights service, application configuration service, log service and so on. At the same time, PaaS platform integrates many common middleware to facilitate application call, such as message queue, distributed file system, cache middleware and so on.

Multilingual support: a good PaaS platform can support many common development languages, such as Java, Node.js, PHP, Python, C++, etc.

Next, let's take a look at how the PaaS platform based on Kubernetes design and implementation supports the above key features.

How to realize multi-tenancy

Kubernetes supports multi-tenancy through the Namespace feature.

We can create multiple different Namespace resource objects, and each tenant has a Namespace. Resource objects such as Pod, Service, and RC created under different Namespace cannot be seen under another Namespace, thus forming a logical multi-tenant isolation feature. However, Namespace isolation alone cannot prevent network isolation under different Namespace. If we know the IP address of a Pod in other Namespace, we can still initiate access, as shown in the following figure.

Aiming at the problem of multi-tenant network isolation, Kubernetes adds the feature of Network Policy. We simply compare it to a network firewall. By defining Network Policy resource objects, we can control which Namespace accesses the Pod under a Namespace (tenant). Suppose we have two Namespace, tenant2 and tenant3, each with some Pod, as shown in the following figure.

If we need to achieve these network isolation goals: Pod with role:db tags in tenant3 can only be accessed by Pod with role:frontend tags in tenant3 (in this Namespace), or by any Pod in tenent2, then we can define a Network Policy resource object as shown in the following figure and publish it to the Kubernetes cluster through the kubectrl tool.

It should be noted that Kubernetes Network Policy needs to cooperate with specific CNI network plug-ins to really take effect. Currently, there are several main CNI plug-ins that support Network Policy.

Calico: a container network scheme based on layer 3 routing.

Weave Net: a two-layer container solution based on message encapsulation.

Romana: a container network scheme similar to Calico.

Network Policy is still in its infancy, and there are still many problems to be studied and solved, such as how to define the access policy of Service. How to resolve the conflict between Service access policy and Pod access policy? in addition, how to define the access policy for external services? In short, in the container field, compared with computing virtualization and storage virtualization, many technologies in network virtualization are just in their infancy.

The Namespace of Kubernetes is a program that logically isolates different tenants, but the programs of multiple tenants may still be scheduled to the same physical machine (Node). If we want the applications of different tenants to be scheduled to different Node, so as to achieve physical isolation, it can be achieved by cluster partition. The specific approach is that we first divide the entire cluster into different Partition according to the tenant. As shown in the following figure, all Node in each partition are tagged with the same tag. For example, tenant a (tanenta) is labeled partition=tenant, tenant b (tanentb) is labeled partition= tenantb. When scheduling Pod, we can use the nodeSelector attribute to specify the target Node tag. For example, the following words indicate that Pod needs to be dispatched to the partition node of tenant a:

NodeSelector: partition: tenanta

Kubernetes partition and tenants can have a variety of corresponding designs. The design of one zone and one tenant mentioned above is a typical design. Tenants can also be divided into major customers and ordinary customers. Each major customer has a separate resource partition, while ordinary customers can share resources in the same partition with N as a group.

Domain Model Design of PaaS platform

We know that an application under the micro-service architecture is usually composed of multiple micro-services, and our Kubernetes usually deploys multiple independent applications, so if we use Kubernetes to model micro-service applications, we need to design the domain object Application in the domain model of the PaaS platform. An Application includes multiple micro-services, and the corresponding Pod, Deployment and Service objects will be generated when publishing (deploying), as shown in the following figure.

The following is a more detailed domain model diagram. Node and Namespace in Kubernetes are modeled as K8sNode and TanentNS, respectively, and partitions are modeled as ResPartition objects, and each partition can include 1 to N TanentNS, that is, one or more tenants (Tanent > use). Each tenant includes a number of user accounts (User) to define and maintain the tenant's application (Application). To separate permissions, user groups (User Group) can be used, while a standard role-based permission model can be added.

The Service domain object in the figure above is not Kubernetes Service, but a "composite structure" that includes Kubernetes Service and related RC/Deployment. Only all the necessary attributes are included in the Service realm object, and the corresponding Kubernetes Service and RC/Deployment instances are generated when the application is deployed. The following figure shows the definition interface (prototype) of Service.

After we have defined an Application on the interface, we can publish the application. In the process of publishing an application, first select a partition, and then the program calls the API interface of Kubernetes to create all the Kubernetes resource objects related to this Application, and then query the status of Pod to determine whether the release is successful and the specific reason for failure. The following is a schematic design diagram of the key modules of Application from definition to release.

We know that Kubernetes is a micro-service architecture platform based on container technology, and the binaries of each micro-service are packaged into a standard Docker image, so the first step in the whole lifecycle management process of an application is to package from source code to Docker image, and this process is easy to automate. We can implement it either programmatically or through mature third-party open source projects. Jenkins is recommended here. The following figure is a schematic diagram of the image packaging process implemented by Jenkins. Considering the power of Jenkins and a wide range of users, many PaaS platforms integrate Jenkins to achieve application lifecycle management functions.

Basic Middleware of PaaS platform

A complete PaaS platform must integrate and provide some common middleware to facilitate application development and managed operation. First of all, the first type of important basic middleware is that ZooKeeper,ZooKeeper is very easy to deploy to the Kubernetes cluster, and there is a YAML reference file on Kubernetes's GitHub. In addition to being used for applications, ZooKeeper can also be used as a basic part of the "centralized configuration service" provided by the PaaS platform for applications, as shown in the following figure.

In addition, considering that many open source distributed systems use ZooKeeper to manage clusters, we can also deploy a standard named ZooKeeper Service for sharing among these clusters.

The second type of important middleware is caching middleware, such as Redis and Memcache, which we mentioned earlier, which can also be easily deployed to Kubernetes clusters and provided as basic services to third-party applications. In the case of getting started with Kubernetes, there is a GuestBook example that demonstrates how to access a Redis master-slave cluster in a PHP page, and even a complex Codis cluster can be successfully deployed to a Kubernetes cluster. In addition, RedHat's J2EE memory cache middleware Infinispan also has a case of Kubernetes cluster deployment.

The third type of important middleware is message queuing middleware, whether it is classic ActiveMQ, RabbitMQ or the new generation of Kafka, these message middleware can also be easily deployed to provide services in Kubernetes clusters. The following figure is a schematic modeling diagram of a 3-node RabbitMQ cluster on the Kubernetes platform. In order to form a RabbitMQ cluster, we define three Pod, each Pod corresponds to a Kubernetes Service, which maps to three RabbitMQ Server instances. In addition, we define a separate Service named ku8-rabbit-mq-server, which provides services and corresponds to the above three Pod, so each Pod has two tags.

The fourth type of important middleware is distributed storage middleware. At present, block storage services provided by Ceph clusters and distributed file storage services provided by GlusterFS can be used on Kubernetes clusters. GlusterFS is recommended as a standard storage system for file storage by RedHat's OpenShift platform. The following is a schematic diagram of this scheme.

In RedHat's scheme, GlusterFS clusters are deployed on separate server clusters, which is suitable for scenarios with large cluster size and high performance and storage requirements. In the case of limited machines, we can also treat each Node node of the Kubernetes cluster as a storage node of the GlusterFS, and deploy the GlusterFS to the Kubernetes cluster using the scheduling method of DaemonSet. The specific deployment method is described in detail in the Kubernetes GitHub website, and the deployment of the GlusterFS cluster by Pod also makes the deployment and operation and maintenance of GlusterFS very simple.

ElasticSearch clusters that provide full-text retrieval capabilities can also be easily deployed to Kubernetes. The three-piece ELK sets of log collection and query analysis mentioned above are basically deployed in Pod to achieve unified collection, query, analysis and other functions of Kubernetes cluster logs and user application logs.

In the current hot field of big data, many systems can also be deployed to Kubernetes clusters in a containerized way, such as Hadoop, HBase, Spark and Storm. The next section will give the modeling scheme for Storm On Kubernetes and deploy it to the Kubernetes cluster, and finally submit the WordCountTopology job in Chapter 6 and observe the results of the run.

Storm On Kubernetes actual combat

Through the study of Chapter 6, we know that a Storm cluster is composed of ZooKeeper, Nimbus (Master) and some Supervisor (Slave) nodes. The configuration file of the cluster is saved in conf/storm.yaml by default, and the most key configuration parameters are as follows.

Storm.zookeeper.servers: list of IP addresses of nodes in the ZooKeeper cluster.

The IP address of the nimbus.seeds:Nimbus.

List of Worker listening ports in supervisor.slots.ports:Supervisor.

From the above key configuration information and how the Storm cluster works, we first need to model ZooKeeper as Kubernetes Service in order to provide a fixed domain name address so that Nimbus and Supervisor can access it. The following is the modeling process for ZooKeeper (for simplicity, we only model one ZooKeeper node).

First, define the name of the Service,Service corresponding to ZooKeeper as ku8-zookeeper and the associated label as Pod of app=ku8-zookeeper:

ApiVersion: v1kind: Servicemetadata: name: ku8-zookeeperspec: ports: name: clientport: 2181selector: app: ku8-zookeeper

Second, define the RC corresponding to ZooKeeper:

Apiversion: v1 kind: Replicationcontrollermetadata: name: ku8-zookeeper-lspec: replicas: 1template: metadata: labels: app: ku8-zookeeperspec: containers: name: server image: jplock/ zookeeper imagePu1lPolicy: IfNotPresentports:-containerPort: 2181

Next, we need to model Nimbus as Kubernetes Service, because Storm clients need to access the Nimbus service directly to submit topology tasks, so there are nimbus.sceds parameters in conf/storm.yaml. Because Nimbus provides Thrift-based RPC services on port 6627, the definition of Nimbus services is as follows:

Apiversion: v1kind: Servicemetadata: name: nimbusspec: selector: app: storm-nimbusports:-name: nimbus-rpc port: 6627 targetPort:6627

Considering that there are many parameters in the storm.yaml configuration file, in order to achieve the configurability of any parameters, we can use the Config Map resource object of Kubernetes to save the storm.yaml and map it to the Pod instance corresponding to the Nimbus (and Supervisor) node. The following is the contents of the storm.yaml file (storm-conf.yaml) used in this case:

Storm.zookeeper.servers: [ku8-zookeeper] nimbus.seeds: [nimbus] storm.log.dir: "log" storm. Local.dir: "storm-data" supervisor.slots.ports:-6700 670167026703

By creating the above configuration file as the corresponding ConfigMap (storm-config), you can execute the following command:

Kubelet create configmap storm-config-from-file=storm-conf.yaml

The storm-config can then be Volume mounted by any Pod to any specified path within the container. Next, we can continue to model the Pod corresponding to the Nimbus service. After searching and analyzing the relevant Storm images from Docker Hub, we chose the official image storm:1.0 provided by Docker. Compared with other Storm images, the officially maintained image has the following advantages.

The Storm version is new.

Storm has only one image as a whole, and the command command parameter of the container is used to determine which type of node is started, such as the Nimbus master node, the Nimbus-ui manager, or the Supervisor slave node.

The standardized Storm process startup mode allows conf/storm.yaml configuration files to be mapped outside the container, so the ConfigMap feature of Kubernetes can be adopted.

The YAML file that uses storm:1.0 image to define Nimbus Pod is as follows:

Apiversion: v1kind: Pod metadata: name: nimbuslabels: app: storm-nimbusspec: name: config-volumeconfigMap: name: storm-configitems:-key:storm-conf.yaml path:storm.yaml containers:-name: nimbus image: storm: 1.0imagePullPolicy: IfNotPresentports:-containerPort: 6627 command: ["storm", "nimbus"] volumeMounts:-name: config-volumemountPath: / conf restartPolicy: Always

Here we need to pay attention to two details: the first detail is how to use ConfigMap. The first detail is to map the previously defined ConfigMap-storm-config to a Volume of Pod, and then attach the Volume to a specific path in the container. The second detail is the container parameter command, and the above command: ["storm", "nimbus"] indicates that the container starts the nimus process.

Similarly, we define the storm-ui service, which is a Web hypervisor that provides graphical Storm management capabilities, and because we need to access it outside the Kubernetes cluster, we map port 8080 to 30010 on the host through NodePort. The YAML definition file for the storm-ui service is as follows:

Apiversion: v1kind: Servicemetadata: name: storm-uispec: type: NodePortselector: app:storm-uiports:-name: web port: 8080 targetPort: 8080nodePort:30010

Finally, let's model Supervisor. It seems that Supervisor does not need to be modeled as Service, because Supervisor will not be called actively, but in fact, Supervisor nodes initiate communication with each other, so the addresses registered with ZooKeeper by Supervisor nodes must be accessible to each other. There are two ways to solve this problem on the Kubernetes platform.

In the first way, when the Supervisor node registers to the ZooKeeper, it does not use the hostname (Pod name), but uses the P address of Pod.

In the second way, using HeadlessService mode, each Supervisor node is modeled as a HeadlessService, and make sure that the container name (hostname) of the Supervisor node is the same as that of the HeadlessService, so that the address registered to the ZooKeeper by the Supervisor node is the same as the HeadlessService name, and the Supervisor nodes can communicate with each other's HeadlessService domain name.

Among them, the first method needs to modify the startup script and corresponding parameters of Supervisor, which is troublesome to implement, and the ② method can be implemented without modifying the image, so we use the first way to model. The following is the Service definition of a Supervisor node. Note the special writing of clusterIP: None:

Apiversion: v1 kind: Servicemetadata: name:storm-supervisorspec: clusterIP:Noneselector: app:storm-supervisorports:-port: 8000

The Pod corresponding to the node storm-supervisor is defined as follows. Note that the name of Pod is storm-supervisor, and the value of command is ["storm", "supervisor"]:

Apiversion: v1kind: Pod metadata: name: storm-supervisorlabels: app: storm-supervisorspec: volumes: name: config-volumeconfigMap: name: storm-configitems:-key:storm-conf.yaml path: storm.yaml containers: name: storm-supervisorimage: storm: 1.0imagePullPolicy: IfNotPresent command: ["storm", "supervisor"] volumeMounts:-name: config-volumemountPath: / conf restartPolicy:Always

We can define multiple Supervisor nodes, such as two Supervisor nodes in this case. After successfully deploying to the Kubernetes cluster, we enter the Storm management interface through port 30010 of Storm UI, and we can see the following interface.

The screenshot below verifies that two Supervisor nodes can also be successfully registered in the cluster. We see that each node has four Slot, which is consistent with our configuration in storm.yaml.

At this point, the modeling and deployment of the Storm cluster on Kubernetes has been successfully completed. Next let's take a look at how to submit the previously learned WordCountTopology job in the Storm cluster and observe how it works.

First of all, we can download the JAR file storm-starter-topologies-1.0.3.jar of the compiled WordCountTopology job from https:/ljar-download.com/, and then submit the Topology job to the Storm cluster through the Storm Client tool. The command to submit the job is as follows:

Storm jar/userlib/storm-starter-topologies-1.0.3.jar org.apache.storm.starter.ordcountTopology topology

Since the Storm Client tool is already included in the storm:1.0 image, the easiest way is to define a Pod and map the downloaded storm-starter-topologies-1.0.3.jar as a Volume to the / userlib/ directory in Pod. Set the startup command of the container to the above command to submit a job. Here is the YAML definition of this Pod:

Apiversion: v1 kind: Podmetadata: name:storm- topo-examplespec: volumes: name:user-libhostPath: path: / root/stormname: config-volumeconfigMap: name:storm-configitems:-key: storm-conf.yaml path: storm. Yaml containers: name: storm-topo-exampleimage: storm: 1.0imagePullPolicy: IfNotPresent command: ["storm", "jar", "/ userlib/storm-starter-topologies-1.0.3.jar", "org.apache.storm.starter.WordCountTopology", "topology"] volumeMounts:-name: config-volumemountPath: / conf name:user-lib mountPath: / userlib restartPolicy: Never

The above definition has the following key points.

Place the storm-starter-topologies-1.0.3.jar in the / root/storm directory of the host.

The startup command for the container is storm client, which submits the Topology job.

The Pod restart policy is Never, because as long as the Topology job is submitted.

After creating the above Pod, we look at the log of the Pod. If we see the following output, it indicates that the topology job of WordCountTopology has been successfully submitted to the Storm cluster.

Next, let's go to Storm UI to see how the job is executed. The following figure is a summary of WordCountTopology, with a status of Active, which ran for 8 minutes, took up 3 Worker processes, and ran a total of 28 Task.

After successfully submitting to the Storm cluster, we can go to the Supervisor node (Pod) to view the log output of the topology job. The log output of the job is in the directory / log/workers-artifacts. Each topology job has a separate folder to store the log. We search the last Bolt-- statistics of WordCountTopology and send the Tuple log. We can see the following result, that is, each Word (word) is statistically output.

The following interface shows the details of WordCountTopology, showing the relevant information about all the Spout in the topology, such as how many Task were generated, how many Tuple were sent, how many failed, and the relevant information about all Bolt, such as the number of Tuple processed, processing delay and other statistical information, which helps us to analyze the performance bottleneck of Topology jobs and the possibility of improvement.

In addition to the list information above, Storm UI also provides a topology diagram showing the operation of Stream. As shown in the following figure, we see that the data flow originates from the spout node, uses 3.13ms when processed by the split node, and then arrives at the count node. The processing time of the count node is 0.06ms.

Once the Topology job of Storm is running, it will not stop, so you will see that the statistics of Tuple in the following interface are increasing, because the Spout node of WordCountTopology is constantly generating Tuple, so if we need to stop the job, we can click the Deactvate button in the figure to suspend the job or terminate the job.

At this point, I believe you have a deeper understanding of "how to deeply understand Kubernetes micro service platform". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.