Kubernetes practice of New Oriental: from Service-oriented ES to Kafka and Redis 04/18 Update SLTechnology News&Howtos

Kubernetes practice of New Oriental: from Service-oriented ES to Kafka and Redis

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Dictation / author:

Jiang Jiang, head of platform Architecture of New Oriental Information Management Department

Chen Boxuan, senior operation and maintenance engineer of New Oriental Information Management Department

Edit:

Rancher Labs

Cdn.xitu.io/2019/7/24/16c21ac9da0c6313?w=1394&h=933&f=jpeg&s=118992 ">

Jiang Jiang, head of platform Architecture of New Oriental Information Management Department

Chen Boxuan, senior operation and maintenance engineer of New Oriental Information Management Department

In 2017, New Oriental began to explore the use of containerization means to service the middleware business. In 2019, based on Rancher 1.6 using ES;, New Oriental once again began to expand the service of middleware, using Kafka, ES and Redis based on Kubernetes. The use of containerization means to service the middleware effectively improves the work efficiency of the operation and maintenance team and greatly shortens the software development process. This article will share New Oriental's attempt on middleware service.

From kindergarten, primary school, middle school, university and studying abroad, New Oriental covers almost every field of education. Our educational product line is very long and complex. So, what kind of IT ability do we use to support such a long education line? -- New Oriental Cloud.

We currently have 16 cloud data centers, including self-built and leased IDC, and we have directly connected Aliyun and Tencent Cloud through cloud networking, resulting in a hybrid cloud architecture across multiple cloud providers. New Oriental's cloud system is very special, and you can see some relatively traditional parts in it, such as SQL Server, windows and counter service programs. But there are also more up-to-date things, such as TiDB, containers, microservices, and so on. In addition, you can also find video dual division, interactive live broadcast and other parts of the partial Internet applications. An enterprise's IT architecture is closely related to an enterprise's business phase. New Oriental, like thousands of enterprises that have moved from traditional business to Internet +, is in a critical stage of digital transformation.

Next, let's talk about the containerization of the New Oriental. New Oriental has been making containers for many years. In 2016, New Oriental tried some business solutions based on Docker Swarm, which were not very ideal. 2017 is a year of changes in container orchestration architecture. We chose Rancher's Cattle engine to start our own containerization construction, and at the same time, we are waiting to see what happens in the industry. By 2018, the container construction of New Oriental evolved again and finally turned to K8S in an all-round way.

So what do you think of K8S in New Oriental? We think that K8S is the middle layer between the PaaS layer and the IaaS layer, and the interfaces and specifications are established for the lower IaaS layer and the upper PaaS layer. But do not care about the implementation of the function. If we want to do a complete set of container cloud, only K8S is not enough, but also need to introduce other open source components to supplement.

From the above picture, we can see that we use various open source groups to supplement the K8S ecology and combine it into the current container cloud platform of New Oriental.

Our runtime components are based on the Docker,Host host operating system. Ubuntu,K8S network components are selected using Canal, and Mellanox network card acceleration technology is used at the same time. As our k8s management platform, Rancher 2.0 provides important functions such as multi-tenant management, visualization, rights docking AD domain, etc., which helps us avoid a lot of background integration work, and provides us with a stable graphical management platform while saving manpower.

Let's take a look at our K8S practice, and as you can see from the figure above, we are completely based on the native community version of K8S. Deploy into a three-node HA architecture through kubeadm tools and nginx stream load balancing.

The key components of the cluster run in host network mode. This can reduce the resource consumption on the network and achieve better performance, such as Ingress components, build overlay container network through Flannel, and run upper applications.

Using containers definitely involves image management. New Oriental is an early user of Harbor. We have been using Harbor since version 1.2, and the back-end storage docks with ceph object storage. Currently, we are trying to mirror the function of distribution, using Aliyun's open source Dragonfly. It can convert north-south download traffic to east-west direction, making it possible for mirrors to copy between node. When the size of the cluster is very large, slow down the pressure load on the Harbor server caused by pulling the image.

Our K8S cluster is completely running on the physical machine. when the scale of the cluster becomes larger, the number of physical machines increases, so we have to use the management software of physical machines to reduce our operation and maintenance costs.

Here, we use Ubuntu's Maas, which is a bare metal management platform, which can install a physical machine without an operating system into a specified standard physical machine according to the template. Let's initialize the physical node with Ansible playbook, turn it into the physical machine node we want, and add it to the cluster.

As can be seen from the figure above, the standard physical machine is turned into a node of TiDB by loading the role of TiDB, and the standard physical machine is turned into a node of K8S through the role of K8S. We will push Osquery and Filebeat to the physical machine in each role. They can collect the machine information of the physical machine and push it to CMDB for asset management.

Our CI/CD is based on business differentiation. Some of us directly dock with New Oriental's own Jenkins, and another part of business directly docks with Rancher Pipeline functions to achieve CI/CD.

In terms of cluster monitoring, we are now using the Operator of the open source community Prometheus. At first we used native Prometheus, but native Prometheus was particularly troublesome in configuring alarm discovery and alarm rules.

After citing Operator, Operator helps us simplify the process of configuration, which is easier to use and is recommended.

It is worth mentioning that the cluster monitoring after Rancher 2.2 is based on Prometheus Operator. If you are interested, you can go back to the next new version of Rancher to try it.

Our log is set for two levels. The business log runs filebeat through sidecar, collects the data into the kafka cluster, and then consumes it to ES by logstash, which can reduce the pressure load on ES.

On the other hand, at the cluster level, the logs at the cluster level provide log collection function through Rancher 2.2 and collect them into the ES cluster with fluentd.

We have a total of five clusters, one is online business, two sets of production and testing; one is Platform1 cluster, running middleware applications, such as ES, Redis, Kafka, are also divided into production and testing two sets; and the other is a test cluster, which is used to test functions. K8S cluster upgrade iteration, test new components, test new functions are completed on this cluster.

Perhaps careful friends found that our cluster is version 1.14.1, the version is very new, why? Because Kubernetes 1.14 has a very important feature, called local PV, which is currently GA, we are very interested in this feature, so we upgrade the cluster all the way to 1.14.

At present, business applications are mainly in two aspects:

Handheld Bubble APP and New Oriental APP backend services are all running on the container cloud architecture.

The service-oriented middleware, such as Kafka, Redis, ES cluster-level middleware services, are all running on our container cloud architecture.

Why should middleware be serviced?

So why should we service the middleware?

From our point of view, middleware such as ES, queue, Redis cache, etc., all share several common features, just like the monster in the picture, which is very large.

Let me give an example to make a comparison: for example, I start a business virtual machine, 4C8G is more common, from 10 is 40C 80G. For comparison, can 40C80G start an elasticsearch node? 40C80G starting an es node is very stressful. In actual production, a high throughput ES node generally needs more than 100g of memory. From this example, we can see that the single resource consumption of middleware class load is very large.

In addition, middleware is widely used in the project, any application, will certainly use Redis, MQ and other components. A single deployment of any component will occupy multiple virtual machines. Each project hopes to have a small stove and monopolize an environment. While the small stove consumes more resources, coupled with the inevitable different versions and configurations of middleware, we need to hire a lot of people to maintain the middleware, which is a big problem.

Of course, if the whole company has a total of more than a dozen projects, it is possible to use virtual machines completely. But New Oriental now has 300 or 400 projects, and middleware consumes a lot of resources. If all the virtual machine resources are used, the cost is still very high.

So how do we solve this problem? We offer three arrows: containerization, automation and service.

Containerization is best understood. I just mentioned various configurations, which are unified through general-purpose containers. You must follow my standards. Deploy the container directly to the physical machine for better performance and resilience.

The next step in containerization is automation, which is, more precisely, coding. It is to code the infrastructure and manage the infrastructure in an iterative way. We use Helm and Ansible to code and automate.

After the first two steps have been done, we can move on to the third step. If we use our own management norms and best practices to constrain everyone, it may not be of much use. The easiest way is to export services and let everyone use our services.

Gradually merge the small stove into a big pot of rice, cut the peak and fill the valley, but also avoid the waste of resources. Each company has more or less some super VIP projects. This kind of business becomes a separate small stove in a big pot of rice. It's also a big-pot mechanism, but I separately provide you with resource isolation and permission isolation.

Before the service, we have a better understanding of the operation and maintenance personnel is the labor output of the project. I am very busy every day, but I don't see much achievement. After the service, we transform the labor export into the construction of the service platform, empower the front-line personnel and let the second-line personnel do more meaningful things.

Our practice: ELK/ES

Next, we will explain the arrangement of the various middleware of New Oriental one by one.

Elastic has a product called ECE, which is the industry's first containerized management platform for ES. ECE is based on K8S's 1.7 (or possibly 1.8) implementation. Various versions of ES instances are provided to users by means of containers and physical machines. But it also has some limitations: it can only manage ES, not other middleware.

This inspired us a lot, and we wondered if we could imitate and create our own service platform in the way of Rancher+Docker. This led to the birth of our first version of the platform to use Rancher 1.6 to manage ELK.

The figure above is our ELK cluster, which spans Rancher 1.6 and K8s and is currently in the middle of migrating to K8s.

We have two versions of ELK choreography: UAT environment choreography and production choreography. In the UAT environment, we adopt the scheme of rook (ceph), and the ES node starts in Statefulset mode. The advantage of this scheme is that in case which node fails, the storage computing is completely separated, and you can drift wherever you want.

Our production environment will be different, we make each ES node into a deployment, we do not let it drift, using Taint and label to limit the deployment to a certain host. The storage of POD no longer uses RBD, but is written directly to the local disk hostpath, and the network uses the host network to get the best performance.

What if I hang up? Hang up and wait for the resurrection on the spot, restart the machine on the spot, change the disk or change the hardware. What if it can't be resurrected? We have a machine management, the machine is killed, directly pull a new machine out of the pool, go online, and use the replication function of ES to copy the data.

You may wonder why you have two sets of plans, and the production arrangement is still so ugly.

We believe that the minimalist architecture is the most beautiful architecture, and the fewer components in the middle, the fewer points of failure, the more reliable. The performance of local disk is better than that of RBD, and the performance of local network is better than that of K8s network stack. The most important thing is: all of the middleware applications we orchestrated are actually distributed (or built-in HA architecture). They all have a built-in copy mechanism, and we don't have to consider the protection mechanism at K8S layer at all.

We have also experimentally compared the two schemes. If the node dies, the local restart time is much lower than the drift time, and the RBD sometimes has the problem of drift. The chance of a complete failure of the physics festival is still very small. So we finally chose a slightly more conservative solution for the online environment.

Our practice: Redis

Our current Redis is mainly a Sentinel scheme, which is also choreographed in a way that deployment is limited to specific nodes. Our Redis does not do any persistence and is used purely as a cache. This leads to a problem: if the master dies, the K8S will restart immediately, which must be less than the time it takes for the Sentinel to find it dead. After it gets up, it is still a master, an empty master, and then all the data in the rest of the slave will be lost, which is unacceptable.

We have also done a lot of research before. With reference to Ctrip's practice, we use Supervisord to start Redis when the container is started. Even if the Redis in POD is dead, POD will not be restarted immediately, thus giving the Sentinel sufficient time to switch between master and slave, and then restore the cluster through human intervention.

In terms of Redis optimization, we bind CPU for each Redis instance. We know that Redis processes are affected by CPU context switching or network card soft interrupts. So we put some restrictions on the node where the Redis instance resides and type taint. We tie all the processes needed by the operating system to the first N CPU, and free up the back CPU to run Redis. When starting Redis, the process will correspond to the CPU one-to-one to get better performance.

Our practice: Kafka

As we all know, Kafka is a high-throughput distributed publish and subscribe message. Compared with other middleware, it has the characteristics of high throughput, data persistence, distributed architecture and so on.

So, how does New Oriental use Kafka? Are there any special requirements for Kafka clusters?

We will group according to the business application scenario, which will be divided into three categories: the first is to use Kafka as the message queue of the transaction system; the second is to use Kafka as the middleware of the business log; and the third is to use Kafka as the message queue of the transaction system.

If we want to meet these three types of application scenarios, our Kafka must meet the security requirements. For example, transaction data cannot be transmitted in clear text, so it must be securely encrypted.

Next, let's talk about the native security encryption of Kafka. How do we do it? And how is it chosen?

With the exception of the financial industry, other industries generally do not use their security protocols when using Kafka. Without using the security protocol, the performance of Kafka cluster is very good, but it obviously does not meet the requirements of New Oriental for Kafka cluster, so we turn on data encryption.

We use Kafka native support, encrypt the channel of Kafka through SSL, make plaintext transmission into ciphertext transmission, authenticate users through SASL, and control its user rights through ACL.

Let's take a brief look at the difference between the two SASL user authentications. SASL_PLAIN is to write the user name and password in the jaas file in clear text, and then load the jaas file into the Kafka process in the form of startup parameters, so that when the client of Kafka accesses the server, it will take the jaas file to authenticate and start the user authentication.

SASL_GASSAPI is based on Kerberos KDC network security protocol, friends who are familiar with AD domain must know kerberos, AD domain also uses Kerberos network security protocol, the client directly requests KDC server to interact with KDC server to realize user authentication.

The two methods have their own advantages and disadvantages. In the end, New Oriental chose the first SASL_PLAIN. The reason is very simple, so that we can avoid maintaining KDC services separately and reduce the deployment cost of operation and maintenance. But there is a problem with this method, because the Kafka username and password are loaded through this process, and if you want to change files such as adding users and changing user passwords, you must restart the Kafka cluster.

Restarting the Kafka cluster is bound to have an impact on the business, which is unacceptable. Therefore, we use a flexible method, grouping according to permissions, a total of 150 users are pre-set in the jaas file, and the administrator assigns different users to the project. This avoids the embarrassment of adding projects to restart the cluster.

As shown in the figure above, we have opened two ports on the Kafka cluster, one is a port with user authentication and SSL encryption, and the other is a SASL_PLAIN port with no SSL encryption and only user authentication enabled. The client connecting to Kafka chooses the port to access according to its own needs.

After talking about the architecture, let's talk about the choreography of kafka. Our kafka and zk clusters are deployed through the host network, and the data volumes fall to the local physical machine through hostpath for better performance.

Both Kafka and zk are deployed with a single deployment, which is fixed on the node. Even if there is a problem, we will let it restart on the original machine and prevent the container from migrating at will.

In the aspect of monitoring, exporter+Prometheus scheme is adopted, which runs on the container network of overlay.

Our practice: service-oriented platform

Our idea when doing this service-oriented platform is simple: don't reinvent the wheel, make the best use of the existing technology stack, and combine helm, ansible, and K8s.

Take kafka as an example. Ansible generates helm chart based on environment, such as ssl certificate and embedded user configuration, which are generated by ansible according to user input. The generated results are inserted into helm chart, and then helm creates corresponding instances based on chart.

Here are some screenshots of our platform 1.0 Demo.

This is cluster management, and different cluster entrances will be deployed to different clusters to maintain its state.

The steps for applying for the service are shown above. The whole step is very simple, just select the cluster and the desired version.

In this management interface, you can see your IP, the access portal, and the port used by your instance (the port is automatically assigned by the platform). If it is a SSL connection, you can also get your certificate, which can be downloaded directly from the page. We will also connect the logs of the cluster to the platform later.

Our backstage is still very complicated. Ansible's AWX platform is used in the background. Here we can see that creating a cluster actually requires a lot of inputs, but these inputs are generated directly for the user in the foreground interface.

This is a complete Kafka cluster deployed, including Zookeeper, Kafka, exporter for monitoring, and so on. We have configured a kafka Manager for each cluster, which is a set of graphical management consoles, and you can manage the kafka directly in manager.

Monitoring alarm is essential, we have made some preset alarm rules based on Prometheus Operator. For example, whether there is a delay in topic. When the cluster is generated, Operator will automatically discover your endpoint, that is, the exporter,operator we just looked at, and then automatically add the alarm. There is no need for manual access at all.

We have generated a visual panel for each project, which can be viewed directly by logging in to Grafana when you need to monitor.

The above picture is a simple pressure test result. 512K Message,SSL+ACL configuration five partitions three copies, about 1 million messages per second, configuration is five 16C 140G memory containers, SSD hard disk. We found that as the size of the Message gets larger, so does the performance.

Prospect of service-oriented platform

I just talked about some of our work this year, so what do we want to do next year?

Starting from fiscal year 2020, New Oriental plans to service all services such as Redis and ES, and eventually integrate these exposed API into cloud portals for users in the group or third-party system calls.

Another thing I have to mention is Operator. Last week, Elastic released a new project called ECK,ECK, ES's official Operator.

With Operator, you simply type CRD,Operator to automatically generate the cluster you need.

We believe that the Helm-based approach can greatly simplify the work of Yaml, but it is not the end point, we believe that the end point is Operator.

Editor's note:

This article is based on the speeches made by Jiang Jiang and Chen Boxuan at the third Enterprise Container Innovation Conference (Enterprise Container Innovation Conference, ECIC) held by Rancher in Beijing on June 20, 2019. This year's ECIC has a large scale and has set up 17 keynote speeches throughout the day, attracting nearly 1,000 container technology enthusiasts and more than 10000 viewers online. You can read more transcripts of the conference speeches on the Rancher official account.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.