NetEase deploys operation and maintenance in OpenStack 07/06 Update SLTechnology News&Howtos

NetEase deploys operation and maintenance in OpenStack

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction to OpenStack

OpenStack is an open source IaaS implementation, which consists of a number of interrelated subprojects, including computing, storage, and networking.

As a result of the release under the Apache agreement, more than 200 companies have joined the OpenStack project since its inception in 2010, including AT&T, AMD, Cisco, Dell, IBM, Intel, Red Hat, etc.

Currently, there are 17,000 developers working on the OpenStack project from 139 countries, and the number is growing.

OpenStack is compatible with some AWS interfaces, and in order to provide more powerful functions, it also provides OpenStack-style interfaces (RESTFul API).

Compared with other open source IaaS, architecturally loosely coupled, highly scalable, distributed, pure Python implementation, and a friendly and active community make it very popular, and the semi-annual development summit also attracts developers, suppliers and customers from all over the world.

The main subprojects of OpenStack are:

Compute (Nova), which provides computing virtualization services, is the core of OpenStack and is responsible for managing and creating virtual machines. It is designed to be easy to extend, supports a variety of virtualization technologies, and can be deployed on standard hardware.

Object Storage (Swift) provides object storage service, which is a distributed, scalable, multi-copy storage system.

Block Storage (Cinder), which provides block storage services, provides persistent block-level storage devices for OpenStack's virtual machines. Supports a variety of storage backends, including Ceph,EMC.

Networking (Neutron) provides network virtualization services and is a pluggable, extensible, API-driven service.

Dashboard provides a graphical console service that allows users to easily access, use and maintain resources in OpenStack.

Image (glance) provides mirroring services designed to discover, register, and deliver virtual machine disks and mirrors. Multiple backends are supported.

Telemetry (Ceilometer) provides consumption statistics service, through which the OpenStack billing function can be easily realized.

Orchestration (Heat) integrates many components in OpenStack, such as AWS's CloudFormation, which allows users to manage resources through templates.

Database (Trove) is based on database-as-a-service built by OpenStack.

NetEase private cloud uses four components: Nova, Glance, Keystone and Neutron.

An overview of NetEase's private cloud platform

1. NetEase Private Cloud Architecture

NetEase private cloud platform is developed by NetEase Hangzhou Research Institute, which mainly provides infrastructure resources, data storage processing, application development and deployment, operation and maintenance management and other functions to meet the company's product testing / launch needs.

The image above shows the overall architecture of NetEase's private cloud platform.

The whole private cloud platform can be divided into three categories of services: core infrastructure services (IaaS), basic platform services (PaaS) and operation and maintenance management support services. Currently, it includes: CVM (virtual machine), cloud network, cloud disk, object storage, object cache, relational database, distributed database, full-text retrieval, message queue, video transcoding, load balancing, container engine, cloud billing, cloud monitoring, management platform and other 15 services.

NetEase's private cloud platform makes full use of the latest achievements of open source cloud computing. We have developed and deployed cloud hosts and cloud network services based on the keystone, glance, nova and neutron components of the OpenStack community.

In order to deeply integrate with other services (cloud disk, cloud monitoring, cloud billing, etc.) of NetEase's private cloud platform, and to meet the specific needs of the company's product usage and OPS management, our team independently developed more than 20 new features based on the community OpenStack version, including CVM resource quality assurance (computing, storage, network QoS), image multipart storage, CVM heartbeat reporting, tenant private network isolation in flat-dhcp mode, and so on.

At the same time, our team also summarized some deployment, operation and maintenance specifications and upgrade experience in the daily operation and maintenance OpenStack and the new version of the upgrade community.

For more than two years, the research and development of NetEase private cloud platform OpenStack team adheres to the concept of open source and open, and always follows the principle of "source community, give back to the community".

While enjoying the free OpenStack community to continuously develop new features and repair bug, our team is also active in making its own contribution to the community, thus helping the OpenStack community to grow.

Over the past two years, our team has submitted nearly 100 commits for new feature development / bug repair to the community, and more than 50 community bug for repair. These community contributions include Essex, Folsom, Havana, Icehouse, Juno and other versions of OpenStack.

Thanks to the increasing stability and maturity of OpenStack, the private cloud platform has been running steadily for more than 2 years, providing services for as many as 30 Internet and game products of NetEase.

From the perspective of application effect, NetEase private cloud platform developed based on OpenStack has achieved the following goals:

It improves the utilization rate of the company's infrastructure resources, thus reducing the hardware cost. Taking physical server CPU utilization as an example, private cloud platforms increase the average CPU utilization from less than 10% to 50%.

The automation level of infrastructure resource management and operation and maintenance is improved, thus the operation and maintenance cost is reduced. With the help of Web self-service resource application and allocation and automatic deployment of cloud platform services, the number of system operation and maintenance personnel has been reduced by 50%.

It improves the flexibility in the use of infrastructure resources, thereby enhancing the ability to adapt to product business fluctuations. Using virtualization technology to turn the physical infrastructure into a virtual resource pool, through effective capacity planning and on-demand use, the private cloud platform can well adapt to product burst business.

Introduction of NetEase's reference Scheme for OpenStack deployment

In the specific production environment, in order to strike a balance between performance and reliability, the keystone backend uses Mysql to store user information and memcache to store token.

In order to reduce the pressure of access to keystone, keystoneclient for all services (nova,glance,neutron) is configured to use memcache as the cache for token.

Since NetEase's private cloud needs to be deployed in multiple data centers, each of which is geographically isolated naturally, this is a natural disaster recovery method for upper-level applications.

In addition, in order to meet the functions and operation and maintenance requirements of private cloud, NetEase private cloud needs to support two network models: nova-network and neutron.

1. Multi-area deployment method

To meet these requirements, we propose an enterprise-oriented multi-area deployment scheme.

On the whole, the deployment of multiple regions is relatively independent, but it can be interconnected through the intranet. Each region includes a complete OpenStack deployment, so independent mirroring services and independent network modes can be used. For example, region A uses nova-network and area B uses neutron, which does not affect each other. In addition, in order to achieve user single sign-on, keystone is shared between regions. The division of the region is mainly based on the network model and geographical location.

two。 Computing node, control computing node

Unlike typical OpenStack deployments, which divide hardware into computing nodes and control nodes, in order to make full use of hardware resources, we try to make the deployment symmetrical, that is, the offline of any node will not affect the overall service.

Therefore, we divide the hardware into two categories: computing nodes and control computing nodes. The compute node deploys nova-network,nova-compute,nova-api-metadata,nova-api-os-compute. The control computing node deploys nova-scheduler,nova-novncproxy,nova-consoleauth,glance-api,glance- registry and keystone in addition to the services of the computing node.

The services that provide API to the public are nova-api-os-compute,nova-novncproxy and glance-api,keystone. These services are stateless and can be scaled out easily, so these services are deployed after the load balancer HAProxy and are highly available using Keepalived. In order to ensure the quality of service and facilitate maintenance, we do not use nova-api, but are managed separately by nova-api-os-compute and nova-api-metadata. In terms of external dependencies, NetEase private cloud deploys highly available RabbitMQ clusters, master / slave MySQL, and memcache clusters.

3. Network planning

In terms of network planning, NetEase private cloud mainly uses nova-network 's FlatDHCPManager+multi-host network model, and divides a number of Vlan, which are respectively used for virtual machine fixed-ip network, internal network floating IP network and external network network.

The operation and maintenance platform independently developed by NetEase is used for monitoring and alarm, which is similar to Nagios, but more powerful. The more important monitoring alarms include log monitoring and process monitoring. Log monitoring ensures that the service is found as soon as an exception occurs, and the process monitoring ensures that the service is running normally. In addition, NetEase's private cloud uses Puppet for automatic deployment and StackTach to help locate bug.

Configuration of each component of OpenStack

There are hundreds of configuration items in OpenStack Havana, and most configuration items can be configured with default values, otherwise just understanding the meaning of so many configuration items is enough to make OPS personnel collapse, especially for those OPS personnel who are not familiar with the source code. Below are some of the key configuration items in NetEase's private cloud and explain how they affect the functionality, security, and performance of the service.

1.Nova key configuration

This item is used to generate iptables rules for nova metadata api request forwarding on the host. If not configured properly, the ec2/OpenStack metadata information cannot be obtained inside the virtual machine through the IP of 169.254.169.254. The generated iptable rules are as follows:

Its other use is that the virtual machine communicates with the destination host during resize, cold migrate and other operations. The default value of this item is the public network IP address of the host. It is recommended to change it to a private network address to avoid potential security risks.

This item is the IP address that the nova-api-metadata service listens to. You can see from the above iptables rules that it has something to do with the configuration item of my_ip. It is the most sensible choice to keep it consistent.

We only deploy novncproxy processes on some nodes and add these processes to the HAProxy service to achieve the high availability of the novnc proxy process. Multiple HAProxy processes use Keepalived to implement the high availability of HAProxy. You only need to expose the virtual IP address managed by Keepalived:

The benefits of this deployment are:

1) implement the high availability of novnc proxy service

2) the public network addresses of the nodes related to the cloud platform will not be exposed

3) easy expansion of novnc proxy service

But there are also some shortcomings:

1) the virtual machine listens on the private network IP address of the computing node where it is located. Once there is a problem with the network isolation between the virtual machine and the host, the VNC address interface of all virtual machines will be exposed.

2) problems will be encountered during online migration, because the private network IP monitored by VNC does not exist on the destination computing node. However, this problem is already being solved by the nova community, and it is believed that J version will be incorporated into it soon.

When the nova-compute process starts, the virtual machine that should be running should be started, which means that the virtual machine record in the nova database is running, but the virtual machine is not running on Hypervisor. When the computing node is restarted, this configuration item is of great use. It allows all virtual machines on the node to run automatically, saving the time of manual processing by operation and maintenance personnel.

There is no limit on the access frequency of API. The number of concurrent visits to API will be limited after it is enabled. You can determine whether to enable it based on the number of visits to the cloud platform and the number and affordability of API processes. If this option is disabled, the processing time of API requests will take longer in cases of large concurrency.

The maximum return data length limit of nova-api-os-compute api. If set too short, part of the response data will be truncated.

The filter available for nova-scheduler: Retry is used to skip compute nodes that have been attempted but failed to prevent rescheduling; AvailabilityZone filters those user-specified AZ to prevent users' virtual machines from being created into unspecified AZ; Ram filters out compute nodes that run out of memory; and Core filters out compute nodes with insufficient VCPU Ecu is a filter developed by ourselves, developed in conjunction with our CPU QoS function, to filter out compute nodes with insufficient number of ecu; ImageProperties is to filter out compute nodes that do not meet the image requirements, for example, the images used by QEMU virtual machines cannot be used on LXC computing nodes; Json is to match custom node selection rules, such as not being able to create certain AZ, but to create the same AZ as those virtual machines. There are other filters that you can choose according to your needs.

The nova-compute scheduled task has been deleted in the database, but for the processing action after the virtual machine (that is, the audit operation mode of the wild virtual machine) still exists in the Hypervisor of the computing node, it is recommended to choose log or reap. Log method requires operation and maintenance personnel to find those wild virtual machines according to log records and perform subsequent actions manually. This method is relatively safe to prevent problems such as cleaning up user virtual machines due to unknown exceptions in nova service or bug, while reap method can save operation and maintenance staff manual intervention time.

The synchronization threshold between the user quota and the actual usage in the instances table, that is, how many times the user quota has been modified to force the synchronization of one usage to the quota record

The synchronization interval between the user quota and the actual usage, that is, how many seconds after the last quota record was updated, it is automatically synchronized with the actual usage when it is updated again.

As we all know, there are still a lot of quota bug unresolved in open source nova projects. The above two configuration items can solve the problem that user quota usage does not match the actual usage to a large extent, but it will also bring some database performance overhead, which needs to be set reasonably according to the actual deployment situation.

# Compute node resource reservation #

The binding range of the virtual machine vCPU can prevent the virtual machine from competing for the CPU resources of the host process. The recommended value is to reserve the first few physical CPU and allocate all the subsequent CPU to the virtual machine, which can be combined with cgroup or kernel startup parameters to ensure that the host process does not occupy the CPU resources used by the virtual machine.

The proportion of physical CPU oversold is 16 times by default, and hyperthreading is also counted as a physical CPU. You need to determine the specific configuration based on the specific load and physical CPU capability.

The oversold ratio of memory allocation is 1.5 times by default. It is not recommended to enable oversold in production environment.

Memory reservation, which cannot be used by virtual machines

Disk reserve space, which cannot be used by virtual machines

Service offline time threshold. If the nova service on a node does not report a heartbeat to the database after this time, the api service will think that the service has been offline. If the configuration is too short or too long, it will lead to misjudgment.

RPC call timeout, because a single process of Python can not be truly concurrent, so RPC requests may not be able to respond in time, especially when the target node is performing scheduled tasks that take a long time, so it is necessary to comprehensively consider the timeout time and wait tolerance time.

Whether to enable the multi-node mode of nova-network. If multi-node deployment is required, this item needs to be set to True.

2.Keystone

There are fewer configuration items, mainly to weigh what kind of back-end driver to configure to store token, usually SQL database, or memcache. Sql can persist storage, while memcache is faster, especially when users need to update their passwords and need to delete all expired token. In this case, the speed of SQL is very different from that of memcache.

3.glance

Includes two parts, glance-api and glance-registry,:

The number of child processes that glance-api processes requests. If configured to 0, there is only one main process, and if the corresponding configuration is 2, there is one main process plus 2 child processes to process requests concurrently. It is recommended to make a comprehensive determination based on the computing capacity of the physical node in which the process is located and the number of requests for the cloud platform.

It has the same meaning as configuration osapi_max_limit in nova

The maximum number of items returned in a response can be specified in the request parameters. The default is 25. If the setting is too short, the response data may be truncated.

The underlying layer of OpenStack depends on software version, configuration, and performance tuning

1. Virtualization technology selection

In the architecture of the private cloud platform, OpenStack relies on some underlying software, such as virtualization software, virtualization management software and Linux kernel. The stability and performance of these software are related to the stability and performance of the whole cloud platform. Therefore, the version selection and configuration tuning of these software is also an important factor in NetEase's private cloud development.

In NetEase's private cloud platform, we choose KVM virtualization technology, which is the best compatible with Linux kernel. Compared with Xen virtualization technology, KVM virtualization technology is more closely related to the Linux kernel and easier to maintain. After selecting the KVM virtualization technology, the virtualization management driver uses the computing driver libvirt configured by the OpenStack community for KVM, which is also a set of open source virtualization management software that is widely used and highly active in the community, supporting various virtualization management including KVM.

On the other hand, NetEase uses the open source Debian as his host kernel, the source uses the wheezy stable branch of Debian, and KVM and libvirt also use the package version in the Debian community wheezy source:

two。 Kernel selection

In terms of kernel selection, we mainly consider the following two factors:

Stability: at the beginning of the development of private cloud platform, stability is a basic principle of NetEase's private cloud development. We use the Debian Linux version, and there is no doubt that Debian's native kernel is more stable. This is also our first choice.

Functional requirements: in NetEase's custom development, in order to ensure the service performance of the virtual machine, we developed CPU QoS technology and disk QoS, which relies on the underlying CPU and blkio cgroup support. Therefore, we need to turn on the cgroup configuration option in the kernel. On the other hand, NetEase's private cloud will support container-level virtualization such as LXC. In addition to cgroup, LXC also relies on the namespace feature in the Linux kernel.

Considering the above factors, we chose the Linux 3.10.40 kernel source code of the Debian community, opened cgroup configuration options such as CPU/mem/blkio and namespace options such as user namespace, and compiled a Linux kernel adapted to NetEase private cloud. From the perspective of usage, after selecting the underlying dependent software of the above version of OpenStack, NetEase's private cloud runs stably, and we will update these software in due course.

3. Configuration optimization

After the stability of NetEase private cloud was guaranteed, we began to tune the performance. On the one hand, we refer to some excellent practices of IBM, and make some configuration optimizations in CPU, memory, Imax O and so on. On the whole, NetEase private cloud will actively learn from the industry's excellent practices to optimize the overall performance of the private cloud platform while paying attention to stability.

3.1CPU configuration optimization

In order to ensure the computing power of CVMs, NetEase private cloud developed CPU QoS technology, specifically, it uses the uniform scheduling of cfs time slices and the binding technology of process pinning.

With reference to the analysis of IBM, we know the advantages and disadvantages of process pinning technology, and tests also verify that there are great differences in performance among CVMs with different binding methods. For example, the performance difference between two VCPU bound to non-hyperthreaded cores of different numa nodes and assigned to a pair of adjacent hyperthreaded cores is 30% to 40% (tested by the SPEC CPU2006 tool). On the other hand, CPU0 has a heavy load because it handles interrupt requests, so it is no longer suitable for use in CVM. Therefore, considering the above factors and several rounds of testing, we finally decided to reserve CPU 0-3, and then let the CVM be scheduled by the host kernel in the remaining CPU resources. The final CPU configuration is as follows (libvirt xml configuration):

3.2 memory configuration optimization

In terms of memory configuration, the practice of NetEase Private Cloud is to turn off KVM memory sharing and open large transparent pages:

After SPEC CPU2006 testing, these configurations improve the CPU performance of CVM by about 7%.

3.3I/O configuration optimization

1) the configuration optimization of disk Imax O mainly includes the following aspects:

KVM's disk cache method: drawing on the analysis of IBM, NetEase's private cloud adopts the cache method of none.

Disk io scheduler: currently, cfq is selected as the host disk scheduling strategy for NetEase's private cloud. In the process of practical use, we find that the scheduling strategy of cfq can easily lead to the problem of long scheduling queue and utils 100% for those low-configuration disks. NetEase's private cloud will also learn from the practice of IBM to tune the parameters of cfq and test the deadline scheduling strategy.

Disk I QoS O QoS: in the face of the increasingly prominent problem of disk I QoS O resource shortage, disk ID O QoS is developed by NetEase private cloud, which is mainly based on blkio cgroup to set its throttle parameters. Because the libvirt-0.9.12 version restricts disk I to O in QEMU, and there is a fluctuation problem, our implementation is written to cgroup through Nova command execution. At the same time, we have also developed and submitted blkiotune's throttle interface setting patch (which has been incorporated in the libvirt-1.2.2 version) to the libvirt community to completely solve this problem.

2) the configuration optimization of network IBO.

We mainly turned on vhost_net mode to reduce network latency and increase throughput.

Operation and maintenance experience

1. Use experience

Open source software bug is inevitable, but the new version will be much easier to use than the old version, especially for OpenStack, which is growing rapidly, which we have experienced after using Essex, Folsom and Havana versions, so it is recommended that all kinds of OpenStack users can follow up with the community version in time and keep pace with the community.

Do not easily "optimize" the so-called functional performance of the community version, especially before exchanging views with community experts, otherwise such "optimization" is very likely to turn into a point of failure or performance bottleneck, which may eventually lead to inability to synchronize with the community, after all, the ability and knowledge reserve of a company or team (especially small companies and small teams). It is difficult to compare with hundreds of experts in the community.

Refer to the deployment architecture solutions shared by all kinds of large companies, and try not to work behind closed doors. Especially for open source software, the usage scenarios of various companies and teams are very different, and there are all kinds of surrounding components. It is the best way to refer to industry practices.

There may be many ways to implement some details, but each method has its advantages and disadvantages and needs to be fully demonstrated, analyzed, tested and verified before it can be deployed to the production environment.

All deployment plans and functional designs should take into account the problem of smooth upgrade, even if you get the information that you can stop service during upgrade, you should still try to avoid this situation, because the impact of service suspension is very difficult to define.

two。 Operation and maintenance criteria

OpenStack is also a back-end system service, and all the basic principles related to system operation and maintenance are applicable. Here are some experiences summarized according to the problems encountered in the actual operation and maintenance process:

The mismatch between the default values of configuration items and the actual environment may lead to a variety of problems, especially the network-related configuration has a strong relationship with hardware, and the hardware of production environment and development environment are heterogeneous, resulting in some default values are not applicable in the production environment. Guidelines: each version must be tested in the same environment as online hardware before it can be launched.

If you do a good job in capacity planning, the allocated quota should be less than the total capacity of the cloud platform, otherwise various problems will occur, resulting in operation and maintenance development spending a lot of unnecessary energy to locate and analyze the problem.

Too many configuration items are prone to errors, which need to be checked carefully with developers. When launching, you must first verify whether the changes are correct through the noop function of puppet before you can really go online.

Network planning should be done well in advance, such as fixed IP, floating IP, VLAN, etc., network expansion is difficult and risky, so planning in advance is the safest. One principle is that big is better than small, and more is better than less.

Network isolation should be done well, otherwise there is no way to guarantee the network security of users.

Attention should be paid to the issue of information security, which is a clich é. Every platform has the same problem, but we should pay more attention to it. Once there are security vulnerabilities, all virtual machines are faced with serious threats.

Article source: Ma GE Education

Official Wechat: Ma GE linux operation and maintenance staff

Technology Exchange Group: 537045784

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.