Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to build a continuous deployment platform under the traditional mode based on Saltstack and Artifactory

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article is about how to build a continuous deployment platform based on Saltstack and Artifactory in the traditional mode. The editor thinks it is very practical, so I hope you can learn something after reading this article. Let's take a look at it with the editor.

1. Continuous deployment. Current situation

Due to the lack of a standard continuous deployment process, it leads to confusion in version management, confusion in product management, long launch time, incomplete online test coverage, more faults and complex troubleshooting after the increase of business traffic. Operation and maintenance, testing, and developers may have to go through an all-night experience every iteration of the version, and there will still be a lot of online failures on the second day of launch.

two。 Pain point

L the coverage rate of automatic release system is low.

There is no standardized release process.

A) focus only on Agile and ignore quality issues

B) frequent changes lead to an increase in failure rate

C) there are many kinds of development languages, the management of release products is chaotic, and the way of release is complex

Security issues are easy to be ignored.

2. Tool introduction 1. Saltstack

ZeroMQ-based open source configuration management tools. The reason why the author chose to use saltstack instead of ansible is that ansible is based on ssh communication. After managing and controlling more than 500 hosts, the way of issuing commands based on message queue is better than ssh protocol in terms of stability and speed. The reason for choosing saltstack directly is that there are different parallel technology stacks in the service development team, especially when java and. Net coexist, saltstack's support for windows is obviously better than ansible. It is easier to be used as an underlying publishing tool for multiple platforms.

The automatic deployment platform based on SaltStack is mainly built with three features: grains, pillar and state. Grains is used to obtain default environment configuration information, pillar is used to define environment information, and state is used to arrange release files for release.

2. Artifactory

There are two kinds of full-language product warehouse management software, open source version and enterprise version. Open source version supports maven product management; Enterprise Edition supports full language product management, supports metadata management, provides highly available deployment methods, matches nvd and vulnDB databases, and provides vulnerability scanning capabilities.

Third, for the above pain point solution 1. Low coverage of automated release

The standardization of publishing tools is realized by building compatible multi-platform deployment unified publishing tools and replacing the traditional copy of shell scripts. Through the state feature of SaltStack, more than 90% manual operations such as cross-platform basic service release, service start and stop, file release, configuration release, remote host management and so on are realized. Use SaltStack's state orchestration file, execute remote commands, get the product and configuration through Artifactory, and publish the required version online.

The main solution is in the deployment platform, through the json format to describe the release process, through yaml.dump (sls_json) to convert json files into yaml configuration files, and finally through the platform scheduling saltstack to execute the scheduled tasks.

The converted yaml file format is as follows:

two。 Standardized release process

L backup

The first step in the scheduling of publishing tasks is backup, which requires two mechanisms: local backup plus remote backup, local backup for fast rollback and off-site backup for environment reconstruction.

L cut traffic (blue and green deployment)

For services, especially stateful services, you need to take the node offline in the registry to ensure that all the processing of this node is completed before deployment.

For the page, you need to log out the node on the load balancer and deploy the web page with no traffic.

L deployment

Through the sls feature of saltstack, the deployment files are orchestrated and multiple deployment tasks are released uniformly.

When deploying, we hope to find information similar to the following on the deployment page, such as the requirement id corresponding to the deployment package, the submission information of the corresponding code of the deployment package, the pass rate of the automated test of the deployment package, the code scan results of the deployment package, the security scan results of the deployment package, the results of manual testing of the deployment package, and so on. Operation and maintenance personnel need to see this kind of information during the release process to determine whether the package has passed all the quality levels and has the launch conditions, so as to determine whether the launch can continue. Here we use the metadata function of Artifactory to record the information of the whole life cycle of the software package and connect it to the release platform through api. Give the operation and maintenance staff a complete information record of the package.

L automated testing

Here, automated testing can be understood as checking whether the communication of the service port is normal, whether the function on the regression line is available, whether the defect has been fixed, whether the new feature has been deployed, and so on. At the same time, preheating services and sites are needed here to open up the business process through automated testing.

L flow regression (canary)

Part of the real traffic is switched to the deployed application, and the healthy operation of the new online application is preliminarily judged by full-link log tracking or monitoring indicator feedback, and this result is used as the basis for subsequent release or fallback.

L deployment completion (rolling release)

Use trough time to pull traffic to deployed applications while upgrading the rest of the applications.

L change management notification.

After the successful launch, we need to inform everyone in time that the online version has changed, the product manager needs to update the document in time, and the operator needs to inform the user in a timely manner.

L rollback

Any release needs to consider the rollback scheme, which needs to be rolled back to a specified version for a single application; for multiple applications, you need to specify a rollback set, and specify the rollback scheduling task through the orchestration task at the time of the release. For updates such as databases, if the review is complex, you need to explicitly roll back the plan or make the version compatible in the business before the upgrade plan is made.

3. Establish a unified product management warehouse

Most Internet companies already have unified management of source code repositories, but they are still in a primitive management situation for products, such as using ftp and open source management repositories in each language. The problem here is that operators need to put a lot of effort into maintaining different package management platforms (such as ftp, maven, nuget, pypi, docker image center, etc.). In addition to wasting a large number of manpower costs of the operation and maintenance team, it also extremely complicates the release process. Publishers need to get online packages on different platforms, resulting in confusion in the release process and configuration of the release platform. And most open source components do not provide high availability, once the hardware or software failure, it will seriously affect the release efficiency.

To solve this problem, we use Artifactory to manage product repositories in all languages. In the same way as Unified gitlab, we uniformly manage the products of the whole company and become the only package source for docking and publishing platform, thus standardizing the release process.

4. Vulnerability scanning

At present, most of the security team scans are carried out after the service deployment is launched, which can easily lead to the abandonment of the entire iteration due to security vulnerabilities in the version, and all packages need to be recompiled, re-tested and launched, wasting a lot of time and reducing the speed of iteration.

The solution is to advance the vulnerability scanning step, scan external references and internal common libraries when the package is constructed and compiled, and even when the developer code code, and once the high-risk vulnerabilities are matched, submit or build the terminal directly. If you must continue to build, you can record the scan results in the metadata of the artifact for testers and operators to view. At present, JFrog Xray and other security scanning former homes provide this kind of capability. You can also use open source software, such as cvechecker, to scan packages in the compilation pipeline to prevent entire iterations from failing due to security vulnerabilities.

Fourth, improve it in the later stage. Set up a measurement system to improve the quality of release

In the agile development model, developers and testers often report to the same manager. For the sake of fast iterative online functions, some teams will be opportunistic and release packages that do not have complete tests online for testing. The direct manifestation of this problem is that in order to solve a bug, it is possible to hotfix or release a new version of the same application or page multiple times. This is very dangerous, regardless of the stability of the online business. To prevent this from happening, we can take some measures or specifications to constrain the development team. For example:

Number of new bug triggered after launch

Number of times to post the same question in a short period of time

The number of P5-P0 level failures due to online

Fault recovery time after launch

Number of rollbacks after launch

The number of emergency online during the non-online time

...

Review each team on a monthly or regular basis by collecting the above data. And review the release status, through the formulation of specifications, evaluate the delivery quality and delivery capability of the team, and mine the release problems and pain points in the team, so as to improve the release quality and reduce the online failure rate.

two。 Develop metrics and conduct release quality assessment

Each team has an initial score of 100, which is reset every month, and this score is used as a criterion for iterative quality each month. The score is not linked to the kpi assessment, but is only used to drive the development team to improve efficiency.

The evaluation is divided into two dimensions: project team release stability score and service (site, app, micro-service, etc.) release quality score.

L release hotfix during off-line time (project team minus 1 point, service minus 1 point)

L code class hotfix, the same project is released more than 3 times a day (project team minus 1 point, service minus 2 points)

L hotfix release failure or rollback (project team minus 2 points, service minus 2 points), whether the release failed or not shall be determined by the operation and maintenance team.

Exception or failure of script such as database (project team minus 1 point)

Monthly number of service releases (take top5, services are sorted minus 5 to 1 points)

L for online accidents above level P2 due to hotfix, the project team and related services will be reduced by 5 points and 5 points respectively.

L if the hotfix of the project team this month exceeds 30% of the average of the previous three months, it will be subtracted by 10 points.

3. Change management

In google's SRE system, change management is the most important part of DevOps system. According to past experience, 90% of online failures are caused by online changes, including software, hardware, environment, and other factors. The purpose of building a change management system is to quickly locate online problems, stop losses caused by changes, and timely inform relevant personnel to do a good job in fault prevention. Therefore, the change management system also needs us to focus on the construction and improvement.

Landing methods include, but are not limited to, the following:

L Wechat notifications such as operation and maintenance personnel, corresponding developers and testers, product managers, etc.

L scrolling the record of recent changes on the big screen

L change records are synchronized to the monitoring system

Although in the agile development mode, the product, development and testing teams are all running fast in small steps, the operation and maintenance staff must have their own principles, which must standardize the entire launch process and uniformly manage the DevOps tool chain.

The above is based on how Saltstack and Artifactory build a continuous deployment platform in the traditional mode. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report