Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Explain in detail | how does Ali do double 11 full-link pressure test?

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

Guide: full-link stress testing is Ali's initiative. This article will introduce Ali's typical internal e-commerce activities (such as Singles' Day preparation) from the aspects of work content, operation process, operation summary, etc., in order to show you a complete stress testing process to help more enterprises and users better complete performance testing.

Preface

It is a clich é about the importance and necessity of performance testing, which is summarized as follows from a technical and business strategic point of view:

The purpose of performance testing is to solve the system performance uncertainty caused by flood peak flow in large-scale marketing activities. An ideal marketing campaign cycle should have the following closed-loop process: an additional step between PS:1 and 2. Environmental modification and basic data preparation. Emphasize that it must be in the production environment. Stress test environment preparation: the real online environment needs to be reused, and the stress test results and problem exposure are the most real situation. It can be globally identified and transmitted through pressure measurement (data into the shadow area). Basic data preparation: take the e-commerce scenario as an example, construct the core basic relevant data (such as buyers, sellers, commodity information) that meet the great promotion scenario, and use the online data as the data source for sampling, filtering and desensitization, and maintain the same order of magnitude.

It can be seen that performance testing carries out capacity assessment / bottleneck location & solution through real and efficient stress testing, and ultimately ensures the stable progress of the activity; the content of each link is very important, taking Ali double 11 activities as an example. We will have some details of process and division of labor in addition to technical preparation, implementation and guarantee. The following will be described one by one. With regard to process and management Alibaba full-link pressure testing has been in its seventh year from 2013 to now. in the past seven years, we have made continuous accumulation, summary and optimization progress. from the beginning of more than 200 people to participate in all-night pressure testing of large-scale project activities to later only a few people during the day pressure testing, more intelligent pressure testing method, such a large-scale project activity, can not be separated from effective process control and division of labor management. Alibaba has strict process control and division of labor management mode and experience in the full-link pressure test project, which has been promoted for many years. It is summarized as follows: the time point in the diagram is the simulated time point, which only serves as a reference for sequence.

Good process planning and management can greatly improve the efficiency of team cooperation. Coupled with the intelligent function of the tool platform, the overnight stress test of 200 people can be reduced to less than 10 people during the day, effective scheme + sufficient preparation + reliable platform technical products = successful stress test. The following will be combined with the previous articles in the topic series to introduce how Alibaba did the Singles Day stress test project in the aspects of data preparation, architecture transformation, traffic security strategy (environment and flow isolation), stress test implementation, and problem location analysis. When preparing the reconstruction data of the pressure test environment, we need to consider where the pressure test environment (that is, the deployment environment of the pressure test object) is, and different environments need to make different preparations. The stress test environment of the entire Ali economy, including Singles' Day stress test, all choose the online environment. At this time, it is necessary to evaluate whether the existing environment can be directly used if the full link stress test is to be carried out, whether the same API pressure test will be intercepted multiple times, whether it will be affected by dirty data, and if so, how to modify and avoid it. These problems can be summed up as two types of problems: business problems and data transfer problems. The problems are relatively clear, so we will transform them one by one according to these two types of problems. The transformation is divided into two aspects: business transformation and middleware transformation, which have been completed in the era of internal full-link pressure testing 1.0, which can be used as a reference point for external customers. At the same time, we already have a mature product solution to provide one-stop capacity to avoid complex transformation and maintenance costs. Business transformation is to solve the problem of business abnormality in the process of stress testing, or the problem that the pressure test request can not be carried out normally. Examples are as follows: traffic differentiation and identification: the distinction between pressure test traffic and business traffic, which can be identified in the full-link system; the problem of traffic singleness: for example, when an order is issued by the same person, repeated execution will fail; current limit interception of traffic: if there is a daily limit, it needs to be modified to adjust the configuration of access traffic degradation in real time. Eliminate the impact of pressure test data on the report dynamic check. The contents involved in business transformation cannot be exhausted one by one, and need to be sorted out one by one according to different business models, business architecture and configuration. After the general carding and transformation, all the subsequent new applications are developed in accordance with the specifications, and the basic leak detection can be carried out before the annual stress test. As a component connecting business applications, middleware transformation middleware has a vital function in stress testing, which is to pass the traffic identification down to the final database level. Although we have upgraded the middleware from the core applications in the past 13 years, we have stepped on many pitfalls, such as the comprehensiveness of the transformation, the cost of business code modification, version compatibility and so on. After the completion of the transformation, the model diagram of the pressure measurement flow can be referred to as follows: the transformation of the environment needs to be combined with the specific analysis and design of the business scene. The high availability solution on the cloud provides the service of the full-link stress test solution. After the data preparation promotion activity is determined, a review of the business model will be conducted to determine the technical architecture applications corresponding to the business model, the scope of the business that needs to be tested, the level of data, and what the data form is. Therefore, data preparation includes two parts: preparing business model data and pressure test flow data. Data preparation is mainly divided into two parts: the establishment of business model and the construction of basic data. Business model data, that is, the data related to the business model of pressure testing, including which API is involved, what is the pressure measurement level or proportional relationship between these API, and so on. The construction accuracy of the business model directly affects the referability of the pressure test results. The main purpose of the model design is to collect the business and abstract it into an executable stress test model, and predict and design the elements in each sub-model, and finally produce an executable stress test model. Before the Singles Day promotion, we will determine the relevant business and classify the scenarios. Existing business scenarios: collect and process previous data as prediction data to form a model prototype, combined with new business games to form an existing business model; new business scenarios: directly according to the new business, model matching, to form a new business model. Finally, the two types of business scenarios will be combined to form the final business model. The following figure serves as an example:

When assembling business model data, we need to pay attention to some key factors, such as modifying the specific e-commerce business model: 1 to N: whether a request corresponding to the downstream business interface will be called multiple times; proportion of business attributes: calculate the proportion of different types of business based on historical data After the business model is assembled, the business model in a single transaction should be a funnel. The funnel ratio between each layer will have different proportions according to different levels, different ways of playing and different rules. In a major promotion activity, this proportional relationship will not change in theory. The funnel model references are as follows: the business model corresponds to the pressure measurement level, while Taobao promotes all the RPS mode pressure testing, that is, from the server point of view, each API is a funnel proportional relationship, which can be well applied to capacity planning. The RPS model is also well supported in the commercial product PTS (performance testing Service, Performance Testing Service). Basic data if the business model corresponds to the interface / API to be tested, then the flow data is to determine what the API is testing, such as which users to log in, which goods and stores to view, which goods to buy, and even what the payment price is. In the traffic data, some of the above business model corresponds to the specific RPS value, the model reflects the proportional relationship, and the traffic data has the specific RPS value of each pressure test. The most important part of the flow data is the real stress test data, which can be called basic data, such as buyer, seller, commodity data and so on. The purpose of full-link pressure testing is to simulate double 11, so the authenticity of the simulation is very important, and the authenticity of the basic data is very important. The full-link pressure test will take the online data as the data source, and form the data that can be used as the pressure test after sampling, filtering, desensitization and other operations. When online data is taken out and used, especially when it comes to writing data, avoid causing dirty data. When we land or read, we use the form of shadow table. When the pressure measurement flow is identified, read and write the shadow table, otherwise read and write the online formal table. The shadow meter is produced for the safety of pressure measurement flow. Taobao internal system uses the pressure testing system, the data platform and the pressure testing platform are two sets of platforms. The data platform manages / provides pressure test data (including model data and flow data), and the pressure test platform provides the ability to ensure that pressure test requests can be sent from all over the country at a specified "protocol" and at a specified rate of magnitude. The data factory capability provided in the commercial product PTS (performance testing Service, Performance Testing Service) can well combine the internal data platform with the stress test platform to produce a unified pressure test system. As long as the user constructs the test data and defines the parameters in the form of file / custom, it can be configured in the use. Traffic security policy the main purpose of the traffic security policy is to ensure that the normal pressure flow and the data are messy, safely and in line with expectations. This includes two layers of consideration: the strict isolation of test data from normal data, that is, the monitoring and protection mechanism of illegal traffic.

Means: shadow table data. The shadow table is a writable pressure test data table which is consistent with the online structure but is in an isolated position.

Effect: data isolation, avoiding data confusion.

Security filtering of pressure test traffic, that is, it is not identified as attack traffic

Means: connect the security-related policy to the traffic control downgrade function; relax the security policy appropriately for the pressure test, or identify it according to the special mark

Effect: the pressure test traffic is not determined as the attack traffic, and the security of the online business is guaranteed at the same time of the successful pressure test.

Here, related to third-party systems, such as Alipay, SMS and other services, because of the particularity of the business needs to do the pressure test system to get through. Taobao realized the first full-link pressure test in 13 years, but failed to get through the downstream service link. In 14 years before the Singles Day stress test, and Alipay, logistics links to get through a comprehensive pressure test system. For external customers, Alipay, SMS and other corresponding bezel services can be provided for users to do full-link pressure testing. The pressure test is implemented according to the process control introduced at the beginning, and when everything is ready, the full-link pressure test can begin. In addition to the regular understanding of the formal stress test, we have two additional pre-operations: system warm-up and login preparation. Note: the single-link pressure test and debugging after the first transformation is not introduced here, this part is basically verified by the developers themselves, so it is not specially described here. With regard to the system preheating, the preheating mentioned here does not include the prerun we mentioned internally. Preheating is for the cached data to be cached well in advance, to achieve the state of large cache state, and to better achieve the purpose of our cache. The use of caching should be made the best use of, so it needs to be preheated. For external customers, they can preheat the system in advance through a first round of low-level full-link stress tests, including before the real promotion activity, that is, caching the data that needs to be cached in advance.

Login preparation: login preparation is mainly used in scenarios where long connections are maintained and seconds killed, that is, users log in step by step and then perform business operations. Therefore, if the magnitude is particularly large, you can prepare for login in advance, one is to simulate the real user login scenario, and the other is to protect the login system.

Formal pressure test: in general, the formal pressure test will carry out a variety of pressure test strategies according to the pressure test plan. The double 11 pressure test on Taobao generally consists of the following steps:

The main results are as follows: 1) Peak pulse: that is, the target peak flow at 0 point is completely simulated, the pressure measurement is carried out, and the performance of the system is observed.

2) system touch height: cancel the current limiting and degraded protection function, raise the current pressure measurement value (on the premise that the current target pressure test value has been reached, then the touch height test can be carried out), and observe the limit value of the system. Multiple rounds of lifting pressure measurements can be carried out until the system is abnormal.

3) current-limiting degradation verification: that is, to verify whether the current-limiting degradation protection function is normal. (introduction of AHAS) the commercial product AHAS (Application High availability Service, Application High Availability Service) provides a comprehensive current-limiting degradation capability, which can be used for full-link degradation protection.

4) destructive testing: this is mainly to verify the effectiveness of the plan, similar to the plan execution drill during the disaster recovery drill. That is, in order to continuously maintain the high-state pressure test, and verify the effectiveness of the plan, and observe the impact on the system after the implementation of the plan. For external customers, different pressure measurement level data can be configured to carry out multi-round pressure tests and observe the performance of the system. Stress testing should not be an one-time operation, but should be a repeated, multi-round verification operation. Problem location analysis after the end of the pressure test, the system performance and monitoring data in the pressure test process will be sorted out, the pressure test review will be carried out, and the current system bottleneck, subsequent improvement repair plan and the next round of pressure test time will be analyzed. In the analysis of positioning problems, because there are many systems involved and the forms of sub-business systems are different, we need specific analysis of specific problems, in which we inevitably need the intervention of front-line research and development. The pressure test report of the commercial product PTS (performance testing Service, Performance Testing Service) has detailed statistics and trend chart data, sampling logs and added monitoring data. The follow-up PTS will also provide architecture monitoring to help the performance test executive students to better determine whether the system is normal during the stress testing process from the perspective of the system architecture. Intelligent pressure testing Alibaba full-link pressure testing has entered the seventh year, from the beginning to feel the stone across the river, developed to a more intelligent form. Some of these functions will also be reflected in commercial products, please look forward to it. The problem of supporting capacity evaluation of more protocols is automatically found in full-link function testing & stress test rehearsal pressure test normalization elasticity is greatly promoted while pressing while bouncing. In the future, Alibaba will carry out the full-link pressure test to the 7th year, during which he has experienced too much training and accumulation. With the emergence of new technology, we will continue to improve ourselves and do better. At the same time, we also hope that we can endow external customers with so many years of experience, spend every round of promotion activities perfectly, and apply full-link stress testing to more daily scenarios.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report