Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to solve the problem that the second kill system is down?

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to solve the problem that the second kill system fails". In the daily operation, it is believed that many people have doubts about how to solve the problem of how to solve the problem of Next, please follow the editor to study!

What caused it?

The plan of the panic buying activity is to start on time at zero:

At 22:00, the operators put the goods online through the background.

At 23:00, the little brother backstage has imported the goods into the cache and warmed up in advance.

The traffic at the beginning of the rush purchase is very heavy, and it is planned to undertake most of the user query requests through Redis to avoid all the requests falling on the database.

Cache hit

As expected in the figure above, most of the requests will hit the cache, but the cache time of all items is set to expire in 2 hours when the backend warms up the cache.

Therefore, all products fail at the same point in time, and instantly all requests fall on the database, resulting in the collapse of the database under pressure, and all the requests of users report overtime errors.

In fact, all requests go directly to the database, as shown below:

Cache avalanche

When did you find out?

In the morning, 01PUR 02century SRE received the system alarm and logged in to the operation and maintenance management system to find that the database node CPU and memory soared beyond the threshold, and quickly contacted the background developer for location and troubleshooting.

Why didn't you find out earlier?

Since the cache setting expiration time is 2 hours, the cache can hit most requests before 1: 00 in the morning, and the database service is in a normal state.

What measures were taken when you found out?

After the background brother found the problem through log location and troubleshooting, he carried out a series of operations:

First of all, most of the traffic is restricted through the API Gateway (gateway).

Then restart the database service that is down.

And then reheat the cache.

After confirming that the cache and database services are normal, the gateway traffic will be released normally, and the panic buying activity will return to normal at about 01:30.

How to avoid the next occurrence?

In fact, the cause of this accident is a cache avalanche, a huge amount of query data, requests directly fall on the database, resulting in excessive database downtime.

The solution to cache avalanche in the industry is actually more mature, such as:

Uniform expiration

Add mutex

Cache never expires

Uniform expiration

Set different expiration times to make the cache expiration time as uniform as possible. Usually, you can increase the random value for the validity period or uniformly plan the validity period.

Uniform distribution of cache key expiration time

Add mutex

Consistent with the cache breakdown solution, only one thread is allowed to build the cache at a time, and other threads are blocked and queued.

Mutually exclusive access

Cache never expires

Consistent with the cache breakdown solution, the cache never expires physically and updates the cache with an asynchronous thread.

Update cache asynchronously

At this point, the study on "how to solve the problem that the second kill system failed" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report