A cache performance problem troubleshooting 07/09 Update SLTechnology News&Howtos

A cache performance problem troubleshooting

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Overview

The following shares have skipped many pitfalls, including redis, tomcat environment configuration, machine hardware configuration and other issues (consistent with online, or hardware performance reduction factor, such as online: 8C16G, stress test: 4C8G, with a simple difference of 2 times), and directly move the main idea of mining bottlenecks out of the table.

Analysis of pressure test data

Global graph preview

Cdn.xitu.io/2018/11/29/1675ebca0b8bfcb2?w=1919&h=880&f=png&s=125839 ">

By testing a LVB viewing page with high concurrency, an interesting feature was found in APM (Pinpoint) monitoring:

The data in the two red boxes in the figure above (nearly 10s) occur about 30 minutes apart. Around 16:20, the system cannot support the abnormal unavailability of the service. With curiosity, trace the stack of method calls, as shown in the following figure:

How long does this method take? First of all, figure out some concepts in Call Tree:

It can be seen that this sql query method takes more than 14 seconds. Why? The sql statement has been shown in APM, and executing the query in mysql finds that the execution time is very fast, so what's the problem? We can only keep digging!

By comparing the same url, when the request response is in milliseconds, the data is found as follows:

After getting the data from redis, we no longer executed the sql query. Through this analysis, we decided to trace the truth of code recovery (tests that don't understand the code are not good developers):

You can see that after the cache expires, query the database directly.

Solution SQL Optimization: low priority

From the perspective of data analysis, sql optimization is of little use, not because it returns a large amount of data and lacks an index, which can be skipped this time.

Cache concurrency: high priority

Scenario: when the concurrent access of a website is high, if a cache fails, multiple processes may query DB at the same time and set the cache at the same time. If the concurrency is really high, this may also cause excessive DB pressure and frequent cache updates.

How to handle it: lock the cache query, lock it if KEY does not exist, then check DB to cache, and then unlock it; other processes wait if they find a lock, and then return data or enter DB query after unlocking.

Experience summary

1. Make good use of monitoring tools, such as APM, for link monitoring, server performance, and method call sequence observation.

2. Trace method stack and related logs

3. Go deep into the code and find out the essence.

Official account of Wechat: Le Shao Blackboard

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.