Analysis of the reasons why the database runs very slowly 07/01 Update SLTechnology News&Howtos

Analysis of the reasons why the database runs very slowly

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "the analysis of the reasons for the slow operation of the database". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

When you find that the database query is particularly slow, and you can't find the reason in terms of hardware configuration, SQL optimization and indexing, you may need to find out the reason from the performance of the database's computing engine itself.

How important is the computing engine performance of the database? We can take a car as a simple analogy. The hardware configuration of the server is the infrastructure, which is equivalent to the road of the car, and the driving effect of the highway and the mountain village dirt road is definitely different; the query optimization of SQL is equivalent to the driving level; and the database computing engine is equivalent to the automobile engine, which is not only the source of database performance, but also the core technical barrier of various manufacturers.

Then, we explore how to improve the performance of the database from the implementation technology of the database computing engine. The following figure is a simplified process from issuing a SQL statement from the client to returning the result to the client.

If the database kernel is regarded as an organization, then the optimizer is located at the top of the organization, giving orders as the head of the organization; the executor is located in the middle of the organization, strictly carrying out the plan issued by the optimizer, reading data from the storage space for processing, and finally returning it to the client.

Optimizer

How to understand the optimizer vividly? Take the query "Zhihu's like answer" as an example, the user told the database through SQL to "find out the answer of ten thousand likes for me", and the optimizer transformed the user's demand into the strategy and method of "how to find the answer of ten thousand likes", that is, the query plan.

There are thousands of different execution plans for the same SQL, and there is a big difference in performance between good and poor execution plans.

How to choose the best one from thousands of query plans? The query optimizer of the early database usually uses heuristic rules to optimize RBP (Rule Based Optimization), which is not accurate enough to obtain the optimal execution plan, while the cost-based optimization CBO (Cost Based Optimization) can efficiently filter out the best execution plan for most scenarios.

As a result, the high-performance database engines we see often use cost-based optimizers.

Actuator

The executor is one of the most important components of the database kernel. Improving the performance of the actuator will greatly improve the performance of the database, so the major database manufacturers have invested a lot of energy in the research and development of actuator technology.

There are two main ways to improve the performance of actuators, one is vector computing (vectorized execution), the other is code generation (code generation). At present, mainstream database manufacturers will use one of the actuator optimization techniques, for example, Snowflake uses vector computing, Impala uses code generation, Spark uses both, and OushuDB uses vector computing plus SIMD optimization technology. However, some traditional databases have not implemented any of these performance technologies.

If you are smart, you may ask, which technical route is better? On this question, many studies and papers have given the answer: the two technologies have different emphasis, but can improve the performance, different statements will also have varying degrees of performance improvement, vector computing is more suitable for parallel processing of data SIMD. Therefore, if you want to further improve the performance of the database engine on the basis of parallel computing, you can make full use of CPU hardware instructions (such as SIMD) combined with parallel processing data.

SIMD

SIMD (single instruction multi-data), that is, single instruction multiple data stream, executes the same instruction at the same time in a synchronous manner. Compared with single instruction single data stream (SISD), single instruction multiple data stream obtains all operands at once, which speeds up the operation, especially the data-intensive operation.

As shown in the figure above, only one pair of data can be multiplied at a time using scalar operations, while four pairs of data can be multiplied at the same time with the SIMD multiplication instruction. As a kind of vector architecture, SIMD uses a vector instruction to start a set of data operations, in which data loading, storage and data calculation are carried out in the form of pipeline.

Through testing on the international standard dataset TPCH, we found that OushuDB 4.x is about an order of magnitude faster than the latest version of SparkSQL 3.x.

Based on the above analysis, from the point of view of improving database performance, we can adopt the technical path of cost-based optimization + vector computing + SIMD as the preferred method to improve database performance.

This is the end of the analysis of the reasons for the slow operation of the database. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.