Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to circumvent RDS performance issues in terms of design and specification?

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

[this article is from neo.Wang, author of Yunqi Community]

In start-up Internet companies, the only thing developers think about all day is to "do the function." When the company's business volume gradually increased, or even increased several times, the initial procedural problems were exposed one by one.

Among them, the most obvious is the pressure of the database. The databases mentioned below all refer to RDS for MYSQL.

Scene 1: wide table phenomenon

When designing the table structure, the developer refers to the design of the product prototype to a large extent. Usually, the conditions that need to be queried together and the fields displayed together in the product prototype are all put in the same table. On the other hand, the iteration of the products of Internet companies is very fast, and new functions emerge one after another, and the developers have no time to reorganize the database structure, so they blindly add redundant fields to the original table in order to complete the development of the function as soon as possible. This makes some "antique" tables wider and wider, and there are more and more wide tables with more than 70 or 80 fields.

Suggestion

(1) the number of fields in a single InnoDB table is less than 50 is recommended.

(2) large fields, such as text and blob, should be stored separately.

Scenario 2: query phenomenon of the number of rows in the list

In the background or part of the front-end functions, there is often a list query function similar to the MIS system. In general, this function is to find out the number of rows according to the selected query conditions, and then query the data on the first page according to the paging rules. As a result, there are several situations:

(1) the query column indicates that the query is fine, and it is necessary to use multi-table association queries. For convenience, the developer directly takes the "multi-table association" SQL of the query details, and changes the field after the select to count (*), which is directly used as the SQL for querying the number of rows, or even order by is not removed. In terms of business, when querying the number of rows, you only need to query one of the tables.

(2) the design of the code logic is unreasonable, which leads to some developers directly applying the "home page" logic. When querying each subsequent page, the number of lines is re-queried.

Suggestion

(1) add SQL audit mechanism. Substandard SQL is not allowed to go online.

(2) add code review mechanism

Scenario 3: query does not move the index phenomenon

It is very common that some tables start with a small amount of data and then become large tables due to the adjustment of the functional focus of the product. The program, SQL and table structure have not been adjusted accordingly. There are a lot of large table queries, did not leave the index, resulting in slow queries. With the accumulation of slow queries, the whole database is paralyzed, and the phenomenon of "secondary business dragging down the main business" appears.

Suggestion

(1) set a reasonable index according to the query scenario, and the combined index is preferred.

(2) when the combined index and the single-field index exist at the same time, it is recommended to delete the single-field index to avoid the "useless work" of the optimizer.

Scenario 4: there is no isolation between online and offline data

The online data here refers to the data directly facing the majority of users; the offline data is the data used in the background system of the company's internal customer service and operation. Background system due to different job responsibilities, there will be a variety of query requirements, some may be very large, very complex, such as the export of a whole month, a whole quarter of data. It will directly lead to great pressure on the data, which in turn affects the entire database instance and leads to the failure of the online system.

Suggestion

The characteristics of online data are:

A large number of visits; each user only checks their own data and can hit the index; the query condition is simple; the number of entries returned is small; the response time is very high; the timeliness of the data is high.

The characteristics of offline data are:

The number of visits is small; background colleagues will query the data of the whole platform, which is not easy to hit the index; the query conditions are complex; the number of entries returned is large; the requirement for response time is not high; the requirement for the timeliness of the data is not very high.

To sum up, the two types of data are completely different from each other. Separate online data from offline data. When updating, update online data uniformly; when querying, check online and offline. Online data is updated offline through real-time data synchronization such as DTS.

Scenario 5: over-reliance on MYSQL without considering other storage phenomena

Some list query scenarios may involve 10 to 20 query conditions, and the amount of data retrieved is generally large. It is more difficult to use MYSQL at this time, and the index can hardly be overwritten.

Suggestion

In addition to relational databases, we have many different options for data storage, such as search engine classes, NOSQL classes, timing classes, cache classes, and so on. You should choose the most appropriate data storage method according to different query scenarios. It is unwise to try to solve all problems with MYSQL.

Scenario 6: not thinking about the phenomenon from the perspective of the database

For example, subqueries. When developers think about problems from a human point of view, they will appear as follows:

SELECT * FROM table1 WHERE id IN (SELECT id FROM table2)

This seed query. When dealing with the subquery, MYSQL takes each piece of data from the outer layer and scans the inner layer, and the result is that the number of rows of table1 × table2 is scanned several times.

Suggestion

Avoid using subqueries and do table association through indexes instead

Scenario 7: do calculations directly in the database

Some developers will write calculations such as case when, group by + count/sum, etc., in SQL. MYSQL is good at querying and storing data, but not good at computing-although it can be done. Resulting in a lot of slow SQL. MYSQL only optimizes the query, not the calculation.

Suggestion

(1) group by + count/sum can consider pre-calculation.

(2) case when can be carried out on the business side or front end.

(3) make effective use of what each tool is best at.

Scenario 8: using function phenomena on index fields

Some tables are indexed on bigint-type timestamped fields, and the query conditions are accurate to the sky. Some developers will write SQL as:

WHERE from_unixtime (create_timestamp) > = '2018-01-01' AND from_unixtime (create_timestamp)

< '2018-02-01' 这样。在索引字段上使用函数,索引就起不到作用,扫描数据的时候依然是全表扫描,并对每一行数据的create_timestamp做from_unixtime运算。 建议 如: WHERE from_unixtime(create_timestamp) >

= '2018-01-01' AND from_unixtime (create_timestamp)

< '2018-02-01' 这种场景,可以改为: WHERE create_timestamp >

= unix_timestamp ('2018-01-01') AND create_timestamp < unix_timestamp ('2018-02-01')

In this way, it will only be calculated once, and then match the index directly. Avoid full table scanning.

Scenario 9: not making full use of the cache phenomenon

Some scenarios that do not require high real-time data. There will be frequent execution of queries with the same conditions, or even concurrent execution of multiple queries with the same query conditions. At this time, if you query the database every time, it is bound to cause a waste of resources.

Suggestion

Cache this part of the query results in redis. Direct most of the requests to redis.

Scenario 10: excessive amount of data in a single table

Due to the heavy dependence on MYSQL, a lot of data that is more suitable for the existence of NOSQL database is also stored in MYSQL, and the number of rows is very large. Such a table, whether it is query, update, or DDL operation, needs to take a lot of time to do after it is out of service.

Suggestion

(1) A single table should not exceed 10 million rows, and the size should not exceed 5 gigabytes. If it exceeds, you can consider sub-database and sub-table.

(2) according to the scenario, consider using other data storage tools or other business logic to solve the problem of large tables.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report