In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you the reason why the MySQL single table data should not exceed 5 million rows. I hope you will gain something after reading this article. Let's discuss it together.
Once widely spread in China's Internet technology circle such a saying: MySQL single table data volume of more than 2000 million rows, the performance will be significantly degraded. In fact, this rumor is said to have originated from Baidu. The specific situation is something like this. When DBA tested the performance of MySQL that year, it was found that when the amount of a single table was in the order of 2000 rows, the performance of SQL operations declined sharply, so the conclusion came from this. Then it is said that Baidu engineers moved to other companies in the industry and also brought this message, so such a saying spread in the industry.
Later, Alibaba's "Java Development Manual" proposed that the number of rows in a single table exceeds 5 million rows or the capacity of a single table exceeds 2GB. In this regard, there is Ali's golden iron rule support, therefore, many people design big data storage, will take this as the standard, sub-table operation.
So, how much do you think is appropriate? Why not 3 million lines, or 8 million lines, but 5 million lines? Maybe you would say that this may be Ali's best actual combat number, right? So, here comes the question again: how is this figure evaluated? Just a moment, please think for a moment.
In fact, this number has nothing to do with the actual number of records, but has something to do with the configuration of MySQL and the hardware of the machine. Because MySQL loads the index of the table into memory in order to improve performance. When InnoDB buffer size is sufficient, it can be fully loaded into memory, and there will be no problem with the query. However, when a single-table database reaches the upper limit of a certain order of magnitude, memory cannot store its indexes, resulting in disk IO generated by subsequent SQL queries, resulting in performance degradation. Of course, this is also related to the design of the specific table structure, which ultimately leads to memory limitations. Here, increasing the hardware configuration may lead to an immediate performance improvement.
Then, my view on sub-library and sub-table is that it needs to be combined with the actual needs and should not be over-designed. At the beginning of the project, we do not use sub-library and sub-table design, but with the growth of business, when it is impossible to continue to optimize, then consider sub-library and sub-table to improve the performance of the system. In this regard, Alibaba's "Java Development Manual" added: if it is expected that the amount of data after three years can not reach this level at all, please do not divide the database into tables when creating tables. So, back to the initial question, how much do you think is appropriate? My suggestion is that, according to the comprehensive assessment of the situation of one's own machine, if there is no standard in mind, then taking 5 million rows as a unified standard for the time being is a relatively eclectic value.
Let's take a look at some notes on SQL writing, which will be helpful to you.
The compilation of sql needs to be optimized.
Use limit to limit the records of query results
Avoid select * and list the fields you need to find
Use join to replace subqueries
Split large delete or insert statements
You can find the slower SQL by opening the slow query log
Do not do column operations: SELECT id WHERE age + 1 = 10, any column operation will result in a table scan, including database tutorial functions, evaluation expressions, etc. When querying, move the operation to the right of the equal sign as much as possible
Sql statements are as simple as possible: a sql can only operate on one cpu; large statements break down small statements to reduce lock time; a large sql can block the entire library
The efficiency of rewriting OR to IN:OR is n, the efficiency of IN is log (n), and the number of in is recommended to be less than 200.
Without functions and triggers, implemented in the application
Avoid% xxx type query
Use less JOIN
Compare with the same type, such as the ratio of '123' to '123', 123 to 123
Try to avoid using the! = or operator in the WHERE clause, otherwise the engine will give up using the index and do a full table scan
For consecutive values, use BETWEEN instead of IN:SELECT id FROM t WHERE num BETWEEN 1 AND 5
List data do not take the whole table, use LIMIT to page, the number of each page is not too large
After reading this article, I believe you have a certain understanding of "the reason why MySQL single table data should not exceed 5 million rows". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.