In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "what are the common MySQL interview questions". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let the editor take you to learn "what are the common MySQL interview questions"?
Tell me about the sub-library and sub-table?
With the surge of the number of users and the accumulation of time, there is more and more data in the database, at this time, the database will produce bottlenecks, resource alarm, slow query and other scenarios.
First of all, the number of connections that the stand-alone database can carry, the number of Imax O and the throughput of the network are all limited, so when the concurrency comes up, the database will gradually fail.
In addition, if the amount of data in a single table is too large, the performance of the query will be degraded. Because the more data there is, the higher the B + tree is, and the higher the tree is, the more times you query the Icano, and the worse the performance.
Because of the above reasons, we have no choice but to put on the sub-database table.
Split the data previously existing in a database instance into multiple database instances and deploy them on different servers, which is a sub-library.
Split the data that previously existed in one table into multiple tables, which is a sub-table.
In general:
Sub-table: to solve the problem of slow query due to the amount of data in a single table. Roughly three or 40 million rows of data have to be split, but it depends on the amount of data in each row. Some fields are very small and may support more rows, while some fields that are large may not be able to hold 10 million.
Sub-library: to solve the problem that server resources are limited by a single machine and can not withstand high concurrent access, distribute requests to multiple servers to reduce server pressure.
How do you usually divide the library?
Generally, sub-libraries are divided according to business, such as order base, user base and so on.
Sometimes some special libraries will be subdivided, for example, some activity-related libraries have been split.
Because the concurrency may be high when doing activities, for fear of affecting the existing core business, even if there is a connection, it will be split separately.
What kind of problems do you think the sub-database will bring?
First of all, there is the question of affairs.
We use a relational database, which is largely because it ensures transactional integrity.
After dividing the library, stand-alone transactions are no longer needed, and distributed transactions must be solved, while distributed transactions are basically incomplete (I summarized a wave of distributed transactions in my previous article, and there are distributed transactions searched in the background).
This is a very important point to consider.
Connected table JOIN problem
When we are in a library, we can also use JOIN to join table queries, but after cross-library, we cannot use JOIN.
The solution at this time is to associate in the business code, that is, to find out the data of one table first, then look up another table through the result, and then use the code to associate to get the final result.
This approach is slightly more complex to implement, but it is also acceptable.
There are also some fields that can be appropriately redundant. For example, previous tables store an associated ID, but businesses often require that the corresponding Name or other fields be returned. At this point, these fields can be redundant to the current table to remove the operations that need to be associated.
So how do you separate the meter?
There are actually two kinds of sub-tables:
Vertical subtable
Horizontal subtable
Vertical table, let's take a look at the picture, it's very intuitive:
Vertical subtable is to strip out some uncommonly used large fields.
Like the example above: user names are very common search results, gender and age take up little space, while addresses and profiles take up relatively large space, and we all know that the space of a data page is limited. split some useless data out, and you can store more rows of data on one page.
The more useful data is stored in memory, the number of disk accesses is reduced and the performance is improved.
Horizontal split table is because there is too much data in a table. As mentioned above, the more data there is, the higher the B + tree is, and the performance of access is poor, so it is split horizontally.
In fact, regardless of these, a simple understanding, in a hundred data to find a data fast, or in ten thousand data to find a data fast?
Even if there is an index, there are many thick catalogs, and it is slow to turn through the catalogue.
What's wrong with the sub-meter?
The vertical sub-meter is fine, but it needs to be related, while the horizontal sub-meter is a bit troublesome.
Sorting, count, paging issues
If a user's data is split into multiple tables, the paging of query results can not be found as directly as a single table, as is the case with count operations.
It can only be realized by business code or the data in each table can be summarized, sorted, paged and returned by middleware.
The results of operations like count can actually be cached, and then the count is updated each time the data is added or deleted.
Routing problem
The routing of the sub-table can be divided into:
Hash routing
Range routing
Routing tabl
Hash routing, in fact, is to select a column in the table, and then do the Hash operation, and then take the result of the Hash operation to model the number of sub-tables, so that the data can be evenly distributed to different sub-tables.
It's the same principle as which bucket HashMap chooses.
The advantage is that the data is evenly distributed.
The disadvantage is that it is troublesome to add child tables. If you think about the expansion of HashMap, do you have to move the data? This sub-table is the same, we all know that data migration is a hassle!
Range routing, in fact, is very simple, can be a time, can also be an address, indicating a certain range.
For example, the original User table, I can be divided into User_HZ, User_BJ, User_SH, according to the place name to divide the User.
For example, the log table, I can divide the table into log_202103 and log_202104, and divide the logs by year and month.
The advantage is that it is relatively easy to expand, such as now to a GZ, then add a User_GZ. If it comes to May, build a log_202105.
The disadvantage is that the data may be unevenly distributed, such as a large number of BJ users or a monthly promotion, a large number of logs, and so on.
Routing table, is to set up a special table to record routing information, take a look at the map is very clear.
We can see from the figure that the user data with a UserID of 2 is going to be queried in the user table User_3.
The advantage is flexibility. If you want to migrate the data, migrate directly and then the routing table will be changed.
The disadvantage is that it has to be checked again, and each query needs to access the routing table, but this is usually cached.
Global primary key problem
In the past, when a single table was very simple, the primary key was self-increasing, but now it is a little awkward after the sub-table.
So some means are needed to ensure that the global primary key is unique.
It is still self-increasing, but the self-increasing step length is set. For example, there are now three tables with a step size of 3, and the initial ID values of the three tables are 1, 2, and 3, respectively. So the ID growth for the first table is 1, 4, 7. The second table is 2, 5, 8. The third table is 3, 6, 9, so it won't be repeated.
UUID, the simplest, but discontiguous primary key insertion can lead to severe page splitting and poor performance.
Distributed ID, more famous is the Twitter open source sonwflake snowflake algorithm, the specific will not expand, otherwise it will be another article, simply use redis to increase it.
How to design the Sharding-Key of the routing problem mentioned above?
Our sub-table is split according to a column, and that column is Sharding-Key, which must be taken with it when querying.
For example, the log_202103 mentioned above means that the query condition must be dated so that the correct table can be found.
Therefore, the design has to consider the conditions of the query as Sharding-Key.
Give an example of an order form called Sharding-Key, which is often asked.
When you want to find an order, you will find it through the order number, so you should use the order ID as Sharding-Key.
But if you think about it, when you open the takeout software to find your historical order, you have no order ID, you only have your UserID, so you can only traverse all the sub-tables through UserID, so the efficiency is very low!
So you want to use UserID as Sharding-Key!
But what about the merchants? The merchant must be concerned about how many orders he sold today, so he also has to look for orders, but he only has his own merchant ID, so if he wants to query the order, he can only traverse all the sub-tables through the merchant ID, so the efficiency is very low!
Therefore, Sharding-Key can not meet all the query needs, only curve to save the country.
The general practice is redundant data.
Synchronize the order to another table for the merchant to use, which is used as the Sharding-Key by the merchant ID, or you can synchronize the data to the ES. Generally speaking, the data synchronization here is asynchronous and will not affect the normal process.
At this point, I believe you have a deeper understanding of "what are the common MySQL interview questions"? you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
-Source database prodSQL > select name from vs. tablespace
© 2024 shulou.com SLNews company. All rights reserved.