In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
In the development of the project, our database data is getting larger and larger, followed by too much data in a single table. As a result, the query becomes slow, and the application operation is seriously affected because of the table locking mechanism, which leads to the bottleneck of database performance.
When this happens, we can consider dividing the database into tables, that is, a single database or table is split into multiple databases and tables, and then when users access it, according to certain algorithms and logic, let users access different libraries and different tables, so that the data is dispersed into multiple data tables, reducing the access pressure of a single data table. The performance of database access is improved.
Here are some summaries of the sub-libraries and sub-tables in the project:
Single database and table
Single database and single table is the most common database design. For example, there is a user table in the database db, and all users can look it up in the user table in the db library.
Single database and multiple tables
With the increase of the number of users, the amount of data in the user table will become larger and larger. When the amount of data reaches a certain extent, the query on the user table will gradually slow down, thus affecting the performance of the whole DB. If you use mysql, a more serious problem is that when you need to add a column, mysql locks the table, during which all read and write operations have to wait.
User can be segmented horizontally in some way to produce two user_0000,user_0001 tables with exactly the same table structure, user_0000 + user_0001 +... The data happens to be a complete piece of data.
Multi-database and multi-table
With the increase of the amount of data, there may not be enough storage space for a single DB, and a single database server can no longer support it with the increase of query. At this time, you can make a horizontal distinction between the database.
Sub-database and sub-table rules
When designing a table, you need to determine what rules the table should follow to divide the database and the table. For example, when there is a new user, the program has to determine which table to add this user information to; similarly, when logging in, we have to find the corresponding records in the database through the user's account, all of which need to be done according to a certain rule.
Routin
The process of finding corresponding tables and libraries through sub-library and sub-table rules. For example, the rule of sub-database and sub-table is user_id mod 4, when users sign up for a new account, the account id is 123, we can determine that the account should be saved to the User_0003 table by id mod 4. When user 123 logs in, we make sure to record it in User_0003 through 123 mod 4.
Problems caused by sub-database and sub-table, and matters needing attention
1. The problem of sub-database and sub-table dimension
If the user buys a commodity, the transaction record needs to be saved, and if it is divided into tables according to the latitude of the user, the transaction records of each user are saved in the same table, so it is very convenient to find the purchase situation of a certain user. however, the purchase of a commodity is likely to be distributed in multiple tables, which is more troublesome to find. On the contrary, according to the sub-table of commodity dimension, it is very convenient to find the purchase of this commodity, but it is more troublesome to find the transaction records of the buyer.
So the common solutions are:
a. By scanning the table, this method is basically impossible, and the efficiency is too low.
b. Record two pieces of data, one according to the user's latitude and the other according to the commodity dimension.
c. It is solved by search engine, but if the real-time requirement is very high, it has to be related to real-time search.
two。 The problem of federated query
A federated query is almost impossible because the associated tables may not be in the same database.
3. Avoid cross-library transactions
Avoid modifying tables in db0 while modifying tables in db1 in a transaction, one is that the operation is more complex, and the efficiency will have a certain impact.
4. Try to put the same group of data on the same DB server.
For example, put the merchandise and transaction information of seller an into db0, when the db1 is dead, the things related to seller a can be used normally. That is, to prevent data in a database from relying on data in another database.
One master, more reserves.
In practical applications, in most cases, reading is much greater than writing. Mysql provides a read-write separation mechanism, all write operations must correspond to Master, and read operations can be carried out on Master and Slave machines. The structure of Slave is exactly the same as Master. A Master can have multiple Slave, and even Slave can be hung under Slave. In this way, the QPS of DB cluster can be effectively improved.
All write operations are first operated on Master, and then synchronously updated to Slave, so there is a certain delay from Master synchronization to Slave machines. When the system is very busy, the delay problem will be more serious, and the increase in the number of Slave machines will also aggravate this problem.
In addition, we can see that Master is the bottleneck of the cluster. When there are too many writes, it will seriously affect the stability of Master. If the Master is down, the whole cluster will not work properly.
So
1. When the reading pressure is high, you can consider adding the fractional solution of Slave machines, but when the number of Slave machines reaches a certain number, you have to consider sub-libraries.
two。 When there is a lot of pressure to write, it is necessary to divide the library.
Why do you need sub-database and sub-table when using MySQL?
It can be said that the use of MySQL, as long as a large amount of data, will immediately encounter a problem, to separate databases and tables.
Here is a question: why do you want to divide the library and table? Can't MySQL handle a big watch?
In fact, it is a big table that can be handled. In the projects I have experienced, the physical file size of a single table is more than 80g, and the number of records per table is more than 500 million, and this table belongs to a very core table: friend relationship table.
But this is not the best way to do it. Because faced with file systems such as Ext3 file system, there are many problems in dealing with larger files. This layer can be replaced with the xfs file system. However, there is a problem that can not be solved when the MySQL single table is too large: the operation related to table structure adjustment is basically impossible. Therefore, the use of major items will monitor the application of sub-database and sub-table.
From the point of view of Innodb itself, there are only two locks on the Btree of the data file, the leaf node lock and the child node lock. It is conceivable that when a page split or a new leaf is added, the data can not be written into the table. So sub-library sub-table is a better choice.
So how much is the sub-database and sub-table?
After testing below 10 million records in a single table, the write and read performance is better. In this way, in the buffer, the single table is kept below 8 million records with data fonts, and the single table with character type is kept below 5 million.
If you plan according to 100 libraries and 100 tables, such as user business:
5 million * 100mm 100 = 500 billion = 500 billion record.
With a number in mind, it is relatively easy to plan according to the business.
Real problem
Set the number of website users to 10 million, but the number of active users is only 1%. How to improve the access speed of active users by optimizing the database?
A:
The partition of MySQL can be used to divide the active users into one zone and the inactive users into another zone. The amount of data in the active user area is relatively small, so it can improve the access speed of active users.
It can also be divided horizontally, dividing active users in one table and inactive users in another table, which can improve the access speed of active users.
Summary
The above is the whole content of this article. I hope the content of this article has a certain reference and learning value for everyone's study or work. Thank you for your support. If you want to know more about it, please see the relevant links below.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.