How to solve the problem of distributed database data storage for hundreds of millions of users 07/01 Update SLTechnology News&Howtos

How to solve the problem of distributed database data storage for hundreds of millions of users

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces how to solve the problem of distributed database data storage for 100 million users. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

I. MySQL replication

Master-slave replication of 1.MySQL

Master-slave replication of MySQL is to copy the data from the MySQL master database to the slave database.

The main purpose is to achieve the separation of database read and write, write operation to access the master database, read operation to access the slave database, so that the database has a stronger access load capacity to support more user access.

Its main replication principle is that when the application client sends an update command to the database, the database will synchronously record the update command into Binlog, then another thread will read the log from Binlog, and then copy it to the server by remote communication. After getting the update log from the server, add it to its own Relay log. Another SQL execution thread then reads the new log from the Relay log and executes it again in the local database.

In this way, when the client application executes a update command, the command is executed synchronously on the master database and the slave database, thus realizing the replication from the master database to the slave database, so that the slave database and the master database keep the same data.

One-master and multi-slave replication of 2.MySQL

The master-slave replication of MySQL is a kind of data synchronization mechanism, which can replicate not only the data from one master database to one slave database, but also the data from one master database to multiple slave databases, which is the so-called one-master and multi-slave replication of MySQL.

After multiple slave databases are associated with the master database, the Binlog logs on the master database are replicated to multiple slave databases synchronously. By executing the log, the data in each slave database is consistent with the data on the master database. The data update operation represents all database update operations. Except for query read operations such as SELECT, other DML write operations such as INSERT, DELETE, UPDATE, and DDL operations such as CREATE TABLE, DROPT ABLE, ALTER TABLE, etc., can also be replicated to the slave database synchronously.

3. The advantages of one master and multi-slave replication

One-master-multi-slave replication has four advantages, namely, load sharing, dedicated aircraft, easy cold backup and high availability.

a. Share the load

Distribute read-only operations across multiple slave databases, thus spreading the load across multiple servers.

b. Special plane for special use

You can use different slave servers for different types of queries.

c. It is convenient for cold standby

Even if the database is replicated with one master and multiple slaves, in some extreme cases. It may also result in the loss of data servers throughout the data center. So generally speaking, many companies will do cold backup for data, but a difficulty in cold backup is that if the database is writing, the cold backup data may be incomplete and the data file may be corrupted. Zero downtime backup can be achieved by using one master and multiple slaves. Only need to close the data replication process, the file is closed, and then copy the data file, and then reopen the data replication after the copy is completed.

d. High availability

If a server goes down, there will be no problem as long as you don't send a request to the server. When the server is restored, resend the request to the server. Therefore, in the case of one master and multiple slaves, a slave server is down and unavailable, and the impact on the whole system is very small.

Master replication of 4.MySQL

However, one master and multiple slaves can only realize these advantages on the slave server. When the master database is down and unavailable, the data still cannot be written, because the data cannot be written to the slave server, and the slave server is read-only.

To solve the availability problem of the primary server, we can use MySQL's primary primary replication scheme. The so-called master master replication scheme means that both servers act as primary servers, and writes received on either server will be replicated to the other server.

As the master copy schematic above, when the client program updates the data of master server A, master server A will write the update operation to the Binlog log. Then Binlog synchronizes the data log to the primary server B, writes it to the Relay log of the primary server, then holds the Relay log, obtains the update log in Relay log, and performs a SQL operation to write to the local database of database server B. The updates on server B are also replicated to the Relay log of server A through Binlog, and then the data is updated to server A through Relay log.

In this way, any server An or B receives the write operation of the data will be synchronously updated to another server, realizing the database master master replication. Master master replication can improve the write availability of the system and achieve high availability of write operations.

Master master failure recovery of 5.MySQL

How to deal with the failure of the database server when using the MySQL server to implement the primary master replication?

Normally, the user writes to primary server A, and then the data is copied from A to primary server B. When master server A fails, the write operation is sent to master server B, and the data is copied from server B to server A.

The maintenance process of master failure is as follows:

At the beginning, all the master servers can be used normally. When the master server A fails, the application detects the failure of the master server A, and it may take a few seconds or minutes to detect the failure. Then the application needs to fail over and send the write operation to the backup master server B. Send the read operation to the slave server corresponding to the B server.

When the failure ends after a period of time, the A server needs to rebuild the data lost during the failure, that is, to synchronize the data from the server from the B server. The system will not return to normal until the synchronization is complete. At this time, server B is the main access server for the user, and server An is used as the backup server.

Considerations for 5.MySQL replication

Note the following when using MySQL for primary master replication:

a. Do not write data to both databases at the same time, as this situation can lead to data conflicts.

b. Replication only increases the read concurrent processing ability of the data, but does not increase the write concurrent ability and the system storage capacity.

c. Updating the structure of the data table can cause significant synchronization delays.

Operations that need to update the table structure, do not write to the Binlog, and close the Binlog that updates the table structure. If you want to update the table structure, the operation and maintenance engineer DBA should update the table structure manually on all master and slave databases.

2. data slicing

Data replication can only improve the ability of data reading and concurrent operation, but can not improve the ability of concurrent data writing and the whole storage capacity of data, that is, it can not increase the total number of database storage records. If the write operation of our database also has a large number of concurrent requests to meet, or if our data table is so large that a single server or even a table cannot be stored. The solution is data slicing.

1. Introduction to data slicing

a. Main goal: divide a data table into smaller slices, store different slices on different servers, and use multiple servers to store a data table in a sliced way. avoid the storage and access pressure caused by one server record storage processing the whole data table.

b. Main features: database servers are independent of each other, do not share any information, even if there are some server failures, it does not affect the availability of the whole system. The second feature is to locate the shard through the shard key, that is to say, which server a shard is stored on and which server to look up, which is calculated by the slicing key routing partition algorithm. In the SQL statement, as long as the sharding key is included, you can access a specific server without having to connect to all the servers and communicate with other servers.

. The main principle: the data is sliced in some way, usually using the slicing key routing algorithm just mentioned. Through the slicing key, it is calculated according to a certain routing algorithm, so that each server only stores part of the data.

two。 Hard coding to realize data slicing

As shown in the figure example, data slicing is achieved by hard-coding the application. Suppose our database splits the data table according to the user ID. The logic of slicing is that the data with odd user ID is stored in server 2, and the data with even user ID is stored in server 1. Then, when the application is coding, it can hash directly through the user ID, usually the remainder. If the remainder is odd, connect to server 2, and if the remainder is even, connect to server 1, so that a user table is fragmented on two servers.

The main disadvantage of this hard coding is that the database sharding logic is realized by the application itself, and the application needs to couple the database sharding logic, which is not conducive to the maintenance and expansion of the application. A simple solution is to store the mapping relationship on the outside.

3. External storage of mapping table

When the application connects to the database for SQL operation, it looks up the external data storage to query which server it should connect to, and then connects to the corresponding server to perform the corresponding operation according to the number of the returned server. In this example, the user ID=33 check is 2, and the user ID=94 check is also 2. They connect to the corresponding server according to the number of the user server found, and write the data to the corresponding server shard.

4. Challenge and solution of data slicing

Database fragmentation faces the challenge as shown in the figure:

Now there are some special distributed database middleware to solve these problems, the more famous one is Mycat. Mycat is a special distributed database middleware, the application connects to Mycat like the database, and the operation of data slicing is completely left to Mycat to complete.

In the following example, there are three sharding database servers, database servers dn1, dn2, and dn3, whose sharding rules are based on the prov field. So, when we execute a query operation "select * from orders where prov='wuhan'", Mycat will route the SQL operation to the server node dn1 according to the sharding rule. After dn1 performs a data query operation to return the result, Mycat returns it to the application. By using distributed database middleware such as Mycat, applications can use sharded databases transparently and imperceptibly. At the same time, Mycat also supports federated join queries and database transactions of fragmented databases to some extent.

5. Expansion and expansion of sharded database

At first, the amount of data is not too much, two database servers are enough. But as the data grows, it may be necessary to add a third, a fourth, a fifth or more servers. In the process of adding servers, the sharding rules need to be changed. After the sharding rules are changed, the data previously written to the original database may have to access the new server according to the new sharding rules, so data migration is also needed.

Whether it is to change the rules of the routing algorithm or to migrate data, it is a troublesome and complicated thing. Therefore, in practice, the common practice is to use logical database for data slicing, that is to say, although only two servers are needed to complete data slicing storage at the beginning, it is still logically divided into multiple logical databases. The specific method of operation, this article does not need to elaborate at great length.

III. Database deployment plan

1. Single service and single database

This is the simplest deployment scenario. There may be multiple application servers, but they perform a single function. Multiple servers that complete a single function provide services through load balancing. They are only connected to a single database server, which is an architectural method when the number of early users of the application system is relatively low.

two。 Master-slave replication to achieve scalability

If you put forward higher requirements for the availability of the system and the access performance of the database, you can initially scale through the master-slave replication of the database. Through master-slave replication, one master and multiple slaves are realized. The write operation of the application server connects to the master database, and the read operation is read from the slave server.

3. Two Web services and two databases

As the business becomes more complex, business sub-libraries of data can be carried out in order to provide higher database processing capacity. The business sub-database of data is a kind of logic and a division based on function, which stores data tables for different purposes in different physical databases.

In this example, there are product category service and user service, two application server clusters, and the corresponding database is also split into two, one is called category database, the other is called user database. Each database still uses master-slave replication. Through the way of business sub-database, it not only provides more database storage in the same system, but also provides more powerful data access ability. at the same time, the system becomes simpler and the coupling of the system becomes lower.

4. Integrated deployment plan

According to the access characteristics of different data, different solutions are used to deal with it. For example, the category database may be able to meet all the access requirements through master-slave replication. However, if the number of users is particularly large, master-slave replication or master-master replication still can not meet the access pressure of data storage and write operations, then the user database can be stored in pieces. At the same time, each shard database is also deployed by master-slave replication.

This is the end of how to solve the problem of distributed database data storage for hundreds of millions of users. I hope the above content can be helpful to you and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.