How to realize distributed Storage of massive data in MySQL 07/01 Update SLTechnology News&Howtos

How to realize distributed Storage of massive data in MySQL

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to achieve distributed storage of massive data in MySQL. The content of the article is of high quality, so the editor shares it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

1. The concept and advantages of distributed applications

Distributed database refers to the use of high-speed network to connect physically dispersed data storage units to form a logically unified database. The basic idea of distributed database is to store the data in the original centralized database to multiple data storage nodes connected through the network in order to obtain larger storage capacity and higher concurrent access. In recent years, with the growth of the amount of data, distributed database technology has also been rapid development, the traditional relational database began to change from centralized model to distributed storage, from centralized computing to distributed computing.

The main purpose of distributed database system is disaster recovery and remote data backup, and through the nearest access principle, users can access the nearest database node, so as to achieve remote load balancing. At the same time, through the synchronization of data transmission between databases, the consistency of data can be maintained in distribution. this process completes the data backup, and the data stored in different places does not affect the access of the service in the event of a single point of failure. you just need to switch the access traffic to remote mirrors.

The advantages of distributed database applications are as follows:

(1) it is suitable for distributed data management and can effectively improve the performance of the system.

(2) the system has good economy and flexibility.

(3) the reliability and availability of the system are strong.

2. The main technologies of mysql distributed application.

(1) mysql data cutting

Data cutting (sharding) refers to dispersing the data stored in the same database to multiple databases (hosts) through some specific conditions, so as to achieve the effect of dispersing the load of a single device. Data sharding can also improve the overall availability of the system, because after a single crash, only some part of the overall data is different, not all data.

According to the type of segmentation rules, it can be divided into two segmentation modes. One is to split the data into different databases (hosts) according to different tables (or schema), and the other is to split the data in the same table to multiple databases (hosts) according to certain conditions according to the logical relationship of the data in the table, which is called horizontal (horizontal) segmentation of data. The characteristic of vertical segmentation is that the rules are simple and the implementation is more convenient, which is especially suitable for systems with low coupling, little interaction and clear business logic. In this system, it is easy to split the tables used by different business modules into different databases. Splitting according to different tables will have less impact on the application, and the splitting rules will be relatively simple and clear. Horizontal segmentation is a little more complex than vertical segmentation. Because you want to split different data in the same table into different databases, the split rules themselves are more complex for the application, and the later data maintenance is also more complex.

(2) Why split the data?

1) mature and stable DB like Oracle can support massive data storage and query, but the price is not affordable to everyone.

2) when the load is high, there is a bottleneck in Master-Slaver mode. In the prior art, the relevant Replication mechanism is used to achieve the relevant read and write throughput performance when the load is high. There are two bottlenecks in this mechanism: one is that the effectiveness depends on the proportion of read operations, where Master often becomes a bottleneck. A sequential queue is needed to execute write operations, which can not withstand the overload of Master, and the data synchronization delay of Slaver is also very large. At the same time, it also consumes the computing power of CPU, so you still need to run once on each slave machine after the write operation is performed on Master. On the other hand, Sharding can easily distribute computing, storage and Imax O to multiple machines in parallel, which can make full use of the processing power of multiple machines, avoid single point of failure, provide system availability, and provide good error isolation.

3) using free MySQL and cheap Server or even PC as clusters to achieve the effect of minicomputer + large-scale commercial DB, reduce a lot of capital investment and reduce operating costs, why not?

Data table partitioning is supported in versions of Mysql5.1 and above. After the data in the database is stored in different database hosts after vertical or horizontal segmentation, the problem faced by the application system is how to better integrate these data sources.

1) configure and manage one (or more) data sources you need in each application module, access each database directly, and complete the data integration within the module.

2) all data sources are managed uniformly through the intermediate agent layer, and the back-end database cluster is transparent to the front-end applications.

The second scheme, although the cost may be higher in the short term, is very helpful to the scalability of the whole system. For the second scheme, the ideas can be as follows:

1) using mysql proxy to realize data segmentation and integration.

Mysql proxy establishes a connection pool between the client request and the mysql server. All client requests are sent to the mysql proxy, and the mysql proxy makes the corresponding analysis to determine whether it is a read operation or a write operation, and then send it to the corresponding mysql server. For multi-node slave clusters, the effect of load balancing can also be achieved.

2) using amoeba to realize data segmentation and integration.

Amoeba is an open source framework based on java and focuses on solving distributed database data source integration proxy programs. Amoeba already has query routing, query filtering, read-write separation, load balancing and HA mechanism and other related content. Amoeba mainly solves the following problems:

Integration of complex data sources after ① data segmentation

② provides data segmentation rules and reduces the impact of data segmentation rules on the database.

③ reduces the number of connections between database and client

④ read-write separate routing

3) using HiveDB to realize data segmentation and integration.

3. Read-write separation of mysql

Read-write separation is the use of database replication technology to distribute read and write on different processing nodes, so as to improve availability and scalability. The master database provides write operations, read operations from the database, and in many systems, more read operations. When the master database writes, the data should be synchronized to the slave database in order to effectively ensure the integrity of the database. Mysql also has its own synchronous data technology. Mysql replicates data through binary logs. After the master database is synchronized to the slave database, the slave database is generally composed of multiple databases, so as to reduce the pressure. Read operations should be assigned to different servers according to the pressure of the server, rather than simply randomly assigned. Mysql provides mysql proxy for read-write separation.

At present, the common separation of mysql reading and writing can be divided into the following two types.

① is based on internal implementation of program code.

In the code according to select, insert route classification, this kind of method is also the most widely used in the production environment.

② is implemented based on intermediate proxy layer.

The agent is located between the client and the server, and the proxy server receives the client request and forwards it to the back-end database through judgment. The following figure is the structure diagram of ebay read-write separation, which replicates data to other data nodes in near real time through share plex, then checks the database status through specific modules, and carries out load balancing and read-write separation, which greatly improves the availability of the system.

4. Mysql cluster

Mysql cluster technology provides redundancy for mysql data in distributed systems, enhances security, so that a single mysql server failure will not have a huge negative effect on the system, and the stability of the system is guaranteed.

Mysql cluster uses the shared-nothing (no sharing) architecture. Mysql custer is mainly implemented by NDB storage engine. NDB storage engine is a memory storage engine, which requires that all data must be loaded into memory. The data is automatically distributed among different storage nodes in the cluster, and each storage node holds only one fragment of the complete data. At the same time, users can set up the same data to be saved on multiple different storage nodes to ensure that a single point of failure will not cause data loss.

Mysql cluster requires a set of computers, each of which may have a different role. Mysql cluster can be divided into three categories according to node type: management node (managing other nodes), data node (storing data in cluster, there can be multiple), and mysql node (storing table structure, which can have multiple nodes). A computer in Cluster can be a node or a collection of two or three nodes. These three nodes are only logically divided, so they do not necessarily have an one-to-one correspondence with the physical computer. Multiple nodes can be distributed in different geographical locations, so it is also a scheme to realize distributed database.

The emergence of Mysql cluster well realizes the load balancing of the database, reduces the pressure on the data center node and the processing of big data. When the database center node fails, the cluster will use certain strategies to switch to other backup nodes, which effectively shields the failure problem, and the failure of a single node will not affect the service provided by the whole database. And by using the database cluster architecture, the data synchronization and redundancy is carried out between the master and slave databases all the time, and the database is multi-point and distributed, which completes the backup of database data well and avoids data loss.

On how to achieve massive data distributed storage in MySQL to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.