How to realize data Segmentation in MySQL 07/01 Update SLTechnology News&Howtos

How to realize data Segmentation in MySQL

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

How to achieve data segmentation in MySQL, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

What is MySQL data segmentation

The word "Shard" means "fragment" in English, and as a database-related technical term, it seems to have first appeared in massively multiplayer online role-playing games. "Sharding" let's call it "fragmentation". Sharding is not a new technology, but a relatively simple software concept. As we all know, data table partitioning was not available until after MySQL5, so before that, many potential users of MySQL had concerns about the scalability of MySQL, and the ability to partition has become a key indicator of whether a database is extensible or not (not the only indicator, of course).

Database extensibility is an eternal topic, and promoters of MySQL are often asked how to do things such as how to deal with application data on a single database and need to be partitioned. The answer is: Sharding. Sharding is not an accessory function of a particular database software, but an abstract processing based on specific technical details. It is a solution of horizontal expansion (ScaleOut, or horizontal expansion or outward expansion). Its main purpose is to break through the limitation of IMagano capability of single-node database server and solve the problem of database expansibility.

The data is distributed horizontally to different DB or table through a series of segmentation rules, and the specific DB or table that needs to be queried is found through the corresponding DB routing or table routing rules for Query operation. The term "sharding" here usually refers to "horizontal segmentation", which is also the focus of this article. What kind of segmentation and routing will there be? At this point, readers will inevitably have questions. Let's give a simple example: let's explain the log in a Blog application. For example, the article table has the following fields: article_id (int), title (varchar (128)), content (varchar (1024)), user_id (int).

In the face of such a watch, how do we divide it? How do you distribute such data into tables in different databases? In fact, by analyzing the application of blog, it is not difficult to come to the conclusion that there are two kinds of users in the application of blog: the viewer and the owner of blog. When browsing a blog, visitors actually browse under a specific user's blog, while the owner of blog manages his own blog and also operates under a specific user's blog (in his own space). The so-called specific user is represented by the field of the database as "user_id". This is the "user_id", which is the basis and rule basis of the sub-library that we need. We can do this by putting all the article information with a user_id of 1: 10, 000 in the article table in DB1, all the article information with user_id of 1000, 1, 000, 000 in the article table in DB2, and so on, all the way up to DBn.

In this way, the article data is naturally divided into various databases to achieve the purpose of data segmentation. The next problem to be solved is how to find a specific database. In fact, the problem is simple and obvious, since we used the distinguishing field user_id when dividing the library, then naturally, the process of database routing is still rare, user _ id. Consider the blog application we have just presented, whether it is accessing other people's blog or managing your own blog. In short, I need to know who the user of this blog is, that is, everyone understands the user_id of this blog. Make use of this user_id, make use of the rules of sub-database, and in turn locate the specific database. For example, user_id is 234. using the rules of this talent, you should locate DB1. If user_id is 12343, you should locate it. Using the rules of the talent, you should locate to DB2. And so on, using the rules of sub-library, reverse routing to a specific DB, a process we call "DB routing".

Of course, taking into account the data segmentation of the DB design must be unconventional, unorthodox DB design. So what kind of DB design is orthodox DB design?

It's basically what we use as a rule. Usually, we will consciously design our database according to the paradigm, and the high load may consider using the relevant Replication mechanism to improve the throughput and performance of reading and writing, which may already meet many needs, but the shortcomings of this mechanism are still obvious (mentioned below). The above mentioned "self-conscious design according to the paradigm". Considering that the DB design of data segmentation will violate this usual rule and constraint, in order to split, we have to appear redundant fields in the tables of the database to distinguish fields or tag fields called sub-libraries, such as fields such as user_id in the example of article above. (of course, the example just now does not reflect the redundancy of user_id very well, because even if the user_id field is not divided into libraries. It is also going to appear, so we have picked up a bargain. Of course, the emergence of redundant fields is not only in the sub-library scenario, in many large-scale applications, redundancy is also necessary, this involves the design of efficient DB, this article will not repeat.

Why MySQL data segmentation

The above gives a summary description and explanation of what data segmentation is, and readers may wonder why data segmentation is needed. Is a mature and stable database like Oracle enough to support the storage and query of massive data? Why do you need data slicing? Indeed, Oracle's DB is mature and stable, but the high cost of use and high-end hardware support are not affordable for every company. Just imagine the cost of using tens of millions of dollars a year and the tens of millions of yuan of minicomputers as hardware support. can ordinary companies afford this? Even if we can afford it, if there is a better solution, a cheaper one with better scale-out performance, why not choose it?

However, things are always unsatisfactory. Usually, we will consciously design our database according to the paradigm, and the high load may consider using the relevant Replication mechanism to improve the throughput and performance of reading and writing, which may already meet many needs, but the shortcomings of this mechanism are still obvious. First of all, its effectiveness depends on the proportion of read operations. Master often becomes a bottleneck, and write operations need to be queued in sequence to be executed. If the Master is overloaded, the delay of data synchronization in Slaves may be large, and it will greatly consume the computing power of CPU, because write operations still need to be run on every slave machine after they are executed on Master. At this point, Sharding may become a chicken rib.

Replication can't handle it, so why does Sharding work? The reason is very simple, because it can be well expanded. We all know that every machine, no matter how well configured, has its own physical upper limit, so when our application has reached or far exceeded a certain upper limit of a single machine, we can only seek the help of other machines or continue to upgrade our hardware, but the common solution is to scale out, sharing the pressure by adding more machines. We also have to consider that as our business logic continues to grow, can our machines meet the demand through linear growth? Sharding can easily distribute computing, storage, and Imax O to multiple machines in parallel, so that it can make full use of the processing power of multiple machines, while avoiding a single point of failure, providing system availability and good error isolation.

Considering the above factors, data segmentation is necessary, and the data segmentation we discuss here also takes MySql as the background. Based on cost considerations, many companies also choose MySql of Free and Open. Developers who know something about MySql may know that data table partitioning was not available until MySQL5, so before that, many potential users of MySQL had concerns about the scalability of MySQL, and the ability to partition has become a key indicator of whether a database is extensible or not (not the only indicator, of course). Database extensibility is an eternal topic, and promoters of MySQL are often asked how to do things such as dealing with application data on a single database because of the shortage of application data and the need for partitioning. The answer is Sharding, which is what we call data segmentation.

This is the answer to the question about how to achieve data segmentation in MySQL. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.