In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
basic concepts
It's too boring to introduce the concept directly. Let's take an example similar to the background passage.
Business scenario: The customer completes the order, the courier accepts and updates the waybill status, and the customer can check the waybill status at any time. A ticket express may have multiple sub-pieces. At the same time, we need to mark the status of each waybill, and the interpretation and meaning of the waybill status are stored in the waybill status dictionary table.
Therefore, we need to establish the following table:
We now vertically split the database into waybill database (single table 2000tps, 6000W data), courier database (single table 1500tps, 100W data), customer database (single table 1500tps, 1000W data records) according to business; assuming that each MySQL database single table cannot exceed 2000W data, single table cannot exceed 1000tps. Then the waybill library needs to be divided into 3 pieces, and the customer library needs to be divided into 2 pieces, which are uniformly managed by MyCat. As shown below:
1. logical bank
MyCat, as a middleware, should be application-insensitive.
The application accesses MyCat. According to the above, the application perceives that there is only one database (or multiple databases, just like accessing MySQL instances) in the background (assuming there is only one database, this database is called SF, which has waybill related tables, courier related tables and customer related tables); here MyCat's database is the logic library. Visit MyCat and the results should look like this
Although the tables may exist in different libraries, ostensibly they belong to the same logical library in the same MyCat instance. So, although the architecture diagram above shows that they are not in the same database, in MyCat they are in the same logical repository.
2. logic table
The tables under the logical library are logical tables. Logical tables can be fragmented or not.
The orders table is obviously a table to be sharded, but in MyCat's view, although they are distributed on different sharded nodes (distributed on different MySQL databases), they are still considered to be the same logical table, in the same logical library.
2.1 Fragmentation table
Fragmented tables refer to those tables with large data that need to be split into multiple databases, so that each fragment has a part of data, and all fragments constitute complete data. Each fragmentation table has its own fragmentation rules, according to which fragmentation is determined.
The configuration is as follows:
This means that the mod-long rule is used to divide the waybill table orders into test1 and test2 databases (fragmentation nodes) according to the primary key id.
Request case 1:
select * from orders where id = 1;
For queries to a shard table, if the query is by shard column, the request will only be sent to one shard.
Request Case 2:
select * from orders where id
< 100 and id >0;
For a query of a shard table, if the query is based on the shard column range (if the field type supports the range), the request calculates two boundary values according to the shard rules, then sends the request to the shard corresponding to the result, and merges the results of each shard.
Request Case 3:
select * from orders where initialpoint = 'Beijing';
As in the case of queries based on non-shard columns, the request is sent to all shards and the results of each shard are merged.
Request case 4:
The request is an SQL statement of update type, which is handled in the same way as the three cases of query.
2.2 non-slice table
Not all tables in a database are very large, some tables can be cut without fragmentation, non-fragmentation is relative to fragmentation tables, that is, those tables that do not need data fragmentation.
For example:
This means that the courier table does not need to be fragmented and is stored on the fragmentation node test3.
The operations for non-sharded tables are the same as for regular databases, since distributed databases are not involved.
2.3 ER table
Relational databases are based on the Entity-Relationship Model, which describes things and relationships in the real world, from which the ER table in Mycat comes. According to this idea, a data fragmentation strategy based on E-R relationship is proposed. The records of child tables and the records of parent tables associated with them are stored in the same data fragment, that is, child tables depend on parent tables, and table groups ensure that data Join will not operate across databases.
Table Group is a good idea to solve cross-fragmentation data join, and it is also an important rule for data fragmentation planning.
As follows:
The waybill table is a fragment table, and the waybill table and the waybill sub-child table are one-to-many relations, which can be made into a parent-child table.
SQL requests for child tables are processed according to the rules of the previous fragmented tables after they are mapped to the corresponding fields of the parent table through joinKey.
2.4 global table
In a real business system, there are often a large number of tables similar to dictionary tables, which basically change very little. Dictionary tables have the following characteristics:
Changes are not frequent. The overall change of data volume is not large. The data scale is not large. There are rarely more than hundreds of thousands of records.
For this kind of table, in the case of fragmentation, when the business table is fragmented due to its size, the association between the business table and these affiliated dictionary tables becomes a thorny problem. Therefore, Mycat solves the join of this kind of table through data redundancy, that is, all fragments have a copy of data, and all dictionary tables or some tables conforming to the characteristics of dictionary tables are defined as global tables.
Data redundancy is a good idea to solve cross-fragmentation data join, and it is also another important rule of data fragmentation planning.
For example:
The waybill status information dictionary table is only a comment for each waybill status, which is a typical dictionary table, and has a many-to-one relationship with the fragment table orders.
For global tables, all query requests are sent to only one of the global table shards for execution, and all update requests are executed on each global table shard.
2.5 How to decide?
From the previous description, we can infer that for modifications and queries to the shard table, if the lookup is by shard field, the request will be forwarded to a shard. If not according to the fragment field, the request will be sent to each fragment to search. Therefore, the choice of fragmentation field is more important! For global tables, this is equivalent to having an identical copy on each shard, where the modification request is executed on each shard, but the query falls on only one shard. Therefore, the global table will not change as much as possible, and it is necessary to do Join operation with the fragment table. If it changes frequently or does not need to do join, it is best to make it a non-fragment table.
First throw out the concept of these several logic tables, we first have an impression. Now let's discuss how to determine the type of table in practical terms.
First, the orders table can be defined as a fragmented table. The orders_cargo table is a child-child table, and an order may have multiple child-child parts, so it is best to use orders_cargo as a child table of orders.
In this case, orders and orders_cargo are fragmented according to the corresponding key (that is, the child table is fragmented according to which key corresponds to which key of the main table). For example, orders_cargo is the order_id corresponding to the id of orders. This is the case with order_id and orders id (join result is correct).
One-to-n scene architecture
Like this simple membership relationship a pair of n tables, we deal with it very simply, generally they need to do join the key set as a parent-child table.
But the following scenario is very troublesome, such as courier and waybill is a many-to-many relationship, customers for waybill is also a many-to-many relationship (a receiver, a sender). We have both couriers needing to view all their waybills scenarios and customers needing to view all their waybills scenarios. On the other hand, we also have a scene where we check the courier and customer involved in a waybill.
The customer table and courier table are not used as public tables because they do not do join operations with the fragment table orders.
First, relational tables can be treated as common tables, so that there are no restrictions involved in join operations with sharded tables, since common tables are complete at each shard. However, relational tables are updated so frequently that we may not be able to tolerate updating each shard every time we update the relational tables (performance, reliability considerations).
What about the sub-table as waybill? Then it is relatively simple to find the courier and customer involved in a waybill. Because according to the waybill number (i.e. fragment id) query, MyCat will locate the specific fragment according to the fragmentation rules, rather than searching by fragment.
Write picture description here
However, correspondingly, it is slower for couriers to view all their waybills, because the request is sent to each fragment.
Write picture description here
The child table as a courier has the same flaw.
Another way is to use this relationship table as a child table of both waybills and couriers. But now, you need to apply yourself to do double writing. MyCat hasn't done that yet. Of course, I think it's an area where we can improve ourselves as needed. MyCat middleware performs double-writing according to the relationship redundancy table relationship
In addition, what kind of method to take, are from the business to consider. Here, if the business pressure from the courier to find and from the waybill to find is about the same, then it is best to use the relationship table as a child table of both the waybill and the customer. Then the courier and waybill services are independent, each service application maintains its own relationship table, and at the same time maintains consistency between relationship tables through message queues. This is also a method.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.