In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
What this article shares with you is about what the strategy of MyCat database and table is. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.
1. Introduction to configuration format
Before we explain the MyCat subdatabase and table strategy, let's first introduce the format of its configuration file. In MyCat, there are two main configuration files: schema.xml and rule.xml. As the name implies, these two configuration files specify the configuration and database subtable policies of the database cluster represented by MyCat, respectively. Typical configurations in schema.xml are as follows:
Select user () select user () select user ()
As you can see, schema.xml specifies the relationship between each database node and the virtual database and tables in MyCat, as well as the partitioning strategy for the current table, such as mod-long here. In rule.xml, the specific sub-table strategy and its algorithm implementation class are specified. The following is a typical configuration of rule.xml:
Id mod-long 3
Combined with the configuration of schema.xml and rule.xml configuration files, we can see that MyCat first specifies the virtual database in the current server through schema.xml, as well as the configuration of the corresponding tables in the database, such as mydb and t_goods here. In fact, when we connect to the MyCat database through the database connection tool, the table definitions we see are obtained through this configuration file. It itself does not obtain these configurations by reading the real database nodes. After specifying the virtual database and the virtual table, in schema.xml, through the table-level configuration, we specify the configuration of the data node associated with the current table, and how the table is divided into databases and tables. The specific implementation class of sub-library and sub-table is configured in rule.xml. In addition, from the above configuration, we can also see that MyCat does not support the creation of tables through the client connection tool, and all its tables must be defined in advance in the configuration file.
two。 Sub-database and sub-table strategy 1. Take the remainder
With regard to the residual strategy, this method has been described in detail above, and the main strategy is to extract the number of database nodes according to the specified fields, so as to insert them into the corresponding database, which will not be discussed here.
two。 Slice according to range
Slicing according to the scope, as the name implies, first divides the scope of the overall data, and then assigns each range interval to the corresponding database node. When the user inserts the data, it determines which range it belongs to according to the value of the specified field, and then inserts the data into the database node corresponding to the range. It is important to note that a default range is configured here, and when the data inserted by the user is no longer within any specified range, the data will be inserted on the default node. The following is the configuration of slicing by range:
Members range-members-count files/company-range-partition.txt 00-100 011-50 million 151-100 million 2101-1000 million 01001-9999 million 110 000-999999 23. Slice according to date
It is a little more complicated to understand according to date slicing. We will directly show a configuration example here:
Order_time sharding-by-date yyyy-MM-dd 2019-01-01 2019-02-02 20
In the above configuration, it is easier to understand sBeginDate and sEndDate, these two parameters specify the total partition time range that all partitions will be divided into; while sPartitionDay specifies the time range occupied by each partition, for example, the sPartitionDay is 20 minutes, sBeginDate is 2019-01-01-01, sEndDate is 2019-02-02, that is, according to the 20-day period, the entire time period will be divided into 2019-01-01 2019-01-21 and 2019-01-21 2019-02-02. With regard to the number of partitions divided here, two points need to be explained here:
Here, after we cut the entire time period, we can only get two partitions, but our database nodes are configured with three. At this point, only the first node and the second node will be used, and the third node will never be used.
If the number of database nodes we provide is smaller than the number of partitions cut, then some partitions will not have the specified database nodes, and an exception will be thrown.
In addition, we need to emphasize that during normal use, if the end time is configured, then one day our time will exceed the end time, but if we configure the end time to be very large, then there will be a problem is that the number of partitions required will be more than the number of nodes in our database, and an exception will be thrown. At this point, MyCat allows us to partition the field for longer than the end time, that is, the value of the inserted field can be a date after 2019-02-02. At this point, the partition where the target date is located is calculated as follows:
Int targetPartition = (endTimeMills-sBeginDateMills) / (partitionDurationMills))% nPartitons
EndTimeMills represents the timestamp to be calculated by the target
SBeginDateMills indicates the configured start timestamp
PartitionDurationMills indicates the timestamp duration of each partition period, for example, 20 * 24 * 60 * 60 * 1000
NPartitons represents the number of partitions currently divided between sBeginDate and sEndDate, which we know to be 2 according to our previous demonstration
The above formula, on the whole, is actually relatively simple. In essence, it divides the difference between the target time and the start time by the length of the partition, so as to calculate the number of partitions between the target time and the start time. Then the number of partitions and the number of partitions currently divided are modeled to get the partition in which it is located. Let's take four pieces of data as an example to explain the target data nodes that they will fall into:
Insert into t_order (`id`, `order_ time`) values (1, '2019-01-05'); # Partition 0 order_ db1insert into t_order (`id`, `order_ time`) values (1,' 2019-01-25'); # Partition 1 values (`id`, `order_ time`) values (1, '2019-02-05'); # Partition 1dint db2insert (`id`, `order_ time`) values (1,' 2019-02-15') # Partition 2 DB1
In the above configuration, we have configured the end time of the partition. In fact, under this partition policy, we may not configure the end time. If we do not configure the end time, it is important to note that the partition in which the target time is located is calculated as follows:
Int targetPartition = (endTimeMills-sBeginDateMills) / (partitionDurationMills)
I believe readers have seen that this is to calculate the number of partitions between the target time and the start time, and then take this value as the target partition, that is, the data will fall on the target database node, at this time, with the continuous growth of time, if the number of database nodes is smaller than the currently calculated number of partitions, then an exception will be thrown.
4. Slice according to month
Slicing according to the month, as the name implies, is to determine the target time in which month, and then assign the data to the corresponding data node of this month. The following is an example of the configuration of sharding by month:
Create_time partbymonth yyyy-MM-dd
Based on the above configuration, we need to explain the following points:
If the start time and end time are not configured, the number of nodes in the database must be greater than or equal to 12, because there are 12 months in a year, and the data will be allocated to the corresponding data nodes according to the month in which they are located.
If only the start time is configured (field name is sBeginDate), there are two points to note at this time:
If the target time is less than the start time, an exception is thrown because the partition value between the target time and the start time is negative
If the target time is greater than the start time, the number of months between the target time and the start time will be calculated, and the number of months will be used as the target data node (it will not be stored after 12). Therefore, if the calculated value is larger than the number of our database nodes, an exception will be thrown
If the start time and the end time (field name sEndDate) are configured, it should be noted that it will not be placed on the corresponding database node according to the month in which the target time is located. Instead, the number of months between the start time and the end time will be calculated first, and then the number of months between the target time and the start time targetPartitions will be calculated, and then the two will be modeled, namely targetPartitions% nPartitions. To get the database node where it is located. In addition, it is important to note that if the target time is less than the start time, then targetPartitions is a negative number, and its partition calculation formula is nPartitions-targetPartitions% nPartitions, that is, the target partition will cycle down from the largest partition value.
5. Slice according to enumerated value
Slicing by enumerated values is more suitable for situations where a field has only a fixed number of values, such as provinces. The database node corresponding to each enumerated value is mapped through the configuration file, so that data of the specified type is assigned to the same database instance. The following is an example of sharding according to enumerated values:
Province sharding-by-province-func files/sharding-by-province.txt 0 01001 "01002" 11003 "21004" 06. Range modelling
The advantage of range slicing is that it not only has the advantage of no migration of fixed range data of range slicing, but also has the advantage of uniform distribution of hot spot data. First of all, let's explain it with an example:
Id rang-mod 0 files/partition-range-mod.txt0-5 # 16-10 # 211-15 01
With regard to range module slicing, the concept needs to be highlighted here:
In the final partition-range-mod.txt file, we can see that each line specifies an unwanted range before the equal sign, which indicates the range in which the value of the target partition field will fall
There is a number after the equal sign. It should be noted that this number does not refer to the database node id, but the number of database nodes that will be occupied by the current scope. For example, the data in the range 0-5 here will be assigned to one database node, while the data in the range 6-10 will be assigned to two database nodes.
The equal sign specifies the number of fragments to be used in the current range, and the distribution of the data in this range in these database nodes is achieved by modularization, that is to say, in a large direction, the overall data is divided into multiple ranges, and then in each range, the data is allocated to different data nodes according to the mode.
This is the origin of the concept of range slicing, the advantage of this slicing method is that in the expansion and data migration, the data in the unrelated range does not need to be moved. For example, if we have so much data in the range of 0-5 that one database instance cannot afford it, we can add a database instance at this time, then change the configuration to 0-5room2, and then export the data from the previous database in that range, and then re-import it to distribute it evenly between the two database nodes. It can be seen that the expansion in this way has no effect on the database instances in the other two ranges. Finally, it is important to emphasize that since the equal sign indicates the number of database instances required, the sum of the numbers after the equal sign must be less than the number of real database instances we provide.
7. Binary mode range slicing
The mode of binary mode slicing is very similar to that of range mode, but it is also different. The slicing mode is mainly to judge which shard it belongs to according to the lower 10-bit value of the target slicing field. Let's start with an example:
Id func1 2,1 256512
With regard to the slicing effect of the above slicing method, there are a total of 2 = 3 shards, and the ranges allocated in each shard are 255256-511,512-1023, respectively. The figure is as follows:
|-- 1024 color | |-256-|-256-|- -512-| |-partition0---- |-partition1---- |-partition2- | 8. Consistent hash fragmentation
The binary mode of consistent hash fragmentation is very similar, but the concept of virtual slots of consistent hash is stronger, and the number of virtual slots of consistent hash shards is configurable. The following is a typical configuration of consistent hash shards:
Id murmur 0 3 160 / Users/zhangxufeng/xufeng.zhang/mycat/bucketMap.txt 9. Partition as specified by the target field prefix
Sharding according to the prefix of the target field is easier to understand, which takes the prefix value of the specified partition field and converts it to a decimal number as the partition value, and if the number exceeds the number of partitions, the current data is placed in the default partition. The following is an example of the configuration of partitioning by string prefix:
Id sharding-by-substring 0 2 3 010. Slicing the module range according to the prefix ASCII code and value
Take the module according to the prefix ASCII code and value, as the name implies, that is, after taking the prefix, convert it to the ASCII code value, then take the module to the base of the module, and finally assign the remainder to the corresponding database node according to the range in the configuration file. The following is an example of the configuration of this partition:
Id sharding-by-prefixpattern 2565 files/partition-pattern.txt0-100mm 0101-20051201-256m2 above is what the MyCat sub-database sub-table strategy is, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.