In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article is about Mysql master-slave replication, read-write separation, sub-table sub-database strategy and practice example analysis, the editor thinks it is very practical, so share it for you to learn, I hope you can get something after reading this article, not to say much, follow the editor to have a look.
I. the specific implementation of the MySQL extension
With the continuous expansion of the scale of business, it is necessary to choose an appropriate scheme to cope with the growth of data scale, in order to cope with the increasing pressure of access and the amount of data.
The expansion of the database mainly includes: business split, master-slave replication, read-write separation, database sub-database and sub-table and so on. This article is mainly about database sub-database and sub-table.
(1) Business split
In the summary of massive data and high concurrency solutions for large-scale website applications, one or two articles also specifically describe why the business should be split.
At the beginning of the business, in order to speed up the launch of applications and rapid iteration, many applications adopt a centralized architecture. With the expansion of the business system, the system becomes more and more complex, more and more difficult to maintain, the development efficiency becomes lower and lower, and the consumption of resources becomes larger and larger. The cost of improving system performance through hardware is also getting higher and higher.
Therefore, in the initial stage of selection, an excellent architecture design is an important guarantee for the expansion of the system in the later stage.
For example: e-commerce platform, including users, goods, evaluation, orders and other modules, the simplest way is to create users, shops, comment, order four tables in a database.
However, with the increase of the scale of the business and the increase of the number of visitors, we have to split the business. Each module uses a separate database for storage, and different businesses access different databases, dividing the original dependence on one database into dependence on four databases, so that four databases are under pressure at the same time, and the throughput of the system is naturally improved.
(2) Master-slave replication
Generally speaking, it is a master to write and read from, and one master to follow.
1. Detailed explanation of synchronous installation and configuration of MySQL5.6 database master-slave (Master/Slave)
2. Summary of the common topology and principle of MySQL master-slave replication and how to improve the efficiency of master-slave replication.
3. Use mysqlreplicate command to quickly build Mysql master-slave replication.
The above three articles, described how to configure the master-slave database, and how to achieve the separation of read and write database, here will not repeat, there is a need for selective click to view.
The picture above is a process diagram of data synchronization between MySQL's Master and Slave on the Internet.
This paper mainly describes the principle of MySQL master-slave replication: the actual data replication is that Slave obtains the Binary log file from Master, and then performs the operation recorded in the log in the local mirror. Because the process of master-slave replication is asynchronous, there may be a delay in the data between Slave and Master, which can only ensure the final consistency of the data.
(3) Database sub-database and sub-table
We know that every machine, no matter how well configured, has its own physical upper limit, so when our application has reached or far exceeded a certain upper limit of a single machine, we can only seek the help of other machines or continue to upgrade our hardware, but the common solution is to share the pressure by adding more machines.
We also have to consider whether our machines can meet the demand through linear growth as our business logic continues to grow. Therefore, the use of database sub-database sub-table, can immediately improve the performance of the system, on why to use the database sub-database sub-table of other reasons here will not repeat, mainly talk about the specific implementation strategy. Please see the following chapter.
Second, the implementation strategy of sub-table
Keywords: user ID, table capacity
For most of the database design and business operations are basically related to the user's ID, so the use of user ID is the most commonly used sub-database routing strategy. The user's ID can be used as an important field throughout the system. Therefore, using the user's ID, we can not only facilitate our query, but also distribute the data evenly to different databases. (of course, you can also divide the table according to categories, etc., and there are many ways to divide the routing strategy of the table.)
Then the above e-commerce platform assumes that the order table order stores the user's order data, and the sql script is as follows (just for demonstration, omitting some details):
CREATE TABLE `order` (`user_ id` bigint (32) primary key auto_increment, `user_ id` bigint (32),...)
When the data is relatively large, when the data is divided into tables, the first step is to determine how many tables the data needs to be evenly distributed, that is, table capacity.
Assuming that there are 100 tables for storage, we first perform a modeling operation on the user ID, and then store and query the corresponding tables according to user_id0. The diagram is as follows:
For example, user_id = 101 so what we do when we get the value can be done through the following sql statement:
Select * from order_1 where user_id= 101
Among them, order_1 is based on the calculation of 1010, which represents the first chapter order table after the sub-table.
Note:
In actual development, if you use MyBatis as the persistence layer, MyBatis already provides a good feature to support database sub-tables. For example, if the above sql is implemented in MyBatis, it should be:
Interface definition:
/ * get user-related order details * @ param tableNum specific table number * @ param userId user ID * @ return order list * / public List getOrder (@ Param ("tableNum") int tableNum,@Param ("userId") int userId)
Xml configuration Mapping File:
Select * from order_$ {tableNum} where user_id = # {userId}
Where ${tableNum} means to add parameters directly to sql, which is a feature supported by MyBatis.
Note:
In addition, in the actual development, our user ID is more likely to be generated through UUID, so that we can first hash the UUID to get the integer value, and then take the module operation. Third, the implementation strategy of sub-database.
Database table division can solve the efficiency problem of data query when there is a large amount of data in a single table, but it can not improve the efficiency of the concurrent operation of the database, because the essence of the table division is still the operation carried out on a database, which is easily limited by the IO performance of the database.
Therefore, how to evenly distribute the problem of database IO performance, it is obvious that the data sub-database operation can well solve the performance problem of a single database.
The implementation of sub-library strategy is very similar to that of sub-table strategy, and the simplest one is that it can be routed by modularization.
In the previous example, the user ID is modeled, so that a specific database is obtained. The same keywords are:
User ID, library capacity
The schematic diagram of the route is as follows:
The capacity of the library in the above figure is 100.
Similarly, if the user ID is UUID, please hash first and then take the module.
IV. Implementation strategy of sub-database and sub-table.
In the above configuration, the database sub-table can solve the query performance problem of single table massive data, and the sub-database can solve the problem of concurrent access pressure of a single database.
Sometimes, we need to consider these two problems at the same time, so we not only need to divide the single table, but also need to split the database, in order to expand the concurrent processing capacity of the system and improve the query performance of the single table at the same time. It is the sub-database and sub-table we use.
The strategy of sublibrary and table is more complex than the previous two. A common routing strategy is as follows:
1. Intermediate variable = user_id% (number of libraries * number of tables per library); 2. Library serial number = rounded (intermediate variable / number of tables per library); 3. Table serial number = intermediate variable% number of tables per library
For example, there are 256databases with 1024 data tables in each database. According to the above routing strategy, the user's user_id=262145 can be obtained:
1. Intermediate variable = 262145% (256-1024) = 1-position 2, library number = rounded (1-bank 1024) = 0-position 3, table number = 1% 1024 = 1
In this case, for user_id=262145, it will be routed to the first table of the 0th database.
The schematic diagram is as follows:
Summary of sub-database and sub-table
There are many choices of sub-database and sub-table strategies, and the above should be a relatively simple one according to the user ID. Other methods such as using number range for partitioning or directly using hash for routing. Those who are interested can find and learn on their own.
As mentioned above, if the user's ID is generated through UUID, we need to perform a separate hash operation, and then perform a module operation. In fact, hash itself is a strategy of sub-database and sub-table. When using hash for routing strategy, we need to know the advantages and disadvantages of hash routing strategy. The advantage is: data is evenly distributed. The disadvantage is that it is troublesome to migrate the data and cannot apportion the data according to the performance of the machine.
The query performance and concurrency ability of the above sub-database and sub-table operations have been improved, but there are also some points to pay attention to, for example, things that originally span tables have become distributed things; because records are divided into different databases and different data tables, it is difficult to query multiple tables, and it is impossible to query the data without specifying routing fields. After the sub-database and sub-table, if we need to further expand the lineup of the system (routing strategy change), it will become very inconvenient and we need to re-migrate the data.
Finally, it should be pointed out that there are many middleware to choose from at present, and the most common one is to use Taobao's middleware Cobar.
GitHub address: https://github.com/alibaba/cobara
The document address is: https://github.com/alibaba/cobar/wiki
The middleware Cobar on Taobao is not specifically introduced in this article, but will be introduced in the later study.
In addition, Spring can also achieve the read-write separation of the database, which will be further studied in the following article.
VI. Summary
In the above, we have learned how to separate read and write from the database and separate tables from the database, so is it possible to achieve a scalable, high-performance, high-concurrency website? Obviously not yet! A large website uses much more than these technologies, it can be said that these are the most basic links, because there are still many specific details that we have not mastered, such as: database cluster control, cluster load balancing, disaster recovery, automatic failover, transaction management and so on. Therefore, there are still a lot of places to learn and study.
In short:
My way ahead is long; I see no ending; yet high and low I'll search with my will unbending.
The road ahead is beautiful and bright, the new journey in 2017, do not let go!
The practice of Mycat to achieve master-slave replication, read-write separation, and sub-library and sub-table. What is Mycat?
A thoroughly open source large database cluster for enterprise application development
Enhanced database that supports transactions, ACID, and can replace MySQL
An enterprise database that can be regarded as a MySQL cluster to replace the expensive Oracle cluster
A new SQL Server that integrates memory cache technology, NoSQL technology and HDFS big data
A New Generation of Enterprise Database products combining traditional Database and New distributed data Warehouse
A novel database middleware product
The above content is from the official website of Mycat. To put it simply, Mycat is a database middleware. For our development, it is like an agent. When we need to use multiple databases and separate databases and tables, we only need to configure the relevant rules in mycat, and the program does not need to make any changes. We just need to link the original data sources to mycat, of course, if there are multiple data sources before. We need to switch the data source to a single data source, which has the advantage that when we have a large amount of data, we need to start dividing the database and tables or do read-write separation, there is no need to modify the code (just change the link address of the data source)
Practice of using Mycat to divide tables and databases
Haha, first of all, this is not an entry to Mycat blog, but the editor feels very entry-level blog! This blog focuses on the knowledge of data fragmentation in Mycat. At the same time, the editor will test and verify it on the local database and show it in both pictures and texts.
Database partition table, I hear very tall, there is always a sense of high above, out of reach, but the editor wants to say that, in fact, as a developer, should always come, you still have to learn, the difference is just a matter of time order.
I. Partition subtable
Partitioning is to store the files and indexes of a data table in different physical files.
The partition types supported by mysql include Range, List, Hash and Key, among which Range is more commonly used:
RANGE partition: assigns multiple rows to a partition based on column values that belong to a given contiguous interval.
LIST partitions: similar to partitioning by RANGE, except that LIST partitions are selected based on column values matching a value in a set of discrete values.
HASH partition: a partition that is selected based on the return value of a user-defined expression that uses the column values of the rows to be inserted into the table. This function can contain any valid expression in MySQL that produces a non-negative integer value.
KEY partitions: similar to partitioning by HASH, except that KEY partitions only support computing one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values.
Sub-table refers to the logical division of a table into multiple logical tables. On the whole, the sub-table is divided into two types: horizontal split and vertical split. For example, a large table storing merchant information is divided into tables according to the range of merchant numbers, and records of different ranges are distributed into different tables.
2. Types of Mycat data fragmentation
Mycat fragmentation is actually similar to sub-table, that is, when the database is too large, especially when it is written too frequently and it is difficult to be supported by a host, then the database will face a bottleneck. We distribute the data stored in the same database instance to multiple database instances (hosts), access multiple devices to improve performance, and improve the integrity of the system while segmenting the data.
Data fragmentation refers to the global division of data into related logical fragments, which can be divided into three types: horizontal segmentation, vertical segmentation and mixed segmentation. The following is mainly about the horizontal and vertical segmentation of Mycat. It is important that the Mycat is distributed, so it is normal for the data slices to be distributed to different physical machines, and to collaborate on network communication.
Horizontal syncopation
It is distributed into multiple node libraries according to some rules of a field, and each node contains part of the data. Horizontal data segmentation can be simply understood as split according to data rows, that is, some rows in the table are divided into one node, and some other rows are divided into other nodes. From a distributed point of view, they are a whole table.
Vertical slicing
A database is composed of many tables, and each table corresponds to a different business. Vertical segmentation means that the tables are classified according to the business and can not be divided into different nodes. The vertical split is simple and clear, the split rules are clear, and the application module is clear, clear and easy to integrate, but it is difficult to expand when the amount of data in a table reaches a certain degree.
Mixed segmentation
It is a combination of horizontal and vertical segmentation.
Third, Mycat vertical segmentation, horizontal segmentation actual combat 1, vertical segmentation
As mentioned above, vertical sharding is mainly split according to specific business, so we can imagine such a scenario, suppose we have a very large e-commerce system, then we need to separate order table, pipeline table, user table, user comment table and so on into different databases to improve throughput. The architecture diagram is roughly as follows:
Because the editor is tested on a single machine, there is only host1, but different tables still correspond to different databases, but all databases belong to the same database instance (host). Later, different hosts only need to add nodes.
The mycat configuration file is as follows:
Server.xml
Root / / corresponds to four logical libraries order,trade,user,comment
Schema.xml
Select user ()
Log in to native mysql and create order,trade,user,comment4 databases:
Create database database1 character set utf8;create database database2 character set utf8;create database database3 character set utf8;create database database4 character set utf8
Execute the startup_nowrap.bat file in the bin directory. If you output the following, mycat has been started successfully. If not, check whether order,trade,user,comment4 databases have been created.
Log in to the Mycat server with the following statement:
Mysql-uroot-proot-P8066-h227.0.0.1
Create a Comment table in the comment database and insert a piece of data
Create a new Comment table in figure 1 above, insert a record in 2, and see which data node the record is inserted into, that is, database4.
2. Horizontal segmentation
Server.xml
Root TESTDB
Schema.xml
Select user ()
Rule.xml
Id rang-long autopartition-long.txt
Autopartition-long.txt under the conf directory
# range start-end, data node index# Kwon 1000pm 10000.0-500M=0500M-1000M=11000M-1500M=2
The above configuration creates a logical library called TESTDB, and specifies the table label that needs to be split, and the table name is travelrecord. The partitioning strategy uses the rang-long algorithm, that is, it is divided according to the range of id data column values. The specific rules are defined in the autopartition-long.txt file, that is, records in the range of 0-500 to 10000 of id are stored in the travelrecord table of db1, and records of id in the range of 500 to 10000 are stored in the travelrecord table of db2 database. Let's insert two pieces of data to verify that it is consistent with the sharding rule.
Create a db1,db2,db3 database
Create database db1 character set utf8;create database db2 character set utf8;create database db3 character set utf8
This is indeed the case, so we have completed the horizontal sharding of the mycat database, this example is just to demonstrate the value of sharding according to the range of id column, mycat also supports a lot of sharding algorithms, such as modular, consistent hash algorithm, slicing algorithm by date, etc. You can read the book "distributed database architecture and enterprise practice-based on Mycat middleware" for in-depth study.
Why do you need to separate reading and writing?
As for why it is necessary to separate reading and writing, it has been introduced in my previous article. I believe those who read this article also know why it is necessary to separate reading and writing. Of course, if you also need to know about it, welcome to check out my previous article SpringBoot Mybatis read-write separation configuration. By the way, you can also learn how to separate reading and writing through code.
MySQL master-slave replication
Master-slave replication is the key to the separation of read and write. No matter how the read-write separation is carried out, the premise is that MySQL has master-slave replication, and the current dual master-slave is fine, but the key point is to ensure that the data of the two databases are consistent (excluding the situation that the data just written to the master database has not been reflected in time). If the data of the two libraries are inconsistent, then does the separation of read and write make any sense? Specific MySQL how to master-slave replication can see my previous article MySQL master-slave replication build, based on log (binlog)
Mycat read-write separation settings configure Mycat users
Mycat users have the same meaning as MySQL users. They mainly configure user names and passwords linked to Mycat, as well as logical libraries that can be used. User information is mainly configured in server.xml, as shown below.
Druidparser rayewang separate
A user named raye is defined, the rayewang in the tag user defines the user's password, and the separate defines the logical library that the user can use.
Configure the Mycat logic library
There are many configurations for Mycat, but because we only use the read-write classification function of Mycat, we only need to configure some basic ones. Of course, this article will only introduce the configurations related to read-write separation. Readers are advised to check the documentation for other configurations, or to understand that the logic library is configured in schema.xml.
First introduce some configuration tags in the Mycat logic library
Schema
Schema tags are used to define logical libraries, and schema has four attributes, dataNode,checkSQLschema,sqlMaxLimit,name.
The dataNode tag attribute is used to bind the logic library to a specific database. If dataNode is configured in version 1.3, the sharding table cannot be configured. 1.4 you can configure the default shard. You only need to configure the table that requires sharding.
Name defines the name of the current logical library, making it easy to define references when defining users in server.xml
CheckSQLschema when this value is set to true, MyCat will change the statement to select from users; if we execute the statement select from separate.users;. That is, remove the characters that represent schema and avoid sending them to the back-end database Executive Times (ERROR 1146 (42S02): Table 'separate.users' doesn't exist). However, even if the value is set to true, if the statement contains a name other than the name specified by schema, for example: select * from db1.users;, then MyCat will not delete the db1 field, and if the library is not defined, it will report an error, so it is best not to use this field in the SQL statement.
SqlMaxLimit when the value is set to a numerical value. For each executed SQL statement, if no limit statement is added, the MyCat will automatically add the corresponding value. For example, if you set the value to 100, the effect of executing select from users; is the same as executing select from users limit 100;. If you set this value, MyCat will display all the information queried by default, resulting in too much output. Therefore, in normal use, it is recommended to add a value to reduce excessive data returns. Of course, the size of the limit is also explicitly specified in the SQL statement, which is not constrained by this attribute. It is important to note that this property will not take effect if the schema you are running is a non-split library. You need to add the limit statement manually.
There is a tag table in the schema tag to define different table shard information, but we only do read-write separation and will not use it, so we will not introduce it here.
DataNode
The dataNodedataNode tag defines the data node in MyCat, which is what we usually call data shredding. A dataNode tag is an independent data shard, and dataNode has three attributes: name,dataHost,database.
Name defines the name of the data node, which needs to be unique and is used for references in table tags and schema tags
DataHost this attribute is used to define which database instance the shard belongs to. The attribute value refers to the name attribute defined on the dataHost tag.
Database this attribute is used to define the specific library on the specific database instance of the shard attribute, because here two latitudes are used to define the shard, namely: instance + concrete library. Because the table and table structure created on each library are the same. So this makes it easy to split the table horizontally.
DataHost
DataHost is a tag that defines a real database connection, which also exists as the lowest tag in the mycat logic library, directly defining the specific database instance, read-write separation configuration and heartbeat statements. DataHost has seven attributes: name,maxCon,minCon,balance,writeType,dbType,dbDriver, two tags heartbeat,writeHost, and a readHost tag in the writeHost tag.
Name uniquely identifies dataHost tags for use by dataNode tags
MaxCon specifies the maximum connection for each read-write instance connection pool. That is, the writeHost and readHost tags nested within the tag will use the value of this attribute to instantiate the maximum number of connections in the connection pool.
MinCon specifies the minimum connection for each read-write instance connection pool and initializes the size of the connection pool
Balance read load balancer type
Balance= "0", the read-write separation mechanism is not enabled, and all read operations are sent to the currently available writeHost.
Balance= "1", all readHost and stand by writeHost participate in the load balancing of select statements. To put it simply, when the double master and double slave mode (M1-> S1J M2-> S2, and M1 and M2 are the master and standby of each other), under normal circumstances, M2M S1jue S2 is involved in the load balancing of select statements.
Balance= "2", all read operations are randomly distributed on writeHost and readhost.
Balance= "3". All read requests are randomly distributed to the readhost corresponding to wiriterHost for execution. WriterHost does not bear the read pressure.
WriteType writes the type of load balancer. Currently, there are three values:
WriteType= "0", all write operations are sent to the first writeHost of the configuration, and the first one hangs the second writeHost that is still alive. After restarting, the switch is recorded in the configuration file: dnindex.properties.
WriteType= "1", all write operations are randomly sent to the configured writeHost
DbType specifies the type of database for the back-end connection, and currently supports the binary mysql protocol, as well as other databases that use JDBC connections. For example: mongodb, oracle, spark, etc.
DbDriver specifies the Driver used to connect to the backend database. Currently, the available values are native and JDBC. If you use native, because this value implements the binary mysql protocol, you can use mysql and maridb. Other types of databases need to be supported by JDBC drivers. The original native protocol of postgresql has been supported since version 1.6. If you use JDBC, you need to put the driver JAR package that conforms to the JDBC 4 standard into the MYCAT\ lib directory and check that the driver JAR package includes the following directory structure files: META-INF\ services\ java.sql.Driver. Write the specific Driver class name in this file, for example: com.mysql.jdbc.Driver.
The heartbeat tag indicates the statement used for heartbeat checking with the back-end database. For example, MYSQL can use select user (), Oracle can use select 1 from dual, and so on. This tag also has a connectionInitSql attribute, which is mainly put into the initialization SQL statement that needs to be executed when using the Oracla database. For example: alter session set nlsdateformat='yyyy-mm-dd hh34:mi:ss'
Both the writeHost,readHost tags specify the relevant configuration of the back-end database to mycat to instantiate the back-end connection pool. The only difference is that writeHost specifies write instances, readHost specifies read instances, and groups these read and write instances to meet the requirements of the system. Multiple writeHost and readHost can be defined within a dataHost. However, if the back-end database specified by writeHost goes down, then all readHost bound to this writeHost will be unavailable. On the other hand, because the writeHost downtime system will automatically detect and switch to the backup writeHost, the two tag attributes are consistent, with attributes such as host,url,password,user,weight,usingDecrypt.
Host is used to identify different instances. Generally, we use M1 writeHost and we use S1.
The link address of the instance of the real database of url, if it is the dbDriver of native, it is generally in the form of address:port. If you use JDBC or other dbDriver, you need a special specification. When using JDBC, you can write: jdbc:mysql://localhost:3306/
Link user name of the real database instance of user
Link password of password real database instance
Weight weight is configured in readhost as the weight of the read node. It is mainly used for different configurations of multiple read database instance machines, and the number of visits can be adjusted according to the weight.
Whether usingDecrypt encrypts the password by default 0 No if you need to enable configuration 1, use the encryption program to encrypt the password
Note that readHost is inside the writeHost tag, not alone
The following is my read-write separation profile
Select user ()
It's almost explained before, because I'm just using basic master-slave replication, so I set the balance of dataHost to 3.
Start mycat, and then use the database connection tool to connect to mycat, you can test whether the configuration is successful. The easiest thing is to modify the data from the slave library, so it is easy to see which library you are running to. In addition, since I started mycat based on docker, if you run mycat directly in the system, you can look at the official documentation to see how to start mycat.
The above is the example analysis of Mysql master-slave replication, read-write separation, sub-table sub-library strategy and practice. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.