In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
First, why do you need distributed data repositories
With the rapid development of computer and information technology, the scale of industry application system expands rapidly, and the amount of data generated by industry application increases explosively, easily reaching hundreds of TB or even hundreds of PB, which has far exceeded the processing capacity of traditional computing technology and information systems. Centralized database gradually shows its limitations in the face of large-scale data processing. Therefore, people want to find a way to process data quickly and respond to user access in a timely manner, as well as centralized analysis, management and maintenance of the data. This has become an urgent need.
Distributed database is developed on the basis of centralized database, and it is the product of the combination of computer technology and network technology. Distributed database refers to a database system in which data is physically distributed and logically centrally managed. Physical distribution refers to the distribution of data on nodes or sites with different physical locations and connected by the network; logical centralization means that each database node is logically a whole and managed by a unified database management system. Different node distribution can span different computer rooms, cities and even countries.
2. Characteristics of distributed database
Distributed database not only has the characteristics of transparency, data redundancy, easy expansibility and autonomy, but also has the characteristics of economy, superior performance, faster response, flexible architecture, easy to integrate existing systems and so on.
Although the distributed database has a noble pedigree, it depends on the adjustment network, and the transaction processing is far less mature than the traditional database. In a long period of time, distributed data storage will coexist with traditional data storage.
III. Brief introduction of MyCat database middleware
MyCat is a thoroughly open source large database cluster for enterprise application development, which supports transactions and ACID. It is an enhanced database that can replace MySQL. MyCat is regarded as the enterprise database of MySQL cluster, which is used to replace the expensive Oracle cluster. It is a new type of SQL Server that integrates memory cache technology, NoSQL technology and HDFS big data. It is a new generation of enterprise database product that combines traditional database and new distributed data warehouse. It is also an excellent database middleware.
MyCat is improved by Cobar. MyCat supports Oracle and PostgreSQL, and has supported NoSQL (SequoiaDB and MongoDB) since version 1.3 and introduced a Druid parser. MyCat released version 1.5 in 2016 and version 1.6.6 in 2018. Currently, the MyCat2.0 project has been launched and the core code has been submitted (https://github.com/MyCatApache/MyCat2.git).
Fourth, the core concepts of MyCat are explained in detail. 4.1 logical Library (schema)
Usually in practical applications, business developers do not need to know the existence of middleware, but only need to pay attention to the database, so database middleware can be regarded as a logical library composed of one or more database clusters.
4.2 logical tables (table)
Since there is a logical library, there will be logical tables. In a distributed database, a table that reads and writes data is a logical table for an application. Logical tables can be distributed in one or more sharding libraries or without sharding.
1) sharding table
A sharding table refers to dividing a table with a large amount of data into multiple database instances, all of which are combined to form a complete table. For example, configure the sharding table of t_node on MyCat, and the data is divided into dn1 and dn2 nodes according to the rules.
2) non-fragmented table
Not all tables need to be sliced when the number is large. A non-sharded table is a table that is relative to a sharded table and does not require data sharding. As in the following configuration, t_node exists only on node dn1.
3) ER table
Relational database is based on entity relation model (Entity Relationship Model), which is the source of ER table in MyCat. Based on this idea, MyCat proposes a data slicing strategy based on Emurr relation, in which the records of the child table and the records of the associated parent table are stored on the same data slice, that is, the child table depends on the parent table, and the table grouping (Table Group) ensures that the data association query will not operate across databases.
Table grouping is not only a good idea to solve the cross-fragment data association query, but also an important rule of data segmentation.
4) Global table
In a real business scenario, there are often a large number of similar dictionary tables, the data in these dictionary tables do not change frequently, and the scale of the data is small, rarely have more than hundreds of thousands of records.
When the business table is sliced because of its scale, the association query between the business table and these affiliated dictionary tables becomes a thorny problem, so the association query of this kind of table is solved through data redundancy in MyCat, that is, all the fragments copy a piece of data, and the table of these redundant data is defined as the global table.
Data redundancy is not only a good idea to solve the cross-fragment data association query, but also another important rule of data segmentation planning.
4.3 sharding Node (dataNode)
After the data is sliced, a large table is divided into different sharding databases, and the database where each table shard is located is a sharding node.
4.4 Node Host (dataHost)
After sharding the data, each sharding node may not necessarily own one machine, and there can be multiple sharding databases on the same machine, so that the machine in which one or more sharding nodes are located is the node host. In order to avoid the limit of the number of concurrent hosts in a single node, the shard nodes with high read and write pressure are placed uniformly on different node hosts as far as possible.
Introduction to the principle of MyCat
One of the most important verbs in the MyCat principle is "intercept", which intercepts the SQL statement sent by the user. First, it makes some specific analysis of the SQL statement, such as fragmentation analysis, routing analysis, read-write separation analysis, cache analysis, etc., and then sends the SQL statement to the real database at the back end, and the returned results are properly processed, and finally returned to the user.
When MyCat receives a SQL statement, it first parses the table involved in the SQL statement, and then looks at the definition of the table. If there is a sharding rule in the table, it gets the value of the shard field in the SQL statement, matches the shard function, gets the shard list corresponding to the SQL statement, then sends the SQL statement to the corresponding shard for execution, and finally processes all the data returned by the shard and returns it to the client.
VI. MyCat configuration
Schema.xml is an important configuration file of MyCat, which manages logic library, sharding table, sharding node, shard host and other information.
Service.xml is the configuration file of system parameters. To master the optimization method of MyCat, you must be familiar with the configuration items of this file.
Sequence is the configuration file for the global sequence.
6.1 server.xml profile
The server.xml configuration file contains the system configuration information of MyCat, and the corresponding source code is SystemConfig.java. It has two important tags, user and system. Mastering the configuration attributes of the system tag is the key to MyCat tuning.
0 1 0 0 2 false 0 0 1 64k 1k 0 384m false False true 123456 TESTDB User TESTDB true 6.2 schema.xml profile
As one of the important configuration files in MyCat, schema.xml covers the logic library, table, sharding rules, sharding nodes and data sources of MyCat.
1) schema tag
The schema tag is used to define logical libraries in an MyCat instance. MyCat can have multiple logical libraries, each with its own configuration. You can use schema tags to divide different logical libraries, and if you have configured schema tags, all table configurations will belong to the same default logical library.
As shown above, two different logical libraries are configured, and the concept of logical library is equivalent to the concept of Database in the MySQL database. When we query the tables in the logical library, we need to switch to the logical library to query the tables in it.
CheckSQLschema attribute, when this value is set to true, if we execute the statement select * from TESTDB.travelrecord;, MyCat will remove the schema characters, and modify the SQL statement to select * from travelrecord; to avoid sending the error to the backend database.
SqlMaxLimit property, when this property is set to a value, MyCat automatically adds the corresponding value after the SQL statement if no limit statement is added to each executed limit statement. If this value is not set, MyCat will return all the information that has been queried.
2) table tag
The table tag defines the logical tables in MyCat, and all tables that need to be split need to be defined in the table tag.
The main attributes of the table tag are shown in the table below, which is described on the official MyCat website http://www.MyCat.io.
Limit the number of attribute names and values nameString1dataNodeString1..*ruleString0..1ruleRequiredboolean0..1primaryKeyString1typeString0..1autoIncrementboolean0..1subTablesString1needAddLimitBoolean0..1
3) childTable tag
The childTable tag is used to define the child table of the Emurr fragment, which is associated with the parent table through the attributes on the tag.
4) dataNode tag
The dataNode tag defines the data node in MyCat, which is commonly referred to as data fragmentation. A dataNode tag is a separate data shard.
5) dataHost tag
DataHost tags exist as underlying tags in the MyCat logic library, directly defining specific database instances, read-write separation and heartbeat statements.
The Heartbeat tag is used to configure heartbeat check statements, Mysql can use select user (), Oracle can use select 1 from dual, and so on.
WriteHost and readHost are read-write configurations, and multiple reads and writes can be configured within a dataHost. However, if the back-end database specified by writeHost goes down, all readHost bound to this writeHost will also be unavailable; on the other hand, MyCat will automatically detect writeHost downtime and switch to the standby writeHost.
7. MyCat fragmentation
In MyCat, tables are divided into two large concepts: tables with small amount of data and do not need to do data segmentation, called non-sharded tables; tables with large amount of data that are too large to support the performance of a single database and insufficient capacity, and tables that need to be evenly distributed to different databases through horizontal sharding, called sliced tables. The final thing that the middleware needs to deal with is to split and aggregate the data.
7.1 ER relational shard table
ER model is an entity relation model. Conceptual model design method is widely used. The basic elements are entities, relationships and attributes. MyCat introduces it into data segmentation rules, so that interdependent tables can be divided into the same node according to a certain rule, avoiding cross-library Join association queries. The details of the configuration are not described here. Please refer to the official documentation. Https://www.jianshu.com/p/fc56f6221728
Functions and advantages 8.1 Common commands
MyCat provides a similar data management monitoring mode, which can be managed by executing corresponding SQL statements through the MySQL command line login management port (9066), or by remote connection management through JDBC mode.
The Reload @ @ config command is used to update the configuration file to run the command, which can be updated without a restart.
Reload @ @ sqlstat is used to turn off and enable SQL monitoring analysis.
The Show @ @ database command is used to display a list of MyCat databases, and the running result corresponds to the schema child node of the schema.xml configuration file.
Show @ @ datanode is used to display the MyCat data node, and the running result corresponds to the dataNode node of the schema.xml configuration file.
Show @ @ heartbeat is used to report heartbeat status.
Show @ @ connection is used to obtain the front-end connection status of MyCat.
Kill @ @ connection id,id,id is used to close the connection.
Show @ @ cache is used to view the cache.
Show @ @ datasource is used to view the status of the data source, and can be switched if master-slave or multi-master is configured.
Switch @ @ datasource name:index is used to switch data sources.
Show @ @ syslog limit is used to display Syslog.
Show @ @ sql shows statements that have been executed in MyCat.
Show @ @ shl.show displays a slow SQL statement.
Show @ @ sql.sum shows the overall execution of the SQL statement, the read-write ratio, and so on.
8.2 limitations
Delete operations do not support tables without primary keys. Tables without primary keys are in different order at different nodes, and if you execute select... limit..., different result sets will appear.
XA transactions are not supported and may be rolled back on commit. Zhengzhou Infertility Hospital: http://jbk.39.net/yiyuanzaixian/zztjyy/
Because the cluster is an optimistic concurrent control, the transaction commit may abort at this stage, so if two transactions are written and committed to the same row of different nodes in the cluster, the failed node will abort. For cluster-level aborts, the cluster returns a deadlock error.
The write throughput of the entire cluster is limited by the weakest node, and if one node becomes slow, the entire cluster becomes slow.
Data is written within the cluster according to the self-growing mechanism of id. For example, there are three machines in the cluster that may be incremented by 3pd6 and 9.
8.3 compared with Sharding-JDBC Sharding-JDBC
MyCat is a third-party application of middleware, and sharding-jdbc is a jar package. Because MyCat is deployed separately, using MyCat is like accessing a database, and the logic of sharding-jdbc needs to be written in the project.
Lightweight Sharding-JDBC can be used for stand-alone applications, and MyCat is more appropriate if multiple services need to manipulate the database. Because using Sharding-JDBC requires the corresponding sharding and other logic to be configured in each project, while MyCat only needs to configure a separate deployment.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.