What is the principle of read-write separation in high-performance databases? 07/11 Update SLTechnology News&Howtos

What is the principle of read-write separation in high-performance databases?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What this article shares with you is about the separation of reading and writing in high-performance databases. The editor thinks it is very practical, so I share it with you. I hope you can learn something after reading this article. Let's take a look at it with the editor.

Although various storage technologies have developed rapidly in the past decade, relational database is still the key and core storage system in various business systems because of its ACID characteristics and powerful SQL query. The core of high-performance design in many scenarios is the design of relational database.

No matter in order to meet the needs of business development or to enhance their competitiveness, relational database manufacturers (Oracle, DB2, MySQL, etc.) have also done a lot of technical optimization and improvement in optimizing and improving the performance of a single database server. However, the speed of business development and data growth far exceeds the optimization speed of database manufacturers, especially after the rise of Internet business, with the characteristics of massive users and massive data, a single database server has been difficult to meet business needs. Database clusters must be considered to improve performance.

The first way of high-performance database cluster is "read-write separation", which essentially distributes the access pressure to multiple nodes in the cluster, but does not disperse the storage pressure; the second way is "sub-database and sub-table", which can disperse the access pressure and storage pressure. Let's take a look at the separation of reading and writing.

Principle of separation of reading and writing

The basic principle of read-write separation is to distribute database read and write operations to different nodes. the following is the basic architecture diagram.

The basic implementation of read-write separation is:

The database server sets up a master-slave cluster, either one master or one slave or one master and multiple slaves.

The database host is responsible for the read and write operation, and the slave is only responsible for the read operation.

The database host synchronizes the data to the slave through replication, and each database server stores all the business data.

The business server sends the write operation to the database host and the read operation to the database slave.

It should be noted that the "master-slave cluster" is used here, not the "master-slave cluster". The "slave" of the "slave" can be understood as "servant", the servant needs to help the master work, and the "slave" needs to provide the function of reading data, while the "standby" is generally considered to only provide backup function, not access function. Therefore, the use of "master-slave" or "master-standby" depends on the scene, and the two words are not exactly the same.

The implementation logic of read-write separation is not complex, but two details will introduce design complexity: master-slave replication delay and allocation mechanism.

Replication delay

Take MySQL as an example, the master-slave replication delay may reach 1 second, and if there is a large amount of data synchronization, a 1-minute delay is also possible. The delay of master-slave replication will cause a problem: if the business server reads the data immediately (within 1 second) after it is written to the database master server, the read operation accesses the slave, and the host has not copied the data yet. so that the slave can not read the latest data, business problems may occur. For example, if a user logs in immediately after registration, the business server will prompt him, "you haven't registered yet", and the user has just registered successfully.

There are several common ways to resolve master-slave replication delays:

1. The read operation after the write operation is assigned to the database master server

For example, after registering an account, the read operation of reading the account when logging in is also sent to the database master server. This way is strongly bound to the business and has a greater intrusion and impact on the business. If any new programmer does not know how to write code in this way, it will lead to a bug.

two。 Read the host again after failed to read the slave computer

This is commonly known as "secondary reading". Secondary reading is unbound to the business, and only needs to encapsulate the API accessed by the underlying database. The implementation cost is small, but the deficiency is that if there are many secondary reads, it will greatly increase the pressure on the read operation of the host. For example, the hacker violently cracked the account, which will lead to a large number of second read operations, and the host may not be able to withstand the pressure of read operations and crash.

3. All critical business read and write operations are directed to the host, and non-critical businesses adopt read-write separation.

For example, for a user management system, the business read and write operations of registration + login all visit the host, and users' introductions, hobbies, grades and other services can be separated from reading and writing, because even if users change their own self-introduction, when querying, they see that the self-introduction is still old, and the business impact is much smaller than that of not being able to log in, and it can be tolerated.

Distribution mechanism

There are generally two ways to distinguish between read and write operations and then access different database servers: program code encapsulation and middleware encapsulation.

1. Program code encapsulation

Program code encapsulation refers to abstracting a data access layer in the code (so some articles also call this method "middle layer encapsulation") to realize the separation of read and write operations and the management of database server connections. For example, simple encapsulation based on Hibernate can achieve read-write separation. The basic architecture is as follows:

The way program code is encapsulated has several characteristics:

The implementation is simple, and more customized functions can be done according to the business.

Each programming language needs to be implemented once and cannot be universal. If a business contains multiple subsystems written by multiple programming languages, the workload of repeated development is relatively large.

In the case of a failure, if a master-slave switch occurs, all systems may need to modify the configuration and restart.

At present, among the open source implementation solutions, Taobao's TDDL (Taobao Distributed Data Layer, nickname: big head) is quite famous. It is a general data access layer, and all functions are encapsulated in jar packages for business code calls. Its basic principle is a jdbc datasource implementation based on centralized configuration, which has the functions of active / standby, read-write separation, dynamic database configuration and so on. The basic architecture is as follows:

(http://1.im.guokr.com/0Y5YjfjQ8eGOzeskpen2mlNIYA_b7DBLbGT0YHyUiLFZAgAAgwEAAFBO.png)

two。 Middleware packaging

Middleware encapsulation refers to an independent system to achieve the separation of read and write operations and the management of database server connections. Middleware provides SQL-compatible protocols to the business server, and the business server does not need to separate read and write on its own. For the business server, there is no difference between accessing the middleware and accessing the database. In fact, in the view of the business server, the middleware is a database server. Its basic architecture is as follows:

The characteristics of database middleware are:

It can support multiple programming languages because database middleware provides a standard SQL interface to the business server.

Database middleware to support complete SQL syntax and database server protocol (for example, MySQL client and server connection protocol), the implementation is more complex, many details, it is easy to appear bug, it takes a long time to be stable.

Database middleware itself does not perform real read and write operations, but all database operation requests go through middleware, and the performance requirements of middleware are also very high.

The master-slave switch of the database is not aware of the business server, and the database middleware can detect the master-slave state of the database server. For example, if you write a piece of data to a test table, the success is the host and the failure is the slave.

Because the complexity of database middleware is an order of magnitude higher than that of program code encapsulation, it is generally recommended to use program language encapsulation or mature open source database middleware. If it is a large company, you can invest manpower to implement database middleware, because once this system is done, the more business systems connected, the more program development investment will be saved and the greater the value will be.

In the current open source database middleware solution, MySQL officially provides MySQL Proxy, but MySQL Proxy has never had a formal GA, and now MySQL officially recommends MySQL Router. The main functions of MySQL Router include read-write separation, automatic failover, load balancing, connection pooling, and so on. Its basic architecture is as follows:

(https://dev.mysql.com/doc/mysql-router/2.1/en/images/mysql-router-positioning.png)

Qihoo 360 has also opened up its own database middleware Atlas,Atlas, which is implemented based on MySQL Proxy. The basic architecture is as follows:

The following is the official introduction, you can refer to here for more information.

Atlas is a middleware located between the application and the MySQL. In the view of the back-end DB, Atlas is equivalent to the client that connects to it, and to the front-end application, Atlas is the equivalent of a DB. Atlas communicates with the application as a server, it implements the client and server protocols of MySQL, and communicates with MySQL as a client. It shields the details of DB from the application and maintains connection pooling to reduce the burden on MySQL.

This is what the separation of read and write is like in high-performance databases. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.