How does MySQL support hundreds of millions of traffic 04/27 Update SLTechnology News&Howtos

How does MySQL support hundreds of millions of traffic

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

Editor to share with you how MySQL supports hundreds of millions of traffic, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

1 separation of master and slave reading and writing

Most Internet businesses read more and write less, so give priority to how DB supports a higher number of queries, first of all, you need to distinguish between read and write traffic, which is convenient for separate expansion of read traffic, that is, the separation of master and slave read and write.

If the slave library load is too high due to the sudden increase in front-end traffic, DBA will give priority to expanding the slave database capacity, so that the DB read traffic will fall to multiple slave libraries, and the load of each slave library will be reduced, and then the developer will try its best to block the traffic above the DB layer.

Cache V.S MySQL read-write separation

Due to the difficulty of development and maintenance, the introduction of cache will introduce complexity, consider cache data consistency, penetration, avalanche prevention and other issues, and also maintain a class of components. Therefore, it is recommended to give priority to the separation of read and write, can not bear to use Cache.

1.1 core

Master-slave read-write separation generally copies the data of one DB into one or more copies and writes them to other DB servers:

The original DB is the main library, which is responsible for data writing

Copy target DB to slave database, responsible for data query

So the key to the separation of master from read and write:

Copy of data

That is, master-slave replication

Shielding the change in the way of accessing DB brought about by the separation of master and slave

Make developers feel like they are still using a single DB

2 master-slave replication

The master-slave replication of MySQL depends on binlog, which records all changes on MySQL and saves the binary log files on disk in binary form.

Master-slave replication is to transfer the data in binlog from the master library to the slave library, which is generally asynchronous: the master library operation does not wait for the binlog synchronization to complete.

2.1 the process of master-slave replication

When the slave library connects to the master node, it creates an binlog O thread to request the master library to update the binlog, and writes the received binlog to the relay log file. The master library also creates a log dump thread to send binlog to the slave library.

The slave library will also create a SQL thread, read relay log, and play back in the slave library, finally achieving master-slave consistency.

The use of independent log dump threads is asynchronous to avoid affecting the master update process of the master database, while the slave database does not write to the storage of the slave database after receiving the information, but to a relay log, which is to avoid the time-consuming of writing to the actual storage of the slave database, resulting in a longer delay between the slave library and the master database.

The process of master-slave asynchronous replication

Based on performance considerations, the master database writing process does not wait for the master-slave synchronization to complete before returning the result. In extreme cases, for example, the disk is damaged or the machine power is lost before the binlog on the master database is set down, resulting in binlog loss and inconsistent master-slave data. But the probability is low and can be tolerated.

After the master database goes down, the inconsistency between master and slave data caused by the loss of binlog can only be recovered manually.

After master-slave replication, you can:

Write only the main library when writing

Read-only slave library when reading data

In this way, even if the write request locks the table or lock the record, it does not affect the execution of the read request. Under high concurrency, multiple slave libraries can be deployed to share read traffic, that is, one master and multiple slaves support high concurrent reads.

The slave library can also be used as a backup library to avoid data loss caused by the failure of the master library.

Can an unlimited increase in slave libraries support higher concurrency?

NO! The more slave libraries, the more I / O threads connected from the library, and the master database needs to create the same number of log dump threads to handle replication requests, which consumes more resources for the master library and is limited to the network bandwidth of the master library, so a master library can hang up to 3 slave libraries at most.

2.2 side effects of master-slave replication

For example, in the operation of posting moments, there are data:

Synchronous operation

Such as updating DB

Asynchronous operation

For example, synchronize the content of moments to the audit system.

Therefore, after updating the main database, the moments ID will be written to MQ, and the Consumer will obtain the moments information from the database according to ID and send it to the review system.

At this time, if there is a delay in the master slave DB, it will lead to an exception when the moments information is not available from the library.

Schematic diagram of the impact of master-slave delay on business

2.3 avoid the delay of master-slave replication

What are we going to do? In fact, there are many solutions, the core idea is not to query data from the database as far as possible. Therefore, in view of the above cases, there are the following plans:

2.3.1 data redundancy

When sending MQ, you can not only send moments ID, but also send all the moments information needed by Consumer to avoid re-querying data from DB.

This scheme is recommended because it is simple enough, but it may result in a large single message, thus increasing the bandwidth and time of message transmission.

2.3.2 using Cache

When writing DB synchronously, write the data of moments to Cache, so that Consumer will first query Cache when getting information on moments, which can also ensure data consistency.

This scheme is suitable for scenarios where new data is added. In a data update scenario, updating Cache first may lead to data inconsistency. For example, two threads update data at the same time:

Thread A updates the Cache data to 1

Another thread B updates the Cache data to 2

Then thread B updates the DB data to 2

Thread A then updates the DB data to 1

The final DB value (1) is inconsistent with the Cache value (2)!

2.3.3 query the main database

Instead of querying the slave library in Consumer, you can query the master library instead.

Use carefully, to make it clear that the order of magnitude of the query will not be very large, it is within the bearable range of the main database, otherwise it will cause great pressure on the main database.

Do not use this scheme unless you have to. Because it is necessary to provide an interface to query the main library, it is difficult to ensure that others will not abuse this method.

Master-slave synchronization delay is also easy to ignore when troubleshooting problems.

Sometimes we encounter the weird problem that we can't get the information from DB, and we wonder whether there is some logic in the code that deletes the previous writes, but finds that we can read the data again when we query it after a period of time. This is basically the problem of master-slave delay.

Therefore, the time lagging behind from the database is generally regarded as a key DB index, monitoring and alarm, the normal time is at the ms level, if you reach the s level, you will give an alarm.

Master-slave delay time warning, then how to judge by which index in which database? In the slave library, by monitoring the show slave

The value of the Seconds_Behind_Master parameter output by the status\ G command determines whether there is a master-slave delay.

This parameter value is copied by comparing the timestamp and io_thread of event executed by sql_thread

Event's timestamp (abbreviated as ts) is compared to get such a difference.

However, if the io_ thread load of the replication synchronization master library bin_log log is too high, the Seconds_Behind_Master has always been 0, that is, there is no early warning, and it is not accurate to judge the delay by the value of Seconds_Behind_Master. In fact, you can also compare the binlog positions of master and slave.

3 how to access DB

The use of master-slave replication to replicate data to multiple nodes and the separation of read and write from DB is also achieved, and the use of DB also changes:

In the past, only one DB address was needed.

Now we need to use one master library address, multiple slave library addresses, and need to distinguish between write operation and query operation, combined with "sub-library sub-table", the complexity is greatly increased.

In order to reduce the complexity of implementation, many DB middleware have emerged in the industry to solve the access problem of DB, which is roughly divided into:

3.1 within the application

TDDL (Taobao Distributed Data Layer), for example, runs embedded in the application in the form of code. Can be seen as a data source agent, its configuration manages multiple data sources, each data source corresponds to a DB, which may be a master library or a slave library.

When there is a DB request, the middleware sends the SQL statement to a specified data source and returns the processing result.

Advantages

It is easy to use and low deployment cost, because it is implanted inside the application and runs with the program, which is suitable for small teams with weak operation and maintenance.

Shortcoming

Lack of multilingual support, all developed by the Java language, can not support other languages. Version upgrades also rely on updates from the user.

3.2 independently deployed proxy layer solution

Such as Mycat, Atlas, DBProxy.

This kind of middleware is deployed on a separate server, and the business code is like using a single DB. In fact, it manages a lot of data sources internally. When there is a DB request, it will rewrite the SQL statement as necessary and then send it to the specified data source.

Advantages

Standard MySQL communication protocol is generally used, so it can support multiple languages

Independent deployment, so easy to maintain and upgrade, suitable for large and medium-sized teams with the ability of operation and maintenance

Shortcoming

All SQL statements need to span the network twice: from the application to the proxy layer and from the proxy layer to the data source, so there will be some performance loss.

The above is all the contents of the article "how MySQL supports 100 million-level traffic". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.