How to use LIST, RANGE and HASH Partition to solve the dispersion of Hot spot data 04/21 Update SLTechnology News&Howtos

How to use LIST, RANGE and HASH Partition to solve the dispersion of Hot spot data

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article shows you how to use LIST and RANGE and HASH partition to solve hot data dispersion, concise and easy to understand, absolutely can make you shine, through the detailed introduction of this article I hope you can gain something.

Hotspot data refers to the data that is used frequently, such as some hot events. Due to the fermentation effect of the network, hundreds of thousands or even millions of concurrency can be achieved in a short time. For such scenarios, we need to analyze the bottleneck of the system and the technical implementation of the response.

Program explanation

Large concurrency architecture evolution

1. The difference between Figure 1 and Figure 2 is that there will be a layer of web cache server in the middle. This service can be designed by nginx+lua+redis. The hot data of cache layer is scattered, which will be described in the subsequent 'High concurrency' section.

2. Hotspot data can definitely be intercepted in the cache server of the web layer to prevent a large number of requests from being sent to the application server. However, for non-hot data, requests will be sent to DB after penetrating the cache. The pressure caused by QPS of thousands of data per second on DB is also very large. At this time, we need a certain solution to ensure the timeliness of requests, that is, how to reduce the IO times at DB level.

scene classification

Hot data concurrency is divided into two scenarios: read and write. Most of the daily high concurrency scenarios are read scenarios. No matter what architecture design is adopted, hot data needs to be distributed at the cache layer and DB layer. This chapter focuses on the latter.

principle analysis

We all know that after the hot data is dispersed, the performance of the system will be significantly improved. What is the principle? Next, we will discuss some key knowledge of db storage. The above two are two mysql storage engines that we often use, especially the latter. Basically, most of the tables encountered by the author in the work are defined as innodb engines. What is the difference between the two? What is the difference between using scenarios?

1. Read the data

myisam: BTREE implementation is the same as innodb, myisam is a non-clustered index, index file and data file separation, it is very efficient for reading, why, because it is determined by the storage structure, data order storage, tree leaf nodes point to the file physical address, so the query efficiency is high

innodb： It is implemented through a clustered index, aggregated according to the primary key, so innodb engine must have an identifier that uniquely identifies this column of data. For a clustered index, its leaf nodes store data, and for innodb's auxiliary index, its leaf nodes are the primary key values. Therefore, a secondary lookup is added during query. To avoid this situation, you can directly use a clustered index to query, but in reality, we still need to rely on auxiliary indexes in most business scenarios.

2. Write data

myisam: things are not supported, and write priority is higher than read priority, multithreaded reads can be concurrent, reads and inserts can be concurrent by optimizing parameters, reads and updates cannot be concurrent, lock level is table level lock

innodb: Support things, can achieve read and write concurrency, row-level locking, write performance is better than myisam engine

3. Data page

It is a basic unit of innodb data storage. It can be modified by optimizing innodb_page_size parameter. The default is 16K. According to the principle of clustered index mentioned above, the size of index and the size of single data determine the number of records that can be contained in this data page. The more data records contained, the smaller the probability of page turning and the number of IO operations.

implementations

vertical table

Vertical table is to disassemble the column, which can be disassembled according to business functions or hot and cold. For example, the schematic diagram of disassembling the user table according to the use of hot and cold scenarios is as follows:

The significance of vertical table splitting is to further split the hot table and reduce the problem of excessive IO caused by multi-page queries caused by excessive length of a single row in the data table.

horizontal subscale

Horizontal table splitting is to disassemble rows. After disassembling, the data volume of a single table will be smaller.

For example, for a data volume of 50 million, horizontal tables are divided. The original 50 million data records in a single table are divided into 10 tables, and each table has 5 million records.

The size of each index page is fixed by default 16K, so in the case of a fixed single page size, the more records in a single table, the more pages in the index page, and the probability and frequency of paging during query will increase.

Horizontal sub-table is to solve this problem, sub-table implementation: partition, sub-table/sub-library, specific reference to the following introduction.

best practices

1. Separation of hot and cold data

Take the article content system as an example: title, author, category, creation time, number of likes, number of replies, most recent reply time

1.1 Cold data: can be understood as static data, will be frequently read, but almost or rarely changed, this type of data for reading performance requirements are high, data storage can use myisam engine

1.2 Hot data: data content is frequently changed, this kind of data has high requirements for concurrent reading and writing, we can use innodb engine to store

Different storage engines are used according to specific usage scenarios to achieve relatively optimal performance. The table structure of the article content system is split hot and cold. The table structure after splitting is as follows:

1.3 Performance comparison before and after splitting

Insert 100000 pieces of data and query the article table before and after splitting. The trend of performance comparison is as follows. Similarly, 50 concurrent requests are simulated, with a total of 2500 requests and 50 requests per thread. The ID is random. This is closer to the real query scenario. Obviously, the effect after splitting is better:

Single table test after splitting:

mysqlslap -h227.0.0.1 -uroot -P3306 -p --concurrency=50 --iterations=1 --engine=myisam --number-of-queries=2500 --query='select * from cms_blog_static where id=RAND()*1000000' --create-schema=test

Unsplit Test:

mysqlslap -h227.0.0.1 -uroot -P3306 -p --concurrency=50 --iterations=1 --engine=innodb --number-of-queries=2500 --query='select * from cms_blog where id=RAND()*1000000' --create-schema=test

2. Reduce the size of a single line of data

For the split data table, can we further reduce the size of a single row of data? Summarize the commonly used methods as follows:

2.1 Set reasonable field length

We all know that different field types occupy different storage space, as shown in the following figure:

Type Length (bytes) Fixed/Unfixed Length

TINYINT1 fixed length

SMALLINT2 fixed length MEDIUMINT3 fixed length

INT4 Fixed length

BIGINT8 fixed length

FLOAT(m)4 bytes (m=24 and md), d+2 bytes (md), d+2 bytes (m

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.