Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the practice of using Redis to optimize query performance

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

What is the practice of using Redis to optimize query performance, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.

Application background

An application needs to upload a set of ID to the server to query the data corresponding to these ID. The amount of data stored in the database is 70 million, and the number of ID uploaded each time is usually hundreds to thousands.

Previous solution

The data is stored in Oracle and indexed for ID

When querying, the uploaded ID data is first stored in a temporary table, and then queried by the table association method.

The advantage of this is that it reduces the number of queries (not having to query every ID once) and reduces the time it takes to parse the SQL (only one query SQL needs to be executed, but more SQL processing time for inserting data).

However, there is still a lot of room for improvement in this design, and when the number of concurrent queries increases, the response of the database will take a long time. Although indexed, the time complexity of each ID query is still O (logn), so the total query time complexity should be masked O (logn). I don't know what optimizations Oracle has made to the table association query, but it should not change the level of time complexity.

Solution method

When there is a bottleneck in reading the database, the first thing that comes to mind is to use the in-memory database and cache to solve it. Redis is preferred because Redis is a key-value database that provides rich data structures, and value can store STRING (string), HASH (hash), LIST (list), and ZSET (ordered set).

First of all, you need to change the storage of the data to the key-value schema. The simple thing to do is that an ID corresponds to a string of Value. But an ID can correspond to multiple pieces of data, and a piece of data can contain multiple fields. At this point, you need to reassemble the data, put it together, or use lists, hashes, or collections to store Value.

Redis uses HashTable (hash table) to realize the data structure of key-value, which takes up a lot of space. And my application scenario is that ID has a scale of tens of millions, and if each ID is used as key in the above method, then the memory consumption will be huge. For each key-vaulue structure, the maintenance cost of Redis itself is more than 80 bytes. Even if value stores pure numbers (using the long type and takes up 4 bytes), it is still very large. 10 million of the data will take up 1 gigabyte of memory.

Optimize memory with two levels of Hash

According to the memory optimization method of the official documentation, as well as the Redis practice of this article: Instagram, it is recommended to segment ID as key and use hash to store the value of the first level key, and the second level stores less data (recommended 1000), so the second level key uses the last 3 bits of ID.

To save memory, Redis defaults to using ziplist (compressed list) to store data structures such as HASH (hash), LIST (list), and ZSET (ordered set). When certain conditions are met, it is automatically converted to hash table (hash table), linkedlist (double-ended list), and skiplist (hopping table).

Ziplist is a bi-directional linked list structure implemented by an array. As the name implies, using ziplist can reduce the storage space of bi-directional linked lists, mainly saving the storage of linked list pointers. If the storage points to the previous linked list node and the pointer to the next linked list node, it takes 8 bytes. The conversion to store the last node length and the current node length can save a lot of space in most cases (only 2 bytes at best). But each time you add elements to the linked list, you need to reallocate memory. -- the description quoted here

For more information on ziplist, please see the Redis book ziplist section.

Looking at the .conf file of Redis, you can see the setting information of the conversion conditions.

# Hashes are encoded using a memory efficient data structure when they have a# small number of entries, and the biggest entry does not exceed a given# threshold. These thresholds can be configured using the following directives.hash-max-ziplist-entries 512hash-max-ziplist-value 64# Similarly to hashes, small lists are also encoded in a special way in order# to save a lot of space. The special representation is only used when# you are under the following limits:list-max-ziplist-entries 512list-max-ziplist-value 64# Similarly to hashes and lists, sorted sets are also specially encoded in# order to save a lot of space. This encoding is only used when the length and# elements of a sorted set are below the following limits:zset-max-ziplist-entries 128zset-max-ziplist-value 64

The time complexity of ziplist search is O (N), but the amount of data is small, and the query speed of the second level Hash is still at O (1) level.

Re-encode the data stored in the second level Hash

In my application scenario, there can be many fields for data corresponding to each ID, and many of these fields are actually type data, which are also stored in ID. In order to further save memory, we use base62 encoding for these fields that use numbers as ID (0-9 ID characters become shorter, and further reduce the amount of data that the second-level hash needs to store in Redis, thereby reducing the memory occupied by Redis.

Using Lua scripts to handle batch operations

Because hundreds of ID are uploaded to each query, if the HGET command is called separately for these ID, then a query requires thousands of TCP communications, which is very slow. The best way to do this is to send all the queries to Redis Server at once, and then execute the HGET command at Redis Server in turn, using Redis's Pipelining (pipeline) and Lua script (version 2.6 or above is required). These two functions can be used to handle batch operations. Because the Lua script is easier to use, I chose the Lua script directly.

The Redis Lua script is atomic, and the execution process will lock the Redis Server, so Redis Server will execute all the commands in the Lua script before dealing with other command requests. You don't have to worry about locking shared resources for reading and writing caused by concurrency. Virtually all Redis commands are atomic, and executing any Redis command, including info, locks the Redis Server.

However, it should be noted that:

To prevent Redis from being unable to provide services (such as falling into a dead loop) due to the long execution time of a script, Redis provides the lua-time-limit parameter to limit the maximum running time of the script, which defaults to 5 seconds (see .conf configuration file). When the running time of the script exceeds this limit, Redis will start accepting other commands but will not execute them (to ensure the atomicity of the script, because the script is not terminated at this time), but will return a "BUSY" error-- quoted from the description here

In this case, you need to use the SCRIPT KILL command to terminate the execution of the Lua script. Therefore, it is important to note that Lua scripts do not have endless loops, nor should they be used to perform time-consuming operations.

Performance analysis.

Test environment:

Memory: 1333MHz

CPU:Intel Core i3 2330M 2.2GHz

Hard disk: Samsung SSD

The basic setup of the experiment:

The 70 million data is stored in Redis using two levels of Hash and re-coding of the data as described above.

Simulate data requests (no HTTP requests, direct function calls), query data, and generate response JSON data.

(the data is for reference only, because it is not really tested with the Web server.)

Using the above method, the memory optimization effect of Redis is very good.

Lab setup:

Simulate 500 ID queries at a time, and query continuously in batches. Used to simulate query performance in the case of concurrent tests.

The response speed is almost linearly related to the amount of data in the query. 2000 requests and 100W ID queries can be processed in 30 seconds. Because the speed of Oracle is too slow, we will not do the test.

Lab setup:

Continuous query 1W ID, each time, divided into 20 times. Used to test the impact of the amount of data stored in Redis on query performance.

The query speed is less affected by the amount of data stored. When the amount of data stored is larger, the second-level hash stores more data, so the query time will increase slightly, but still very fast.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report