Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to save time series data in Redis

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the knowledge of "how to save time series data in Redis". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

When we make Internet products now, we all have such a need: to record users' click behavior data on the website or App to analyze user behavior. The data here generally includes the user's ID, the type of behavior (such as browsing, logging in, placing an order, etc.), and the timestamp of the behavior:

UserID, Type, TimeStamp

The data access requirements of an Internet of things project that I have done before are very similar to this. We need to periodically count the real-time status of nearly 10,000 devices, including device ID, pressure, temperature, humidity, and corresponding timestamps:

DeviceID, Pressure, Temperature, Humidity, TimeStamp

These groups of data related to the time of occurrence are time series data. The characteristic of this data is that there is no strict relational model, and the recorded information can represent the relationship between keys and values (for example, a device ID corresponds to a record), so it does not need to be stored in a relational database (such as MySQL). The key-value data model of Redis can just meet the data access needs here. Redis provides two solutions based on its own data structure and extension module.

In this lesson, I will take the statistics of device status indicators in the Internet of things scenario as an example to talk to you about the practices, advantages and disadvantages of different solutions.

As the saying goes, "know yourself and know the enemy, win a hundred battles". Let's start with the reading and writing characteristics of time series data and see what kind of data type should be used to preserve it.

Reading and writing characteristics of time series data

In practical applications, time series data are usually written continuously with high concurrency, for example, the real-time status values of tens of thousands of devices need to be recorded continuously. At the same time, the writing of time series data is mainly about inserting new data, rather than updating existing data, that is, a time series data usually does not change after it is recorded, because it represents the state of a device at a certain time (for example, the temperature measurement of a device at a certain time, once recorded, the value itself will not change).

Therefore, the writing characteristic of this kind of data is very simple, that is, the data is inserted quickly, which requires us to choose the data type with low complexity and try not to block. When you see this, you may immediately think of using the String and Hash types of Redis to save them, because their insertion complexity is O (1), which is a good choice. However, as I said in Lecture 11, when String types record small data (such as the device temperature value in the example), metadata is more memory expensive and is not suitable for storing large amounts of data.

Let's take a look at the characteristics of the "read" operation of time series data.

When we query time series data, there are both queries for a single record (such as querying the running status information of a device at a certain time, corresponding to a record of the device). There are also queries for data within a certain time range (for example, the status information of all devices from 8 a.m. to 10:00 every morning).

In addition, there are some more complex queries, such as aggregating data within a certain time range. The aggregate calculation here is to calculate all the data that meet the query conditions, including calculating the mean, maximum / minimum value, summation, and so on. For example, we need to calculate the maximum pressure of the equipment in a certain period of time to determine whether a fault has occurred.

The "reading" of time series data summed up in one word means that there are many query patterns.

After figuring out the reading and writing characteristics of time series data, let's take a look at how to save the data in Redis. Let's analyze: for the "write fast" of time series data, the high-performance writing characteristics of Redis can be directly satisfied; and for "many query modes", that is, to support single point query, range query and aggregate calculation, Redis provides two schemes to save time series data, which can be implemented based on Hash and Sorted Set, and based on RedisTimeSeries module.

Next, let's learn about the first scheme.

Saving time series data based on Hash and Sorted Set

The combination of Hash and Sorted Set has one obvious benefit: they are Redis's inherent data types, mature code, and stable performance. Therefore, based on these two data types to save time series data, the stability of the system can be expected.

However, in the scenarios we learned earlier, we all use one data type to access data, so why use both types to save time series data? This is the first question we have to answer.

With regard to the Hash type, we all know that one of its features is that it can quickly query a single key. This meets the demand of single-key query for time series data. We can use the timestamp as the key of the Hash collection and the recorded device status value as the value of the Hash collection.

You can take a look at the schematic diagram of recording the temperature value of the device with the Hash collection:

When we want to query temperature data at a certain point in time or at multiple points in time, we can directly use the HGET command or HMGET command to get the value values of one key and multiple key in the Hash collection, respectively.

For instance. We use the HGET command to query the temperature values of 202008030905 and HMGET to query the temperature values of 202008030905, 202008030907 and 202008030908, as shown below:

HGET device:temperature 202008030905 "25.1" HMGET device:temperature 202008030905 202008030907 2020080309081) "25.1" 2) "25.9" 3) "24.9"

You see, it's easy to use the Hash type to implement a single-key query. However, the Hash type has a drawback: it does not support range querying of data.

Although the time series data is inserted into the Hash collection in time increasing order, the underlying structure of the Hash type is a hash table, and the data is not indexed sequentially. Therefore, if you want to query the scope of the Hash type, you need to scan all the data in the Hash collection, and then take the data back to the client for sorting, and then you can get the data within the range of the query on the client. Obviously, the query efficiency is very low.

In order to support queries by timestamp range at the same time, you can use Sorted Set to save time series data because it can be sorted according to the weight score of the element. We can take the timestamp as the element score of the Sorted Set collection and the data recorded at the point in time as the element itself.

I will take the preservation of time series data of equipment temperature as an example to explain. The following figure shows the results saved with the Sorted Set collection.

After saving the data using Sorted Set, we can use the ZRANGEBYSCORE command to query the temperature values in this time range according to the maximum and minimum timestamps entered. As shown below, let's look at all the temperature values between 09:07 and 09:10 on August 3, 2020:

ZRANGEBYSCORE device:temperature 202008030907 2020080309101) "25.9" 2) "24.9" 3) "25.3" 4) "25.2"

Now we know that using Hash and Sorted Set at the same time can meet the data query requirements of a single point in time and within a time range, but we will be faced with a new question, that is, the second question we want to answer: how to ensure that writing Hash and Sorted Set is an atomic operation?

The so-called "atomic operation" means that when we perform multiple write command operations (such as writing data to Hash and Sorted Set with the HSET command and the ZADD command respectively), these command operations are either completed or not completed.

Only by ensuring the atomicity of the write operation can we ensure that the same time series data is either saved or not saved in Hash and Sorted Set. Otherwise, it is possible that there is time series data in the Hash collection, but not in Sorted Set, so there is no way to meet the query requirements when making a range query.

So how does Redis guarantee atomic operation? This involves the MULTI and EXEC commands that Redis uses to implement simple transactions. When multiple commands and their arguments are correct, the MULTI and EXEC commands guarantee atomicity when executing these commands. With regard to Redis transaction support and atomicity guarantees, I will introduce you to you in Lecture 30. In this lesson, we only need to know how to use the MULTI and EXEC commands.

The MULTI command: represents the beginning of a series of atomic operations. After receiving this command, Redis knows that the next commands to be received need to be placed in an internal queue and executed together to ensure atomicity.

The EXEC command: indicates the end of a series of atomic operations. Once Redis receives this command, it means that all command operations to ensure atomicity have been sent. At this point, Redis begins to execute all the command operations that have just been placed in the internal queue.

You can take a look at the following diagram. Commands 1 to N are sent after the MULTI command and before the EXEC command, and they are executed together to ensure atomicity.

Taking the need to save device status information as an example, we execute the following code to write the temperature of the device at 09:05 on August 3, 2020 into the Hash set and the Sorted Set set with the HSET command and the ZADD command, respectively.

127.0.1 26.8QUEUED127.0.0.1:6379 6379 > MULTIOK127.0.0.1:6379 > HSET device:temperature 202008030911 26.8QUEUED127.0.0.1:6379 > ZADD device:temperature 202008030911 26.8QUEUED127.0.0.1:6379 > EXEC1) (integer) 12) (integer) 1

As you can see, first, Redis received the MULTI command executed by the client. Then, after the client executes the HSET and ZADD commands, the result returned by Redis is "QUEUED", indicating that the two commands are temporarily queued and not executed; after the EXEC command is executed, the HSET command and ZADD command are actually executed and the successful result is returned (the result value is 1).

At this point, we solve the problem of single point query and range query of time series data, and use MUTLI and EXEC commands to ensure that Redis can atomically save the data to Hash and Sorted Set. Next, we need to continue to solve the third problem: how to aggregate time series data?

Aggregate computing is generally used to count the data summary status in the time window periodically, and it will be performed frequently in real-time monitoring and early warning scenarios.

Because Sorted Set only supports range query and cannot perform aggregation calculation directly, we can only retrieve the data within the time range to the client, and then complete the aggregation calculation on the client. Although this method can complete the aggregate calculation, it will bring some potential risks, that is, a large amount of data is frequently transferred between Redis instances and clients, which will compete with other operation commands for network resources and cause other operations to become slow.

In our Internet of things project, we need to count the temperature status of each device every 3 minutes, and once the equipment temperature exceeds the set threshold, we will give an alarm. This is a typical aggregate computing scenario, and we can take a look at the data volume in this process.

Suppose we need to calculate the maximum values of all devices every 3 minutes, each device records an indicator value every 15 seconds, 4 values are recorded in 1 minute, and 12 values are recorded in 3 minutes. We need to count 33 device metrics, so there are nearly 400 metrics recorded by a single device every 3 minutes (33 * 12 = 396), while the total number of devices is 10, 000. As a result, nearly 4 million data (396 * 10, 000 = 3.96 million) need to be transferred between the client and the Redis instance every 3 minutes.

In order to avoid frequent large data transfers between clients and Redis instances, we can use RedisTimeSeries to save time series data.

RedisTimeSeries supports aggregate computing directly on Redis instances. Let's take the maximum value calculated every three minutes as an example. If you directly aggregate the calculation on the Redis instance, for one indicator value of a single device, 12 pieces of data recorded every 3 minutes can be aggregated and calculated into a value. Only 33 aggregate values need to be transmitted every 3 minutes for a single device, and only 330000 data for 10, 000 devices. The amount of data is about 1/10 of the aggregate calculation done on the client side, which obviously reduces the impact of a large amount of data transfer on the performance of the Redis instance network.

Therefore, if we only need to query a single point-in-time or a certain time range, it is suitable to use the combination of Hash and Sorted Set, which are the inherent data structures of Redis with good performance and high stability. However, if we need to do a lot of aggregation calculations, and the network bandwidth conditions are not too good, the combination of Hash and Sorted Set is not very suitable. At this point, it is more appropriate to use RedisTimeSeries.

All right, next, let's learn about RedisTimeSeries in detail.

Saving time series data based on RedisTimeSeries module

RedisTimeSeries is an extension module of Redis. It provides data types and access interfaces specifically for time series data, and supports the aggregation of data by time range directly on Redis instances.

Because RedisTimeSeries does not belong to the built-in function module of Redis, when using it, we need to compile its source code into redistimeseries.so, and then use the loadmodule command to load it, as shown below:

Loadmodule redistimeseries.so

When used for time series data access, RedisTimeSeries has five main operations:

Create a time series data set with the TS.CREATE command

Insert data with the TS.ADD command

Use the TS.GET command to read the latest data

Use the TS.MGET command to filter the query data collection by label

Scope queries for aggregate calculations are supported with TS.RANGE.

Next, I'll show you how to use these five operations.

1. Create a time series data collection with the TS.CREATE command

In the TS.CREATE command, we need to set the key of the time series data set and the expiration time of the data in milliseconds. In addition, we can label the data collection to represent the properties of the data collection.

For example, we execute the following command to create a time series data collection with a key of device:temperature and a data validity period of 600s. That is, the data in this collection will be deleted automatically after it has been created for 600s. Finally, we set a label attribute {device_id:1} for this collection, indicating that data belonging to device ID number 1 is recorded in this data set.

TS.CREATE device:temperature RETENTION 600000 LABELS device_id 1OK2. Insert data with TS.ADD command and read the latest data with TS.GET command

We can use the TS.ADD command to insert data into the time series collection, including timestamps and specific values, and use the TS.GET command to read the latest piece of data in the data collection.

For example, when we execute the following TS.ADD command, we insert a piece of data into the device:temperature collection that records the temperature of the device at 09:05 on August 3, 2020; when we execute the TS.GET command, the latest data just inserted is read out.

TS.ADD device:temperature 1596416700 25.11596416700TS.GET device:temperature 25.13. Use the TS.MGET command to filter the query data collection by label

When saving time series data from multiple devices, we usually save the data from different devices to different sets. At this point, we can use the TS.MGET command to query the latest data in the partial collection by tag. When creating a data collection using TS.CREATE, we can set label properties for the collection. When we query, we can match the collection tag attributes in the query conditions, and only the latest data in the matching set is returned in the final query result.

For instance. Suppose we use four collections to save time series data for four devices, and the ID numbers of the devices are 1, 2, 3, 4. When we create the data set, we set device_id to the label of each set. At this point, we can use the following TS.MGET command, along with the FILTER setting (this configuration item is used to set the filter condition for the collection label), to query the data collection of all other devices whose device_id is not equal to 2 and return the latest piece of data in their respective collections.

1) "device:temperature:1" 2) (empty list or set) 3) 1) (integer) 1596417000 2) "device:temperature:3" 2) (empty list or set) 3) 1) (integer) 1596417000 2) "device:temperature:4" 2) (empty list or set) 3) 1)) (integer) 1596417000 2) 30.1 "4. Use TS.RANGE to support range queries that require aggregate computation

Finally, when aggregating time series data, we can use the TS.RANGE command to specify the time range of the data to be queried and the AGGREGATION parameter to specify the type of aggregate calculation to be performed. RedisTimeSeries supports a variety of aggregate computing types, including mean (avg), maximum / minimum (max/min), sum (sum) and so on.

For example, when executing the following command, we can average the data for the periods of 09:05 on August 3, 2020 and 09:12 on August 3, 2020, using the time window of every 180 s.

TS.RANGE device:temperature 1596416700 1596417120 AGGREGATION avg 1800001) 1) (integer) 1596416700 2) 1) (integer) 1596416880 2) "25.8" 3) 1) (integer) 1596417060 2) "26.1"

Compared with using Hash and Sorted Set to save time series data, RedisTimeSeries is an extension module specially designed for time series data access, which can support aggregate calculation directly on Redis instances and filter query data sets by tag attributes. When we need frequent aggregate calculations and filter data sets of specific devices or users from a large number of collections, RedisTimeSeries can give full play to its advantages.

Summary

In this lesson, we learned how to save time series data with Redis. The writing characteristic of time series data is to be able to write quickly, while the query has three characteristics:

Point query, query the data at the corresponding time according to a timestamp

Range query, query data within the range of start and end timestamps

Aggregate calculation, which calculates all the data within the range of the start and end timestamps, such as maximum / minimum, mean, etc.

Redis's high-performance write features are sufficient for fast writes, while Redis offers two options for a variety of query requirements.

The first is to use a combination of Redis's built-in Hash and Sorted Set types to save the data in both the Hash collection and the Sorted Set collection. This scheme can not only use Hash type to achieve fast query of single key, but also use Sorted Set to achieve efficient support for range query, all of a sudden to meet the two major query requirements of time series data.

However, the first scheme also has two disadvantages: one is that when performing aggregation calculation, we need to read the data to the client and then aggregate, and when there is a large amount of data to aggregate, the data transmission overhead is high; the other is that all the data will be saved in each of the two data types, resulting in a large amount of memory overhead. However, we can free up memory and reduce memory pressure by setting the appropriate data expiration time.

The second implementation we learned is to use the RedisTimeSeries module. This is an expansion module specially designed to access time series data. Compared with the first scheme, RedisTimeSeries can support multiple data aggregation calculations directly on the Redis instance, avoiding the transfer of a large amount of data between the instance and the client. However, the underlying data structure of RedisTimeSeries uses linked lists, and its range query complexity is O (N). At the same time, its TS.GET query can only return the latest data, and there is no way to return data at any point in time like the Hash type of the first scheme.

Therefore, using a combination of Hash and Sorted Set, or using RedisTimeSeries, has its own advantages and disadvantages in supporting time series data access. My advice to you is:

If your deployment environment has high network bandwidth and large Redis instance memory, you can give priority to the first option.

If your deployment environment has limited network and memory resources, a large amount of data and frequent aggregate computing, and you need to query by the attributes of the data set, you can give priority to the second option.

This is the end of the content of "how to save time series data in Redis". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report