In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Source: antirez
Translation: Kevin (official account: middleware)
A new Redis data structure called Streams has been introduced in Redis 5, which has attracted great interest from the community. Next, I will conduct a survey in the community, talk to users about their use in actual production, and then write a blog to record it.
Today I want to solve another problem: I'm a little skeptical that many users are just using Streams as a means of solving problems like Kafka. In fact, this data structure can also be used in the context of producer / consumer message communication when it was originally designed. And I realized that Streams is very good at this scene, and the usage is very concise. Streaming is a good pattern and "thought model" that can be used to design systems with great success. But Redis Streams, like most Redis data structures, is a more general structure that can be used to model many different problems. In this post, I will focus on Streams as a pure data structure, completely ignoring its blocking operations, consumer groups, and everything related to messaging.
Streams as an enhanced version of CSV files
If you want to record a series of structured data items and think that using a database is a bit of a killer after all, you might say: let's open a file in "append only only" mode and record each line in CSV (comma-separated values) format:
(open data.csv file in append only mode)
Time=1553096724033,cpu_temp=23.4,load=2.3
Time=1553096725029,cpu_temp=23.2,load=2.1
It seems simple, isn't it? that's what people do all the time: it's a consistent pattern, if you know what you're doing. But what about in-memory (memory), which is equivalent to this (file) mode? Memory is more powerful than append only files, so there are naturally no restrictions like CSV files:
It is difficult to do range query (inefficient)
Too much redundant information: the time in each record is about the same, and many columns are duplicated. At the same time, if you remove the redundant information when you want to switch to a different set of columns, this will make the format less flexible.
The displacement of the data item is the byte shift in the file: if we change the structure of the file, then the displacement value will be wrong, so there is no real concept of primary Id here.
I can't remove these data entries, and without GC (garbage collection) capability, I can only mark them as "invalid" if I don't rewrite log (logs). And for some reason, the performance of log rewriting is poor, and it wouldn't be better if it could be avoided.
On the other hand, the logs of these CSV entries also have a bright side: they have no fixed structure, the data columns can be changed, are easy to generate, and, after all, their structure is relatively compact. Redis Streams is designed to learn from each other, and the result is a hybrid data structure that is very similar to Redis Sorted sets: they look like a basic data structure, and in order to achieve this effect, they have a variety of manifestations at the bottom.
Streams 101
(you can skip this section if you already know the basics of Redis Streams)
Redis Streams is represented by differential compression (delta-compressed) macro nodes, which are connected by a cardinality tree (radix tree). The effect is that you can very quickly do random searches, get ranges on demand, delete old data items, create a stream with an upper limit, and so on. At the same time, the interface for programmers is very similar to the CSV file:
> XADD mystream * cpu-temp 23.4 load 2.3
"1553097561402-0"
> XADD mystream * cpu-temp 23.2 load 2.1
"1553097568315-0"
From the example above, we can see that the XADD command automatically generates and returns the record ID. The record ID is monotonously incremented and consists of two parts: -, the time is expressed in milliseconds, and the counter is incremented for records generated in the same millisecond.
Based on the idea of "append only only CSV files", the first new abstraction we build is that since we use an asterisk as the ID parameter of the XADD command, we can get the record ID for free from the service side. This ID can be used not only to indicate a data record in a stream, but also to associate when the record was added to the stream. In fact, the XRANGE command can query both a range and a single record.
> XRANGE mystream 1553097561402-01553097561402-0
1) "1553097561402-0"
2) 1) "cpu-temp"
2) "23.4"
3) "load"
4) "2.3"
In this example, to identify a single element, I use the same ID as the start and end condition of the range query. However, I can also use any range condition, plus a COUNT parameter, to limit the number of query results. Similarly, instead of specifying the full ID as a range condition, you can use only the Unix millisecond timestamp portion of the ID to get the elements in a given time range.
XRANGE mystream 1553097560000 1553097570000
1) "1553097561402-0"
2) 1) "cpu-temp"
2) "23.4"
3) "load"
4) "2.3"
2) 1) "1553097568315-0"
2) 1) "cpu-temp"
2) "23.2"
3) "load"
4) "2.1"
Now, there is no need to show more Streams API, the details can be found in the Redis documentation. Let's focus on its usage pattern: XADD is used to add elements, and XRANGE (including XREAD) is used to get elements in scope (depending on your purpose). Let's see why I call Streams such a powerful data structure.
If you want to know more about Streams and its API, be sure to read this tutorial: https://redis.io/topics/streams-intro
Tennis player
A few days ago, I was working with a friend who was recently studying Redis to model an app that records local tennis courts, local players and games. The method used to model a player is obvious: a player is a small object, so a hash value plus the player: key is enough. When you use Redis as the primary means of applying data modeling, you will immediately realize that you need a way to record the matches held in a given tennis club. If player 1 and player 2 play a game and player 1 wins, we can record it in a stream as follows:
> XADD club:1234.matches * player-a 1 player-b 2 winner 1
"1553254144387-0"
Through this simple operation, we get:
ID in a unique competition ID:stream
You don't need to create an object to identify a game.
A free range query can page the match record or view the match record at a given time in the past.
Before Streams appears, we need to create a sorted set sorted by time. The element in the sorted set is the ID of the game, and it also needs to be saved as a hash value in a different key. This not only means more work, but also leads to an unimaginable waste of memory. There are more things you can think of (you can see later).
Currently, one thing you can see is that Redis Streams is a Sorted Set in append-only mode (append only), with time as the key, and each element is a small hash value. What revolutionizes Redis in the context of modeling is his simplicity.
Memory usage
The above use case not only means a pattern that is more consistent in terms of behavior. Compared to the old Sorted set + hash approach, the memory overhead of the Stream solution is so low that something that was not feasible before is now completely feasible.
The following figures are based on the previous configuration, the cost of saving 1 million pieces of competition data:
Sorted Set + Hash memory overhead = 220 MB (242 RSS)
Stream memory cost = 16.8 MB (18.11 RSS)
This is more than an order of magnitude difference (13 times the difference to be exact), and it means that use cases that used to spend too much in memory are now entirely feasible. The magic is Redis Streams: macro nodes can contain multiple elements that are encoded in a very compact way in listpack data structures. For example, even though integers are semantically strings, listpack can encode them into binary form. On this basis, we can carry out differential compression and "the same column" compression. At the same time, because macro nodes are linked together in the cardinality tree (which takes up very little memory by design), we can also query through ID and time. All this adds up to very little memory footprint. Interestingly, semantically, users don't see any implementation details that make Streams so efficient.
Now, let's do a simple calculation. If I could store 1 million records in 18MB memory, 10 million records in 180MB and 100 million records in 1.8GB. If you have 18GB memory, you can save 1 billion records.
Time series.
In my opinion, what we need to focus on is that the above use of Stream to represent a tennis match is semantically completely different from using Stream to deal with a time series. Yes, logically we are still recording certain events, but an important difference is that in a scenario, we record and create record entries to render objects; in a time series scenario, we only measure something that happens externally, and this is not represented as an object. You may think this distinction is not important, but it is not. For Redis users, it is important to establish the concept that Redis Streams can be used to create small objects with full order, each with an ID.
Time series is the most basic usage scenario, and obviously, the most important usage scenario, but before the emergence of Streams, Redis was somewhat powerless about this scenario. The memory characteristics and flexibility of Streams, coupled with the ability of stream (capped stream) with an upper limit (refer to the parameter options of the XADD command), are a very powerful tool in the hands of developers.
Conclusion
Streams is very flexible and has many usage scenarios. Well, needless to say, one of the key messages I want to convey in the above example is the analysis of memory usage, which may be obvious to many readers, but the conversation with people in recent months has given me the impression that there is a strong correlation between Streams and Streams usage scenarios, as if this data structure is only good at this scenario, but this is not the case. : -)
Multi-quality middleware technical information / original / translated articles / materials / practical information, please follow the official account of "Brother Middleware"!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.