Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use the HyperLogLog algorithm of Redis

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the Redis HyperLogLog algorithm how to use the relevant knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe that we read this Redis HyperLogLog algorithm how to use the article will have a harvest, let's take a look at it.

Today is Friday, you are happy to fish, the product manager e-mailed you a requirements document. The demand is probably: the company needs to count the daily IP of visitors to the site, and this statistics is a long-term behavior, ranging from months to years.

After reading the requirements, you think that using the collection type of Redis, you can easily achieve this function: generate a collection type key every day, use SADD to store the daily visitor IP, and use the SCARD command to easily get the number of daily visitor IP.

You will soon finish typing the code and pass the test, and this feature will be online. After running online for a period of time, it is found that the server where Redis is located begins to alarm because some keys take up too much memory. You can find that these keys are collection keys for storing guest IP. You just patted your head and knew you had dug a big hole for yourself.

Assuming that it takes up to 15 bytes to store an IP address in IPv4 format, the site has a maximum of 1 million visitors per day. These collection keys will use 0.45 GB of memory in a month and 5.4 GB a year, which is only estimated that in the case of IPv4 format, IPv6 format will take up more memory. Although the time complexity of SADD and SCARD is O (1), their memory consumption is unacceptable.

You looked through the official website of Redis and found that Redis also provides a data type, HyperLogLog, which can meet the requirements of the product and take up less memory.

HyperLogLog algorithm

HyperLogLog is a probabilistic algorithm created specifically to calculate the cardinality of a set. It can calculate the approximate cardinality of a given set.

The approximate cardinality is not the actual cardinality of the set, it may be a little smaller or larger than the actual cardinality, but the error between the estimated cardinality and the actual cardinality will be within a reasonable range, for those statistics that do not require very accurate statistics can be used HyperLogLog algorithm.

The advantage of HyperLogLog is that the memory it needs to calculate the approximate cardinality does not change according to the size of the collection. No matter how many elements the collection contains, the memory needed for HyperLogLog to calculate is always fixed and very small.

Each HyperLogLog type of Redis only needs to use 12KB memory space to count close to: 264 elements, while the standard error of the algorithm is only 0.81%.

If you use the HyperLogLog type to achieve the above functions, if you have 1 million visitors a day, you will only take up 360KB memory for one month.

PFADD

The PFADD command allows you to count one or more elements of a given collection.

PFADD key element [element...]

Depending on whether a given element has been counted, the PFADD command may return 0 or 1:

If all the elements given have been counted, the PFADD command returns 0, indicating that the approximate cardinality calculated by HyperLogLog has not changed.

If there is at least one element in a given element that has not been counted before, resulting in a change in the approximate cardinality calculated by HyperLogLog, the PFADD command returns 1.

For example:

Redis > PFADD letters a b c-- first add (integer) 1redis > PFADD letters a-- second add (integer) 0

If you only specify key and no elements when invoking the command, there will be no action if key exists, and if not, a data structure will be created (return 1).

PFCOUNT

The approximate cardinality calculated by HyperLogLog for the collection can be obtained through the PFCOUNT command. Returns 0 if the given key does not exist.

PFCOUNT key [key...]

For example:

Redis > PFCOUNT letters (integer) 3

When multiple HyperLogLog is passed into PFCOUNT, the PFCOUNT command first joins all HyperLogLog and then returns the approximate cardinality.

Redis > PFADD letters1 a b c (integer) 1redis > PFADD letters2 c d e (integer) 1redis > PFCOUNT letters1 letters2 (integer) 5PFMERGE

The PFMERGE command can perform union calculations on multiple HyperLogLog, and then save the calculated union HyperLogLog to a specified key.

PFMERGE destKey sourceKey [sourceKey...]

If the specified key already exists, the PFMERGE command overwrites the existing key.

Redis > PFADD letters1 a b c (integer) 1redis > PFADD letters2 c d e (integer) 1redis > PFMERGE res letters1 letters2OKredis > PFCOUNT res (integer) 5

You can see that the PFMERGE and PFCOUNT commands are very similar, and in fact the PFCOUNT command does the following when calculating the approximate cardinality of multiple HyperLogLog:

Call the PFMERGE command internally, calculate the union of all given HyperLogLog, and store the union in a temporary HyperLogLog.

Execute the PFCOUNT command on the temporary HyperLogLog to get its approximate cardinality.

Delete temporary HyperLogLog.

Returns the resulting approximate cardinality.

When the program needs to call the PFCOUNT command on multiple HyperLogLog, and this call may be repeated multiple times, consider replacing this call with the corresponding PFMERGE command call: by storing the result of the union calculation in the specified HyperLogLog instead of recalculating the union each time, the program can minimize unnecessary union calculation.

Business scenario

The features of HyperLogLog are very suitable for scenarios such as counting (monthly and annual statistics) and deduplication (spam detection).

This is the end of the article on "how to use Redis's HyperLogLog algorithm". Thank you for reading! I believe you all have a certain understanding of the knowledge of "how to use Redis's HyperLogLog algorithm". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report