Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Redis data structure HyperLogLog

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use Redis data structure HyperLogLog. It is very detailed and has certain reference value. Friends who are interested must finish reading it.

HyperLogLog (hereinafter referred to as HLL) is a data structure added in version 2.8.9 of Redis. It is used for high-performance cardinality (deduplication) statistics, but its disadvantage is that it has a very low error rate.

The HLL command is called pf, which is the acronym of Philippe Flajolet, the inventor of the data structure HyperLogLog.

Pfadd

Pfadd key elemnet [element] pfadd adds an element, and returns 1 if the addition is successful

127.0.0.1 pfadd 6379 > pfadd 2019-04-29:unique:ids U1u2u3u4 (integer) 1pfcount

Calculate the independent total of one or more HyperLogLog

127.0.0.1 pfcount 6379 > pfcount 2019-04-29:unique:ids (integer) 4127.0.0.1 pfadd 2019-04-29:unique:ids U1u2u3u5 (integer) 1127.0.1 29:unique:ids 6379 > pfcount 2019-04-29:unique:ids (integer) 5pfmergepfmerge destkey sourcekey [sourcekey]

Compute the union of multiple HyperLoglog and assign values to destkey

127.0.0.1 pfadd 6379 > pfadd 2019-04-30:unique:ids u4u2u3u6u7 (integer) 1127.0.0.1 integer > pfmerge 2019-04:unique:ids 2019-04-29:unique:ids 2019-04-30:unique:idsOK127.0.0.1:6379 > pfcount 2019-04:unique:ids (integer) 7

The amount of HyperLogLog memory is very small, but there is a certain error rate. The official figure given by redis is 0.81% error rate. You need to confirm the following two items for data selection in development:

Just to calculate the independent total, you don't need to get a single piece of data.

Can tolerate a certain rate of error. After all, the amount of HyperLogLog memory is very small.

Data deduplication statistics

If counting PV is very easy, just give each web page a separate Redis counter with the key suffix plus the date of the day. In this way, you can make a request, incrby once, and finally count all the PV data.

But UV is different, it needs to be duplicated, and multiple access requests for the same user can only be counted once in a day. This requires that every web request needs to be marked with the user's ID, and both login and non-login users need a unique ID to identify them.

You may have come up with a simple solution, that is, a separate set collection for each page to store the ID of all users who visited the page that day. When a request comes in, we just use sadd to plug in the user ID. The size of the collection can be fetched through scard, and this number is the UV data of the page. Yes, this is a very simple plan.

However, if you have a very large number of page views, such as a UV with tens of millions of popular pages, you need a large set collection to count, which is a waste of space. If there are a lot of such pages, the storage space required is amazing. Is it worth consuming so much storage space for such a deduplication function? In fact, the data needed by the boss does not need to be too accurate. 105w and 106w are not much different for bosses. So, is there a better solution?

This is a solution introduced in this section, and Redis provides HyperLogLog data structures to solve this statistical problem. HyperLogLog provides an imprecise deduplication counting scheme, which is not accurate but not very imprecise. The standard error is 0.81%. This accuracy can meet the above UV statistics requirements.

The HyperLogLog data structure, the advanced data structure of Redis, is very useful, but surprisingly few people have used it.

Matters needing attention

HyperLogLog this data structure is not free, it does not mean that it costs money to use this data structure, it needs to occupy a certain 12k of storage space, so it is not suitable for statistics of data related to a single user. If you have hundreds of millions of users, you can calculate that the cost of this space is very staggering. But compared with the set storage scheme, the space used by HyperLogLog can really be described as a thousand kilograms compared to four taels.

However, you don't have to worry too much, because Redis optimizes the storage of HyperLogLog. When the count is relatively small, its storage space is sparse matrix storage, which takes up very small space. Only when the count slowly increases, and the sparse matrix occupies more and more space than the threshold, it will be transformed into a dense matrix at one time, which will take up 12k space.

Principle of HyperLogLog implementation

HLL introduces the bucket algorithm and harmonic averages to make the algorithm closer to the real situation.

The bucket-splitting algorithm means that the original data is divided into m parts on average, and the average is multiplied by m in each segment, so as to reduce the error caused by chance and improve the accuracy of prediction. to put it simply, it is to divide a piece of data into multiple parts and divide it into multiple rounds of calculation.

Harmonic averages refer to optimization algorithms that use averages rather than using averages directly.

The above is all the content of the article "how to use Redis data structure HyperLogLog". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report