Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to count the visits of independent users with Redis

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to use Redis to count independent user visits". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to count the visits of independent users with Redis".

Today, let's talk about the real question of pinduoduo's backstage interview, which is a simple architectural question: pinduoduo has hundreds of millions of users, so for a web page, how to use Redis to count the number of user visits to a website?

Use Hash

Hash is a basic data structure of Redis. The underlying Redis maintains an open hash, which maps different key to the hash table. If a keyword conflict is encountered, a linked list will be pulled out.

When a user visits, if the user has logged in, then we will use the user's id. If the user has not logged in, then we can also randomly generate a key on the front-end page to identify the user. When the user visits, we can use the HSET command, key can select URI and the corresponding date to piece together, field can use the user's id or random ID, value can be simply set to 1.

When we want to count the number of visits to a website on a given day, we can directly use HLEN to get the final results.

Advantages: simple, easy to implement, query is also very convenient, data accuracy is very high.

Disadvantages: take up too much memory. As key increases, so does performance. Small websites are OK, but websites like pinduoduo, which have hundreds of millions of PV, can't stand it.

Use Bitset

We know that for a 32-bit int, if we only record id, then only one user can be recorded, but if we convert to binary, each representing one user, then we can represent 32 users in one breath, saving 32 times the space! For scenarios with a lot of data, if we use bitset, we can save a lot of memory. For users who have not logged in, we can also use the hash algorithm to hash the corresponding user ID into a digital id. Bitset is very memory-saving, assuming that there are 100 million users, and only 100000000, 000, 10, 000, 000, 10, 000, 10, 000, 10, 000, 10, 000, 10, 000, 000

Redis has provided us with a SETBIT method, which is very convenient to use. We can take a look at the following example. We can constantly use the SETBIT command in the item page to set up whether the user has visited the page, or we can use the GETBIT method to query whether a user has accessed it or not. Finally, we can count the number of visits to the page every day through BITCOUNT.

The advantages of less memory, convenient query, you can specify to query a user, the data may be slightly flawed, for non-login users, different key may be mapped to the same id, otherwise need to maintain a mapping of non-login users, there is additional overhead.

The disadvantage is that if the user is very sparse, it may take up more memory than method 1.

Use probability algorithm

For a website like pinduoduo, which may have a lot of visits to multiple pages, if the number needed is not so accurate, you can use the probability algorithm. In fact, our statistics on the UV of a website, 100 million and 103 million are actually about the same. In Redis, the HyperLogLog algorithm, which is a cardinality evaluation algorithm, has been encapsulated. The characteristic of this algorithm is that the data does not store specific values, but some relevant data used to calculate the probability.

When users visit the website, we can use the PFADD command to set the corresponding command, and finally we can successfully calculate the final result through PFCOUNT, because this is only a probability algorithm, so there may be an error of 0.81%.

The advantage takes up very little memory, and for a key, only 12kb is needed. It is especially suitable for such super-large users as pinduoduo.

Shortcomings query when specifying users, there may be errors, after all, the storage is not specific data. There is also a certain error in the total.

At this point, I believe you have a deeper understanding of "how to use Redis to count independent user visits". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 207

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report