In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "introduction and usage of BloomFilter in Redis". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the introduction and usage of BloomFilter in Redis.
Introduction and scenario usage
The Chinese name of BloomFilter is Bloom filter. As a filter, does it feel like Bloom's E skill in LOL (indestructible)?
The Bloom filter was proposed by a man named Bloom. It is implemented through a large array of digits and several different hash functions. We can understand the Bloom filter as an imprecise set. We all know that set can be deduplicated, and using set can help us determine whether certain elements already exist in the collection and help us implement the deduplication function.
However, while set provides precise deduplication, it also brings us a bigger problem-space consumption.
For example, when we do web crawlers, we need to de-weight the crawled url to avoid climbing to the already crawled website. If we use set, it means that we need to put all the crawled url into the collection. Suppose a url 64 bytes, then 100 million url means we need to occupy 6GB, billion is about 60GB.
Please note that it is memory.
For example, when we want to filter spam or spam messages, we need to determine whether the emails or messages are junk from the list of billions of spam mails or phone calls. If we use set at this time, then it takes up space, needless to say, at the level of 100 GB.
In the above interview, I mentioned cache traversal. Users deliberately request that the database does not exist (for example, ID =-1). If it is not processed at this time, it will certainly penetrate the cache to query the database. A query is fine. What if thousands or tens of thousands come in at the same time? Can your database hold up? So at this time, we use set for processing, taking up so much memory space, do you think it's worth it? Or is there a better way?
The three typical scenarios mentioned above, website deduplication, spam filtering, and cache penetration, can be solved perfectly by using BloomFilter.
Have you found that the above three scenarios are not very precise, especially for spam filtering? it doesn't matter if you receive a few spam messages occasionally. Like cache penetration, it also coincides with a feature of BloomFilter. He said that some may not have, but he said that if there is no ID, it really does not exist. I said that if your cache does not exist in the database, then it really does not exist. I am so confident that I have filtered you. What, you hit me?
Probe into the principle
After talking about the concepts and application scenarios for such a long time, are you still confused about how BloomFilter can be removed? Let's talk about the implementation principle of BloomFilter. First of all, I will show you a structure diagram.
F, G, H are several unbiased Hash functions, and below is a large array of digits. When we add data to BloomFilter, it will first do several hash operations on our data (key) (here is FGH), and each hash operation will get an unused index subscript of the array of digits. At this time, we will change the values of several subscript positions to 1. If you determine whether the element exists, just determine that all indexes in which the subscript has a value of 1 is fine.
In fact, you have also found that the subscript calculated by different key duplicates in BloomFilter. As shown in the figure above, this is the source of the error (you can configure the initial size and error rate to control the error). It is also the root cause of what he said is not necessarily there. He said that there is no such feature, because if all are zeros or there are zeros, then they certainly do not exist. If it is all 1s, it is also possible that several other key put in 1s.
Basic use
Because BloomFilter is an extension module of Redis, you need to download it and you can use Docker to pull it. I won't explain the installation steps in detail. You can go to its github to learn how to install.
After installation, we can use it happily.
Bf.add key element add
Bf.exists key element judges whether it exists or not.
Bf.madd key element1 element2... Batch add
Bf.mexists key element1 element2... Batch judgment
The order is simple. You can try it yourself.
HyperLogLog
Introduction and scenario usage
There is also an error-prone data structure HyperLogLog in Redis.
Let's first think about a scenario. What should we do when the boss asks us to calculate the UV of the page?
If the number of visits is not large, it is fine to use set to remove users, but if there are millions or tens of millions of visits, then you will encounter the waste of space mentioned above. If only we had a data structure that can be deduplicated and counted at this time.
That's when HyperLogLog made his debut! It can provide inaccurate deduplication counting scheme (error value is about 0.81%), inaccurate is not accurate wow, UV want you to be more accurate? 0.81% we can accept. The most important thing is that HyperLogLog takes up only 12KB memory.
Usage and scene practice
Pfadd key element add
Pfcount key calculation
Pfmerge destkey sourcekey1 sourcekey2... Merge
Commands begin with pf because it was invented by a professor named Philippe Flajolet.
You can see that these three basic commands are very simple and easy to master. Then let's put it into practice.
BitMap
Introduction and usage scenarios
First of all, let's think about a more interesting scenario. The boss wants you to count the number of days that multiple users are online at the same time in a year. What do you do at this time?
You may think of using hash storage, which is a waste of space. Is there a better way? The answer is yes, bitmap bitmaps are used in Redis.
We know that a character in a string is represented by 8 bits (as shown above). In Redis, the bottom layer of bitmap is string, or the bottom layer of string is bitmap.
If we have this, can we use it to calculate the number of times a user checks in within a specified time? That is, a location represents a day, 0 for non-check-in and 1 for check-in. In the image above, the user checked in four times in eight days.
Bitmap in Redis also provides multiple commands for bitmap to perform and, OR, XOR operations, as well as non-operations for a single bitmap. Does this give you some idea of what we need in the first place?
Use of basic commands
Setbit key index 0ax 1 sets the value of a bit
Getbit key index gets the value of a bit
Bitcount key start end gets the number of 1s in the specified range
It should be noted that start and end here refer to the character position, not the bit position! Including the following bitpos
Bitpos key bit start end gets the position of the first character index range from start to end whose value is bit
Bitop and/or/xor/not destkey key1 key2 performs logical operations on multiple bitmap.
There is also a fun instruction for bitmap is bitfield, here I do not introduce too much, interested students can learn about it.
Hands-on practice
First of all, let's implement the function of counting the number of user check-ins.
Remember our first question? Counting the number of days that multiple users are online at the same time in a year, we have nothing to be afraid of with bitmap.
GeoHash
Introduction and scene application
GeoHash is often used to calculate nearby people and nearby stores.
Imagine if we use a relational database to store the address (id, longitude, latitude) of an element. How should we calculate the people in the neighborhood at this time? Do we have to traverse the positions of all the elements and calculate the distance? This is obviously impossible.
Of course, you can use partition and SQL statements to circle areas, and then build a two-way composite index to improve performance, but the concurrency of the database is limited, can we use Redis to do it?
The answer is yes, GeoHash is used in Redis to provide a good solution. The specific principle is to regard the earth as a plane and map two-dimensional coordinates to one-dimensional (the reason for the loss of accuracy). If you are interested in the algorithm, you can learn more about it yourself. The space is limited and you don't want to explain too much.
Basic commands and use of actual combat
Geoadd key longitude latitude element (multiple triples can be configured later) to add elements
Geodist key element1 element2 unit calculates the distance between two elements
Geopos key element [element] gets the location of the element
Geohash key element gets the element hash
Georadiusbymember key element distanceValue unit count countValue ASC/DESC [withdist] [withhash] [withcoord] to get an element near an element, you can append the following option [distance] [hash] [coordinate]
Georadius key longitude latitude distanceValue unit count countValue ASC/DESC [withdist] [withhash] [withcoord] is the same as above, but the element is changed to the specified coordinate value.
At this point, I believe you have a deeper understanding of the "introduction and use of BloomFilter in Redis". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.