Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize the automatic completion function by using Redis

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Forget the version from which redis is opened and can give prompts according to some of the command prefixes entered, that is, automatic completion. Next, the author introduces the implementation of this cool feature based on redis.

About sorted set

Suppose there is mara,marabel,marcela in the result. Now we type mar, and we get these three names, and the output is sorted by dictionary. Before implementing this requirement, let's briefly introduce sorted set.

Everyone knows that sorted set is sorted by score:

127.0.1 sida127.0.0.1:6380 > zadd test 80 xiaolang127.0.0.1:6380 > zadd test 60 afei127.0.0.1:6380 > zadd test 90 chenssy127.0.0.1:6380 > zadd test 98 yunaiv127.0.0.1:6380 > zrange test 0-11) "afei" 2) "xiaolang" 3) "sida" 4) "chenssy" 5) "yunaiv"

But if the score are all the same, what sort of sorted set is it? Is sorted by dictionary:

127.0.1 zadd exam 0 sida127.0.0.1:6380 > zadd exam 0 xiaolang127.0.0.1:6380 > zadd exam 0 chenssy127.0.0.1:6380 > zadd exam 0 yunaiv127.0.0.1:6380 > zadd exam 0 afei127.0.0.1:6380 > zrange exam 0-11) "afei" 2) "chenssy" 3) "sida" 4) "xiaolang" 5) "yunaiv"

This is a very important feature of sorted set and a key point of our automatic completion requirements. But that's not enough. there's still a long way to go before our final needs. Fortunately, sorted set has another cool command: ZRANK. This command will know the location of the key you want to query in the sorted set:

127.0.0.1 zrank exam sida 6380 > zrank exam sida (integer) 2127.0.0.1 zrank exam yunaiv (integer) 4127.0.1 zrank exam yunaiv 6380 > zrange exam 2 41) "sida" 2) "xiaolang" 3) "yunaiv"

At this point, it feels very close to the first version of our automatic completion, and we can get any member and N member in sorted set sorted by dictionary.

Simple implementation

In order to achieve the final automatic completion, we need to pay some price: space.

It means that for a member to be added to the sorted set, such as afei, we not only add the complete word (afei) to the sorted set, but also add all possible prefixes (a, af, afe, afei). In order to figure out whether a word is a real member or a prefix of a member (for example, bar is both a complete word and a possible prefix for a member such as bark), we add a trick to add a special character, such as "$", after the real member, so the member of afei adds the following words:

A, af, afe, afei, afei$

Now suppose we need to add three words: foo, bar, and foobar to do some testing, then sorted set will have the following words:

127.0.1 zrange autoc 0-11) "b" 2) "ba" 3) "bar" 4) "bar$" 5) "f" 6) "fo" 7) "foo" 8) "foo$" 9) "foob" 10) "fooba" 11) "foobar" 12) "foobar$"

It's a lot closer to our ultimate goal. Now suppose the user enters "fo", then in order to achieve automatic completion, we need to execute the following command to examine the results carefully. Foo$ and foobar$ are the results we need. We just need to remove the special suffix $(all other words that do not end in $are ignored):

127.0.0.1 zrange autoc 6380 > zrank autoc fo (integer) 5127.0.1 zrange autoc 5-11) "fo" 2) "foo" 3) "foo$" 4) "foob" 5) "fooba" 6) "foobar" 7) "foobar$"

A test for more words

The website http://antirez.com/misc/female-names.txt provides nearly 5000 female names. Add all possible prefixes for all names to the sorted set according to the rules mentioned earlier. Assuming the user enters member, you only need to use the following two commands to get the 50 words after the member entered by the user sorted by the dictionary, and then take only the words ending with $:

127.0.0.1 zrange autoc 6380 > zrank autoc member (integer) 8127.0.1

Time-space complexity

According to the official documentation, the event complexity of both zrank and zrange is O (log (N)). Therefore, even if the amount of data is relatively large, this scheme is feasible.

ZRANK key member

Starting version: 2.0.0

Time complexity: O (log (N))

ZRANGE key start stop [WITHSCORES]

Starting version: 1.2.0

Time complexity: O (log (N) + M) with N being the number of elements in the sorted set and M the number of elements returned.

So how much memory do you need?

Suppose that in the worst case, a word of length M needs to be added to the sorted set. So if there are N words, a total of N * (Ma+1) words need to be added to the sorted set, and Ma is the average length of these N words.

Fortunately, the actual situation is much better than this. For nearly 5000 female names, for example, only about 15000 words need to be added to sorted set, because these nouns have a lot of repetitive prefixes. Take three names as an example: marci,marcia,marcile. In a worst-case scenario, you need to add 3 * (6: 1) = 21 words to the sorted set, but in reality, you only need to add 11 words to the sorted set:

M, ma, mar, marc, marci, marci$, marcia, marcia$, marcil, marcile, marcile$ .

Moreover, as the sample becomes larger, the repeated prefixes become larger and more.

Therefore, the memory required is also OK. Do you feel good? Ha

Query and forecast

Our automatic completion is sorted according to the dictionary, which is what we need most of the time. But there are also cases where we want to sort them by similarity. For example, in google search, search results are sorted by dimensions such as frequency and heat. For example, when the user enters ne, the user should want to see these popular keywords such as Netflix,news,new york times.

This is not so easy, and if we are to be able to sort by frequency, we need a special sorted set that can update each prefix in real time in a non-blocking manner:

When the user enters a query such as "foo", because all its prefixes are: "f", "fo", "foo". Then execute the command for each prefix: ZINCRBY 1 foo; if the user enters "foobar", execute for each prefix: ZINCRBY 1 foobar

Another problem with this method is that many automatic completion systems only need to display TOP N keywords, assuming that N is 10. But in our method, if we want to calculate TOP 10, we need to get much more than 10 keywords, and theoretically we want this prefix to be taken out of all the member in the sorted set of key.

Fortunately, statistically speaking, if there are 300 member prefixes in each sorted set, you get the TOP 5 keyword. The more frequent the query, the higher its score, and the higher the probability that it will eventually be selected. So, what we need to do is to search for each prefix of the string:

If the number of member in the sorted set with the prefix as key has not reached 300, you only need to simply zincrby it. If the number of member in the sorted set with the prefix as key has reached 300, we will delete those member with low scores and add new member, so as to achieve the effect of continuous iteration of keyword frequency.

This method may not be understood at once, it doesn't matter, give me a chestnut. Suppose the user has now typed next, and there are already netflix (100), news (120), new york (80), near (23), nequ (1) in the sorted set whose prefix n is key. Since the number of member in the sorted set corresponding to this prefix is not yet 300, execute: zincrby n 1 next. Where n is the prefix and next is the keyword entered by the user. Suppose that the user now enters next, and the sorted set with the prefix n of key already has netflix (100,120), new york (80), near (23), omitting 295keywords with score greater than 1, nequ (1). Since the number of member in the sorted set corresponding to this prefix reaches 300, delete the lower-scoring nequ first, and then execute: zincrby n 1 next.

This method is closely related to the distribution of keywords entered by users. If the frequency of all keywords entered by users is close, then the data obtained by this method is not very reliable. But we know that this is not a problem, because search is that the vast majority of people are searching for that small set of keywords.

Clean-up phase

Because of the long-tailed search effect mentioned above, we can talk about the removal of member with a score of 1, because users pay little attention to these keywords. Cleanup operations also reduce the use of memory, allowing redis to save more and more useful data.

Knowledge summary

In the sorted set data structure, if the score of the member is the same, it is sorted by dictionary. The zrank command gets the position of the specified member in the result, and can take N member after it. Zincrby can add points to the member in the specified sorted set.

Reference: http://oldblog.antirez.com/post/autocomplete-with-redis.html

Summary

The above is the whole content of this article. I hope the content of this article has a certain reference and learning value for everyone's study or work. Thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report