Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Redis+Bitmap to realize 100 million-level massive data statistics

2025-01-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article is to share with you about how to use Redis+Bitmap to achieve 100 million-level mass data statistics. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Send a message

Share more and give more, create more value for others in the early stage, regardless of return, in the long run, these efforts will repay you exponentially.

Especially when you start to cooperate with others, don't worry about the short-term returns, it doesn't make much sense, it's more about exercising your vision, perspective and problem-solving skills.

Binary state statistics

Brother Ma, what is binary state statistics?

That is, the values of the elements in the collection are only 0 and 1. In the case of check-in and whether the user is logged in, you only need to record check-in (1) or not (0), logged in (1) or not (0).

If we use Redis's String type implementation (key-> userId,value-> 0 means offline, 1-login) in the scenario of judging whether a user is logged in, if the login status of 1 million users is stored, and if it is stored in the form of a string, 1 million strings need to be stored, which is too expensive to store.

Brother Ma, why does the String type cost a lot of memory?

In addition to recording the actual data, the String type also requires additional memory to record data length, space usage and other information.

When the saved data contains strings, the String type is saved using a simple dynamic string (SDS) structure, as shown in the following figure:

Len: 4 bytes, indicating the used length of the buf.

Alloc: 4 bytes, indicating the actual length allocated by buf, usually > len.

Buf: an array of bytes that holds the actual data, and Redis automatically adds a "\ 0" to the end of the array, taking up an extra byte of overhead.

So, in addition to buf saving the actual data in SDS, len and alloc are extra overhead.

In addition, there is an overhead of the RedisObject structure, because Redis has many data types, and different data types have some of the same metadata to record (such as the time of the last access, the number of references, and so on).

Therefore, Redis uses a RedisObject structure to uniformly record the metadata and point to the actual data.

For the binary state scenario, we can use Bitmap to implement. For example, the login status is represented by one bit bit, and 100 million users only occupy 100 million bit bits of memory ≈ (100000000 / 8 / 1024) 12 MB.

The approximate formula for calculating the space occupation is: ($offset/8/1024/1024) MB

What is Bitmap?

The underlying data structure of Bitmap uses a String-type SDS data structure to hold an array of bits. Redis uses eight bit bits of each byte array, and each bit bit represents the binary state of an element (either 0 or 1).

Think of Bitmap as an array in bit units. Each cell of the array can only store 0 or 1. The subscript of the array is called the offset offset in Bitmap.

For intuitive display, we can understand that each byte of the buf array is represented by a row, each row has 8 bit bits, and 8 cells represent the 8 bit bits in this byte, as shown in the following figure:

Eight bit make up a Byte, so Bitmap saves a lot of storage space. This is the advantage of Bitmap.

Judge the user's login status

How to use Bitmap to determine whether a user is online among a large number of users?

Bitmap provides GETBIT and SETBIT operations to read and write to the bit bit of the offset position of the bit array through an offset value offset. It is important to note that offset starts at 0.

Only a key = login_status is needed to store the user login status collection data. If the user ID is used as an offset, the online setting is set to 1, and the offline setting is set to 0. Determine whether the corresponding user is online by GETBIT. 500 million users only need 6 MB of space.

SETBIT command

SETBIT

Sets or clears the bit value of key's value at offset (only 0 or 1).

GETBIT command

GETBIT

Gets the value of the bit bit of the value of key at offset, and returns 0 if key does not exist.

If we want to judge the login of users with ID = 10086:

The first step is to execute the following instructions to indicate that the user is logged in.

SETBIT login_status 10086 1

The second step is to check whether the user is logged in, and the return value of 1 indicates that the user is logged in.

GETBIT login_status 10086

The third step is to log out and set the value for offset to 0.

Monthly check-in status of SETBIT login_status 10086 users

In the check-in statistics, each user's daily check-in is represented by 1 bit bit, while a year's check-in only needs 365 bit bits. There are only 31 days in a month at most, and only 31 bit bits are needed.

For example, how do users with statistical number 89757 sign in in May 2021?

Key can be designed as uid:sign: {userId}: {yyyyMM}, and the value of-1 for each day of the month can be used as offset (because offset starts at 0, offset = date-1).

The first step is to execute the following instructions to record that the user will sign in on May 16, 2021.

SETBIT uid:sign:89757:202105 15 1

The second step is to determine whether user number 89757 will sign in on May 16, 2021.

GETBIT uid:sign:89757:202105 15

The third step is to count the number of times the user signs in in May and use the BITCOUNT instruction. This instruction is used to count the number of bit bits with a value of 1 in a given bit array.

BITCOUNT uid:sign:89757:202105

In this way, we can achieve the monthly sign-in situation of users, isn't it great?

How to count the time of signing in for the first time this month?

Redis provides the BITPOS key bitValue [start] [end] instruction, which returns data indicating the first offset location in the Bitmap whose value is bitValue.

By default, the command detects the entire bitmap, and you can specify the range to detect through the optional start and end parameters.

So we can get the date of the first sign-in of userID = 89757 in May 2021 by executing the following instruction:

BITPOS uid:sign:89757:202105 1

It is important to note that we need to return value + 1, because offset starts at 0.

Total number of consecutive check-in users

After recording the clock-in data of 100 million users for 7 consecutive days, how to count the total number of users who have signed in for 7 consecutive days?

We use the date of each day as the key,userId of Bitmap as the offset, and set the bit of the offset location to 1 if we sign in.

The data of each bit bit of the set corresponding to key is a sign-in record of the user on that date.

There are seven such Bitmap, if we can do "and" operation on the corresponding bit bits of these seven Bitmap.

The same UserID offset is the same. When the bit of a userID in the corresponding offset location of 7 Bitmap is 1, it means that the user has signed in continuously for 7 days.

The result is saved to a new Bitmap, and we count the number of bit = 1 through BITCOUNT to get the total number of users who have signed in for 7 consecutive days.

Redis provides BITOP operation destkey key [key...] This instruction is used to perform bit operations on Bitmap with one or more keys = key.

Opration can be and, OR, NOT, XOR. When BITOP deals with strings of different lengths, the missing part of the shorter string is treated as 0. An empty key is also seen as a sequence of strings containing zeros.

It is easy to understand, as shown in the following figure:

3 Bitmap, the corresponding bit bits do the "and" operation, and the results are saved to the new Bitmap.

The operation instruction indicates that the three bitmap are AND and the result is saved to the destmap. Then BITCOUNT statistics are performed on destmap.

/ / count the number of bit bits = 1 with the operation BITOP AND destmap bitmap:01 bitmap:02 bitmap:03// BITCOUNT destmap

Simply calculate the memory cost of the next 100 million-bit Bitmap, accounting for about 12 MB of memory (10 ^ 8 / 8 Bitmap 1024), and the memory cost of a 7-day Bitmap is about 84 MB. At the same time, we'd better set the expiration time for Bitmap and let Redis delete expired punch data to save memory.

Thank you for reading! On "how to use Redis+Bitmap to achieve 100 million-level massive data statistics" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report