What is the way to quickly find an integer among 10 million integers? 07/19 Update SLTechnology News&Howtos

What is the way to quickly find an integer among 10 million integers?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the way to quickly find an integer among 10 million integers". Interested friends might as well take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the way to quickly find an integer in 10 million integers?"

An interesting interview question:

Suppose that when we need to quickly find whether an integer exists in 10 million integers (integers range from 100 million to 100 million), how to quickly find and judge will be more convenient?

Ps: data of int type takes up four bytes of space when it is stored. Assuming that the stored data ranges from 100 to 100 million, the storage space is approximately 100,100,100,000,000 byte. The consumption of storage space is about 40000 kb 40mb.

1. Hash table structure scheme

To achieve this function through the hash table is of course possible, but the hash table in addition to the need to store the corresponding numbers, but also need to store the corresponding linked list pointers, each number are stored in the int type, the memory size of the number alone is about 40mb, if the storage space is added to the pointer, then the memory size is probably 80mb or so.

two。 Boolean array structure scheme

Store by using an array of Boolean types. First create a 10 million-length Boolean array, and then locate the corresponding subscript assignment true according to the integer, then later in the traversal, according to the corresponding subscript to find the specified variable, if the value of the variable is true, then prove the existence of the number.

Boolean type variables are stored only by true and false, but they actually take up 1 byte when stored in an array. After being compiled, this type of variable is actually stored with int type data 0000 0000 and 0000 0001, so it still takes up a lot of memory space.

3. Use bitmap for storage

For example, for a bitmap with a length of 10, each bit bit stores ten shaping digits from 0 to 9, and the bit corresponding to each bitmap is 0. When the number we deposit is 3, the position at 3 becomes 1.

Picture

When we store multiple numbers, for example, when we store 4, 3, 2, and 1, the bitmap structure will look like this:

Picture

So at this point you may be wondering, how on earth can you calculate the index location of a number on the code?

First of all, let's post the code of the core idea:

Public void set (int number) {

/ / it is equivalent to moving a number 3 digits to the right, which is equivalent to dividing by 8

Int index = number > > 3

/ / equivalent to the position where number% 8 gets the byte [index]

Int position = number & 0x07

/ / perform | or operate as long as one of the two objects participating in the operation has a value of 1.

Bytes [index] | = 1 > 3

Int position = number & 0x07

Return (bytes [index] & (1 3)

Int position = number & 0x07

Return (bytes [index] & (1 setbit bitmap-01 999 0)

(integer) 0

127.0.0.1 setbit bitmap-01 6379 > 9991

(integer) 0

127.0.0.1 6379 > setbit bitmap-01 1003 1

(integer) 0

127.0.0.1 6379 > setbit bitmap-01 1003 0

(integer) 1

127.0.0.1 6379 > setbit bitmap-01 1004 0

(integer) 0

Note:

The 1.offset parameter must be greater than or equal to 0 and less than 2 ^ 32 (bit mapping is limited to 512 MB).

two。 Because the size of the Redis string is limited to 512 megabytes (megabytes), the maximum offset that a user can use is 2 ^ 29-1 (536870911). If you need to use more space than this, use multiple key.

3. When generating a very long string, Redis needs to allocate memory space, which can sometimes cause server block. On the 2010 Macbook Pro, setting the offset to 536870911 (512MB memory allocation) will take about 300ms, setting the offset to 134217728 (128MB memory allocation) will take about 80ms, setting the offset 33554432 (32MB memory allocation) will take about 30ms, and setting the offset to 8388608 (8MB memory allocation) will take about 8ms.

Getbit instruction

Syntax: getbit key offset

Return value:

127.0.0.1 setbit bm 6379 > 0 1

(integer) 0

127.0.0.1 getbit bm 6379 > 0

(integer) 1 bitcount instruction

Syntax: bitcount key [start] [end], where the start and end values are optional

Return value: number of bits set to 1

127.0.0.1 purl 6379 > bitcount user18

(integer) 4 bitop instruction

Syntax: bitop operation destkey key [key...]

Operation can be any of the four operations: AND, OR, NOT, XOR:

BITOP AND destkey key [key...], logically merge one or more key and save the result to destkey.

BITOP OR destkey key [key...], logically OR one or more key, and save the result to destkey.

BITOP XOR destkey key [key...], logically XOR one or more key, and save the result to destkey.

BITOP NOT destkey key, logically negate the given key, and save the result to destkey. Actual combat case

Count the number of check-ins of a user in the xx forum. It is assumed that the user's id is U18.

127.0.0.1 setbit user18 6379 > 1 1

(integer) 0

127.0.0.1 setbit user18 6379 > 1 1

(integer) 0

127.0.0.1 setbit user18 6379 > 1 1

(integer) 0

127.0.0.1 setbit user18 6379 > 15 1

(integer) 0

127.0.0.1 purl 6379 > bitcount user18

(integer) 4

The check-in days of user18 can be quickly calculated from redis by using the bitcount instruction.

(ps: but if you want to calculate the number of consecutive check-in days, or record more check-in information, and take into account the reliable stability of data storage, then bitmap is not very applicable at this time. Here I just simulate a case for you to learn the use of this instruction.)

Daily statistics of active users of the website

The design idea is to use days as the statistical key, and then use the user ID as the Offset. Once the user logs in to visit the website, the corresponding offset is calculated according to the user id and set to 1.

/ / 11034 of users visited the website on October 21, 2020

127.0.1 6379 > setbit 20201021 11034 1

(integer) 0

/ / 11089 of users visited the website on October 21, 2020

127.0.1 6379 > setbit 20201021 11089 1

(integer) 0

/ / 11034 of users visited the website on October 21, 2020

127.0.1 6379 > setbit 20201022 11034 1

(integer) 0

/ / 11032 of users visited the website on October 21, 2020

127.0.1 6379 > setbit 20201023 11032 1

(integer) 0

/ / the intersection of multiple sets is calculated by using bitop or to calculate the period from October 21, 2020 to October 22, 2020.

Id of the user who has logged in continuously within / / and put it into the bitmap named active_user

127.0.0.1 6379 > bitop or active_user 20201021 20201022

(integer) 1387

/ / extract active user information

127.0.0.1 purl 6379 > bitcount active_user

(integer) 2

127.0.0.1 6379 > user online status, online population statistics / / simulated user 98321 access to the system

127.0.0.1 6379 > setbit online_member 98321 1

(integer) 0

127.0.0.1 6379 > setbit online_member 13284 1

(integer) 0

127.0.0.1 6379 > setbit online_member 834521 1

(integer) 0

127.0.0.1 purl 6379 > bitcount online_member

(integer) 3

/ / 13284 of the simulated users leave the system, and the number of online users is reduced by 1.

127.0.0.1 6379 > setbit online_member 13284 0

(integer) 1

127.0.0.1 purl 6379 > bitcount online_member

(integer) 2

127.0.0.1purl 6379 >

This instruction case takes the member id as the incoming offset location and stores it in the incoming bitmap. By setting the corresponding to bitmap, when the corresponding member logs in and visits, the corresponding offset location is set to 1. Finally, the number of items in the bitmap is counted as 1 through bitcount, thus the number of users online is calculated.

Bloom filter provided by Guava

Corresponding dependencies:

Com.google.guava

Guava

22.0

After introducing related dependencies, you can understand the Bloom filter component provided in guava by writing a simple case:

Packageorg.idea.guava.filter

Import com.google.common.hash.BloomFilter

Import com.google.common.hash.Funnels

/ * *

* @ author idea

* @ date created in 11:05 2020-10-25

, /

Public class GuavaBloomFilter {

Public static void main (String [] args) {

/ / the number of elements expected to be processed and the most expected probability of false positives to be passed in when creating the Bloom filter.

BloomFilter bloomFilter = BloomFilter.create (

Funnels.integerFunnel ()

100, / / the number of elements you want to store

0.01 / / probability of false positives of hope

);

BloomFilter.put (1)

BloomFilter.put (2)

BloomFilter.put (3)

BloomFilter.put (4)

BloomFilter.put (5)

/ / determine whether there is a corresponding element in the filter

System.out.println (bloomFilter.mightContain (1))

System.out.println (bloomFilter.mightContain (2))

System.out.println (bloomFilter.mightContain (1000))

}

In this code, you may wonder, why mightContain instead of contain?

This actually has something to do with the misjudgment mechanism of the Bloom filter itself.

Common scenarios for Bloom filters are:

Some scenarios with cache penetration can be used as a precaution.

Its own advantages:

Less storage space is consumed.

The time efficiency is high, and the time complexity of query and insertion is O (h). (h is the number of hash functions)

Disadvantages:

The probability of querying whether an element exists does not guarantee 100% accuracy, and there will be some misjudgments.

Elements can only be inserted and queried, and deletions are not allowed.

At this point, I believe you have a deeper understanding of "what is the way to quickly find an integer in 10 million integers". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.