Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to resolve server hash conflicts

2025-01-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to resolve server hash conflicts". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to resolve server hash conflicts.

I. Overview of hash tables

The hash function of the hash table enters a key and returns an index of the hash table. The set of possible keys is large, but the set of hash function values is only the size of the table.

Other uses of hash functions include cryptography, message digest systems, and digital signature systems. In order for these applications to work as expected, the probability of conflicts must be very low, so a hash function with a very large set of possible values is required.

Cryptosystem: given a user's password, the operating system calculates its hash and compares it with the user's hash stored in the file. Don't let the password be easily guessed and hash to the same value.

Message digest system: given an important message, calculate its hash and publish it separately from the message itself. Readers who want to check the validity of the message can also use the same algorithm to calculate its hash and compare it with the published hash. Don't expect to falsify messages easily and still get the same hash.

Popular hash function algorithms for these applications are:

Md5: 2 ^ 128 values (to find a conflict key, you need to hash about 2 ^ 64 values)

Sha-1: 2 ^ 160 values (about 2 ^ 80 values are needed to find a conflict key)

II. Hash conflict

Let's look at a simple example. Suppose we use the hash function: h (K) = K mod M, and insert these values: 217,701,19,30,145

H (K) = 217% 7 = 0

H (K) = 701% 7 = 1

H (K) = 19% 7 = 2

H (K) = 30% 7 = 2

H (K) = 145% 7 = 5

In the above example, it is obvious that 19 and 30 clashed.

III. Conflict resolution strategy

Unless you want to do a perfect hash, you must have a conflict resolution strategy to handle conflicts in the table.

At the same time, the policy must allow finding, inserting, and deleting operations that run correctly!

Conflict resolution techniques can be divided into two categories: open hash method (open hashing, also known as zipper method, separate chaining) and closed hash method (closed hashing, also known as open address method, open addressing). The difference between the two methods is that the open hash method stores the conflicting keys outside the main table of the hash table, while the closed hash method stores the conflicting keys in another slot in the table.

Here are some popular hash conflict resolution strategies in the industry:

Linear detection (Linear probing)

Double hash (Double hashing)

Random hash (Random hashing)

Detach Link (Separate chaining)

The above linear detection, double hashing and random hashing are all closed hashing methods, while separated links are open hashing methods.

1. Linear detection (Linear probing)

Insert a value

When inserting key K in a table of size M using the hash function H (K):

Set indx = H (K)

If the table location indx already contains a key, you do not need to insert it. Over

Otherwise, if the table location indx is empty, the key is inserted into it. Over

Other collisions. Set indx = (indx + 1) mod M.

If indx = = H (K), the table is full! We can only expand the capacity of the hash table.

Therefore, linear detection is basically a linear search for empty slots when a collision occurs.

Pros: easy to implement; always find a location (if any); average performance is very good when the table is not very full.

Cons: "cluster" or "cluster" keys are formed in adjacent slots of the table; when these clusters fill most of the entire array, performance degrades seriously, because the work performed by the probe sequence is actually an exhaustive search of most of the array.

Simple example

Such as hash table size M = 7, hash function: h (K) = K mod M. Insert these values: 701,145,217,19,13,749

H (K) = 701% 7 = 1

H (K) = 145% 7 = 5

H (K) = 217% 7 = 0

H (K) = 19% 7 = 2

H (K) = 13% 7 = 1 (conflict)-> 2 (already has a value)-> 3 (insert position 3)

H (K) = 749% 7 = 2 (conflict)-> 3 (already has a value)-> 4 (insert position 4)

It can be seen that if the hash table is not very large, with the data insertion, the conflict will also occur, the probe traversal times will gradually become lower, and the retrieval process will become exhaustive.

Retrieve a value

If you use linear probes to insert keys into the table, linear probes will find them!

When you use the hash function H (K) to search for key K in a table of size N:

Set indx = H (K)

Returns FOUND if the table location indx contains keys.

Otherwise, if the table location indx is empty, NOT FOUND is returned.

Otherwise, set indx = (indx + 1) modM.

If indx = = H (K), NOT FOUND is returned. We can only expand the capacity of the hash table.

Question: how do I remove keys from a table that uses linear probes?

Can I make a "delayed deletion" and just mark the slot of the deleted key as empty?

Obviously, linear detection is difficult to do, if the position is set to empty, then if the subsequent values are also hash conflicts, linear probe insertion, then these values can no longer be traversed.

2. Double hash (Double hashing)

Linear detection conflict resolution results in clusters in the table because if two keys collide, the next position detected is the same for both keys.

The idea of double hashing: offset to the next detected location depends on the key value, so it can be different for different keys.

The second hash function H 2 (K) needs to be introduced as the offset in the probe sequence (linear probes are treated as double hashes of H 2 (K) = = 1).

For a hash table of size M, the value of H2 (K) should be in the range of 1 to Mmur1; if M is a prime, a common choice is H2 (K) = 1 + ((K / M) mod (Mmur1)).

Then, the insertion algorithm for double hashes is:

Set indx = H (K); offset = H 2 (K)

If the table location indx already contains a key, you do not need to insert it. Over

Otherwise, if the table location indx is empty, the key is inserted into it. Over

Other collisions. Set indx = (indx + offset) mod M.

If indx = = H (K), the table is full! We can only expand the capacity of the hash table.

The hash table is prime, and double hash is very effective in practice.

Double Hash can also be seen: https://blog.csdn.net/chenxuegui1234/article/details/103454285

3. Random hash (Random hashing)

Like double hashes, random hashes avoid clustering by making the probe sequence depend on the key.

When using random hashes, the probe sequence is generated by the output of the pseudo-random number generator seeded by the key (possibly with another seed component, which is the same for each key, but different for different tables).

Then, the insertion algorithm for random hashing is:

Create a RNG with K as the seed. Set indx = RNG.next () mod M.

If the table location indx already contains a key, you do not need to insert it. Over

Otherwise, if the table location indx is empty, the key is inserted into it. Over

Other collisions. Set indx = RNG.next () mod M.

If all M locations have been detected, discard. We can only expand the capacity of the hash table.

Random hashing is easy to analyze, but it is not often used because of the "cost" generated by random numbers. Double hashes are still often used in practice.

4. Separate links (Separate chaining)

When inserting key K into a table with a hash function H (K)

Set indx = H (K)

Inserts a keyword into the list of links titled indx. (search the list first to avoid repetition. )

When searching for key K in a table with a hash function H (K)

Set indx = H (K)

Use a linear search to search for keywords in a linked list with the title indx.

When using the hash function H (K) to delete the key K in the table

Set indx = H (K)

Delete the key with the title indx in the link list

Advantages: as the number of entries increases, the average case performance remains good. Even more than M; deletion is easier to achieve than open addresses.

Disadvantages: the need for dynamic data, in addition to data also need to store pointers, poor locality, resulting in poor cache performance.

Obviously, Java7's HashMap is a way to split links.

Split chain hash analysis

Remember the load factor measure of the fill degree of the table: α = N / M.

Where M is the size of the table and N is the number of keys inserted in the table.

Through separate links, we can make α > 1 given load factor α, and we want to know the time cost in the best, average and worst cases.

Successfully found

The new key insert and lookup failed (these are the same), at best O (1) and at worst O (N). Let's analyze the average.

Average cost of split links

A table assuming that the load coefficient is α = N / M

A total of N items are assigned in the M link list (some of which may be empty), so the average number of items per link list is:

If the find / insert fails, one of the linked lists in the table must be exhaustively searched, and the average length of the linked list in the table is α. Therefore, the average number of times to insert or find unsuccessfully using a separate link is

After a successful search, the list of links containing the target key is searched. In addition to the target keys, there are an average of (Nmur1) / M keys in the list; on average, half of them will be searched before the target is found. Therefore, the average number of successful comparisons found using individual links is

When α

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report