What are the knowledge points about hashing in java? 04/28 Update SLTechnology News&Howtos

What are the knowledge points about hashing in java?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what are the knowledge points about hashing in java". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

What is a hash?

Hash, refers to the process of turning arbitrary length input into fixed length output through a certain algorithm, this output is called hash value, or hash code, this algorithm is called Hash algorithm, or Hash function, this process is generally called Hash, or computing Hash,Hash is translated into Chinese hash, hash, hash and so on.

Since it is a fixed-length output, it means that the input is infinite and the output is limited, so it is inevitable that different inputs may get the same output. Therefore, the Hash algorithm is generally irreversible.

So what is the use of the Hash algorithm?

The use of hash algorithm

Hash algorithm is a generalized algorithm, or an idea, it does not have a fixed formula, as long as it satisfies the algorithm defined above, it can be called Hash algorithm.

Generally speaking, it has the following uses:

Encrypt the password, for example, using MD5+ salt to encrypt the password

Quick query, for example, the use of hash table, through the hash table can quickly query elements

Digital signatures, such as inter-system calls plus signatures, can prevent tampering with data

File inspection, for example, there is usually an MD5 value when downloading Tencent Games. After downloading the installation package, a MD5 value is calculated and compared with the official MD5 value, you can know whether the file has been damaged or tampered with in the download process.

Well, speaking of the Hash algorithm, or the Hash function, in Java, the parent class Object of all objects has a Hash function, the hashCode () method, so why do you need to define such a method in the Object class?

Strictly speaking, there is a difference between the Hash algorithm and the Hash function. I believe you can distinguish it according to the context.

Let's see what the comments on the JDK source code say:

Return a hash value for this object that exists to better support hash tables, such as HashMap. To put it simply, this method is for hash tables such as HashMap.

/ / the internal address of the object is returned by default

Public native int hashCode ()

At this point, we have to mention another method in the Object class-- equals ().

/ / the default is to directly compare whether the addresses of two objects are equal.

Public boolean equals (Object obj) {

Return (this = = obj)

}

What kind of entanglement do hashCode () and equals have?

Generally speaking, hashCode () can be seen as a weak comparison, returning to the nature of Hash, mapping different inputs to fixed-length outputs, then the following situations occur:

If the input is the same, the output must be the same

If the input is different, the output may be the same or different.

The output is the same, the input may be the same, or different

If the output is different, the input must be different.

Equals () is a strict way to compare whether two objects are equal, so if two objects equals () is true, then their hashCode () must be equal, what if they are not equal?

If equals () returns true and hashCode () is not equal, then if you think of these two objects as key of HashMap, they will most likely be positioned in different slots of HashMap, and there will be a HashMap with two equal objects inserted, which is not allowed, which is why if you override the equals () method, you must override the hashCode () method.

For example, for the class String, we all know that its equals () method compares whether the contents of two strings are equal, not the addresses of two strings. Here is its equals () method:

Public boolean equals (Object anObject) {

If (this = = anObject) {

Return true

}

If (anObject instanceof String) {

String anotherString = (String) anObject

Int n = value.length

If (n = = anotherString.value.length) {

Char v1 [] = value

Char v2 [] = anotherString.value

Int I = 0

While (nMurt -! = 0) {

If (v1 [I]! = v2 [I])

Return false

ITunes +

}

Return true

}

Return false

}

So, for the following two string objects, using equals () to compare them is equal, but their memory addresses are not the same:

String a = new String ("123")

String b = new String ("123")

System.out.println (a.equals (b)); / / true

System.out.println (a = = b); / / false

At this point, if the hashCode () method is not overridden, then an and b will return different hash codes, which will cause great interference to the key where we often use String as HashMap, so the hashCode () method that String overrides:

Public int hashCode () {

Int h = hash

If (h = = 0 & & value.length > 0) {

Char val [] = value

For (int I = 0; I < value.length; iTunes +) {

H = 31 * h + val [I]

}

Hash = h

}

Return h

}

The algorithm is also very simple, expressed by a formula: s [0] * 31 ^ (nMUI 1) + s [1] * 31 ^ (nMel 2) +. + s [n-1].

Well, since hash tables are mentioned repeatedly here, let's take a look at how hash tables evolve step by step.

Hash table evolution history array

Before we talk about hash tables, let's take a look at arrays, the ancestor of data structures.

The array is relatively simple, I will not say much, everyone will understand, see the following picture.

The subscript of an array usually starts at 0 and stores elements back in turn, as well as finding specified elements, which can only be found from the beginning (or from the end).

For example, to find the element 4, you need to look it up three times from scratch.

Early hash tables

The disadvantage of the array mentioned above is that you can only look for an element from the beginning or from the end until it is matched, and its equilibrium time is complex O (n).

So, is there any way to find elements quickly with arrays?

Smart programmer brothers come up with a way to calculate the value of an element through a hash function and use this value to determine the position of the element in the array, so that the time complexity can be reduced to O (1).

For example, if there are five elements 3, 5, 4, and 1, the position is calculated by the hash function and placed precisely before you put them into the array, instead of placing elements in turn like a simple array (finding locations based on indexes rather than element values).

If the array length requested here is 8, we can make such a hash function as hash (x) = x% 8, then the final element will look like this:

At this point, let's look for the element 4, and first calculate that its hash value is hash (4) = 4% 8 = 4, so just return to the element at position 4.

Evolutionary hash table

Things look perfect, however, there is an element 13, to be inserted into the hash table, calculate its hash value is hash (13) = 13% 8 = 5, Nani, it also calculates the position is 5, but 5 has been occupied first, what should I do?

This is the hash conflict.

Why is there a hash conflict?

Because the array we applied for is of finite length, mapping infinite numbers to a finite array will sooner or later conflict, that is, multiple elements will be mapped to the same location.

Well, now that there is a hash conflict, then we have to solve it, we must do it!

How to?

Linear detection method

Now that position 5 is already occupied, I will move one position back, and I will go to position 6. This is the linear detection method. When there is a conflict, move back in turn until an empty position is found.

However, there is a new element 12, and its hash value is hash (12) = 12% 8 = 4 what? In this way, you have to move back three times to position 7 in order to have a vacant position, which leads to the inefficiency of inserting elements, and the same is true of searching. First locate position 4, and find that it is not the person I am looking for, and then move back until you find position 7.

Secondary detection method

There is a big drawback to using linear detection. Conflicting elements tend to pile up together, for example, 12 to 7, then 14, and then to the end of the array, and then to 0 from the beginning. You will find that conflicting elements are clustered, which is not conducive to finding, and also not conducive to inserting new elements.

At this time, a clever programmer brother put forward a new idea-the secondary detection method. When there is a conflict, I do not use the latter one to find the empty position, but use the original hash value plus the quadratic power of I to find it. In this way, until an empty location is found.

Or take the above as an example, insert element 12, the process is like this, this article comes from Princess Tong GE read the source code:

In this way, you can quickly find an empty place to place new elements, and there will be no accumulation of conflicting elements.

But goose, there is a new element 20. Where do you put it?

I found that I couldn't put it anywhere.

Research shows that when more than half of the elements are placed in the hash table using the secondary detection method, the location of the new element will not be found.

Therefore, it leads to a new concept-capacity expansion.

What is capacity expansion?

When the placed element reaches x% of the total capacity, it needs to be expanded, which is also called the expansion factor.

Obviously, the larger the expansion factor, the better, indicating that the higher the space utilization of the hash table.

Therefore, it is a pity that the secondary detection method can not meet our goal, the expansion factor is too small, only 0.5, half of the space is wasted.

It's time for programmer brothers to give full play to their smart features. After brainstorming in 996, they came up with a new way to implement hash tables-linked tables.

Linked list method

It's all about resolving conflicts! If there is a conflict, I will not put it in the array. I use a linked list to connect the elements of the subscript position of the same array, so that we can make full use of the space, ah, ha.

Hey, perfect △△.

It's really perfect. I'm a hacker. I keep putting *% 84th elements into it, and then you'll find that almost all the elements are in the same linked list. Hehe, the end result is that your hash table is reduced to a linked list, and the efficiency of querying inserted elements becomes O (n).

At this time, of course, there is a way, what is the expansion factor?

For example, when the expansion factor is set to 1, when the number of elements reaches 8, the capacity is doubled, half of the elements are still in the 4th position, and half of the elements go to the 12th position, which can relieve the pressure on the hash table.

However goose, still not very perfect, but also from a linked list into two linked lists, this article comes from the princess Tongge read the source code.

Smart programmer brothers started a brainstorm that grew 9127 this time, and finally came up with a new structure-linked list tree method.

Linked list tree method

Although the above expansion can solve part of the problem when the number of elements is relatively small, the overall search and insertion efficiency will not be too low, because the number of elements is small.

However, hackers are still attacking, the number of elements continues to increase, when increased to a certain extent, it will always lead to a particularly low efficiency of search and insertion.

So, to change the way of thinking, since the efficiency of the linked list is low, how about upgrading it to a red-black tree when the linked list is long?

Well, I think so. Just do it.

Well, not bad, my mother is no longer afraid of me being hacked, the query efficiency of the red-black tree is O (log n), which is much higher than the O (n) of the linked list.

So, is this the end of it?

If you think too much, you still have to move half of the elements every time you expand, okay? one tree is divided into two trees. Is that really good?

Programmer brothers are too difficult, this time after 12127 brainstorming, finally came up with a new thing-consistent Hash.

Consistent Hash

Consistent Hash is more often used in distributed systems. For example, a Redis cluster deploys four nodes. We define all hash values as 0-2 ^ 32, with 1/4 elements on each node.

Here is just an example, the principle of the actual Redis cluster is like this, the specific value is not like this.

At this point, suppose you need to add a node to Redis, such as node5, between node3 and node4, so that you only need to move the element between node3 to node4 from node4 to node5, and the other elements remain the same.

In this way, the speed of capacity expansion is increased, and fewer elements are affected, and most of the requests are almost unaware.

This is the end of the introduction of "what are the knowledge points about hashing in java". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.