In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "how the Hash conflict is going on". The explanation in the article is simple and clear and easy to learn and understand. Please follow the editor's train of thought to study and learn "how the Hash conflict is going on".
What is the Hash conflict?
Before this article officially begins, let's make this question clear in a few words: what exactly is the Hash conflict that we often talk about?
Go straight to the previous picture:
What did you think of when you saw this picture?
Have you thought of the structure of HashMap's array plus linked list?
By the way, I'm here to take HashMap as a starting point to tell you about Hash conflicts.
Then let's look at the following picture:
Suppose we now have a key with the value of [why technology]. After the Hash algorithm, the value is calculated to be 1, which means that the value should be placed at the subscript 1 of the array.
But as shown in the figure, there is already a value of eat in the place with the subscript 1. This pothole has been occupied.
So at this moment, we call this phenomenon Hash conflict.
How does HashMap resolve Hash conflicts?
Chain address method, also known as zipper method.
When there is a Hash conflict in the array, the data structure of the linked list comes in handy.
How do you use a linked list? Look at the picture:
So we can solve the problem.
In fact, hash conflicts are just like this: different objects get the same HashCode after the same Hash algorithm.
So when I wrote here, I suddenly thought of an interview question:
Excuse me, what version of HashMap is my above drawing based on JDK?
Why did you think of this interview question?
Because I hesitated for about 0.3 seconds when I drew the picture, and when I hung it on the linked list, did I use head insertion or tail insertion?
As we all know, HashMap in JDK 7 uses head insertion, that is, [why technology] before [eat], HashMap in JDK 8 used tail insertion.
What do you say about this question? it's really boring. But what can I do? I still have to memorize the eight-part essay.
Interview, memorize it, not shabby.
Build a String like HashCode
As we know earlier, the root cause of Hash conflicts is that different objects get the same HashCode after the same Hash algorithm.
At first glance, this sentence: well, it makes a lot of sense, that's what it is, no problem.
For example, in our commonly used HashMap, most of the key is of the String type. For Hash conflicts to occur, at least two String classes with the same HashCode are required.
So let me ask you: how can I quickly get two String with the same HashCode?
How's it going? are you a little confused?
It only takes one question from being reasonable to being a little confused.
Come on, let me show you a wave of analysis.
Let me ask you first: will two different String of length 1, such as the following code, have the same HashCode?
String a = "a"
String b = "b"
It's definitely not, is it?
If you don't know, I suggest you look for the answer in the ASCII code.
Let's comb down to see if a String of length 2 will have the same HashCode?
To answer this question, we need to first look at String's hashCode calculation method. Let me take JDK 8 as an example:
Let's assume that these two String of length 2 are xy and ab, respectively.
Note that the xy and ab here are placeholders, not strings.
Similar to the unknowns x and y in the unary quadratic equation in primary school textbooks, we need to bring them into the hashCode method above to calculate.
The most important part of the hashCode algorithm is the for loop.
There are three things in the for loop that we don't know what they are: h _ value. Length and val [I]. Let's take a look at debug:
H equals 0 in the initial case.
The underlying structure of the String type is the char array, which you should know.
So, value.length is the length of the string. Val [] is the char array.
Bring xy into the for loop, and the for loop will loop twice.
The first cycle: hints 0 minutes Val [0] = x, so hashes 31 minutes 0 cycles x, that is, hags x.
The second loop: hcorrecxMagery Val [1] = y, so h=31*x+y.
So, after calculation, the hashCode of xy is 31*x+y.
Similarly, the hashCode of ab is 31*a+b.
Since we want to build a string like hashCode, we can get the equation:
31x+y=31a+b
Then the question arises: how much are the xmeme yreageme b respectively?
Can you figure it out?
You can figure out a hammer! You don't want to untie the arrangement and combination on the blackboard, you just can't untie it.
But I can unravel it and show you how to do it.
The math class has begun. Attention, I'm going to change shape.
31x+y=31a+b can be transformed into:
31x-31a=b-y .
That is, 31 (xmura) = bMury.
It is much clearer at this time, and it is obvious that the above equation has a special solution:
XMui axiom 1 ~ m b muryboy 31.
Because, it can be obtained from above: for any two strings xy and ab, if they satisfy x-a=1, that is, the difference between the ASCII code value of the first character is 1 and that of b-y=31 is satisfied, that is, the difference of ASCII code value of the second character is-31. Then the hashCode of these two characters must be equal.
It has been made so clear that such a combination is compared with the ASCII code table, isn't it?
Aa and BB, is that right?
Ab and BC, aren't they?
Ac and BD, do you?
Yes. Now we can generate two strings that are the same as HashCode.
We're making it a little more difficult. What if I want to build more than two HashCode-like strings?
Let's analyze it first.
Aa's HashCode is the same as BB's. Let's arrange it one by one, isn't it still the same?
Like this: AaBB,BBAa.
Or, for example, I was shocked! Is there an endless cycle in ConcurrentHashMap? "the example that appeared in this article, AaAa,BBBB:
You see, amazing things happen.
We have four strings like hashCode.
With these four strings, let's combine them with Aa,BB, such as AaBBAa,BBAaBB.
With 8 combinations, we can get 8 hashCode-like strings again.
Wait a minute. I think I found a pattern.
If we take Aa,BB as the seed data, after many permutations and combinations, we can get any number of hashCode-like strings. The length of the string increases as the number increases.
I'm not quite clear about the text. Just show you code it, as follows:
Public class CreateHashCodeSomeUtil {
/ * *
* seed data: two hashCode strings of length 2
, /
Private static String [] SEED = new String [] {"Aa", "BB"}
/ * *
* generate a set of 2 to the nth power of HashCode-like strings
, /
Public static List hashCodeSomeList (int n) {
List initList = new ArrayList (Arrays.asList (SEED))
For (int I = 1; I < n; iTunes +) {
InitList = createByList (initList)
}
Return initList
}
Public static List createByList (List list) {
List result = new ArrayList ()
For (int I = 0; I < SEED.length; + + I) {
For (String str: list) {
Result.add (SEED [I] + str)
}
}
Return result
}
}
With the above code, we can generate as many hashCode-like strings as we want.
It's like this:
So stop asking questions like this:
With these hashCode-like strings, we put all these strings into HashMap with the following code:
Public class HashMapTest {
Public static void main (String [] args) {
Map hashMap = new HashMap ()
HashMap.put ("Aa", "Aa")
HashMap.put ("BB", "BB")
HashMap.put ("AaAa", "AaAa")
HashMap.put ("AaBB", "AaBB")
HashMap.put ("BBAa", "BBAa")
HashMap.put ("BBBB", "BBBB")
HashMap.put ("AaAaAa", "AaAaAa")
HashMap.put ("AaAaBB", "AaAaBB")
HashMap.put ("AaBBAa", "AaBBAa")
HashMap.put ("AaBBBB", "AaBBBB")
HashMap.put ("BBAaAa", "BBAaAa")
HashMap.put ("BBAaBB", "BBAaBB")
HashMap.put ("BBBBAa", "BBBBAa")
HashMap.put ("BBBBBB", "BBBBBB")
}
}
Finally, the length of the HashMap will be expanded twice. After the expansion, the array length is 64:
But only three positions are occupied inside, which is the place with the subscript 0pr 31pr 32:
The drawing is as follows:
See, the thorn is not exciting, the length of the array is 64, 14 pieces of data are stored, and only 3 positions are occupied.
The utilization rate of this space is too low.
So, that's hack, HashMap. Congratulations, you have mastered a hacker attack technology: hash conflict Dos.
If you want to know more. You can take a look at Brother Stone's article: "I didn't expect that Hash conflicts can still be played in this way. did you get hit by your service?" ".
Seeing the picture above, I wonder if there is anything wrong with it.
If not, let me give you another hint: under the position marked 32 in the array, there is a linked list of length 8.
Didn't you suddenly realize it? In JDK 8, what is the threshold for linked lists to turn trees?
Therefore, in the current case, the position under the array subscript 32 should not be a linked list, but a red-black tree.
Right?
Yes, a hammer. Yes! Some people are misled if they are not careful.
This is not right. The threshold for a linked list to a red-black tree is when the number of nodes is greater than 8, not equal to 8.
In other words, the tree operation will only be triggered when there is another key with subscript 32 after hash calculation, and the value is different from the previous value.
Believe it or not, I'll show you what node it is now:
Are you kidding me? As you can see clearly from the image above, the 8th node is still a normal node.
And if it's a treed node, it should look like this:
Believe it or not, let's bring in one more hash conflict and take a look at it with your own eyes. The code won't cheat.
So how to create one more conflict?
The simplest thing is to write:
Won't there be one more conflict? I am such a genius that I can't help applauding myself.
All right, let's take a look at what the current node state looks like:
So, did you become TreeNode, didn't lie to you?
What? You asked me why I didn't draw the picture.
Don't ask, but I can't draw a red-black tree. Serious people who draw that thing.
In addition, I would like to say one more word about a pit in an interview question for HashMap.
The interviewer asked: what are the conditions for turning JDK 8's HashMap linked list into a red-black tree?
Most of the friends who have memorized the eight-part essay of the interview will be able to answer: when the length of the linked list is greater than 8.
Is that the right answer?
It's right, but it's only half right.
Another condition is that the array will turn to a red-black tree when the length of the array is greater than 64.
It is clearly written in the source code that the length of the array is less than 64, so you can directly expand the capacity instead of turning to a red-black tree:
It seems that many people ignore the condition that the length of the array is greater than 64.
If you recite eight-part essays, you still have to memorize them all.
For example, the following test case:
They all fall to the position marked 0 under the array.
When the ninth element BBBBAa falls in, it goes to the treeifyBin method, but the tree operation is not triggered, only the expansion operation is carried out.
Because the current length is the default length, which is 16. Do not meet the conditions for turning red and black trees.
So, from the screenshot below, we can see that where the label is ①, the length of the array becomes 32 and the length of the linked list becomes 9, but the node is still normal node:
Well, it's kind of interesting. I think it's much more interesting to learn HashMap this way.
Entity class as key
In the above example, we use the String type as the key in HashMap.
This scenario can cover 95% of our development scenarios.
But every once in a while, entity classes may be put into HashMap as key.
Notice, here comes the interview question again: can entity classes be used as objects in HashMap?
You can do that if you have to. But there's a hole, so be careful not to step into it.
Let me give you an example of a news I saw some time ago:
Suppose I want to collect students' family information and save it with HashMap.
So my key is the student object, and the value is the student family information object.
They are respectively like this:
Public class HomeInfo {
Private String homeAddr
Private String carName
/ / omit the modification method and toString method
}
Public class Student {
Private String name
Private Integer age
/ / omit the modification method and toString method
}
Then our test case is as follows:
Public class HashMapTest {
Private static Map hashMap = new HashMap ()
Static {
Student student = new Student ("why", 7)
HomeInfo homeInfo = new HomeInfo ("Great South Street", "bicycle")
HashMap.put (student, homeInfo)
}
Public static void main (String [] args) {
UpdateInfo ("why", 7, "Binjiang Road", "Motorcycle")
For (Map.Entry entry: hashMap.entrySet ()) {
System.out.println (entry.getKey () + "-" + entry.getValue ())
}
}
Private static void updateInfo (String name, Integer age, String homeAddr, String carName) {
Student student = new Student (name, age)
HomeInfo homeInfo = hashMap.get (student)
If (homeInfo = = null) {
HashMap.put (student, new HomeInfo (homeAddr, carName))
}
}
}
In the initial state, there is already a 7-year-old child named why in HashMap. His family lives on Da Nan Street and his family's means of transportation is a bicycle.
Then, one day he told the teacher that he had moved to Binjiang Road, and his bike was replaced by a motorcycle.
So the teacher modified the family information of the why children through the page.
Finally, the updateInfo method is called.
Hey, guess what?
Let me show you the output:
After the update, two 7-year-old children named why appeared in their class, one living in Da Nan Street and the other in Binjiang Road.
The update has been added, do you think it is magical?
The phenomenon has come out, so it is not easy to locate the problem code according to the phenomenon?
Obviously, this is where the problem lies:
The homeInfo taken out here is empty, so a new data will be put in.
So let's see why this place is empty.
Follow the hashMap.get () source code to take a look:
The place labeled ① is the computed key, which is the hashCode of the student object. Our student object does not override hashCode, so we call the default hashCode method.
The student here is from new:
Therefore, the hashCode of this student is bound to be different from the previous student in HashMap.
Therefore, where the label is ③, the subscript of the tab array calculated by hash is null. It will not enter the if judgment, which is returned as null.
Then the solution is on the horizon: override the object's hashCode method.
Really?
Wait, you come back, don't run away with half of it. I'm not finished yet.
Then look at the source code:
When the HashMap put method is executed, the equals method is used to determine whether the current key is the same as the key that exists in the table.
We don't override the equals method here, so we return false here.
So, if our hashCode and equals methods are not overridden, then the following diagram occurs:
If we rewrite the hashCode and not the equals method, then the following diagram occurs:
In a nutshell: in HashMap, if you use an object for key, be sure to override the object's hashCode and equals methods. Otherwise, not only can not achieve the desired results, but also may lead to memory overflow.
For example, in the above example, we put it into a loop and add-Xmx10m to the startup parameters. The running result is as follows:
Because each time the student object from new is different, the hashCode is different, so the expansion operation will be triggered constantly, and finally an OOM exception will be thrown in the method of resize.
Strange knowledge added. I flipped through the book "Java programming ideas (4th Edition)" when I wrote this article.
Two more strange knowledge has been added.
The first is in this book. For the example of putting objects in HashMap, it looks like this:
Groundhog: woodchuck, marmot.
Prediction: prophecy, prediction, prediction.
Consider a weather forecasting system that links marmots to forecasts.
What kind of immortal need is this TM that you can't read?
Fortunately, Brother why was so knowledgeable that he closed his eyes and searched my knowledge warehouse.
So that's what happened.
In the American state of Pennsylvania, Groundhog Day falls on February 2 every year.
According to folklore, if the marmot sees his shadow when he comes out of the hole on February 2, then the little thing will return to the hole to continue hibernating, indicating that spring will not come for another six weeks. If there is no shadow, it will come out to find food or courtship, indicating that the cold winter is coming to an end.
This echoes by judging whether the marmot can see a shadow when it comes out of the hole, so as to judge whether the winter is over.
In this way, the demand makes sense.
The second strange knowledge goes like this.
With regard to the HashCode method, "Java programming ideas (4th Edition)" reads as follows:
I saw something wrong at a glance: result=37*result+c.
As we said earlier, the base number should be 31?
The author said that the formula was taken from the book Effective Java (1st Edition).
These two books are both java Bibles. I suggest you type the dream linkage in the message area.
"Effective Java (1st edition)" is too old. I only have physical books in 2nd and 3rd editions.
So I looked for the first edition of the ebook on the Internet, and finally found the corresponding description:
As you can see, the formula given in the book is indeed based on 37.
After turning over the third edition, in the same place, the formula is as follows:
And, you go to the Internet to search: String's hashCode calculation method.
It's all about why it's 31. The number 37 is rarely mentioned.
In fact, I guess that in the early version of JDK, String's hashCode method should have used 37, and later changed to 31.
I tried to download the earliest version of JDK to verify it, but I turned the Internet upside down and couldn't find a suitable one.
Why did it change from 37 to 31 in the book?
The author explains it as follows: above is version 1, and below is version 2:
The part framed in the box wants to express exactly the same thing, except that the object has changed from 37 to 31.
And why it changed from 37 to 31, the author explained in the second edition, which is the part I underlined.
31 there is a good license to use displacement and subtraction instead of multiplication for better performance:
31 examples = (I)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.