HashMap case analysis 07/13 Update SLTechnology News&Howtos

HashMap case analysis

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Most people do not understand the knowledge points of this "HashMap case Analysis" article, so the editor summarizes the following content, detailed content, clear steps, and has a certain reference value. I hope you can gain something after reading this article. Let's take a look at this "HashMap case Analysis" article.

Scene acting

Interviewer: introduce yourself first.

Angela: I am Angela, one of the three bitches in the grass, the strongest single (Zhong Kui is not convinced)! Oh, no, no. I'm currently working on system development.

Interviewer: see on your resume that you are familiar with the Java collection, which has been used by HashMap?

Angela: it has been used. (still familiar)

Interviewer: can you tell me something about the internal data structure of HashMap?

Angela: at present, I use the JDK1.8 version, internally using array + linked list / red-black tree

Angela: is it convenient for me to draw you a data structure diagram?

Interviewer: do you understand the data insertion principle of HashMap?

Angela: er, [in a meditative manner]. I think we should draw a picture more clearly, as follows:

1. Determine whether the array is empty and initialize.

two。 Is not empty, calculates the hash value of k, and calculates the subscript index that should be stored in the array through (n-1) & hash

3. Check whether data exists in table [index]. If there is no data, construct a Node node and store it in table [index].

4. If there is data, it means that there is a hash conflict. Continue to judge whether the key is equal and equal, and replace the original data (onlyIfAbsent is false) with the new value.

5. If it is not equal, determine whether the current node type is a tree node. If it is a tree node, create a tree node and insert it into the red-black tree.

6. If it is not a tree node, create an ordinary Node to join the linked list; determine whether the length of the linked list is greater than 8, and if so, the linked list is converted to a red-black tree

7. After the insertion is completed, it is determined whether the current number of nodes is greater than the threshold, and if it is greater than the starting expansion, it will be twice as large as the original array.

Interviewer: you just mentioned the initialization of HashMap. How does HashMap set the initial capacity?

Angela: [is that a problem? ?] Generally speaking, if new HashMap () does not pass a value, the default size is 16, and the load factor is 0.75. if you pass in the initial size k, the initialization size is greater than the integer power of k, for example, if you pass 10, the size is 16. (additional note: the implementation code is as follows)

Static final int tableSizeFor (int cap) {int n = cap-1; n | = n > 1; n | = n > 2; n | = n > 4; n | = n > 8; n | = n > 16; return (n

< 0) ? 1 : (n >

= MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY: n + 1;}

Note: the following figure is a detailed process, the algorithm is to move the initial binary to the right, respectively, to the right of 1, 2, 4, 8, and 16 bits, XOR, and the first number of 1 in the high position is constantly moved to the right. Change the back of the high position 1 into 1M11 1111 + 1 = 1000000 = (in accordance with the integer power of 2 and greater than 50)

Interviewer: you mentioned the hash function. Do you know how to design the HashMap hash function?

Angela: [quite detailed] the hash function first gets the hashcode through key, which is an int value of 32 bits, and then lets the high 16-bit and low 16-bit of hashcode perform XOR operations.

Interviewer: do you know why it is so designed?

Angela: [this also has to be asked]. This is also called a disturbance function. There are two reasons for this design:

Be sure to reduce hash collisions as much as possible, and the more decentralized the better; the algorithm must be as efficient as possible, because this is a high-frequency operation, so it uses bit operations; interviewer: why can hashcode's high 16-bit XOR and low 16-bit XOR reduce hash collisions? Can the hash function be directly used by key's hashcode?

[this question is a bit tricky] Angela almost exploded in place and wanted to make a series of moves of biubiubiu 21-13.

Angela: because the key.hashCode () function calls the hash function that comes with the key key type, it returns an int hash value. Int values range from-2147483648 to 2147483647, which adds up to about 4 billion of the mapping space. As long as the hash function is mapped uniformly and loosely, it is difficult to collide in general applications. But the problem is that there is no room for memory in an array of 4 billion lengths. You know, if the initial size of the HashMap array is only 16, you need to modulo the length of the array before the remainder can be used to access the array subscript.

Modular operation in the source code is the hash value and array length-1 to do a "and" operation, bit operation is faster than% operation.

BucketIndex = indexFor (hash, table.length); static int indexFor (int h, int length) {return h & (length-1);}

By the way, this also explains why the array length of HashMap takes the integer power of 2. Because this (array length-1) is exactly equivalent to a "low mask". The result of the and operation is that the high bits of the hash values are all zeroed, leaving only the low values for array subscript access. Take the initial length 16 as an example, 16-1-15. The binary representation is 00000000 00000000 00001111. Do the and operation with a hash value as follows, and the result is that the lowest four-digit value is intercepted.

10100101 11000100 00100101 & 00000000 0000000000001111-000000000000000000101 / / High bits are all zeroed, leaving only the last four digits

But here comes the problem, so that even if my hash distribution is loose, if I only take the last few digits, the collision will be very serious. What is even more fatal is that if the hash itself is not done well, the loophole in the distribution into an isometric sequence, if it happens to make the last few low bits show regular repetition, it will be extremely painful.

At this time, the value of the hash function ("disturbance function") is reflected, and you should guess at this point. Look at the picture below.

The right shift is 16 bits, which is exactly half of 32bit. The purpose of XOR is to mix the high and low bits of the original hash code, so as to increase the randomness of the low bits. And the mixed low-order is mixed with some of the high-order features, so that the high-level information is also retained in disguise.

Finally, let's take a look at an experiment in Peter Lawley's column "An introduction to optimising a hashing strategy": he randomly selects 352 strings and masks them with an array subscript on the premise that their hash values do not conflict at all.

The results show that when the length of the HashMap array is 512 (), that is, when the mask is lowered by 9 bits, 103 collisions occur without the disturbance function, which is close to 30%. There were only 92 collisions after using the perturbation function. Collisions have been reduced by nearly 10%. It seems that the disturbance function does have an effect.

In addition, Java1.8 has been adjusted compared with 1.7, which has done four shifts and four XOR, but obviously Java 8 thinks that one disturbance is enough. If you do it four times, the marginal utility may not be great, so the so-called "once" is changed to one for the sake of efficiency.

Here is the hash code for 1.7:

Static int hash (int h) {h ^ = (h > 20) ^ (h > 12); return h ^ (h > 7) ^ (h > 4);}

Interviewer: it seems that I have done my homework. I have something to learn. Did you secretly read Angela's official blog account? you just said that 1.8has optimized the hash function. Are there any other optimizations?

Angela: 1.8There are three main optimizations:

1. Array + linked list changed to array + linked list or red-black tree

two。 The insertion mode of the linked list is changed from head insertion to tail insertion, that is, when inserting, if there are already elements in the array position, 1.7 put the new element into the array, the original node as the successor node of the new node, 1.8 traversing the linked list, place the element at the end of the linked list

3. When expanding capacity, 1.7 need to relocate the elements in the original array to the location of the new array, 1.8 uses simpler judgment logic, the location remains the same or index + old capacity size

4. When inserting, 1.7 first judge whether expansion is needed, then insert, 1.8 insert first, and then determine whether expansion is needed. Interviewer: tell me why you need to make these optimizations.

Angela: [ahem, it's really a chain fire]

1. To prevent hash conflicts, the length of the linked list is too long, and the time complexity is reduced from O (n) to O (logn)

two。 Because when the 1.7 plug method is used to expand the capacity, the head insertion method will reverse the linked list, and the ring will be produced in the multithreaded environment.

Thread An is inserting node B and thread B is also inserting. When there is not enough capacity, thread A starts to expand, re-hash, places elements, uses head insertion, and the B node that is traversed is put into the head, thus forming a loop.

The expansion of 1.7 calls the transfer code, as follows:

Void transfer (Entry [] newTable, boolean rehash) {int newCapacity = newTable.length; for (Entry e: table) {while (null! = e) {Entry next = e.next; if (rehash) {e.hash = null = = e.key? 0: hash (e.key);} int I = indexFor (e.hash, newCapacity); e.next = newTable [I] / / if thread An is suspended on this line, thread B begins to expand newTable [I] = e; e = next;}}

3. Why can 1.8 directly locate the original node in the new data without re-hash during the expansion?

This is due to the fact that the expansion is twice the size of the original array, and the mask used to calculate the location of the array is only a high bit with an extra 1, for example: the length before expansion is 16, and the binary nMul used for computing (nMuth1) & hash is 0000 1111.

After the expansion of the capacity to 32, the binary system will be higher than that of 32, and the value will be 0001 1111.

4. Because it is a & operation, 1 and any number & is itself, it can be divided into two cases, as shown in the following figure: when the high bit 4 of the original data hashcode is 0 and the high bit is 1

The fourth high bit is 0, the re-hash value is unchanged, the fourth bit is 1, and the re-hash value is 16 larger than the original (the capacity of the old array).

Interviewer: is HashMap thread-safe?

Angela: no, in a multithreaded environment, 1.7 will cause problems of endless loop, data loss and data coverage.

As an example, thread A hangs when thread An executes to line 6 of the following code to determine that the index position is empty, and thread B starts to execute line 7 to write node data to the location of index. Then thread A resumes the scene and performs the assignment operation, overwriting the data of thread A.

There is also line 38 + size this place will also cause multi-thread simultaneous expansion and other problems.

Final V putVal (int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {Node [] tab; Node p; int n, I; if ((tab = table) = = null | | (n = tab.length) = = 0) n = (tab = resize ()) .length If ((p = tab [I = (n-1) & hash]) = = null) / / multithreaded execution here tab [I] = newNode (hash, key, value, null); else {Node e; K k; if (p.hash = = hash & & (k = p.key) = = key | | (key! = null & & key.equals (k) e = p Else if (p instanceof TreeNode) e = ((TreeNode) p) .putTreeVal (this, tab, hash, key, value); else {for (int binCount = 0;; + + binCount) {if ((e = p.next) = = null) {p.next = newNode (hash, key, value, null) If (binCount > = TREEIFY_THRESHOLD-1) / /-1 for 1st treeifyBin (tab, hash); break;} if (e.hash = = hash & & (k = e.key) = = key | | (key! = null & & key.equals (k) break; p = e }} if (e! = null) {/ / existing mapping for key V oldValue = e.value; if (! onlyIfAbsent | | oldValue = = null) e.value = value; afterNodeAccess (e); return oldValue;}} + modCount; if (+ + size > threshold) / / multiple threads come here and may repeat resize () resize (); afterNodeInsertion (evict) Return null;}

Interviewer: how do you usually solve this thread unsafe problem?

Angela: HashTable, Collections.synchronizedMap, and ConcurrentHashMap can implement thread-safe Map in Java.

HashTable is to directly add the synchronized keyword to the operation method, locking the entire array, and the granularity is relatively large; Collections.synchronizedMap is an internal class using the Collections collection tool, which encapsulates a SynchronizedMap object by passing Map, and defines an object lock internally, which is realized through the object lock; ConcurrentHashMap uses segmented locks to reduce the lock granularity and greatly improve the degree of concurrency. Interviewer: do you know the implementation principle of ConcurrentHashMap segmented lock?

Angela: [Oh, my God! Russian nesting dolls, one by one] ConcurrentHashMap member variables are modified by volatile, which eliminates instruction reordering and ensures memory visibility. In addition, a combination of CAS operation and synchronized is used to realize the assignment operation, and multithreading operation will only lock the nodes of the current operation index.

As shown in the following figure, thread A locks the linked list of node A, thread B locks the linked list of node B, and the operation does not interfere with each other.

Interviewer: you mentioned earlier that when the linked list changes to the red-black tree, the length of the linked list reaches the threshold. What is the threshold?

Angela: the threshold is 8, and the threshold of the red-black tree is 6.

Interviewer: why 8, not 16, 32 or even 7? And why the red-black tree linked list threshold is 6, not 8?

Angela: ask the author! Oh, my God, biubiubiu really wants to recruit 213 times]

Because that's how the author designed it, oh, no, because after calculation, if the hash function is reasonably designed, the probability of eight hash collisions is 6 parts per million. Because 8 is enough, as for why it turns back to 6, because if the number of hash collisions hovers around 8, linked lists and red-black tree conversions will always occur, in order to prevent this from happening.

Interviewer: are the internal nodes of HashMap orderly?

Angela: it is unordered and randomly inserted according to the hash value.

Interviewer: is there an orderly Map?

Angela: LinkedHashMap and TreeMap

Interviewer: tell me about how LinkedHashMap is organized?

Angela: LinkedHashMap maintains a single linked list with header and tail nodes. In addition to inheriting the Node attribute of HashMap, LinkedHashMap node Entry also has before and after to identify the front node and the post node. You can sort by insertion order or access order.

/ * The head (eldest) of the doubly linked list. * / transient LinkedHashMap.Entry head; / * * The tail (youngest) of the doubly linked list. * / transient LinkedHashMap.Entry tail; / / Link the newly added p node to the back end of the linked list private void linkNodeLast (LinkedHashMap.Entry p) {LinkedHashMap.Entry last = tail; tail = p; if (last = = null) head = p; else {p.before = last; last.after = p;}} / / LinkedHashMap node class static class Entry extends HashMap.Node {Entry before, after Entry (int hash, K key, V value, Node next) {super (hash, key, value, next);}}

Sample code:

Public static void main (String [] args) {Map map = new LinkedHashMap (); map.put ("1", "Angela"); map.put ("2", "of"); map.put ("3", "blog"); for (Map.Entry item: map.entrySet ()) {System.out.println (item.getKey () + ":" + item.getValue ()) }} / / console output 1: Angela 2: 3: blog

Interviewer: tell me about how TreeMap is organized?

Angela: TreeMap is sorted according to the natural order of Key or the order of Comprator. Internally, it is realized through red-black trees. So either the class to which key belongs implements the Comparable interface, or customize a comparator that implements the Comparator interface and passes the comparison to the TreeMap user key.

The above is about the content of this article on "HashMap case Analysis". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.