Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Classical data structure HashMap and line-by-line analysis of each key point

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the classic data structure HashMap and line-by-line analysis of each key point, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Talk about the classical data structure HashMap, analyze each key point line by line

Source code analysis based on JDK-8u261

1 introduction

   HashMap is a very frequently used tool class in the form of key-value pairs, which is very convenient to use. However, it should be noted that HashMap is not thread-safe, it is ConcurrentHashMap that is thread-safe (not to mention the outdated utility class of Hashtable), and HashMap and ConcurrentHashMap are also used to do various caches in the Spring framework. Starting with Java 8, the source code of HashMap has been modified to improve its performance. Let's first take a look at the data structure of HashMap:

   as a whole can be thought of as an array + linked list. The array is for quick retrieval, and if the hash function conflicts, the linked list will be hung at the end of the same location. In other words, the hash values of the nodes on the same linked list are all the same. However, if there are more hash conflicts, the generated linked list will also be pulled longer, and the retrieval will degenerate into a traversal operation, and the performance will be relatively low. In order to improve this situation, red and black trees were introduced in Java 8.

   red-black tree is an advanced balanced binary tree structure, which can guarantee that the time complexity of searching, inserting and deleting is O (logn) at worst. In the scenario with a large amount of data, the insertion and deletion performance of the red-black tree is higher than that of the AVL tree. When the number of nodes in the linked list is greater than or equal to 8, and the length of the current array is greater than or equal to MIN_TREEIFY_CAPACITY (note that this is the test point! So don't say anything in the future that when the length of the linked list is greater than 8, it will turn into a red-black tree, this will only make others think that you are not seriously looking at the source code), all nodes in the linked list will be converted into a red-black tree, and if the current number of linked list nodes is less than or equal to 6, the red-black tree will be degraded into a linked list. The value of MIN_TREEIFY_CAPACITY is 64, which means that the length of the current array (that is, the number of bucket bin) must be greater than or equal to 64, and the current linked list can only be converted when the length of the linked list is greater than or equal to 8. If the length of the current array is less than 64, even if the length of the current linked list is greater than 8, it will not be converted. This requires special attention. The following treeifyBin method is used to convert a linked list into a red-black tree operation:

1 * 2 * Replaces all linked nodes in bin at index for given hash unless 3 * table is too small, in which case resizes instead. 4 * / 5final void treeifyBin (Node [] tab, int hash) {6 int n, index; Node e; 7 if (tab = = null | (n = tab.length)

< MIN_TREEIFY_CAPACITY) 8 resize(); 9 else if ((e = tab[index = (n - 1) & hash]) != null) {10 TreeNode hd = null, tl = null;11 do {12 TreeNode p = replacementTreeNode(e, null);13 if (tl == null)14 hd = p;15 else {16 p.prev = tl;17 tl.next = p;18 }19 tl = p;20 } while ((e = e.next) != null);21 if ((tab[index] = hd) != null)22 hd.treeify(tab);23 }24}   从上面的第7行和第8行代码处可以看出,如果当前数组的长度也就是桶的数量小于MIN_TREEIFY_CAPACITY的时候,会选择resize扩容操作,此时就不会走转成红黑树的逻辑了。这里的意思就是说如果当前的hash冲突达到8的时候,根本的原因就是因为桶分配的太少才产生那么多冲突的。那么此时我选择扩容操作,以此来降低hash冲突的产生。等到数组的长度大于等于MIN_TREEIFY_CAPACITY的时候,如果当前链表的长度还是8的话,才会去转化成红黑树。   由此可以看出加入MIN_TREEIFY_CAPACITY这个参数的意义就是在于要保证hash冲突多的原因不是因为数组容量少才导致的;还有一个意义在于,假如说当前数组的所有数据都放在了一个桶里面(或者类似于这种情况,绝大部分的节点都挂在了一个桶里(hash函数散列效果不好,一般不太可能出现)),此时如果没有MIN_TREEIFY_CAPACITY这个参数进行限制的话,那我就会去开开心心去生成红黑树去了(红黑树的生成过程以及后续的维护还是比较复杂的,所以原则上是能不生成就不生成,后面会有说明)。而有了MIN_TREEIFY_CAPACITY这个参数进行限制的话,在上面的第8行代码处就会触发扩容操作。这里的扩容更多的意义在于把这个hash冲突尽量削减。比如把链表长度为8的八个节点再平分到扩容后新的两倍数组的两处新的桶里面,每个桶由原来的八个节点到现在的四个节点(也可能是一个桶5个另一个桶3个,极端情况下也可能一个桶8个另一个桶0个。但不管怎样,从统计学上考量的话,原来桶中的节点数大概率会被削减),这样就相当于减少了链表的个数,也就是说减少了在同一个位置上的hash冲突的发生。还有一点需要提一下,源码注释中说明MIN_TREEIFY_CAPACITY的大小要至少为4倍的转成红黑树阈值的数量,这么做的原因也是更多的希望能减少hash冲突的发生。   **那么为什么不直接用红黑树来代替链表,而是采用链表和红黑树来搭配在一起使用呢?**原因就在于红黑树虽然性能更好,但是这也仅是在大数据量下才能看到差异。如果当前数据量很小,就几个节点的话,那么此时显然用链表的方式会更划算。因为要知道红黑树的插入和删除操作会涉及到大量的自旋,以此来保证树结构的平衡。如果数据量小的话,插入删除的性能高效根本抵消不了自旋操作所带来的成本。   **还有一点需要留意的是链表转为红黑树的阈值是8,而红黑树退化成链表的阈值是6。**为什么这两个值会不一样呢?可以试想一下,如果这两个值都为8的话,而当前链表的节点数量为7,此时一个新的节点进来了,计算出hash值和这七个节点的hash值相同,即发生了hash冲突。于是就会把这个节点挂在第七个节点的后面,但是此时已经达到了变成红黑树的阈值了(MIN_TREEIFY_CAPACITY条件假定也满足),于是就转成红黑树。但是此时调用了一次remove操作需要删掉这个新加的节点,删掉之后当前红黑树的节点数量就又变成了7,于是就退化成了链表。然后此时又新加了一个节点,正好又要挂在第七个节点的后面,于是就又变成红黑树,然后又要remove,又退化成链表…可以看到在这种场景下,会不断地出现链表和红黑树之间的相互转换,这个性能是很低的,因为大部分的执行时间都花费在了转换数据结构上面,而我仅仅是做了几次连续的增删操作而已。所以为了避免这种情况的发生,将两个阈值错开一些,以此来尽量避免在阈值点附近可能存在的、频繁地做转换数据结构操作而导致性能变低的情况出现。   这里之所以阈值会选择为8是通过数学统计上的结论得出的,在源码中也有相关注释: 其中中间的数字表示当前这个位置预计发生指定次数哈希冲突的概率是多少。可以看到当冲突概率为8的时候,概率已经降低到了0.000006%,几乎是不可能发生的概率。从这里也可以看出,HashMap作者选择这个数作为阈值是不希望生成红黑树的(红黑树的维护成本高昂)。而同样负载因子默认选择为0.75也是基于统计分析出来的,以下是源码中对负载因子的解释:   负载因子衡量的是数组在扩容前的填充程度,也就是说一个数组真正能存进去的实际容量 = 数组的长度 * 负载因子(比如当前数组的长度为16(桶的个数),负载因子为0.75,那么当数组存进了16 * 0.75 = 12个桶的时候,就会进行扩容操作,而不是等到数组空间满了的时候)。如果为0.5表示的就是数组填充一半后就进行扩容;为1就表示的是数组全部填满后再进行扩容。之所以默认值选择为0.75是在时间和空间成本上做的一个折中方案,一般不建议自己更改。这个值越高,就意味着数组中能存更多的值,减少空间开销,但是会增加hash冲突的概率,增加查找的成本;这个值越低,就会减少hash冲突的概率,但是会比较费空间。   而数组的默认容量为16也是统计上的结果。值得一说的是,如果事先知道了HashMap所要存储的数量的时候,就可以将数组容量传进构造器中,以此来避免频繁地扩容操作。比如我现在要往HashMap中大约放进200个数据,如果不设置初始值的话,默认容量就是16,当存进16 * 0.75 = 12个数据的时候就会扩容一次,扩容到两倍容量32,然后等再存进32 * 0.75 = 24个数据的时候再继续扩容…直到扩容到能存进200个数据为止。所以说,如果能提前先设置好初始容量的话,就不需要再扩容这么多次了。 2 构造器 1/** 2 * HashMap: 3 * 无参构造器 4 */ 5public HashMap() { 6 //负载因子设置为默认值0.75,其他的属性值也都是走默认的 7 this.loadFactor = DEFAULT_LOAD_FACTOR; 8} 910/**11 * 有参构造器12 */13public HashMap(int initialCapacity) {14 //初始容量是自己指定的,而负载因子是默认的0.7515 this(initialCapacity, DEFAULT_LOAD_FACTOR);16}1718public HashMap(int initialCapacity, float loadFactor) {19 //initialCapacity非负校验20 if (initialCapacity < 0)21 throw new IllegalArgumentException("Illegal initial capacity: " +22 initialCapacity);23 //initialCapacity如果超过了设定的最大值(2的30次方),就重置为2的30次方24 if (initialCapacity >

MAXIMUM_CAPACITY) 25 initialCapacity = MAXIMUM_CAPACITY;26 / / load factor non-negative checksum illegal numeric check (when the divisor is 0 or 0.0 and the divisor is 0.0, the result is NaN) 27 if (loadFactor > > 1) 53 / * 54 after two right-shift operations and bitwise OR, n becomes 111101 (110001 | 001100) 55 General interpretation: this time becomes 01111xxxxxxxxxxxxxxxxxxxxxxxxxxx56 * / 57 n | = n > 2 position 58 / * 59 after four right-shift operations and bitwise OR, n becomes 111111 (111101 | 000011) 60 General interpretation: this time becomes 011111111xxxxxxxxxxxxxxxxxxxxxxx61 * / 62 n | = n > 4 63 / * 64 after eight right-shift operations and bit-by-bit OR, for the above example data, it has become a case where all bits are 1, 65 then there is no difference between the following two right-shift operations (because the result of the right-shift must be 0, and the original bit-by-bit or later has not changed) 66 actually after so many times to the right and bit or In order to finally come up with a general interpretation of the number 67 which is all 1: at this time it becomes 01111111111111111xxxxxxxxxxxxxxx68 * / 69 n | = n > > 8 70 / * 71 after 16 right-shift operations and bitwise OR, the general explanation: at this time, it becomes 0111111111111111111111111111111111111111111111111172, so you only need to move 16 bits to the right to stop (of course, you can also continue to move 32 bits to the right, 64 bits. 73, but then it doesn't make any sense, because the result of moving right is 0. The result of bitwise OR will not change) 74 and the maximum value of int MAX_VALUE is 2 to the 31st power-1, which translates to binary that 31 175 have changed this value to the 30th power of 2 at the previous 25th line of code, and 1 is followed by 30 zeros, that is, 010000.00076, so the maximum cap passed into this method can only be this number, and after-1 has been moved to the right several times, it becomes 00111. 111. That is, 30 177s will finally be revised to the 30th power of 2 after + 1 at line 90, that is, MAXIMUM_CAPACITY78 * / 79 n | = n > > 16 80 / * 81 n if it is less than 0, it corresponds to the case where the incoming cap itself is 0. After moving to the right, n becomes-1 (in fact, not moving to the right will not change the result at all, 82 because the binary number of-1 is 32 1s, and any digit will not change). At this time, the result 1 (to the power of 2) is 8384, which shows that the final effect is that the original data starts from the position where the first highest bit is 1. All the following bits, whether 0 or 1, are changed to 185. finally, after adding one more bit at the 90th line of code, it becomes a number in which the highest bit is 1 and the remaining bits are all 0, but the number of bits is one more than the original data, that is, the minimum second power of the original data is 8687. Now you can consider the situation mentioned earlier if the cap itself is the power of 2. If there is no 47th line of code operation, then after constantly moving to the right, 88 gets a binary number that is all 1, that is, the result of this number * 2-1, and finally adds 1 to become twice the original data, which is obviously wrong 89 * / 90 return (n

< 0) ? 1 : (n >

= MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY: n + 1 key 91} 3 put method 1 * HashMap: 3 * / 4public V put (K key, V value) {5 return putVal (hash (key), key, value, false, true) 6} 7 8Universe * 9 * calculate the hash value of key 10 * Note here the hashCode method of key is called directly, and its high 16 bits and low 16 bits are XOR as the final hash (the int value in Java is 32 bits) 11 * so why do you do so? Because the method used in the subsequent determination of the position of the barrel bin is (table.length-1) & hash 12 * where the length of the array must be a power of 2 (if not, it will be converted to a minimum power of 2 greater than this value, see the tableSizeFor method for details), then the value of the length of the array minus 1 is 13 * 11111 in binary. The low bits are all a number of ones. In this way, if the hash value calculated by this method is bitwise summed, the result is that a 14 * preserves all these low bits of the hash value. To put it bluntly, it is the same result as hash% table.length, that is, 15% of the array length is left. But it is not efficient to use% as the remainder directly, and this bitwise sum method can only be applied to the case where the array length is a power of 2. If this is not the case, the equivalent exchange cannot be done. 16 * 17 * it can be seen from the above that the bit-by-bit method will only use information with a low hash value, and it doesn't matter what the high-level information is, anyway, it won't be recorded in the final hash calculation. 18 * that would be a pity and a waste. It would be even better if high-level information is also recorded. So at line 26 below, 19 * XOR its high 16-bit and low 16-bit XOR, in order to merge the high 16-bit feature information into the hash value, make the hash as hash as possible and reduce the occurrence of hash conflicts 20 * it is also simple and efficient to use an XOR operation at the same time. Unlike some hash functions, it will not do a lot of computation before generating a hash value (for example, the implementation of this block 21 * in Java 7 is to move to the right many times) 22 * 23 * / 24static final int hash (Object key) {25 int h 26 return (key = = null)? 0: (h = key.hashCode ()) ^ (h > > 16); 27} 28 29 Universe * 30 * at line 5: 31 * / 32final V putVal (int hash, K key, V value, boolean onlyIfAbsent, 33 boolean evict) {34 Node [] tab; 35 Node p; 36 int n, I 37 if ((tab = table) = = null | | (n = tab.length) = = 0) 38 / / initialize the current array if it is not already initialized. As you can see, the initialization of HashMap is delayed to 39 n = (tab = resize ()) .length in the put method. 40 if ((p = tab [I = (n-1) & hash]) = null) 41 / * 42 find the location of the bucket in which the data is inserted by (n-1) & hash (as to why it is detailed in the notes to the hash method) 43 if there is no data on the bucket, just create a new Node node and insert it into the bucket. That is to say, quickly judge 44 * / 45 tab [I] = newNode (hash, key, value, null) 46 else {47 / * 48 otherwise if there is data on the bucket, execute the following logic 49 50 e is used to determine whether the newly inserted node can be inserted, if not null means to find the location of the new node to be inserted 51 * / 52 Node e; 53 K k 54 if (p.hash = = hash & & 55 ((k = p.key) = = key | | (key! = null & & key.equals (k) 56 / * 57 if the hash value of the first node on the bucket is the same as the hash value to be inserted, and the key is the same, 58 record this location e The overlay of the value will be done later (quick judgment mode) 59 * / 60 e = p 61 / / this means that the node to be inserted is not the same as the first node in the current bucket, but their calculated hash value is the same 62 else if (p instanceof TreeNode) 63 / / if the node is a red-black tree 64 e = ((TreeNode) p) .putTreeVal (this, tab, hash, key, value) to execute the insertion node logic of the red-black tree (this article will not expand the red-black tree analysis) 65 else {66 / / here means that there are multiple nodes on the linked list, traversing the nodes on the linked list (the first node does not need to be judged, because it has been judged at line 54) 67 for (int binCount = 0; + + binCount) {68 if ((e = p.next) = = null) {69 / * 70 if the next position of the current linked list is null, it means that it has been cycled to the last node and the same one has not been found 71 the new node to be inserted at this time is inserted into the last 72 * / 73 p.next = newNode (hash, key, value, null) 74 / / if the number of linked lists has reached the threshold of converting to a red-black tree by this time, it will be analyzed before converting 75 if (binCount > = TREEIFY_THRESHOLD-1) 76 / * 77 to see if it will really be converted to a red-black tree. You need to see whether the number of buckets in the current array 78 is greater than or equal to MIN_TREEIFY_CAPACITY. If it is less than, it will only expand 79 * / 80 treeifyBin (tab, hash). 81 break 82} 83 if (e.hash = = hash & & 84 ((k = e.key) = = key | | (key! = null & & key.equals (k) 85 / / if this node already exists in the linked list, you can exit the loop (e has been assigned at line 68) 86 break 87 p = e; 88} 89} 90 if (e! = null) {91 / * 92 if you find a location to insert, overwrite 93 94 to record the old value, and finally return 95 * / 96 V oldValue = e.value 97 / / if onlyIfAbsent is false, or if the old value is null, the new value overrides the old value 98 if (! onlyIfAbsent | | oldValue = = null) 99 e.value = value;100 / / hook method, empty implementation 101afterNodeAccess (e); 102return oldValue 103} 104} 105 / * 106 walked here to explain that before walking to line 45, a new node was added. 107108 number of modifications + 1 mod Count is used to do quick failure. If a change is made in the iterator, modCount! = expectedModCount,109 indicates that the HashMap has been modified by another thread and a ConcurrentModificationException exception will be thrown (I explained this process in detail in my article analyzing the source code of ArrayList110, which is also similar in HashMap) Since a new node has been added, it is necessary to determine whether capacity expansion is needed at this time. 115If the current array capacity has exceeded the set threshold threshold, perform the expansion operation 116117if (+ + size > threshold) 118resize (); 119 / hook method, empty implementation 120 afterNodeInsertion (evict); 121return null;122} 4 resize method

   calls the resize method to initialize or expand capacity at lines 39 and 118 of the putVal method above. And the resize method is also a method that compares the essence and bright spot in the HashMap source code. The specific implementation can be divided into two parts: setting the expansion flag bit and the specific data migration process. Let's first take a look at the first half of the source code of the resize method:

1 * HashMap: 3 * capacity expansion operation (if the current array is empty, it becomes an initialization operation for the array) 4 * / 5final Node [] resize () {6 Node [] oldTab = table; 7 / / record the capacity of the old array (current array). If null, it is 0 8 int oldCap = (oldTab = = null)? 0: oldTab.length; 9 / * 101. When calling the parameter constructor, threshold stores the least second power greater than or equal to the current array capacity and assigns it to oldThr 11 2. When calling the no-parameter constructor, threshold=0 123. Before the array is no longer empty, now when expanding, threshold stores the capacity of the old array * load factor 13 * / 14 int oldThr = threshold; 15 / / newCap refers to the number of the array after expansion, and newThr refers to the result of newCap * load factor (take the integer part if there is a decimal number) 16 int newCap, newThr = 0 17 / / the following is the analysis of various situations, and then the process of assigning newCap and newThr tag bits 18 if (oldCap > 0) {19 if (oldCap > = MAXIMUM_CAPACITY) {20 / * 21 if the current array capacity has exceeded the set maximum, set threshold to the maximum value of int And then return the current array capacity 22, which means that there is no expansion operation 23 * / 24 threshold = Integer.MAX_VALUE in this case. 25 return oldTab 26} else if ((newCap = oldCap = DEFAULT_INITIAL_CAPACITY) 28 / * 29 if the current array capacity * 2 does not exceed the set maximum value, and the current array capacity is greater than or equal to the initial capacity 16, 30 sets newCap to oldCap * 2 The significance of the condition that newThr is set to oldThr * 231 oldCap > = DEFAULT_INITIAL_CAPACITY will be explained later that 32 * / 33 newThr = oldThr 0) 35 / * 36 explain oldCap=0 here, that is, it is time to initialize the array. We just saw that if it is the default no-parameter constructor, 37 threshold will not be assigned, that is, 0. But if a parameterized constructor is called, the threshold is assigned at the initial stage of the constructor, 38 and this if condition corresponds to this case. This is set to oldThr because you can see in line 14 above that oldThr points to threshold, 39 which means that the value of oldThr is greater than or equal to the least second power of "current array capacity" (note that I put quotation marks on "current array capacity", which means that 40 is not the actual physical existing array capacity (the current physical capacity is 0). But the capacity set through the constructor), 41 must be a number greater than 0. Since this oldThr now represents the new capacity I want to set, just assign newCap to this number 42 * / 43 newCap = oldThr 44 else {45 / * 46 as mentioned above, this corresponds to the invocation of a no-parameter constructor. When threshold=0 47 assigns newCap to 16 * 0.75 = 12, the default value is 48 * / 49 newCap = DEFAULT_INITIAL_CAPACITY; 50 newThr = (int) (DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY). 51} 52 if (newThr = = 0) {53 / * 54 there are two situations in which the program can go this far: 55 the first: if newThr is not assigned in the if condition at line 43 code, then ft = new array capacity * load factor is calculated. 56 if the array capacity and ft do not exceed the set maximum, assign newThr to ft Otherwise, the maximum value assigned to int 57 58 is the second: note the condition at line 27 above, and newThr is not assigned if oldCap is smaller than 16. 59 the root cause of this situation is not the call to the resize method this time, but the call to the last initialization. For example, it is clear: 60 at first I called the constructor new HashMap (2), and after calculation, threshold=2. Then calling the put method to trigger initialization will jump into the method 61 at this time oldCap=0,oldThr=2. Then you go to the if condition at line 34, newCap=2, and then you go to line 52, 62, which is the if condition: newThr=1, and then modify threshold=1. After that, the resize method will take a specific expansion operation, 63 but the flag bits of these settings will not be changed, and the method will be exited after the expansion. This is the first time the procedure is called. 64 then after I successfully inserted the node at this time, I called the put method again. At this point, the node will still be successfully inserted, 65 but at line 117 of the above putVal method, it is found that my current size=2 is greater than threshold=1. Then the method resize will be called to expand the capacity 66, and at this time oldCap=2,oldThr=1. You will go to the if condition at line 26, newCap=4, and when the oldCap=2 is less than 16, 67 will jump out of the if condition, and then go to line 52, the if condition: newThr=3, and then modify threshold=3 68, which is the correct situation, because the newThr itself is the result of the newCap * load factor. That is, 4 * 0.75 = 369, then if there is no judgment condition at line 27 in the source code, it will jump to line 33 of the code, the oldThr=1 at this time, so newThr=2 70 can see that newCap=4 and newThr=2 at this time, an error has occurred, 4 * 0.75 is not equal to 2. So the emergence of the condition 71 oldCap > = DEFAULT_INITIAL_CAPACITY at line 27 changed the operation 72 of updating newThr at line 33 to line 97 to solve the problem of inaccurate newThr updates 73 74. Of course, this is just a demonstration of a possible error, not the essence. In fact, I demonstrated this result by comparing some other data and found that 75 there will be this difference if the number of buckets exceeds 16. In fact, the above errors can be generalized: one is rounded after the original capacity * load factor, and then rounded by * 2,76 and the other is the original capacity * 2 * load factor. The difference lies in the timing of rounding. The difference is 77 after the original capacity * load factor is a decimal number (if it is an integer there will be no difference, and not all numbers with decimals will be different), 78, this decimal number is rounded and then * 2, the difference is magnified, resulting in the final difference. Another clue is 79 oldCap > = DEFAULT_INITIAL_CAPACITY at line 27. Note that it is greater than or equal to, not less than or equal to, that is, 80 will enter this if condition only if it is greater than or equal to 16 (quick calculation of threshold results), and less than 16 will enter this if condition 81 (accurate calculation of threshold results). So if the number of buckets is greater than 16 and the threshold is one more and one less, it may not be so important. 82 for example, 1024 buckets, whether I expand when 819 buckets are full or at 818 does not seem to make a big difference. In this case, 83 because I subtract the threshold threshold by one more, the hash conflict will become higher? Or is the space more wasted? In fact, 84 is not necessarily 84 after all, the amount of data is placed here, and the load factor is generally less than 1, so the difference is 1 at most. This difference will also become less and less important as the 85 data becomes larger and larger. But if, as in the previous example, I expand the capacity of 4 buckets when 2 buckets are full or 3 buckets are full, 86 there will be a big difference. In these two cases, the comparison of the probability of hash conflict is definitely relatively large. Maybe one is 20%, the other is 40%, and when the number of buckets is relatively large, the difference may be 1.2% and 1.3% (this number is casually cited by me). In this way, when the amount of data is 88 and the expansion method is called frequently (for example, I want to deposit a very large capacity but do not specify the initial capacity), I sacrifice the accuracy of the calculation threshold 89 (if the load factor is set properly, there is only one difference), but in exchange for efficient execution speed (note that the left shift operation is performed at line 33. 90 and at line 96 is multiplication, followed by a ternary operator, and then rounded) However, when the amount of data is small, it is obviously more important to calculate accurately, 91 and when the amount of data is small, there is no performance difference. After all, the threshold set here is 16. So in the comment at line 14 above, 92 threshold has a fourth value: the old array capacity * load factor-1 (depending on the value set by the load factor and the array capacity), 93 but this can be counted as the third special case. So to sum up: the significance of adding the 27th line of code is to sacrifice some accuracy slightly when the number of buckets is relatively large and the 94 expansion method is called frequently, but it can make the threshold threshold calculation faster, and it is an optimization method 95 * / 96 float ft = (float) newCap * loadFactor; 97 newThr = (newCap)

< MAXIMUM_CAPACITY && ft < (float) MAXIMUM_CAPACITY ? 98 (int) ft : Integer.MAX_VALUE); 99 }100 //做完上述操作后,将threshold的值也更新一下101 threshold = newThr;102103 //...104}   上面在把newCap、newThr和threshold标记位都设置好了后,下面就是具体的数据迁移的过程,也就是resize方法后半部分的实现: 1/** 2 * HashMap: 3 */ 4final Node[] resize() { 5 //... 6 7 //此时newCap和newThr标记位都已经设置好了,将根据newCap新的容量来创建一个新的Node数组,以此来替代旧数组 8 @SuppressWarnings({"rawtypes", "unchecked"}) 9 Node[] newTab = (Node[]) new Node[newCap]; 10 table = newTab; 11 //如果旧数组为null,也就是说现在是对数组做初始化的操作。那么直接就返回创建的新数组newTab就行了 12 if (oldTab != null) { 13 //遍历旧数组上的每一个桶 14 for (int j = 0; j < oldCap; ++j) { 15 Node e; 16 //如果旧数组的这个桶上没有数据的话,就跳过它,不进行扩容 17 if ((e = oldTab[j]) != null) { 18 //旧数组上的这个节点赋值为null,便于快速GC 19 oldTab[j] = null; 20 if (e.next == null) 21 /* 22 第一个节点后面没有后续节点,也就意味着这个桶上只有一个节点, 23 那么只需要通过计算找出新位置放进去就行了,这里也就是在做快速迁移 24 */ 25 newTab[e.hash & (newCap - 1)] = e; 26 else if (e instanceof TreeNode) 27 //如果是红黑树,就执行红黑树的迁移逻辑(红黑树的分析本文不做展开) 28 ((TreeNode) e).split(this, newTab, j, oldCap); 29 else { 30 /* 31 走到这里说明当前这个桶里面有不止一个的节点,此时就会做链表上多个节点的迁移工作 32 首先来说一下大前提:现在旧数组上桶的位置是j,而准备放进新数组的桶位置有两个:一个是j, 33 也就是说新数组上也会放在j这个位置上;另一个是j+旧数组的容量。比方说当前桶的位置15, 34 而旧数组的容量是16,那么新数组上第二个将要插入的桶的位置就是15 + 16 = 31 35 36 说完了大前提,再来看下面的代码。以下定义了四个指针位置, 37 分别就对应了上面说的两个新插入桶位置的头尾指针 38 */ 39 Node loHead = null, loTail = null; 40 Node hiHead = null, hiTail = null; 41 Node next; 42 do { 43 //next指向当前节点的下一个节点 44 next = e.next; 45 /* 46 那么现在的问题就是通过什么办法来判断到底是插入到j位置还是j+旧数组容量的位置呢? 47 其实也很简单,就是通过节点的哈希值和旧数组的容量按位与的方式来判断的。旧数组容量 48 经过上面的分析后可以知道,肯定是一个2的幂,也就是1000...000,最高位为1,而剩余位都是0的形式 49 这样按位与的结果就是取出了节点hash上的那个与旧数组所对应的1的那个位置上的数。 50 比如说节点的hash值是1010110,旧数组容量是16(1000),那么按位与的结果就是 51 取出了hash值中从右往左第四位的值,即0。也就是说,存在新数组哪个位置上,取决于hash值 52 所对应旧数组容量为1的那个位置上到底是0还是1。从这里也可以看出,除了 53 (table.length - 1) & hash这种方式用来判断插入桶的位置,是必须要求数组容量是2的幂之外, 54 在扩容做迁移的时候,也必须要求了这点 55 56 按位与的结果只有两种,不是0就是1,所以如果为0的话最后就会插入到新数组的j位置, 57 为1就插入到j+旧数组容量的位置(后面会解释如果换一下的话,到底行不行) 58 */ 59 if ((e.hash & oldCap) == 0) { 60 if (loTail == null) 61 //如果当前是第一个插进来的节点,就将loHead头指针指向它 62 loHead = e; 63 else 64 /* 65 不是第一个的话,就将loTail尾指针的下一个next指针指向它,也就是把链拉上 66 loTail在之前已经指向了最后一个节点处 67 */ 68 loTail.next = e; 69 //更新一下loTail尾指针,重新指向此时的最后一个节点处 70 loTail = e; 71 } else { 72 //(e.hash & oldCap) == 1 73 if (hiTail == null) 74 //如果当前是第一个插进来的节点,就将hiHead头指针指向它 75 hiHead = e; 76 else 77 /* 78 不是第一个的话,就将hiTail尾指针的下一个next指针指向它,也就是把链拉上 79 hiTail在之前已经指向了最后一个节点处 80 */ 81 hiTail.next = e; 82 //更新一下hiTail尾指针,重新指向此时的最后一个节点处 83 hiTail = e; 84 } 85 //如果当前遍历链表上的节点还没有到达最后一个节点处,就继续循环去判断 86 } while ((e = next) != null); 87 /* 88 走到这里说明已经将原来的旧数组上的链表拆分完毕了,现在分成了两个链表,low和high 89 接下来需要做的工作就很清楚了:将这两个链表分别插入到j位置和j+旧数组容量的位置就可以了 90 */ 91 if (loTail != null) { 92 /* 93 如果low链表有节点的话(没节点说明之前的按位与的计算结果都是1),就重新更新一下low链表上 94 最后一个节点的next指针指向null。这步操作很重要,因为如果之前这个节点不是 95 旧数组桶上的最后一个节点的话,next是有值的。不改成null的话指针指向就乱了 96 */ 97 loTail.next = null; 98 //将链表插入到j位置 99 newTab[j] = loHead;100 }101 if (hiTail != null) {102 /*103 如果high链表有节点的话(没节点说明之前的按位与的计算结果都是0),就重新更新一下high链表上104 最后一个节点的next指针指向null。这步操作很重要,因为如果之前这个节点不是105 旧数组桶上的最后一个节点的话,next是有值的。不改成null的话指针指向就乱了106 */107 hiTail.next = null;108 //将链表插入到j+旧数组容量的位置109 newTab[j + oldCap] = hiHead;110 }111 }112 }113 }114 }115 return newTab;116}   在Java 7的HashMap源码中,transfer方法是用来做扩容时的迁移数据操作的。其实现就是通过遍历链表中的每一个节点,重新rehash实现的。在这其中会涉及到指针的修改,在高并发的场景下,可能会使链表上的一个节点的下一个指针指向了其前一个节点,也就是形成了死循环,死链(具体形成过程不再展开):   这样再去遍历链表的时候就永远不会停下来,出现了bug。而Java 8中通过形成两个链表,节点hash值在数组容量二进制数为1的那个位置处去按位与判断是0还是1,以此来选择插入的方式很好地解决了这个问题,而且不用每一个节点rehash,提高了执行速度。   既然说到了不用rehash,那么这里想要探究一下在数组扩容时,选择新插入数组的位置是原位置和原位置+旧数组容量,为什么这里加上的是旧数组容量呢?加别的值不可以吗?其实这里加旧数组容量是有原因的。我们都知道,数组容量必须是2的幂,即100…000,而新数组的容量是原数组的2倍,也就是把原来值中的"1"往左移了一位,而我们在判断插入桶位置时用的方式是(数组容量 - 1)& hash值。把这些信息都整合一下,我们就知道在新数组中计算桶位置和在旧数组中计算桶位置的差异,其实就在于旧数组二进制为1上的这位上。如果该位是0,那就是和原来旧数组是同一个位置,如果是1,就是旧数组位置处+旧数组的容量。下面举个例子:   两个节点此时计算出来桶的位置都是1010,即第10个桶。它们都会被放在第10个桶中的链表中。   而现在数组扩容了,数组容量变为了32,我们再来看看结果会怎样:   这时候发现(n - 1) & hash的结果不一样了,节点1是01010,节点2是11010。也就是说,我们在get方法执行的时候(get方法也是通过(n - 1) & hash的方式来找到桶的位置),找到节点1是在第10个桶,节点2是在第26个桶,这两个节点之间相差16个桶,这不就是旧数组的容量吗?现在是不是恍然大悟了,我们当初在扩容时,将节点的hash值和旧数组容量进行按位与,其实也就是在找上图红框框中的那个位置。如果为0,就将节点1放在新数组中第10个桶中(1010),也就是原位置处;如果为1,就将节点2放在新数组中第26个桶中(1010+10000),也就是原位置+旧数组容量处。这样做的话,我在get方法执行的时候也能保证正确执行,能正确找到节点在新数组中桶的位置。同时也说明了,这个放入的策略是不能换的。也就是说,不能是如果为1的话最后就会插入到新数组的原位置,为0就插入到原位置+旧数组容量的位置。如果这么做的话,最后get方法在查找该节点的时候,就找不到了(而实际上还存在)。所以通过Java 8中的这种扩容方式,既能计算出正确的新桶位置,又能避免每一个节点的rehash,节省计算时间,实在是妙哉。 5 get方法 1/** 2 * HashMap: 3 */ 4public V get(Object key) { 5 Node e; 6 //如果获取不到值,或者本身插入的value就是null的话,就返回null,否则返回具体的value 7 return (e = getNode(hash(key), key)) == null ? null : e.value; 8} 910final Node getNode(int hash, Object key) {11 Node[] tab;12 Node first, e;13 int n;14 K k;15 //如果数组没有初始化,或者计算出来的桶的位置为null(说明找不到这个key),就直接返回null16 if ((tab = table) != null && (n = tab.length) >

0 & & 17 (first = tab [(n-1) & hash])! = null) {18 if (first.hash = = hash & & 19 ((k = first.key) = = key | | (key! = null & & key.equals (k) 20 / * 21 if the hash value of the first node on the bucket is the same as the hash value you are looking for, and key is the same 22, return directly (quick judgment mode) 23 * / 24 return first 25 / * 26 if the next node is null, that is, when there is only one node on the current bucket, 27 and the previous node is not, then directly return null, that is, 28 * / 29 if ((e = first.next)! = null) {30 if (first instanceof TreeNode) 31 / / if the node is a red-black tree 32 return ((TreeNode) first) .getTreeNode (hash, key) to execute the lookup node logic of the red-black tree (this article will not expand the red-black tree analysis) 33 / * 34 come here to show that there are multiple nodes on the linked list, traversing each node on the linked list to find (the first node does not need to be judged 35 because it has been judged at line 18) 36 * / 37 do {38 if (e.hash = = hash & & 39 ((k = e.key) = = key | | (key! = null & & key.equals (k) 40 return e 41} while ((e = e.next)! = null); 42} 43} 44 return null;45} 6 remove method 1 * 2 * HashMap: 3 * / 4public V remove (Object key) {5 Node e 6 / / return null if the node to be deleted cannot be found, otherwise return the deleted node 7 return (e = removeNode (hash (key), key, null, false, true)) = = null? 8 null: e.value; 9} 1011final Node removeNode (int hash, Object key, Object value,12 boolean matchValue, boolean movable) {13 Node [] tab;14 Node pacif15 int n, index 16 / / if the array is not initialized, or if the position of the calculated bucket is null (indicating that the key cannot be found), return null17 if ((tab = table)! = null & & (n = tab.length) > 0 & 18 (p = tab [index = (n-1) & hash])! = null) {19 Node node = null, eposition 20K KX 21V v 22 if (p.hash = = hash & & 23 ((k = p.key) = = key | | (key! = null & & key.equals (k) 24 / / if the hash value of the first node on the bucket is the same as the hash value you are looking for, and the key is the same, record the location 25 node = p 26 else if ((e = p.next)! = null) {27 / / e is the next node of the first node on the bucket. If not, it means that the node to be deleted cannot be found, so return null28 if (p instanceof TreeNode) 29 / / if the node is a red-black tree. 30 node = ((TreeNode) p) .getTreeNode (hash, key) to execute the lookup node logic of the red-black tree (this article will not expand the red-black tree) 31 else {32 / * 33 come here to show that there are multiple nodes on the linked list. Traverse each node on the linked list to find it, and record the location 34 if you find it. (the first node does not need to be judged. Because it has been judged at line 22) 35 * / 36 do {37 if (e.hash = = hash & & 38 ((k = e.key) = = key | | 39 (key! = null & & key.equals (k) {40 Node = e 41 break;42} 43 / / at this time p records the previous node of the current node 44 p = escape 45} while ((e = e.next)! = null) 46} 47} 48 / * 49 if the node to be deleted is found, and if matchValue is true (matchValue is true means it can be deleted only if value is equal) 50 and value is equal (if matchValue is false, you only need to determine whether the first condition node is not null) 51 of course, if it is not equal, return null directly That is, 52 * / 53 if (node! = null & &! matchValue | | (v = node.value) = = value | | 54 (value! = null & & value.equals (v) {55 if (node instanceof TreeNode) 56 / / if the node is a red-black tree. To execute the delete node logic of the red-black tree (this article will not expand the analysis of the red-black tree) 57 ((TreeNode) node) .removeTreeNode (this, tab, movable) 58 else if (node = = p) 59 / * 60 if the node to be deleted is the first node on the bucket, directly assign the first position of the current bucket to the next node 61 (null if next is null) 62 * / 63 tab [index] = node.next 64 else65 / / not the first node on the bucket points the next of the previous node to the next node, that is, remove the node node from the linked list and wait for GC66 p.next = node.next;67 / / number of modifications + 1 (rapid failure mechanism) 68 + + modCount 69 / / because the node is to be deleted, if it is found, size will use the size;71 / / hook method, with 72 afterNodeRemoval (node) empty; 73 return node;74} 75} 76 return null;77} 7 clear method 1 HashMap * 2 * HashMap: 3 * / 4public void clear () {5 Node [] tab 6 / / number of modifications + 1 (fast failure mechanism) 7 modCount++; 8 if ((tab = table)! = null & & size > 0) {9 / / size records the number of buckets that currently have data, because the data is to be emptied here, so reset size to 010 size = 0 size 11 / / and set each bucket in table to null at the same time 12 for (int I = 0; I < tab.length) + + I) 13 tab [I] = null;14} 15}

The longer I work in this business, the more I feel: ten thousand tall buildings rise from the ground, this is absolutely B is the truth! Many things at the bottom of the application business are easy to be overlooked for too long. The plan at the beginning of this year is to summarize the commonly used JDK source code tools. As the end of the year is approaching, take advantage of the recent free time, hurry to make up.

ArrayList, do you really understand? Talk about the difference of remove between foreach and iterator

Have you ever wondered why Internet companies always ask about collections? Talk about the classic data structure HashMap

An exclusive Mode for in-depth Analysis of AQS Source Code-detailed explanation of ReentrantLock Lock characteristics

Shared mode for in-depth analysis of AQS source code-Why is there a PROPAGATE status in AQS?

Conditional queue for in-depth analysis of AQS source code-how is the blocking queue in Java implemented?

CountDownLatch, an application tool for in-depth analysis of AQS source code (under planning)

CyclicBarrier, an application tool for in-depth analysis of AQS source code (under planning)

ConcurrentHashMap source code analysis-the implementation of ConcurrentHashMap in Java 8 and bug? And there's more than one! This pit is still relatively large, and we will focus on summarizing it later!

ThreadPoolExecutor source code analysis-ask the broken Java thread pool execution process, in fact, if asked in detail, many people still look confused?

ScheduledThreadPoolExecutor source code analysis-focus on how repeatedly timed thread pools achieve delayed execution and periodic execution!

ThreadLocal source code analysis-focus on summary, memory leaks, soft references, weak references, false references, interviews often like to ask, I also like to ask someone else

Red-black tree TreeMap, LinkedHashMap (not sure if you want to write, have time to write, depending on the project)

Ordered and threaded Map container ConcurrentSkipListMap (jump table) in-depth understanding

LinkedList (not sure if you want to write, have time to write, depending on the project)

If the project is not in a hurry at the end of the year, I will be writing an article on a commonly used sorting algorithm when I have time.

Each summary is an examination of the degree of mastery of knowledge points, the technology is not easy, improve a little bit every day, share with you.

On the classic data structure HashMap and line-by-line analysis of each key point to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report