Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the underlying implementation, loading factor, capacity value and endless loop of HashMap in Java

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to understand the underlying implementation, loading factor, capacity value and dead loop of HashMap in Java". In daily operation, I believe that many people have doubts about how to understand the underlying implementation, loading factor, capacity value and dead loop of HashMap in Java. The editor consulted all kinds of materials and sorted out simple and useful operation methods. I hope it will be helpful for you to answer the question of "how to understand the underlying implementation of HashMap, loading factors, capacity values and endless loops in Java"! Next, please follow the editor to study!

Introduction to HashMap

HashMap is an unordered key-value container based on a hash table. Its keys and values are allowed to be set to null, and it is not thread-safe.

HashMap underlying implementation

In jdk 1.7, HashMap is implemented as array + linked list.

Red-black tree is introduced in jdk1.8, and the bottom layer of HashMap becomes array + linked list + red-black tree implementation.

A brief introduction to the red-black tree

The red-black tree is a special balanced binary tree, which has the following characteristics:

The node is red or black

The root node is black

All the leaves are black. (leaves are NULL nodes)

The two children of each red node are black (there can be no two consecutive red nodes on all paths from each leaf to the root)

All paths from any node to each of its leaves contain the same number of black nodes.

So the time complexity of red-black tree is O (lgn).

Jdk1.8: array + linked list + red-black tree

At the bottom of the HashMap is an array, and the array index value of the element is determined by the hash value of the element (the hash value of key in key-value), which may create a special case-- different key hashes are the same.

In this case, a linked list is introduced. If the hash value of key is the same, a linked list is stored in the index of the array. The linked list contains the same value of all key hash values, which solves the problem of hash conflict.

However, if a large number of special cases with the same hash value occur, the linked list is very long, which will seriously affect the performance of HashMap, because the query efficiency of the linked list needs to traverse all nodes. So the red-black tree is introduced in jdk1.8. When the length of the linked list is greater than 8 and the capacity of the HashMap is greater than 64, the linked list will be converted into a red-black tree.

/ / jdk1.8 / / HashMap#putVal / / binCount is the length counter of the linked list. When the length of the linked list is greater than or equal to 8, execute the tree method / / TREEIFY_THRESHOLD = 8 if (binCount > = TREEIFY_THRESHOLD-1) treeifyBin (tab, hash); / / HashMap#treeifyBin final void treeifyBin (Node [] tab, int hash) {int n, index; Node e / / MIN_TREEIFY_CAPACITY=64 / / if the size of the HashMap is less than 64, only the capacity is expanded and the if is not treed (tab = = null | | (n = tab.length))

< MIN_TREEIFY_CAPACITY) resize(); else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode hd = null, tl = null; do { TreeNode p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((ee = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); } } 加载因子为什么是0.75 所谓的加载因子,也叫扩容因子或者负载因子,它是用来进行扩容判断的。 假设加载因子是0.5,HashMap初始化容量是16,当HashMap中有16 * 0.5=8个元素时,HashMap就会进行扩容操作。 而HashMap中加载因子为0.75,是考虑到了性能和容量的平衡。 由加载因子的定义,可以知道它的取值范围是(0, 1]。 如果加载因子过小,那么扩容门槛低,扩容频繁,这虽然能使元素存储得更稀疏,有效避免了哈希冲突发生,同时操作性能较高,但是会占用更多的空间。 如果加载因子过大,那么扩容门槛高,扩容不频繁,虽然占用的空间降低了,但是这会导致元素存储密集,发生哈希冲突的概率大大提高,从而导致存储元素的数据结构更加复杂(用于解决哈希冲突),最终导致操作性能降低。 还有一个因素是为了提升扩容效率。因为HashMap的容量(size属性,构造函数中的initialCapacity变量)有一个要求:它一定是2的幂。所以加载因子选择了0.75就可以保证它与容量的乘积为整数。 // 构造函数 public HashMap(int initialCapacity, float loadFactor) { // …… this.loadFactor = loadFactor;// 加载因子 this.threshold = tableSizeFor(initialCapacity); } /** * Returns a power of two size for the given target capacity.返回2的幂 * MAXIMUM_CAPACITY = 1 >

> 1; n | = n > 2; n | = n > 4; n | = n > 8; n | = n > 16; return (n

< 0) ? 1 : (n >

= MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY: n + 1;}

Why is the capacity of HashMap to the n power of 2?

The default initial capacity of HashMap is 16, and each expansion is twice the original capacity. The 16 and 2 times here ensure that the capacity of the HashMap is 2 to the nth power, so what is the reason for this design?

Reason 1: and efficient operation

And operation & based on the binary value and 1 at the same time, the result is 1, otherwise it is 0. For example, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0. The reason for using and computing is that it is very efficient for computers.

Reason 2: it is advantageous to fully hash elements and reduce Hash collisions

In the putVal function that adds elements to HashMap, there is a piece of code like this:

/ / n is the capacity, and hash is the hash value of the element if ((p = tab [I = (n-1) & hash]) = = null) tab [I] = newNode (hash, key, value, null)

It calculates the position of the element in HashMap with I = (n-1) & hash when it is added.

When the capacity of HashMap is 2 to the n power, its binary value is 100000. (n zeros), so the value of nmur1 is 011111. (n 1s), so that the values of (n-1) & hash can be fully hashed.

For example, assuming that the capacity is 16, there are four hashes of 1111, 1110, 1011, 1001 that will be added, which are different from the hashes of nmuri 1 (binary = 01111 of 15) are 1111, 1110, 1110, and 1011, respectively.

Assuming that the capacity is not to the n power of 2 and 10, then the results of its operation with the above four hash values are 0101, 0100, 0001 and 0001, respectively.

You can see that the last two values collide, from which you can see that a non-2 power of n increases the probability of hash collisions. So the capacity of HashMap is set to the n power of 2 to facilitate the full hashing of elements.

How HashMap causes an endless cycle

HashMap will lead to a dead loop in jdk1.7. Because the expansion operation uses plug-in, a circular linked list may be generated in a multi-threaded environment, which leads to an endless loop. The tail interpolation method is used in jdk1.8 to avoid the dead cycle.

At this point, the study on "how to understand the underlying implementation of HashMap, loading factors, capacity values and endless cycles in Java" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report