How to choose the default capacity of HashMap 07/01 Update SLTechnology News&Howtos

How to choose the default capacity of HashMap

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

How to choose the default capacity of HashMap, in view of this question, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

Collection is often used in the daily development of Java development, and as a typical data structure of KMel V structure, HashMap must be no stranger to Java developers. In daily development, we often create a HashMap:Map map = new HashMap () as follows

However, have you ever thought that in the above code, we did not specify the capacity for HashMap, so what is the default capacity of a newly created HashMap at this time? Why? This article will analyze this problem. What is capacity in Java, there are two relatively simple data structures for saving data: arrays and linked lists. Arrays are characterized by easy addressing and difficult insertion and deletion, while linked lists are characterized by difficult addressing and easy insertion and deletion. HashMap is the combination of arrays and linked lists, giving full play to the advantages of both, which can be understood as an array of linked lists. In HashMap, there are two confusing key fields: size and capacity, where capacity is the capacity of Map, and size we call it the number of elements in Map. It is easier for you to understand simply: HashMap is a "bucket", so capacity is the maximum number of elements the bucket can currently hold, while the number of elements (size) indicates how many elements the bucket has already contained. Like the following code: Map map = new HashMap ()

Map.put ("hollis", "hollischuang")

Class mapType = map.getClass ()

Method capacity = mapType.getDeclaredMethod ("capacity")

Capacity.setAccessible (true)

System.out.println ("capacity:" + capacity.invoke (map))

Field size = mapType.getDeclaredField ("size")

Size.setAccessible (true)

System.out.println ("size:" + size.get (map))

Output result: capacity: 16, size: 1

Above we define a new HashMap, put an element to it, and then print capacity and size by reflection, with a capacity of 16, and the number of elements already stored is 1. From the previous example, we found that when we create a HashMap, if we don't specify its capacity, we get a Map with a default capacity of 16, so where does this capacity come from? And why this number? Capacity and hash to explain the reason for this default capacity, we first need to know what the use of this capacity is. We know that capacity is the number of buckets in a HashMap, so when we want to put an element in a HashMap, we need to use some algorithm to figure out which bucket we should put it in. This process is called hash, which corresponds to the hash method in HashMap. We know that the function of the hash method is to locate the position of this Kmurv in the linked list array based on Key. That is, the input to the hash method should be a Key of type Object, and the output should be an array subscript of type int. What would you do if you were asked to design this method? In fact, it's simple. All we have to do is call the hashCode () method of the Object object, which will return an integer and then use that number to model the capacity of the HashMap. If it is really that simple, then the capacity setting of HashMap will be much easier, but considering the efficiency and other issues, the implementation of HashMap's hash method is still somewhat complex. The implementation of hash next introduces the implementation principle of the hash method in HashMap. (the following section refers to my article: the most thorough analysis of hash () in Map in the whole network, no other. PS: many of the online articles about the analysis of HashMap's hash method are "derived" from my article. ) is realized by two methods, int hash (Object k) and int indexFor (int h, int length). Hash: this method mainly converts Object to an integer. IndexFor: the main purpose of this method is to convert the integer generated by hash into a subscript in a linked list array. To focus on the focus of this article, let's just look at the indexFor method. Let's first take a look at the implementation details in Java 7 (although there is no such a separate method in Java8, but the algorithm for querying the subscript is the same as Java 7): static int indexFor (int h, int length) {

Return h & (length-1)

}

The indexFor method is actually mainly to replace the hashcode with the subscript in the linked list array. Two of the parameters h represent the hashcode value of the element, and length represents the capacity of the HashMap. So what does return h & (length-1) mean? In fact, he is a model. All Java uses bit operation (&) instead of modular operation (%). The main consideration is efficiency. The efficiency of bit operation (&) is much higher than that of modular operation (%). The main reason is that bit operation operates on memory data directly and does not need to be converted to decimal, so the processing speed is very fast. So why can you use bit operations (&) to achieve modular operations (%)? The principle of this is as follows: X% 2 ^ n = X & (2 ^ n-1)

If n is 3, then 2 ^ 3 = 8, which means 1000 in binary. 2 ^ 3-1 = 7, that is 0111. At this point, X & (2 ^ 3-1) is equivalent to taking the last three digits of the binary of X. From the binary point of view, X / 8 is equivalent to X > > 3, that is, if you move X 3 bits to the right, you get the quotient of X / 8, and the removed part (the last three digits) is X% 8, that is, the remainder. I don't know if you understand the above explanation, but it doesn't matter if you don't understand it. You just need to remember this technique. Or you can find a few examples to try. 6% 8 = 6, 6 & 7 = 6

10 & 8 = 2, 10 & 7 = 2

The operation process is as follows:

Therefore, return h & (length-1); can realize modular operation as long as the length of length is 2 ^ n.

Therefore, because the bit operation operates on the memory data directly and does not need to be converted to decimal, the bit operation is more efficient than the modular operation, so HashMap uses the bit operation instead of the modular operation when calculating the index of the elements to be stored in the array. The reason why it can be replaced by equivalence is that the capacity of HashMap must be 2 ^ n. So, since it is 2 ^ n, why does it have to be 16? Why can't it be 4, 8 or 32? JDK did not give an official explanation for the choice of this default capacity, and the author did not find any valuable information about it online. (if anyone has relevant authoritative information or ideas, you can leave a message.) according to the author's inference, this should be an empirical value (Experience Value). Since you must set a default 2 ^ n as the initial value, you need to make a tradeoff between efficiency and memory usage. This value can be neither too small nor too large. If it is too small, it is possible to expand the capacity frequently and affect the efficiency. It's too big and a waste of space, so it's not cost-effective. Therefore, 16 is adopted as an empirical value. In JDK 8, the default capacity is defined as: static final int DEFAULT_INITIAL_CAPACITY = 1.8, 9-> 16) in JDK 1.7 and JDK 1.8, HashMap initializes this capacity at different times. In JDK 1.8, the capacity is set when the constructor of HashMap is called to define HashMap. In JDK 1.7, you don't do this until the first put operation. See how JDK finds the power of the first 2 greater than the specified value passed in: int n = cap-1

N | = n > 1

N | = n > 2

N | = n > 4

N | = n > 8

N | = n > 16

Return (n)

< 0) ? 1 : (n >

= MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY: n + 1

The purpose of the above algorithm is quite simple, that is: according to the capacity value passed by the user (cap in the code), through calculation, get the first power of 2 larger than him and return.

Please pay attention to the changes in the blue font in the above examples, and you may find some patterns. 5-> 8, 9-> 16, 19-> 32, 37-> 64 have gone through two main stages. Step1, 5-> 7Step 2Magi 7-> 8Step 1Jing 9-> 15Step 2Magi 15-> 16Step 1Magi 19-> 31Step 2Mague 31-> 32 correspond to the above code, Step1:n | = n > 1

N | = n > 2

N | = n > 4

N | = n > 8

N | = n > 16

Corresponding to the above code, Step2:return (n)

< 0) ? 1 : (n >

= MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY: n + 1

Step 2 is relatively simple, which is to judge the limit value, and then take the value obtained by Step 1 + 1. How do you understand Step 1? In fact, it shifts a binary number to the right in turn, and then takes or with the original value. The purpose is to set all subsequent bits to 1 for the binary of a number, starting with the first bit that is not 0. Take a random binary number and apply the above formula to find its purpose: 1100 1100 1100 > > 1 = 0110 0110 0110

1100 1100 1100 | 0110 0110 0110 = 1110 1110 1110

1110 1110 1110 > > 2 = 0011 1011 1011

1110 1110 1110 | 0011 1011 1011 = 1111 1111 1111

1111 1111 1111 > > 4 = 1111 1111 1111

1111 1111 1111 | 1111 1111 1111 = 1111 1111 1111

Through several unsigned right shifts and bitwise OR operations, we convert 1100 1100 1100 to 1111 1111 1111, and then add 1111 1111 1111 to 1, and we get 1 0000 1111, which is the power of the first 2 greater than 1100 1100 1100. Well, we have now explained the code for Step 1 and Step 2. Is that you can convert a number into the first power of 2 that is larger than itself. But there is another special case in which the above formula cannot be applied. These numbers are the power of 2 itself. If the number 4 applies the formula. The result will be 8, but in fact, this problem has also been solved. Specific verification methods and JDK solutions can be found in the most thorough analysis of hash () in Map. It's not going to unfold here. In short, according to the initialization capacity passed by the user, HashMap uses unsigned right shift and bitwise OR operation to calculate the first power greater than 2 of this number. Capacity expansion in addition to specifying the capacity of HashMap during initialization, it may also change during capacity expansion. HashMap has a capacity expansion mechanism, that is, it will be expanded when the expansion condition is met. The expansion condition of HashMap is that when the number of elements in HashMap (size) exceeds the critical value (threshold), the capacity will be expanded automatically. In HashMap, threshold = loadFactor * capacity. LoadFactor is the load factor, indicating the degree to which the HashMap is full. The default value is 0.75f, which has the advantage of being set to 0.75f, which is exactly 3x4, and capacity is the power of 2. Therefore, the product of two numbers is an integer. For a default HashMap, capacity expansion is triggered by default when its size is greater than 12 (160.75). The following is a section of the expansion method (resize) in HashMap: if ((newCap = oldCap = DEFAULT_INITIAL_CAPACITY)

NewThr = oldThr

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.