In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article is about how to call HashMap principle and put method and get method. I think it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.
The principle of HashMap
The data structure of HashMap is array + linked list, which stores values in the form of key,value, and stores values and values by calling put and get methods.
It maintains an Entry array internally, gets the hashCode value of key and shifts it by bit and operation, and then gets an index value by logic and operation with the length-1 of the array to determine the location of the data stored in the Entry array, and solves the hash conflict problem through the linked list.
When a collision occurs, the object will be stored in the next node of the linked list.
The underlying principle of HashMap (what happens inside when you put,get?)
Friends who have come into contact with HashMap will often use put and get methods, then the internal storage of HashMap will be explained in detail. (from the perspective of beginners)-(Xiaobai)
When a program tries to put multiple key-value into HashMap, take the following code snippet as an example:
The above code creates a HashMap object and specifies the capacity (capacity) and load factor (loadFactor), and then put stores the value as a key-value pair. The capacity is easy for us to understand (the default capacity is 16), that is, give it an initialized length, so what is the load factor?
Load factor: indicates the degree to which the HashMap is full, and the default value is 0.75f, that is, by default, the capacity will be automatically expanded when the number of elements in the HashMap reaches 3max 4 of the capacity. Here I set the load factor to 0.9f. The reason for this is to make the "effect" more obvious. For the specific capacity expansion, the source code has the following code:
We can know the formula for calculating the threshold (yu) value from here:
Threshold (threshold) = load factor (loadFactor) * capacity (capacity)
Come on, the source code is as follows:
This is the constructor of the source code. Let's look at the last line of code to calculate the threshold using the tableSizeFor (initialCapacity) method.
The source code for this method is as follows:
Cap
The parameter is the initial capacity given. This algorithm gives the power of 2 that is closest to the parameter cap and does not decrease. For example, pass in 10 and return 16, which is so magical!
Above we learned about the expansion mechanism of HashMap and the internal activities of creating a HashMap object. Let's parse the method of adding a key-value pair to put.
We know that HashMap is saved in the form of key-value, and use the get () method to find key to get the corresponding value. When we debug HashMap values, we can see that the underlying layer of put is made of arrays, and the storage locations are hashed and unordered, unlike arrays in the order in which they are stored. As shown below:
When put finished the fourth value, it was found that only three elements were displayed, and then one by one, it was found that the fourth element appeared in the attribute next. As shown below:
Then continue to put all the values, look, a total of 12 values are stored, but there are only 9 elements in table, and it is also found that the threshold (yu) [threshold] value has increased from the initial 7 to 15, and the capacity (capacity) has also changed from the original 8 to 16, indicating that the expansion mechanism has been triggered (the capacity can be seen from the source code to double that of the original). In a next attribute that we have just found that some values have run into other values. Let's click on the next attribute of the element to see if it is here. As shown below:
Sure enough, the three came to other people's chassis. Three missing elements were found in the attributes of Next in subscript 7, 13, 14.
Look at the following picture:
Take a closer look and find that each element has such a next attribute, some are empty, some are not empty, and the non-empty is the element stored in this next. Do you feel that the element is made up of a chain by the next attribute. Come to the picture (vivid and vivid):
This diagram simulates the internal structure, there are multiple elements in the same subscript at the same time, and the arrows in the linked list structure diagram represent the next attributes in each element. If you see this, you will find a lot of weird questions, why is its storage disordered? why did the storage go together and become a linked list structure? Let's take a look at the source code below. (source code is as follows):
/ * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @ param key key with which the specified value is to be associated * @ param value value to be associated with the specified key * @ return the previous value associated with key, or * null if there was no mapping for key. * (A null return can also indicate that the map * previously associated null with key.) * / public V put (K key, V value) {return putVal (hash (key), key, value, false, true);}
Since it is called HashMap, of course it has to have something to do with Hash.
HashMap uses a so-called "Hash algorithm" to determine where each element is stored. When the program executes the map.put (String,Obect) method, the system calls the hashCode () method of String to get its hashCode value-- every Java object has a hashCode () method, through which its hashCode value can be obtained. After you get the hashCode value of the object, the system determines the storage location of the element based on the hashCode value. Partners can try calling the hashCode () method to see what the result will be after this algorithm.
We now know why it is stored out of order, key determines its storage location by the value of the hash algorithm, and the overlap shows that different key values obtained by the hash algorithm are equally likely (this situation is called collision / conflict), so a subscript will appear multiple elements, forming a linked list structure. As for why linked lists are used to save space, linked lists are not continuously stored in memory, so we can make more full use of memory.
(below we refer to each subscript as Entry (bucket), that is, a key-value pair)
Do not think that this will reduce the efficiency of the query (linked list), when querying, first find the Entry, through the chain traversal. I feel troublesome when I think about it. although this solves the conflict such as collision, it leads to a big problem (reduced search efficiency). This is not good. Comrade HashMap is famous for his fast speed, so he optimizes it in jdk8 and introduces a tree structure. when the length of the linked list is greater than 8, the latter data is stored in the red-black tree to speed up the retrieval. To optimize the problem of low performance caused by too long chain.
To add the code, continue to check the source code of putVal (hash (key), key, value, false, true):
/ * * Implements Map.put and related methods * * @ param hash hash for key * @ param key the key * @ param value the value to put * @ param onlyIfAbsent if true, don't change existing value * @ param evict if false, the table is in creation mode. * @ return previous value, or null if none * / final V putVal (int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {Node [] tab; Node p; int n, I; if ((tab = table) = = null | | (n = tab.length) = = 0) n = (tab = resize ()) .length If ((p = tab [I = (n-1) & hash]) = = null) tab [I] = newNode (hash, key, value, null); else {Node e; K k; if (p.hash = = hash & & (k = p.key) = = key | | (key! = null & & key.equals (k) e = p Else if (p instanceof TreeNode) e = ((TreeNode) p) .putTreeVal (this, tab, hash, key, value); else {for (int binCount = 0;; + + binCount) {if ((e = p.next) = = null) {p.next = newNode (hash, key, value, null) If (binCount > = TREEIFY_THRESHOLD-1) / /-1 for 1st treeifyBin (tab, hash); break } if (e.hash = = hash & & (k = e.key) = = key | | (key! = null & & key.equals (k) break; p = e }} if (e! = null) {/ / existing mapping for key V oldValue = e.value; if (! onlyIfAbsent | | oldValue = = null) e.value = value; afterNodeAccess (e); return oldValue;}} + + modCount If (+ + size > threshold) resize (); afterNodeInsertion (evict); return null;}
After reading the comments, we should get a general idea of it, and continue to look at treeifyBin () to change the linked list to the red-black tree (the new jdk8 feature) method code:
/ * * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. * / final void treeifyBin (Node [] tab, int hash) {int n, index; Node e; / / if the array length is equal to null or the array length is less than 64 if (tab = = null | | (n = tab.length) < MIN_TREEIFY_CAPACITY) / / re-hash to make the linked list shorter resize () / / if the hash conflicts and the array length is greater than 64, you can only use the red-black tree structure else if ((e = tab [index = (n-1) & hash])! = null) {TreeNode hd = null, tl = null; / / returns the new red-black tree do {TreeNode p = replacementTreeNode (e, null) If (tl = = null) hd = p; else {p.prev = tl; tl.next = p;} tl = p;} while ((e = e.next)! = null) If ((hd)! = null) hd.treeify (tab);}}
The above introduction MashMap gives a brief introduction to the mechanism of storing data. We already know that collisions will reduce query efficiency, so how can we effectively avoid hash collisions?
Let's think backwards first, what do you think will lead to more hash collisions in HashMap?
There are no more than two situations:
1. The capacity is too small. If the capacity is small, the probability of collision is high. If the wolf has more meat than meat, there will be competition for strength.
2. Hash algorithm is not good enough. If the algorithm is unreasonable, it may be divided into the same bucket or several buckets. If the distribution is uneven, there will be competition.
Therefore, to solve the hash collision in HashMap is also from these two aspects.
Both of these points are well withdrew in HashMap. The combination of the two methods, expanding the array capacity at the right time, and then using an appropriate hash algorithm to calculate which array elements are assigned to, can greatly reduce the probability of conflict. However, when the amount of data is large, the collision will also grow in direct proportion, so the introduction of red-black tree structure can avoid the problem of low query efficiency.
Let's take a look at the load factor, the balance point that affects performance. What is the load factor has been explained above.
It Hsah the degree of filling of the elements in the table.
If: the greater the loading factor, the more elements to fill, the advantage is that the space utilization is higher, but: the chance of conflict increases. The length of the linked list will be longer and longer, and the search efficiency will be reduced.
On the other hand, the smaller the loading factor is, the fewer elements are filled. The advantage is that the chance of conflict is reduced, but more space is wasted. The data in the table will be too sparse (a lot of space will begin to expand before it is used)
The greater the chance of conflict, the higher the cost of finding.
Therefore, we must find a balance and compromise between "opportunity of conflict" and "space utilization". This balance and compromise is essentially the balance and compromise of the famous "time-space" contradiction in the data structure.
A test code is written here as follows:
Public class HashTest {public static void main (String [] args) {/ / A pair of tests / / threshold=capacity * loadFactor-threshold = capacity x load factor / / source code capacity after expansion is twice as large as before int N1 = 10; / control group int N2 = 1000000; / / put/get how many groups long t0 = 0 / / Total time float lf = 0.9f; / / load factor int capacity = 100; / / initial capacity HashMap map = null; / / control cycle for (int j = 1; j)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.