In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains the "Java set HashMap knowledge points detailed explanation", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Java collection HashMap knowledge points detailed explanation" bar!
What is a hash table
Before we talk about hash tables, let's take a look at the performance of other data structures in the execution of basic operations such as adding and finding.
Array: a continuous storage unit is used to store data. For the search with specified subscript, the time complexity is O (1). For the search with a given value, it is necessary to traverse the array and compare the given keywords and array elements one by one, and the time complexity is O (n). Of course, for ordered arrays, binary search, interpolation search and Fibonacci search can be used to increase the search complexity to O (logn). For general insert and delete operations, it involves the movement of array elements, and its average complexity is also O (n).
Linear linked list: for operations such as adding and deleting linked lists (after finding the specified operation location), you only need to deal with the references between nodes, and the time complexity is O (1), while the search operation needs to traverse the linked list one by one, and the complexity is O (n).
Binary tree: for a relatively balanced ordered binary tree, insert, search, delete and other operations, the average complexity is O (logn).
Hash table: compared with the above data structures, add, delete, find and other operations in the hash table, the performance is very high, without considering the hash conflict (the hash conflict will be discussed later), only one location can be completed, the time complexity is O (1), then we will see how the hash table achieves the amazing constant order O (1).
We know that there are only two physical storage structures of data structures: sequential storage structure and chained storage structure (such as stack, queue, tree, graph, etc., which are abstracted from the logical structure and mapped to memory, and these two physical organization forms). As mentioned above, looking for an element according to the subscript in the array can be achieved in a single location, and the hash table takes advantage of this characteristic, and the backbone of the hash table is the array.
For example, if we want to add or find an element, we can complete the operation by mapping the keyword of the current element to a location in the array through a function and locating once through the array subscript.
This function can be simply described as: storage location = f (keyword), this function f is generally called hash function, the design of this function will directly affect the quality of the hash table. For example, we want to perform an insert operation in a hash table:
The insertion process is shown in the following figure
The search operation is the same, first calculate the actual storage address through the hash function, and then take it out from the corresponding address in the array.
Hash conflict
However, everything is not perfect, what if two different elements have the same actual storage address obtained by the hash function? In other words, when we hash an element, get a storage address, and then insert it, we find that it has been occupied by other elements. in fact, this is the so-called hash conflict, also known as hash collision. As we mentioned earlier, the design of the hash function is very important, and a good hash function will keep the calculation simple and the hash address evenly distributed as far as possible, but we need to be clear that the array is a continuous piece of fixed-length memory space. No matter how good the hash function can guarantee that the resulting storage address will not conflict. So how to resolve the hash conflict? There are many solutions to hash conflict: open address method (conflict, continue to find the next unoccupied storage address), hash function method, chain address method, and HashMap uses chain address method, that is, array + linked list.
Second, the realization principle of HashMap
The backbone of HashMap is an array of Entry. Entry is the basic unit of HashMap, and each Entry contains a key-value key-value pair. (in fact, the so-called Map is actually a set that preserves the mapping between two objects.)
/ / HashMap's trunk array, you can see that it is an Entry array. The initial value is an empty array {}, and the length of the trunk array must be the power of 2. / / as to why this is done, there will be a detailed analysis later. Transient Entry [] table = (Entry []) EMPTY_TABLE;123
Entry is a static inner class in HashMap. The code is as follows
Static class Entry implements Map.Entry {final K key; V value; Entry next;// stores a reference to the next Entry. The value obtained from the hash operation of the hashcode value of key by the single-linked list structure int hash;// is stored in Entry to avoid double calculation / * Creates new entry. * / Entry (int h, K k, V v, Entry n) {value = v; next = n; key = k; hash = h;} 123456789101112131415
Therefore, the overall structure of HashMap is as follows:
To put it simply, HashMap consists of an array + linked list, which is the main body of HashMap, and the linked list mainly exists to solve hash conflicts. If the location of the array you locate does not contain a linked list (the next of the current entry points to null), then the operations such as searching and adding are very fast, and you only need to address once. If the array you locate contains a linked list, the time complexity of the add operation is O (n), first traversing the linked list, covering it if it exists, otherwise adding it; for the search operation, you still need to traverse the linked list, and then compare the search one by one through the equals method of the key object. Therefore, performance considerations, the fewer linked lists in HashMap, the better the performance will be.
Several other important fields
/ * * the number of key-value key-value pairs actually stored * / transient int size;/** threshold, which is the initial capacity when table = = {} (the initial capacity defaults to 16). When table is filled, that is, when memory space is allocated for table, threshold is generally capacity*loadFactory. HashMap needs to refer to threshold when expanding its capacity. The * / int threshold;/** load factor will be discussed in detail later, which represents the filling degree of table. By default, it is the reason for the existence of 0.75 load factor, or because it slows down the hash conflict. If the initial bucket is 16 and the capacity is expanded after 16 elements, there may be more than one element in some buckets. So the load factor defaults to 0.75, which means that a HashMap with a size of 16 will be expanded to 32 when it comes to the 13th element. * the number of times / final float loadFactor;/**HashMap is changed. Because HashMap is not thread-safe, if the structure of HashMap changes due to the participation of other threads during the iteration of HashMap (such as put,remove and other operations), an exception ConcurrentModificationException*/transient int modCount;1234567891011121314151617 needs to be thrown.
HashMap has four constructors. If the user does not pass in the parameters initialCapacity and loadFactor, other constructors will use the default values.
InitialCapacity defaults to 16 and loadFactory defaults to 0.75
Let's take a look at one of them
Public HashMap (int initialCapacity, float loadFactor) {/ / check the initial capacity passed here, the maximum cannot exceed MAXIMUM_CAPACITY = 1 > 20) ^ (h > 12); return h ^ (h > 7) ^ (h > 4);} 12345678910111213
The values calculated by the above hash function are further processed by indexFor to obtain the actual storage location.
/ * return array subscript * / static int indexFor (int h, int length) {return h & (length-1);} 123456
H & (length-1) guarantees that the obtained index must be within the range of the array. For example, the default capacity is 16pcmlthly15, which is converted to binary computing into index=2. Bit operations have higher performance for computers (there are a large number of bit operations in HashMap)
So the final process for determining the storage location is as follows:
Let's take a look at the implementation of addEntry:
Void addEntry (int hash, K key, V value, int bucketIndex) {if ((size > = threshold) & & (null! = table [bucketIndex]) {resize (2 * table.length); / / expand hash = (null! = key) when size exceeds the critical threshold threshold and a hash conflict is about to occur? Hash (key): 0; bucketIndex = indexFor (hash, table.length);} createEntry (hash, key, value, bucketIndex);} 123456789
As you can see from the above code, when a hash conflict occurs and the size is greater than the threshold, you need to expand the array. When you expand the array, you need to create a new array that is twice the length of the previous array, and then transfer all the elements in the current Entry array, and the new array after expansion is twice the length of the previous array, so capacity expansion is a relatively resource-consuming operation.
3. Why must the array length of HashMap be the power of 2?
Let's continue to look at the resize method mentioned above.
Void resize (int newCapacity) {Entry [] oldTable = table; int oldCapacity = oldTable.length; if (oldCapacity = = MAXIMUM_CAPACITY) {threshold = Integer.MAX_VALUE; return;} Entry [] newTable = new Entry [newCapacity]; transfer (newTable, initHashSeedAsNeeded (newCapacity)); table = newTable; threshold = (int) Math.min (newCapacity * loadFactor, MAXIMUM_CAPACITY + 1) } 12345678910111213
If the array is expanded, the length of the array changes, and the storage location index = h & (length-1), index may also change, so you need to recalculate index. Let's first take a look at the transfer method.
Void transfer (Entry [] newTable, boolean rehash) {int newCapacity = newTable.length / / the code in the for loop iterates through the linked list one by one, recalculates the index position, and copies the old array data to the new array (the array does not store the actual data, so it is just a copy reference) for (Entry e: table) {while (null! = e) {Entry next = e.next If (rehash) {e.hash = null = = e.key? 0: hash (e.key);} int I = indexFor (e.hash, newCapacity) / / point the next chain of the current entry to the new index location. NewTable [I] may be empty or an entry chain. If it is an entry chain, insert it directly in the head of the linked list. E.next = newTable [I]; newTable [I] = e; e = next;}} 1234567891011121314151617
This method traverses the data in the old array one by one and throws it into the new expanded array. The index position of our array is calculated by hash scrambling the hashcode of key value, and then the final array index position is obtained by bit operation with length-1.
The array length of HashMap must keep the power of 2, for example, if the binary representation of 16 is 10000, then length-1 is 15, binary is 01111, the length of the expanded array is 32, the binary representation is 1000000 and the binary representation is 31, and the binary representation is 011111. From the figure below, we can also see that this will ensure that the low bit is all 1, but after the expansion, there is only one difference, that is, the leftmost bit 1, so that when passing through h & (length-1), as long as the leftmost difference bit corresponding to h is 0, we can ensure that the new array index is the same as the old array index (greatly reducing the re-exchange of the data position of the old array which has been well-hashed before). I personally understand.
In addition, the array length is kept to the power of 2, and the low order of length-1 is 1, which makes the resulting array index index more uniform.
We can see that the & operation above, the high bit will not affect the result (the hash function uses a variety of bit operations may also be to make the low bit more hash), we only focus on the low bit bit, if the low bit is all 1, then for the h low part, any change in the bit will have an impact on the result, that is to say, to get the storage location of index=21, there is only one combination of the low bit of h. This is why the length of the array must be designed to the power of 2.
If it is not the power of 2, that is, if the low bit is not all 1, the probability of hash conflict will be greater if the lower part of index=21,h is no longer unique. At the same time, the bit bit corresponding to index will not be equal to 1 in any case, and the corresponding array positions will be wasted.
Get method:
Public V get (Object key) {/ / if key is null, go directly to table [0] to retrieve it. If (key = = null) return getForNullKey (); Entry entry = getEntry (key); return null = = entry? Null: entry.getValue ();} 1234567
The get method returns the corresponding value through the key value. If key is null, go directly to table [0] to retrieve it. Let's take another look at the getEntry method.
Final Entry getEntry (Object key) {if (size = = 0) {return null;} / / calculate the hash value from the hashcode value of key int hash = (key = = null)? 0: hash (key) / / indexFor (hash&length-1) gets the final array index, then traverses the linked list and finds the corresponding record for (Entry e = table [indexFor (hash, table.length)]; e! = null; e = e.next) {Object k by comparing with the equals method If (e.hash = = hash & & (k = e.key) = = key | | (key! = null & & key.equals (k) return e;} return null;} 123456789101112131415161718
As you can see, the implementation of the get method is relatively simple: key (hashcode)-> hash- > indexFor- > the final index position, find the corresponding location table [I], and then check whether there is a linked list, traverse the linked list, and find the corresponding records through the equals method of key. It is important to note that some people think that when the above is located in the array position and then traverses the linked list, the judgment of e.hash = = hash is not necessary, just by equals. In fact, just imagine, if the incoming key object overrides the equals method but does not override hashCode, and it happens that this object is located in this array, it may be equal if only judged by equals, but its hashCode is not consistent with the current object. In this case, according to Object's hashCode convention, you cannot return the current object, but should return null, which will be further explained in the following example.
4. Rewriting the equals method requires rewriting the hashCode method at the same time.
Finally, let's talk about an old problem, which is mentioned in all kinds of materials, "overwrite hashcode when rewriting equals". Let's take a small example to see what happens if you rewrite equals instead of hashcode.
Public class MyTest {private static class Person {int idCard; String name; public Person (int idCard, String name) {this.idCard = idCard; this.name = name;} @ Override public boolean equals (Object o) {if (this = = o) {return true } if (o = = null | | getClass ()! = o.getClass ()) {return false;} Person person = (Person) o; / / whether the two objects are equivalent. Determine return this.idCard = = person.idCard by idCard }} public static void main (String [] args) {HashMap map = new HashMap (); Person person = new Person (1234, "Qiao Feng"); / / put to hashmap to map.put (person, "The Demi-Gods & Semi-Devils") / / get takes out, logically speaking, it should be able to output "The Demi-Gods & Semi-Devils" System.out.println ("result:" + map.get (new Person (1234, "Xiao Feng"));}} actual output result: null1234567891011121314151617181920212223242526272829303132333435
If we already have some understanding of the principle of HashMap, this result is not difficult to understand. Although the key we use in get and put operations is logically equivalent (equivalent by equals comparison), since the hashCode method is not overridden, key (hashcode1)-> hash- > indexFor- > final index position is used in put operation, while key (hashcode1)-> hash- > indexFor- > final index position is used when value is fetched through key, because hashcode1 is not equal to hashcode2 Causes a logically incorrect value null to be returned without being located in an array position (it is also possible to happen to locate an array position, but also to determine whether the hash value of its entry is equal, as mentioned in the get method above. )
Therefore, when rewriting the method of equals, you must pay attention to overriding the hashCode method, and make sure that two equal objects are judged by equals, and that calling the hashCode method returns the same integer value. If equals judges two objects that are not equal, their hashCode can be the same (only hash conflicts will occur, which should be avoided as far as possible).
5. Performance optimization of HashMap in JDK1.8
What if too much data on the chain on an array slot (that is, the zipper is too long) leads to performance degradation?
JDK1.8 optimizes the addition of red-black trees on the basis of JDK1.7. That is, when the linked list exceeds 8, the linked list is transformed into a red-black tree, and the rapid addition, deletion, modification and query of the red-black tree are used to improve the performance of HashMap, in which the insertion, deletion and search algorithms of the red-black tree are used.
Thank you for your reading, the above is the "Java set HashMap knowledge points detailed explanation" of the content, after the study of this article, I believe you on the Java collection HashMap knowledge points detailed explanation of this problem has a more profound experience, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.