How to understand the Java hash table 04/18 Update SLTechnology News&Howtos

How to understand the Java hash table

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to understand the Java hash table". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to understand the Java hash table".

The hashCode of an object in Java is generated based on the address of the object and is unique and does not repeat. Why rewrite hashcode and equals?

Hash table is also called hash table, and it is also directly translated into hash table. Hash table is a special data structure, which is obviously different from array, linked list and binary sort tree. It can quickly locate the record you want to find, rather than comparing it with the keywords of the records that exist in the table. This originates from the particularity of the Hash table design, which adopts the idea of function mapping to associate the storage location of the record with the keyword of the record, so that it can be searched very quickly.

The array is convenient to query but not convenient to delete, and the linked list is convenient to delete and not convenient to query.

The Design idea of 1.Hash Table

For general linear tables, such as linked lists, to store contact information:

Zhang San 13980593357 Li Si 15828662334 Wang Wu 13409821234 Zhang Shuai 13890583472

Then it is possible to design a structure that contains information such as name and mobile phone number, and then store the information of four contacts in a linked list. When you want to find out whether the record of "Li Si 15828662334" is in this linked list or want to get Li Si's mobile phone number, you may start traversing from the head node of the list and compare the name in each node with "Li Si" in turn, until the search succeeds or fails, which has a time complexity of O (n). Even if a binary sort tree is used for storage, it is at most O (logn). Assuming that the storage location of the record in the table can be obtained directly through the information of "Li Si", the comparison of intermediate keywords can be omitted, and the complexity can be directly reduced to O (1). Hash tables can achieve this effect.

The Hash table uses a mapping function f: key-> address to map the keyword to the storage location of the record in the table, so that when you want to find the record, you can calculate the storage location of the record in the table directly based on the keyword and mapping relationship, which is usually called the Hash function. The storage location calculated by the Hash function and keyword (note that the storage location here is only the storage location in the table, not the actual physical address) is called the Hash address. For example, in the above example, if the contact information is stored in the Hash table, then when you want to find the information of "Li Si", you can calculate the Hash address directly according to the "Li Si" and Hash function. Let's discuss several key issues in the design of Hash tables.

1. The design of Hash function.

The design of Hash function directly affects the operation efficiency of Hash table. The following examples are given:

If the above contact information is stored, the Hash function is the sum of the ASCII codes of the uppercase letters at the beginning of each word of the name.

So address (Zhang San) = ASCII (Z) + ASCII (S) = 90,833,173

Address (Li Si) = ASCII (L) + ASCII (S) = 76 8315 9

Address (Wang Wu) = ASCII (W) + ASCII (W) = 8787

Address (Zhang Shuai) = ASCII (Z) + ASCII (S) = 903.83

If only these four contact information needs to be stored, this Hash function is poorly designed. First of all, it wastes a lot of storage space. If a char array is used to store contact information, it needs to open up at least 174 to 12 bytes of space, and the space utilization rate is only 4cm 174, less than 5%. In addition, according to the calculation result of Hash function, address (Zhang San) and address (Li Si) have the same address, this phenomenon is called conflict. For 174storage space, only 4 records need to be stored, so the design of Hash function is very unreasonable. Therefore, when constructing the Hash function, we should try our best to consider the distribution characteristics of keywords to design the function so that the Hash address is randomly and evenly distributed in the whole address space. There are usually several ways to construct Hash functions:

1) Direct addressing method

Take the keyword or a linear function of the keyword as the Hash address, that is, address (key) = a student keyword address; if you know that the student's student number starts from 2000 and the maximum is 4000, you can use address (key) = key-2000 as the Hash address.

2) square centering method

Square the keyword, and then take the middle bits of the result as the Hash address. If you have the following keyword sequence {421423436}, and the result squared is {177241178929jn0096}, then you can take {72p89rem 00} as the Hash address.

3) folding method

The keyword is split into parts, and then these parts are combined and converted in a specific way to form a Hash address. If you know that the ISBN number of the book is 8903-24123, you can use address (key) = 89003240123as the Hash address.

4) the method of removing and reserving the remainder

If you know that the maximum length of the Hash table is m, you can take the maximum prime number p not greater than m, and then perform the remainder operation on the keyword, address (key) = key%p.

The selection of p is very critical here. If p chooses well, it can minimize conflicts. Generally, p takes the maximum prime number not greater than m.

Determination of 2.Hash table size

The determination of the size of the Hash table is also very critical. If the space of the Hash table is much larger than the actual number of records stored in the end, it will cause a great waste of space, and if the selection is small, it is easy to cause conflicts. In practice, it is generally necessary to determine the size of the Hash table according to the number of final records and the distribution of keywords. There is also a situation where you may not know in advance the number of records that will eventually need to be stored, then you need to dynamically maintain the capacity of the Hash table, and you may need to recalculate the Hash address.

3. The resolution of conflicts

In the above example, a conflict occurs and needs to be resolved, otherwise the records cannot be stored correctly. There are usually two solutions:

1) Open addressing method

That is, when a keyword conflicts with another keyword, some detection technique is used to form a detection sequence in the Hash table, and then search along the detection sequence in turn, and when an empty unit is encountered, it is inserted into it. The more commonly used detection methods are linear detection, for example, there is a set of keywords {12grad 13, 25, 23, 38, 34, and 84, and the table length of Hash is 14, and the Hash table length is 14. The key function is address (key) = key, which can be inserted directly when inserting 12 13, 25, but when inserting at 23:00, address 1 is occupied, so it is detected down along address 1 in turn (the detection step can be determined according to the situation) until address 4 is detected. If it is found to be empty, insert 23 into it.

2) chain address method

Using the combination of array and linked list, the records with the same Hash address are stored in a linear table, and the sequence number of the header of each table is the calculated Hash address. As in the example above, the Hash table formed by the chain address method is stored as follows:

Although some measures can be taken to reduce conflicts, conflicts cannot be completely avoided. Therefore, it is necessary to choose the solution to the conflict according to the actual situation.

Average lookup length of 4.Hash table

The average lookup length of the Hash table includes the average lookup length when the lookup succeeds and the average lookup length when the lookup fails.

Average lookup length when the lookup is successful: the sum of the number of comparisons for each element in the table / the number of elements in the table

Average lookup length when the lookup is unsuccessful: equal to the average number of comparisons when the lookup element in the table is unsuccessful, which can be understood as inserting an element into the table, which is possible at each location. Then calculate the number of times to compare when each location can be inserted, divided by the table length is the average lookup length when the lookup is unsuccessful.

Here's an example:

If there is a set of keywords {23, 12, 14, 14, 14, 2, 3, and 5}, the table length is 14, the Hash function is key, and the conflict resolution method is open fixed value, then the keywords are stored in the table as follows:

Address 012345678910111213 keyword 231214235 number of comparisons 121332 Note comparison 1 comparison 1 move 1 comparison 1 comparison 2 times comparison 2 times comparison 1 time comparison 1 time

When the search is successful, the average search length is (1 / 2 / 3 / 3 / 6 / 11 / 6).

The average lookup length when a search fails is (1 "7" 6 "5" 4 "3" 2 "1" 1 "1" 1 "1" 1 "1") / 14 "38 max 14.

The length of the search failure is mainly understood as the number of times to compare (a value of 1 means that the hash function is called once, and 7 means that there may be 6 conflicts after one calculation). You can exit the above values.

Here's a concept.

Loading factor = number of records in the table / length of the hash table

If the loading factor is smaller, indicating that there are many empty units in the table, the less likely a conflict will occur.

The greater the loading factor, the greater the possibility of conflict and the more time it takes to find it.

Therefore, the average search length of the Hash table is related to the loading factor. Some related literatures have proved that the performance of Hash can achieve the best when the loading factor is about 0.5. Therefore, in general, the loading factor takes the empirical value of 0.5.

Advantages and disadvantages of 5.Hash table

The advantages of the Hash table are obvious: it can be looked up at a constant level of time complexity, and it is easy to insert and delete data. However, it also has some disadvantages, such as it does not support sorting, generally requires more space than linear table storage, and the recorded keywords cannot be repeated.

6.Hash table demo/*Hash table, using array implementation / # include#define DataType int#define M 30 typedef struct HashNode {DataType data; / / stored value int isNull; / / to mark whether the location has been filled with} HashTable;HashTable hashTable [M]; void initHashTable () / / initialize {int i; for (I = 0; I) for a pair of hash tables

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.