In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces "what are the hash table knowledge points in the java data structure and algorithm". In the daily operation, I believe that many people have doubts about the hash table knowledge points in the java data structure and algorithm. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what are the hash table knowledge points in the java data structure and algorithm?" Next, please follow the editor to study!
1. Brief introduction of hash table
Hash table (hash table) is a kind of data structure, which provides very fast insert and search operations (sometimes even delete operations). The time complexity is O (1). Comparing the time complexity, we can know that hash tables are much faster than trees, and the implementation of hash tables is relatively easy. However, no data structure is perfect, and so is hash tables. The biggest drawback of the hash table is based on the array, because the size of the array is determined when the array is initialized, and it is difficult to expand the array after it is created.
When the hash table is full, the data will be transferred to a larger hash table, which will take a lot of time, and the hash table does not support sequential traversal, because traversing the data from the hash table is random; so the premise that we use the hash table is that we do not need to traverse the data in order, we can roughly know the amount of data; if we meet these two points, we can use the hash table.
Then someone is going to ask, what does a hash table look like when you say it so badly? Here are just two random names.
A very classic example is the English dictionary. When we look it up in the dictionary, we can find page xxx according to this word, where the word corresponds to the number of pages, which can be said to be a hash table.
To take another real example, when you are at school, everyone will have a student number in the school, and you, as a person, will correspond to this student number in the school. If the headmaster has a table recording all the students in the school, and then find a student according to the student number, you can quickly lock in the student's name, gender, class and other information. Have you ever thought that if there is no student number, the headmaster can only find a student according to his name, but there are so many people with the same name that it is not easy to find the target student.
Ok, here the hash table can be regarded as the table in the principal's hand (it is actually an array). We generate the number of the location in the table based on the information we want to store (in this case, this number is the subscript of the array). According to this number, we know where the data exists in the array, and then save the data in it. If there is an array of size 20 and I want to save "aaa", we can think of a great way to turn this string into a smaller number, such as 10, then store this string in the 10th position of the array. The advantage of this is that the next time you query (or delete) the string "aaa" from the hash table, you only need to calculate the number 10 from the "aaa" string. Then go directly to the 10th position in the array to find one to see if there is this string, is not very simple ah!
So what we need to solve now is to come up with a great way to turn the string into a smaller number (this process is called hashing) and to make sure that the number does not exceed the maximum boundary of the array!
2 hashing
Hashing is to find a way to match the data we want to save to an array subscript and save the data at that location in the array; we can talk about this process professionally: convert the data to be saved through the hash function into the corresponding array subscript; now our goal is how to write a hash function that can turn a string into an array subscript.
Here we can assume that the size of a string t array is 30 str string [] string = new String [30] The easiest way to save the word "cats" is to use ASCII codes, but because there are too many ASCII codes to remember, we can set up a set of rules by ourselves. I assume that a to z corresponds to 1 to 26, plus spaces correspond to 0, and now a set of the most humble rules come out. I have the word "cats": C = 3 cats a = 1, t = 20, s = 19. Now "cats" has two ways to become the subscript of an array
An additional addition: if we want to save 50 strings, then our new array must be twice its size, that is, new String [100];, which will be discussed later.
2.1 Hash function implements one
How to achieve better? Don't think so much, just add it up directly. 3-1-20-19 = 43, there is a small problem at this time. The size of our array is 30, that is, the maximum subscript of the array is 29, and here our number is 43. How can 43 be changed into a number within 29 (including 29)? Because the remainder of any number divided by 30 is only between 0 and 29, we divide 43 by 30 to get the remainder 13, so we put "cats" in the array subscript 13, str [13] = "cats".
The implementation of this kind of hash function is very easy, but the easier it is, the greater the disadvantage is. The biggest defect is that many words become the same. For example, more than 100 words such as was,tin,give are all 43 when they are changed into numbers. Then we happen to add these words. Now the problem is that the array subscript calculated by multiple words is very likely to be the same. That is, the array has to put multiple data in one place, how to solve this problem? We can change the implementation of hash function to reduce this probability.
2.2 Hash function implementation two
From 2.1, we can know that the result of turning too many words into numbers is the same, so we have to find a way to correspond to a unique integer for each word, and then divide this integer by the size of the array to take the remainder. You can know where the word is stored in the array.
So, we can use the concatenated multiplication of powers to get this unique integer. For example, "cats" uses this method of calculation: 3 "273" 1 "272" 20 "271" 19 "270, which is a bit similar to the decimal system. Through this algorithm, we can get a unique integer, and the results of any other words calculated by this method are almost impossible to be equal. Those who are interested can try. Then divide the result by 30 to take the remainder, you can get the position of an array, and then throw the string into it.
I don't know if you have found a problem with this method, because the size of the array is fixed, and we get the position of the array by taking the remainder, so the problem is that even if two different integers are divided by 30, the final remainder is equal.
For example, there are two strings that get 32 and 62 through the concatenated multiplication of powers (of course, we certainly won't get these two integers here, and take two numbers randomly for good understanding). Although these two numbers are unique, but divided by the remainder of 30 is 2, then there must be a conflict between the two data to be saved in the hash table. Next, let's resolve this conflict.
There is a simple hash function implementation to take a look (although it can still be modified, but this is almost)
3. Conflict
The reason for the conflict is that two unique integers are divided by the size of the array, and the remainder is equal, while only one data can be stored in one position in the array, which leads to the conflict. There are two ways to resolve the conflict.
3.1 solution 1 (Open address method)
Remember when I said that the size of the array should be twice the actual number? It is used for this time, if a word has been placed in the 15th position of the array, the other word should have been placed in the 15th position, because this position has been occupied by someone else. Then put it in another position of the array, anyway, there are many arrays that are relatively large, which is called the open address method.
3.2 solution II (chain address method)
Since there are two data to be placed in one location of the array, find a way to connect the second data to the first data, through which the second data can be found, and only the address of the first data is saved in the array; in fact, there is a linked list for each position in the array.
The advantage of this method is obvious. The above conflicts are resolved perfectly, and no fancy operations are required. The defect is that when the linked list is too long, if we want to query the last data of the linked list, we can only slowly traverse the linked list, and we know that the advantage of the linked list is to insert and delete, but for the query, this operation is relatively deceptive. and we used a structure like a red-and-black tree to solve the shortcomings of the linked list perfectly. Finally, we almost thought of a more practical method: every position of the array stores a linked list, when the linked list has few nodes, then use the linked list! But when the linked list grows longer and longer, and when the number of nodes reaches a limit, we turn the linked list into a red-black tree, a more perfect solution, which is also called the chain address method.
By the way, the HashMap of jdk7 is the linked list in the array, even if the linked list is very long, it will not become a red-black tree; HashMap in jdk8 has added the operation of turning red-black tree.
4. Open address method
The so-called open address method is that the location of the array subscript calculated from the data we want to save has already stored the data, so at this time we have to find another empty location, and then throw the saved data into it. So how to find it better? Here are three ways, linear detection, secondary detection and re-hashing. Let's take a look at how these three methods work.
4.1 Thread probe
If you look at the linearity of the name, you can see that you are looking for an empty position after going. To take a very simple example, when a string is calculated to correspond to an array subscript 52, but at this time there is already data in 52, try to put it in the position of 53. If the position of 53 has also put the data, then put it in position 54. Keep looking for it slowly until you find an empty position and put the data in it. At this time, the more times you look for it, if you have found the location of 56, then so many positions from 53 to 56 are called filling sequences. when the filling sequence is very long, we call it original aggregation, as shown in the following figure:
There are five filling units in the filling sequence here, and we can also say that the number of steps is 1, and each detection is a step forward. We can know that the more the number of probes, the more serious the aggregation. The less efficient the data you want to add to this location next time.
In addition, the more full the hash table, the lower the efficiency, so when the hash table is almost full, it is necessary to expand, and the array in java cannot be expanded directly, so you need to create a new array, and then find a way to copy the data in this hash table into the new array. Note that it cannot be copied directly here, because the capacity of the new array is not the same as the original array. Then all the data in the original hash table must be re-hashed and then put into a new array, which is very time-consuming.
4.2 Secondary detection
According to our previous linear detection, we can know that if the subscript of the original array calculated by the hash function is x, then the position of the linear detection is x ~ (12) ~ (12) ~ (22) ~ (32), x ~ (42), then the position of the linear detection is x ~ (12) ~ (10), x ~ (22), x ~ (32) and x ~ (42). In fact, it is to detect according to the square of the number of steps to see if there is any data in it, and then put in the new data if not. The second detection can prevent the decline in efficiency caused by too long aggregation.
For the secondary detection, if the current calculated position is x, the position after x will be detected first. If there is data in this position, then there will be more data in the next 4 positions to see if there is any data. If there is still data, then the secondary probe may feel that your aggregation is particularly long, so this time you will jump farther, the position of 16 behind the current position, and so on, until you finally skip the entire array. In this way, the lower efficiency of slow detection of one location can be avoided, as shown in the following figure:
The second detection also has some problems, which will lead to the second gathering, so what is the second gathering? In fact, it is similar to the original gathering! For example, the integers of 184302420544 should be put into the hash table, and the array subscript calculated by the hashing algorithm is 7302, which needs to be detected with 1 step, while 420 should be detected with 1 as step first, and then with 4 step, while 544 should be detected with 1 as step, then 4 as step, and finally with 16 step, if the array subscript corresponding to data is 7. Then we still have to repeat this step, and it is getting longer and longer. This is also a kind of gathering, personal feeling is similar to the nature of the original gathering in a sense.
Secondary detection is not commonly used, because there is a better way to solve it, that is, re-hashing.
4.3 re-hash method
The re-hashing method can be used to eliminate the original aggregation and secondary aggregation, so what is the re-hashing method? We can know that the reasons for the original aggregation and the secondary aggregation are almost the same, both because multiple data are added to the same location in the hash table, and then detect a location according to the step size until an empty location is found. if you need to find a lot of locations, then this is aggregation, and the efficiency of adding will be greatly reduced.
Then we have to think of a way that even if multiple data are to be placed in the same location of the hash table, there is no need to detect each location from scratch. It would be nice if each data could produce a unique step size. Then directly detect the location according to this step size and throw the data into the ok.
So we prepared two hash functions, one hash function that we mentioned above can generate the corresponding array subscript, and the other hash function can generate step size, that is, multiple data are placed in the same location to produce conflicts. Use this hash function to hash again to generate a step size, and detect it according to this step size, instead of starting from the first step size every time. For example, there is a hash function that produces the step size. We can know that the range of the step size is 1-constant. Note that the step size cannot be 0, otherwise it will stand still.
In the figure above, if the data we add to the hash table is a number, then we can directly take the data and the array size to get the array subscript. The key here is our data. Constant can be anything as long as it is a prime number less than the capacity of the array.
By the way, the premise of using the hash method is to ensure that the capacity of the array is a prime number, because only in this way can all the positions be detected. If the capacity of the array is 15, the step size is 5, and the subscript of an array of data is calculated to be 0, then the location of the detection should be: (0: 5) = 5, (5: 5) = 10, (10: 5) = 0, only 0, 5, 10 will be detected. However, if the capacity of the array is prime 13, the step size is 5, and the first data subscript is 0, then the detection position is as follows: (0: 5) = 5, (5: 5) = 10, (10: 5) = 2, (2: 5) = 7, (7: 5) = 12, (12: 5) = 4, (4: 5) = 9, etc., you can see that the position of each detection is different, and you can detect all the positions in the array as long as the data is available.
If the open address method is used, then the detection sequence is generated by this re-hash method, which is actually very easy!
5. Chain address method
You can see the above open address method is a bit troublesome, need to find the detection sequence is really a day dog, trouble I do not want to see, if not so troublesome that it would be nice, ok, then use the chain address method! In a structure like the following, the data is not directly saved in the original array, each location is just a reference to the first data, and all the subsequent data can be fetched through the first reference in that location! If the linked list is too long to traverse, it will be much more efficient if it can be converted to a red-black tree.
There is nothing to say here, because the use of arrays and linked lists is very familiar, nothing particularly difficult, the basic logic: just need to create a new MyHashTable class, this class has several attributes: an array, an attribute of int type identifies the true capacity of the array; it is better to have a node class as a static inner class, which implements the operation of adding, deleting, changing and querying the linked list. Then write a hash function method in the MyHashTable class, according to the array subscript obtained by this hash function, and finally change the addition and deletion of the array!
At this point, the study of "what are the hash table knowledge points in the java data structure and algorithm" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.