In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces the relevant knowledge of how to write the Java hash code, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe you will gain something after reading this Java hash code. Let's take a look at it.
1. The introduction of hash function
We have all used dictionaries. The advantage of dictionaries is that we can quickly locate the words we are looking for through the previous directory. If we want to write every word in an English dictionary, from a to zyzzyva (the last word in the Oxford dictionary), into computer memory so that we can read and write quickly, then a hash table is a good choice.
Here we narrow it down a bit, such as wanting to store 5000 English words in memory. We may think that each word will occupy an array unit, so the size of the array is 5000, and we can use the array subscript to access the word, which is perfect, but how do the array subscript relate to the word?
First, we need to establish the relationship between words and numbers (array subscript):
We know that ASCII is a kind of code, in which a means 97 and b represents 98, and so on, until 122represents z, and each word is made up of these 26 letters. We can design a set of codes similar to ASCII instead of ASCII coding, such as a for 1, for 2, z for 26, and so on.
Then how do you combine the numbers of a single letter into numbers that represent the whole word?
①, add the numbers
First, the first simple way is to add the numbers represented by each letter of the word, and the sum is the subscript of the array.
For example, the word cats is converted into a number:
Cats = 3 + 1 + 20 + 19 = 43
Then the subscript of the word cats stored in the array is 43, and all English words can be converted into array subscripts in this way. But is this really feasible?
Suppose we agree that a word has at most 10 letters, then the last word in the dictionary is zzzzzzzzzz, which is converted to a number:
Zzzzzzzzzz = 26,10 = 260
Then we can get that the range of word coding is from 1 to 260. Obviously, this range is not enough to store 5000 words, so there must be a location for storing multiple words, with an average of 5000 words per array of data items (5000 divided by 260).
How can we solve the above problems?
The first method: consider that each array item contains a subarray or a sublinked list, which is really fast to store data items, but it is still slow if we want to find one of the 192 words.
The second method: why let so many words occupy the same data item? In other words, we do not divide the words enough, the array can represent too few elements, we need to expand the subscript of the array so that only one word is stored in each position.
For the second method above, the question arises, how do we extend the subscript of the array?
The joint multiplication of ② and power
We split the number represented by the word into a series, multiply these digits by the appropriate power of 27 (because there are 26 possible characters and spaces, a total of 27), and then add the product to get a unique number for each word.
For example, convert the word cats into a number:
Cats = 3273 + 1272 + 20271 + 19,270 = 59049 + 729 + 540 + 19 = 60337
This process creates a unique number for each word, but note that we only calculate the 4-letter word here. If the word is very long, such as the longest 10-letter word zzzzzzzzzz, only 279 results in excess of 7000000000000. The result is so huge that it is impossible to allocate so much space for an array in real memory.
So the problem with this scheme is that although a unique subscript is assigned to each word, only a small number of words are stored, and a large number of them are empty. So now we need a way to compress the huge range of integers obtained in the digital power multiplication system into an acceptable range of arrays.
For English dictionaries, suppose there are only 5000 words, and here we choose an array space with a capacity of 10000 to store (we will explain why we need twice as much space later). Then we need to compress the range from 0 to more than 7000000000000 to the range from 0 to 10000.
The first method is to take the remainder and get the remainder after one number is divided by another integer. First of all, let's assume that we want to compress the number from 0-199 (represented by largeNumber) to a number from 0-9 (represented by smallNumber), which has 10 numbers, so the value of the variable smallRange is 10, and the expression for this conversion is:
SmallNumber = largeNumber% smallRange
When a number is divisible by 10, the remainder must be between 0 and 9, so we compress the number from 0-199 to 0-9 with a compression ratio of 20: 1.
We can also use a similar method to compress the unique number representing the word into the subscript of the array:
ArrayIndex = largerNumber% smallRange
This is the hash function. It hashes a large range of numbers into a small range of numbers that corresponds to the subscript of the array. After inserting data into an array using a hash function, the array is the hash table.
2. Conflict
If you compress a large range of numbers into a smaller range of numbers, it is certain that several different words will be hashed to the same array subscript, resulting in a conflict.
Conflicts may prevent the hashing scheme from being implemented. As we said earlier, the specified array range is twice the size of the actual stored data, so half of the space may be free, so when conflicts occur, one method is to find an empty space in the array through a systematic method and fill in the word instead of using the hash function to get the subscript of the array, which is called the open address method. For example, if the hash result of adding the word cats is 5421, but its position is already occupied by the word parsnip, then we will consider storing the word cats in a position 5422 after parsnip.
Another method, as we mentioned earlier, is that each data item in the array creates a sublinked list or subarray, so words are not directly stored in the array, and when conflicts occur, the new data items are directly stored in the linked list represented by the subscript of the array, which is called the chain address method.
3. Open address method
In the development address method, if the data item cannot be directly stored in the array subscript calculated by the hash function, it is necessary to look for other locations. There are three methods: linear detection, secondary detection and re-hashing.
①, linear detection
In linear detection, it will linearly look for blank cells. For example, if 5421 is the location where the data is to be inserted, but it is already occupied, then use 5422, if 5422 is also occupied, use 5423, and so on, the array subscript is incremented in turn until a blank location is found. This is called linear detection because it looks for blank cells step by step along the array subscript.
Complete code:
It is important to note that when the hash table becomes too full, we need to extend the array, but note that the data item cannot be placed in the same position in the new array as the old array, but the insertion position is recalculated based on the size of the array. This is a time-consuming process, so generally we have to determine the scope of the data, give the size of a good array, and no longer expand.
In addition, when the hash table becomes full, we frequently detect the insertion position every time we insert new data, because many locations may be occupied by the previously inserted data, which is called aggregation. The more full the array, the more likely it is that aggregation will occur.
This is like a crowd, when someone faints in the mall, the crowd will slowly gather. The first crowd gathered because they saw the fallen man, while the people gathered behind because they wanted to know what they were looking at. The larger the crowd, the more people will be attracted.
②, loading factor
The ratio of data items to table length that has been filled in a hash table is called a loading factor. For example, a hash table with 10000 units filled with 6667 data has a loading factor of 2max 3. When the loading factor is not too large, the aggregation distribution is more coherent, while when the loading factor is relatively large, the aggregation occurs greatly.
We know that linear detection is a step-by-step backward detection, and when the loading factor is relatively large, aggregation will occur frequently, so if we detect larger units instead of step-by-step detection, this is the second detection to be discussed below.
③, secondary detection
Second detection is a way to prevent aggregation. The idea is to detect units that are far away from each other, rather than units adjacent to the original location.
In linear detection, if the original subscript calculated by the hash function is x, the linear detection is x, and so on, while in the second detection, the process of detection is x, 9, x, 16, and so on, the distance to the original position is the square of steps. Although the secondary detection eliminates the original aggregation problem, it produces another more detailed aggregation problem, called secondary aggregation: for example, 184302420 and 544 are inserted into the table in turn, and their mapping is 7. As long as there is a keyword mapped to 7, a longer step is required, a phenomenon called secondary aggregation. Secondary aggregation is not a serious problem, but secondary detection is not often used because there are good solutions, such as re-hashing.
④, re-hash method
In order to eliminate the original aggregation and secondary aggregation, we use another method: the re-hashing method.
We know that the reason for secondary aggregation is that the step size of the detection sequence generated by the second detection algorithm is always fixed: 1, 4, 9, 16, and so on. So what we think of is that we need to generate a detection sequence that depends on keywords, instead of every keyword is the same, then different keywords can use different probe sequences even if they are mapped to the same array subscript.
The method is to hash the keyword again with a different hash function, using this result as the step size. For the specified keyword, the step size is constant throughout the probe, but different keywords use different step sizes.
The second hash function must have the following characteristics:
1. Different from the first hash function
Second, can not output 0 (otherwise, there will be no step, each detection will be standing still, the algorithm will fall into a dead loop).
Experts have found that the following form of hash function works very well: stepSize = constant-key% constant;, where constant is a prime number and is less than the array capacity.
The re-hashing method requires that the capacity of the table is a prime number. If the table length is 15 (0-14), the non-prime number, there is a specific keyword mapped to 0, and the step size is 5, then the detection sequence is 0, 5, 5, 10, 10, 10, and so on. The algorithm only tries these three units, so it is impossible to find some blank units, and eventually the algorithm crashes. If the array capacity is 13, prime, the probe sequence will eventually access all units. That is to say, 0pr 5pr 10je 2jr 7jr 12je 4jr 9jol 1pr 6pr 11pr 3, go on, as long as there is an empty space in the table, it can be detected.
Complete re-hash code:
Package com.ys.hash;public class HashDouble {private DataItem [] hashArray; / / DataItem class, indicating how many items of data are actually stored in the initial size of the private int arraySize;// array private DataItem nonItem;// () {this.arraySize = 13; hashArray = new DataItem [arraySize]; nonItem = new DataItem (- 1) / / the deleted data item is subscript-1} / / to determine whether the array is full of public boolean isFull () {return (itemNum = = arraySize);} / / determine whether the array is empty public boolean isEmpty () {return (itemNum = = 0);} / / print array contents public void display () {System.out.println ("Table:"); for (int j = 0) J < arraySize; jacks +) {if (hashArray [j]! = null) {System.out.print (hashArray [j] .getKey () + ");} else {System.out.print (" * * ") } / / the array subscript public int hashFunction1 (int key) {return key%arraySize;} public int hashFunction2 (int key) {return 5-key%5 is obtained by hash function conversion. } / / insert data item public void insert (DataItem item) {if (isFull ()) {/ / extended hash table System.out.println ("Hash table is full, re-hash..."); extendHashTable ();} int key = item.getKey (); int hashVal = hashFunction1 (key); int stepSize = hashFunction2 (key) / / use the second hash function to calculate the number of detection steps while (hashArray [hashVal]! = null & & hashArray [hashVal] .getKey ()! =-1) {hashVal + = stepSize; hashVal% = arraySize;// to detect backward with the specified number of steps} hashArray [hashVal] = item; itemNum++ The array has a fixed size and cannot be extended, so the extended hash table can only create a larger array, and then insert the data from the old array into the new array. * but the hash table calculates the location of the given data based on the size of the array, so these data items can no longer be placed in the same position as the old array in the new array. * therefore, you cannot copy directly, you need to traverse the old array sequentially and use the insert method to insert each data item into the new array. * this process is called re-hashing. This is a time-consuming process, but it is necessary if the array is to be extended. * / public void extendHashTable () {int num = arraySize; itemNum = 0 arraySize / recount, as the following is to transfer the original data to the new expanded array arraySize * = 2 arraySize / the array doubles the size of the array [] oldHashArray = hashArray; hashArray = new DataItem [arraySize]; for (int I = 0; I < HashArray [I]) {insert (OldHashArray [I]) }} / / Delete data item public DataItem delete (int key) {if (isEmpty ()) {System.out.println ("Hash Table isEmpty!"); return null;} int hashVal = hashFunction1 (key); int stepSize = hashFunction2 (key) While (hashArray [hashVal]! = null) {if (hashArray [hashVal] .getKey () = = key) {DataItem temp = hashArray [hashVal]; hashArray [hashVal] = nonItem;//nonItem indicates empty Item with key of-1 itemNum--; return temp;} hashVal + = stepSize; hashVal% = arraySize } return null;} / / find the data item public DataItem find (int key) {int hashVal = hashFunction1 (key); int stepSize = hashFunction2 (key); while (hashArray [hashVal]! = null) {if (hashArray [hashVal] .getKey () = key) {return hashArray [hashVal];} hashVal + = stepSize HashVal% = arraySize;} return null;} public static class DataItem {private int iData; public DataItem (int iData) {this.iData = iData;} public int getKey () {return iData;}} 4, chain address method
In the open address method, we use the re-hash method to find an empty space to solve the conflict problem, and another method is to set a linked list in each cell of the hash table (that is, the chain address method). The key value of a data item is still mapped to the cell of the hash table as usual, and the data item itself is inserted into the linked list of this unit. Other data items that are also mapped to this location only need to be added to the linked list and do not need to look for gaps in the original array.
Ordered linked list:
Package com.ys.hash;public class SortLink {private LinkNode first; public SortLink () {first = null;} public boolean isEmpty () {return (first = = null);} public void insert (LinkNode node) {int key = node.getKey (); LinkNode previous = null; LinkNode current = first; while (current! = null & & current.getKey () < key) {previous = current Current = current.next;} if (previous = = null) {first = node;} else {previous.next = node;} node.next = curent;} public void delete (int key) {LinkNode previous = null; LinkNode current = first If (isEmpty ()) {System.out.println ("Linked isEmpty!!"); return;} while (current! = null & & current.getKey ()! = key) {previous = current; current = current.next;} if (previous = = null) {first = first.next } else {previous.next = current.next;}} public LinkNode find (int key) {LinkNode current = first; while (current! = null & & current.getKey ()
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.