In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article will explain in detail the example analysis of hash table in c language. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.
The concept of hash table 1. Search algorithm
when we look for the existence of a data element in a linked or ordered list, the only way is to traverse the entire table, which is called linear enumeration.
if the order table is ordered at this time, we can find it by halving it. This method is called binary enumeration.
The time complexity of linear enumeration is O (n). The time complexity of binary enumeration is O (log2n). Is there a faster way to find it? This is a new data structure to be introduced-- hash table.
2. Hash table
is not a sequential structure, so many data structures are called hash tables in books. The following will be explained in the form of a hash table. As a reader, you only need to know that the two are the same data structure.
We call the process of finding the location of the data stored by a function mapping the hash. Several concepts are involved here:
a) the data that needs to be found is itself called a keyword
b) in the process of changing a keyword to a hash value through function mapping, the function here is called a hash function
c) the process of generating hash values may produce conflicts and need to be resolved
d) after the conflict is resolved, the actual location where the data is stored is called the hash address, which is, in popular terms, an array subscript
e) the data structure that stores all this data is a hash table, which is generally implemented in an array, so it is also called a hash array. The whole process is shown in the following figure:
3. Hash array
to facilitate subscript indexing, the underlying implementation structure of the hash table is an array, the array type can be of any type, and each location is called a slot. As shown in the following figure, it represents a hash table of length 8, also known as a hash array.
4. Keywords
The keyword is an element in a hash array and can be of any type. It can be an integer, a floating point, a character, a string, or even a structure or class. The following A, C, M can all be keywords
Int A = 5 C [100] = "Hello World!"; struct Obj {}; Obj M
In the process of implementing the hash table, we need to convert a non-integer keyword into an array subscript, that is, a hash value, so as to quickly index to its corresponding position through O (1) time.
, the means to convert a non-integer keyword to an integer is a hash function.
5. Hash function
The hash function can be simply understood as the function in the primary school textbook, that is,
Y = f (x)
Where f (x) is the hash function, x is the keyword and y is the hash value. A good hash function should have the following two characteristics:
a) injective
b) Avalanche effect: a change of 1 bit of the input value x can cause a change of at least half of the output value y
monojection is easy to understand. When the hash value y is known in figure (a) (a) (a), the key x may have two cases, not a monojection, while when the hash value y is known in graph (b), the key x must be unique, so it is monojective. Since x and y correspond one to one, the conflict is reduced in essence.
The avalanche effect is to make the hash value more in line with the principle of random distribution. The more random the key distribution in the hash table is, the higher the utilization and efficiency is.
The hash functions commonly used in are: direct address method, division residue method, digital analysis method, square center method, folding method, random number method and so on. The content of the hash function will be explained in detail below.
6. Hash conflict
In the process of generating hash value, hash function is called hash conflict if it generates different keywords and gets the same hash value.
is for the hash function ytransif (x), when the keyword x1 ≠ x2, but there is f (x1) = f (x2), we need to resolve the conflict.
There are many ways to resolve conflicts, such as open address method, rehash function method, chain address method, common overflow zone method and so on. The content of conflict resolution will be explained in detail below.
7. Hash address
The hash address is an array subscript, that is, the subscript of the hash array. The data is obtained by subscript, which is called index. The subscript is obtained from the data, which is called hash. At work, one of the words used to communicate with colleagues is hash.
Second, commonly used hash function 1. Direct addressing method
The direct addressing method means that the keyword itself is a hash value, indicating that the function value is
F (x) = x
for example, we need to count the number of occurrences of each character in a string. The range of any character is [0255], so you can store the number of occurrences of each character with a hash array of 256 in length, and this problem can be solved with a traversal enumeration. The C code is implemented as follows:
Int I, hash [256]; for (I = 0; str [I]; + I) {+ + hash [str [I]];}
is the implementation of the most basic direct addressing method. Hash [c] represents the number of occurrences of the character c in the string str.
2. Square centering method
square middle method is to square the keyword, and then take some bits in the middle as the hash value.
For , for example, for the keyword 1314, the square is 1726596, and the middle three bits are taken as the hash value, that is, 265.
The square middle method is more suitable for cases where the distribution of keywords is not clear and the number of digits is not very large.
3. Folding method
folding method divides the keyword into equal parts (note that the last part can be shorter if the number of digits is not enough), and then sums it up to get a hash.
, for example, for the keyword 5201314, divide it into four groups and add it up to: 52, 01, 31, 4, 88, which is the hash.
folding method is more suitable for situations where the distribution of keywords is not clear, but the number of keywords is large.
4. The method of division and residue
The division residue method is the length of the hash table on the keyword module, indicating that the value of the function is
F (x) = x mod m
where m represents the length of the hash table, this method can not only directly take the mode of the keyword, but also take the module after the square middle method and folding method.
for example, for a hash array of length 4, we can get a hash value from the keyword module 4, as shown in the figure:
5. Position and law
The length of hash array generally chooses the power of 2, because we know that modular operation is more time-consuming, and bit operation is relatively efficient.
chooses the power of 2 as the length of the array to convert modular operations into binary bits and.
makes m = 2 ^ k, then its binary representation is:
The m on any mathematical model is equivalent to taking the binary low k bit of m, and
So the effect of bit and m − 1 is the same That is:
In addition to the direct addressing method, the other three methods of may lead to hash conflicts. Next, let's discuss some commonly used solutions to hash conflicts.
Third, common hash conflict resolution 1, open addressing method 1) principle explanation
open addressing method is to find the next empty address once there is a conflict. As long as the hash table is large enough, you can always find an empty location and record it as its hash address. The formula is as follows:
The di here in is a sequence of numbers, which can be a constant sequence (1 − 1) or an equidifference sequence (1).
2) Animation demonstration
In the figure above , the hash function algorithm is the division remainder method, and the hash conflict solution is the open addressing method. Each data in the hash table is a keyword, which needs to be looked up before insertion. If the location found has not been inserted, insert is performed; otherwise, find the next uninserted location to insert. A total of 6 data were inserted, namely: 11, 12, 13, 20, 19, 28.
The method needs to note that when the inserted data exceeds the length of the hash table, the insert can no longer be performed.
this paper uses the open addressing method of constant series when explaining the reality of hash table in the fourth chapter.
2. Rehash function method 1) principle explanation
rehash function method is that in case of conflict, another hash function is used, which can be a combination of square middle method, folding method, division residue method and so on. Generally, two hash functions are used, and the probability of conflict is very small.
rehash function method can make keywords do not produce aggregation, of course, it will also increase the computing time of a lot of hash functions.
2) Animation demonstration
To be replenished
3. Chain address method 1) principle explanation
of course, after the conflict, we can also choose not to change the position, or in the original position, but to concatenate the same hash value with a linked list. This method is called chain address method.
2) Animation demonstration
In the figure above , the hash function algorithm is the division remainder method, and the hash conflict solution is the chain address method. Each data of the hash table retains a link header node and a tail node, which needs to be searched before insertion. If the location found is not empty, the tail node is inserted and the tail node is updated; otherwise, a new list head node and tail node are generated. A total of 6 data were inserted, namely: 11, 12, 13, 20, 19, 28.
4. Public spillover zone method 1) principle explanation
Once generates conflicting data, it is uniformly put into another order table. Every time you look up the data, the keyword in the hash array is equal to the given keyword, then the search is considered successful; otherwise, it goes to the common overflow area to look for it sequentially, which is called the public overflow zone method.
The method is suitable for situations with fewer conflicts.
2) Animation demonstration
To be replenished
Implementation of hash table 1. Definition of data structure
since the underlying storage of the hash table is still an array, we can define a structure in which we define a member of an array type, and we can record a size field if we need to record the number of hash table elements.
The C language implementation is as follows:
# define maxn (1data [* addr]! = key) {/ / (3) * addr = HashGetAddr (* addr + 1); / / (4) if (ht- > data [* addr] = = NULLKEY) / / (5) return 0; if (* addr = startaddr) / / (6) return 0 } return 1; / / (7)}
(1) A hash value startaddr is calculated according to the hash function HashGetAddr.
(2) addr is required as the return value, so dereferencing is needed.
(3) search the addr correspondence in the hash table. If it is not empty, continue (4). Otherwise, jump out of the loop.
(4) find a position back
(5) if a vacancy is found, the keyword has no corresponding data in the hash table, and 0 is returned directly, which means the search failed.
(6) indicates that the entire hash table has been traversed and no suitable keyword has been found. A direct return of 0 indicates that the search failed.
(7) otherwise, return 1 indicates that the search is successful.
5. Hash table insertion
When there is a hash conflict (that is, when there is no suitable position), the next adjacent position is found, that is, the addressing number is listed as a constant sequence (1, 1, 1, 1, and 1). Insert needs to be aware that when the hash table is slow, the insert operation can no longer be performed. The C language code is implemented as follows:
Int HashInsert (HashTable * ht, DataType key) {int addr = HashGetAddr (key); / / (1) int retaddr; if (HashSearchKey (ht, key, & retaddr)) {/ / (2) return retaddr;} while (ht- > data [addr]! = NULLKEY) / / (3) addr = HashGetAddr (addr + 1) / / (4) ht- > data [addr] = key; / / (5) return addr; / / (6)}
(1) A hash value addr is calculated according to the hash function HashGetAddr.
(2) you need to find out whether it exists before inserting. If it already exists, do not perform the insertion.
(3) look up the addr correspondence in the hash table, and if it is not empty, continue (3); otherwise, jump out of the loop
(4) find a position back and continue to judge whether it is empty or not.
(5) jumping out of the loop means that the addr position of the current hash table is not occupied by other elements, so it can be inserted as the current key position.
(6) return addr as the hash address of key
6. Hash table deletion
has the basis to find, delete operation is relatively simple, if can not find the location of a keyword, then no operation on the hash table, return an empty keyword; otherwise, the location found will be assigned as an empty keyword, and return the deleted location
Int HashRemove (HashTable * ht, DataType key) {int addr; if (! HashSearchKey (ht, key, & addr)) {/ / (1) return NULLKEY;} ht- > data [addr] = NULLKEY; / / (2) return addr;}
(1) perform a lookup first
(2) for the location found, clear the location keyword
7. Complete implementation of hash table
Finally, gives a complete implementation of the hash table of the open addressing method, as follows:
/ * Hash table open addressing method * / # define maxn (1data [addr]! = NULLKEY) addr = HashGetAddr (addr + 1); ht- > data [addr] = key; return addr;} int HashRemove (HashTable * ht, DataType key) {int addr If (! HashSearchKey (ht, key, & addr)) {return NULLKEY;} ht- > data [addr] = NULLKEY; return addr } / * Hash table open addressing method * / this is the end of the article on "sample analysis of hash tables in c language". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good Please share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.