Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to define the hash table of C # algorithm

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to define the hash table of C# algorithm". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

If all keys are small integers, we can use an array to implement an unordered symbol table, using the key as the index of the array and storing its corresponding value at key I in the array. The hash table is used to deal with this situation, it is an extension of the easy method and can handle more complex types of keys. We need to use an arithmetic operation to convert the key to the index of the array to access the key-value pairs in the array.

The search algorithm that uses a hash table is divided into two steps. The first step is to use the hash function to convert the searched key to an index of the array. Ideally, different keys can be converted into different index values. Of course, this is only ideal, so we need to face a situation where two or more keys are hashed to the same index value. So the second step in hash lookup is a process of dealing with collision conflicts. The methods to solve the collision: zipper method and linear detection method.

Hash table is a classic example of how algorithms make trade-offs in time and space. If there is no memory limit, we can directly index the key as (possibly oversized) array, then all lookup operations only need to be accessed once. On the other hand, if there is no time limit, we can use unordered arrays and do sequential lookups, which requires very little memory. The hash table uses a moderate amount of space and time and finds a balance between the two extremes. We only need to adjust the parameters of the hash algorithm to make a choice between space and time.

A hash table can be used to achieve a symbol table for constant-level search and insertion operations in general applications (after sharing). This makes it the best choice for implementing simple symbol tables in many cases.

1. Hash function

The first problem with the hash algorithm is the calculation of the hash function, which converts the key into the index of the array. If we have an array that can hold M key-value pairs, then we need a hash function that converts any key into an index in the array range (an integer in the range of [0, Mmur1]). The hash function we are looking for should be easy to calculate and be able to distribute all keys evenly, that is, for any key, every integer between 0 and Mmur1 has an equal possibility to correspond to it (independent of the key). To understand hashing, you must first think about how to implement a hash function.

The hash function is related to the type of key. Strictly speaking, a hash function is required for each type of key. If the key is a number, such as a social security number, we can use it directly; if the key is a string, such as a person's name, we need to convert the string into a number; if the key contains multiple parts, such as an e-mail address, we need to combine these parts in some way.

Suppose we have an application whose key is the Social Security number of the United States. Social security numbers such as 123-45-6789 are nine digits divided into three fields. The first field identifies the location of the number issued by the geographic area (for example, the number with field 035 in the first field is from Rhode Island and the number with field 214 in the first field is from Maryland), and the other two fields identify the individual. There are a billion different social security numbers, but assuming our application only needs to process a few hundred keys, we can use a hash table of size M = 1000. One possible way to implement a hash function is to use numbers from three keys. Using the three digits in the right field may be preferable to using the three digits in the left field (because customers may be unevenly distributed geographically), but it is better to use an int value of all nine digits, and then consider the hash function of the integer.

Positive integer

The most common method of hashing integers is the method of dividing the remainder. We choose an array whose size is prime M. for any positive integer k, we calculate the remainder of k divided by M. The calculation of this function is simple and can effectively spread the bonds in the range of 0 to Mmur1. If M is not a prime, we may not be able to take advantage of all the information contained in the key, which may result in a non-uniform hash of values. For example, if the key is a decimal number and M is 10 ^ k, then we can only use the last k bit of the key. But if you use the prime number 97, the distribution of hash values is obviously better (a prime number farther away from 100 is better).

Floating point number

If the key is a real number between 0 and 1, we can multiply by M and round to the nearest integer to get an index between 0 and Mmur1. Although intuitive, this approach is flawed because it gives more weight to the most significant bits of the key. The least significant bit does not work. One way to solve this situation is to represent the key as a bit binary number and then use the division remainder method.

String

In addition to the residue method, you can also handle longer keys, such as strings, which we just need to treat as large integers:

Int hash = 0 for (int I = 0 for I)

< s.Length;i++){ hash = (R * hash + s.CharAt(i)) % M; } 如果 R 比任何字符的值都大,这种计算相当于将字符串当作一个 N 位的 R 进制值,将它除以 M 并取余。一种叫做 Horner 方法的经典算法用 N 次乘法,加法和取余来计算一个字符串的散列值。只要 R 足够小(如 31),不造成溢出,那么结果就能落在 0 至 M-1 之内。 组合键 如果键类型具有多个整数字段,则通常可以按照刚才针对String值所述的方式将它们混合在一起。 将 HashCode() 的返回值转化为一个数组索引 由于我们的目标是数组索引,而不是32位整数,因此我们在实现中将 HashCode() 和除留余数法结合,以产生0到M-1之间的整数: private int Hash(Key x){ return (x.HashCode() & 0x7fffffff) % M; } 这段代码会将符号位屏蔽(将一个 32 位整数变为一个 31 位非负整数),然后用除留余数法。在使用这样的代码时我们一般会将数组的大小 M 取为素数以充分利用原散列值的所有位。 自定义的 HashCode 自定义的 HashCode() 需要将键平均地散布为所有可能的 32 位整数。也就是说,对于任意对象 x ,调用 x.HashCode() 有均等的机会得到 2^32 个不同整数中的任意一个 32 位整数值。更简单的方法:对实例变量使用hashCode()方法将每个实例变量转换为32位int值,然后进行算术运算。 public class Transaction { private string who; private string when; private double amount; public int HashCode() { int hash = 17; hash = 31 * hash + who.GetHashCode(); hash = 31 * hash + when.GetHashCode(); hash = 31 * hash + amount.GetHashCode(); return hash; } } 系数的具体值(这里是 31)并不是很重要。 软缓存 如果散列值的计算很耗时,那么我们可以将每个键的散列值缓存起来。第一次调用 HashCode() 时,我们需要计算对象的散列值,但之后可以直接返回缓存。 总的来说,要为一个数据类型实现一个优秀的散列方法需要满足三个条件: 一致性:等价的键必然产生相等的散列值; 高效性:计算简便; 均匀性:均匀地散列所有的键。 在有性能要求时应该谨慎使用散列,因为糟糕的散列函数经常是性能问题的罪魁祸首。保证均匀性的最好办法也许就是保证键的每一位都在散列值的计算中起到了相同的作用。实现散列函数最常见的错误也许就是忽略了键的高位。无论散列函数的实现是什么,当性能很重要时应该测试所使用的散列函数: 计算散列函数和比较两个键,哪个耗时更多? 你的散列函数能够将一组键均匀地散布在 0到 M-1之间吗? 用简单的实现测试这些问题能够预防未来的悲剧。 这些讨论的背后是我们在使用散列时作出一个重要的假设(均匀散列假设),我们使用的散列函数能够均匀并独立地将所有键散布于 0 到 M-1 之间。这个假设是一个我们实际上无法达到的理想模型,但它是我们实现散列函数时的指导思想。原因有两点:一是设计散列函数时尽量避免随意指定参数以防止大量的碰撞,这是我们的重要目标;二是它提示我们使用数学分析来预测散列算法的性能并在实验中进行验证。 2.基于拉链法的散列表 一个散列函数能够将键转化为数组索引。散列算法的第二步是碰撞处理,也就是处理两个或多个键的散列值相同的情况。一种直接的方法是将大小为 M 的数组中的每个元素指向一条链表,链表中的每个结点都存储了散列值为该元素的索引的键值对,这种方法称为拉链法。 这个方法的基本思想就是选择足够大的 M ,使得所有链表都尽可能短以保证高效的查找。查找分两步:首先根据散列值找到对应的链表,然后沿着链表顺序查找对应的键。 拉链法的一种简单实现方法是,为 M 个元素分别构建符号表来保存散列到这里的键,可以使用之前查找树的代码。 因为我们要用 M 条链表保存 N 个键,无论键在各个链表中额分布如何,链表的平均长度肯定是 N/M。 public class SeparateChainingHashST { private int N;//键值总对数 private int M;//散列表的大小 private SequentialSearchST[] ST;//存放链表对象的数组 public SeparateChainingHashST(int M) { this.M = M; ST = new SequentialSearchST()[M]; for (var i = 0; i < M; i++) { ST[i] = new SequentialSearchST(); } } private int Hash(Key key) { return (key.GetHashCode() & 0x7fffffff) % M; } public Value Get(Key key) { return ST[Hash(key)].Get(key); } public void Put(Key key, Value value) { ST[Hash(key)].Put(key,value); } } 当你能预知所需要的符号表的大小时,这段短小的方案能够得到不错的性能。一种更可靠的方案是动态调整数组的大小。 在一张含有 M 条链表和 N 个键的散列表中,未命中查找和插入操作所需的比较次数为 ~N/M。 散列表的大小 在实现基于拉链法的散列表时,我们的目标是选择适当的数组大小 M,既不会因为空链表而浪费大量内存,也不会因为链表太长而在查找上浪费太多时间。而拉链法的一个好处就是这并不是关键性的选择。如果存入的键多于预期,查找所需的时间只会比选择更大的数组稍长;如果少于预期,虽然空间浪费但查找会非常快。当内存不是很紧张时,可以选择一个足够大的 M,使得查找需要的时间变为常数;当内存紧张时,选择尽量大的 M 仍然能够将性能提高 M倍。另一种方法是动态调整数组的大小以保持短小的链表。 删除操作 要删除一个键值对,先用散列值找到含有该键的SequentialSearchST 对象,然后调用该对象的 Delete 方法。 有序性相关的操作 散列最主要的目的在于均匀地将键散布开来,因此在计算散列后键的顺序信息就丢失了。基于拉链法的散列表实现简单,在键的顺序不重要的应用中,他可能是最快的,也是使用最广泛的符号表实现。 3.基于线性探测法的散列表 实现散列表的另一种方式就是用大小为 M 的数组保存 N 个键值对,其中 M >

N . We need to rely on spaces in the array to resolve collision conflicts. All methods based on this strategy are collectively referred to as open address hash tables.

The simplest method in an open address hash table is called linear detection: when a collision occurs (when the hash value of a key is already occupied by a different key), we directly check the next position of the hash table (add the index value by one). Such a linear probe may produce three results:

Hit: the key at this location is the same as the key found

Missed: the key is empty (there is no key at this position)

Continue to search: the key in this location is different from the key being found.

We use the hash function to find the index of the key in the array and check whether the key and the key being found are the same. If not, continue to search (enlarge the index and fold back to the beginning of the array when you reach the end of the array) until you find the key or encounter an empty element. The operation that we will check whether an array location contains the key being looked up is called a probe.

The core idea of the hash table of an open address class is that instead of using memory as a linked list, it is better to use them as empty elements in the hash table, which can be used as a sign for the end of the search. We used a parallel array, a save key and a save value in our implementation.

Total number of key-value pairs in the public class LinerProbingHashST {private int Nstrike / symbol table private int M = 16 new Key / size of the linear probe table private Key [] keys;// key private Value [] values;// value public LinerProbingHashST () {keys = new Key [M]; values = new Value [M] } private int Hash (Key key) {return (key.GetHashCode () & 0x7ffffff)% M;} public void Put (Key key, Value value) {if (N > = M / 2) Resize (2cm); int i; for (I = Hash (key); keys [I]! = null I = (I + 1)% M) {if (Keys [I] .equals (key)) {values [I] = value; return;}} keys [I] = key; values [I] = value; equation + } public Value Get (Key key) {for (int I = Hash (key); keys [I]! = null;i = (iTun1)% M) {if (keys [I] .equals (key)) return values [I];} return default (Value) } / resize the array / private void Resize (int v) {throw new NotImplementedException ();}}

Like the zipper method, the performance of the hash table of an open address class also depends on the ratio of α = Nhammer M, but the meaning is different. We call alpha the usage of the hash table. For hash tables based on zipper method, α is the length of each linked list, so it is generally greater than 1; for hash tables based on linear detection, α is the proportion of space occupied in the table, and it cannot be greater than 1. In fact, we do not allow alpha to reach 1 in LinerProbingHashST (the hash table is full), because a missed lookup at this time can lead to an infinite loop. To ensure performance, the array will be dynamically resized to ensure that the usage is between 1max 8 and 1max 2.

Delete operation

How do I remove a key from a hash table based on linear probes? Setting the location of the key directly to null will make the element after that location unsearchable. So we need to reinsert all the keys on the right side of the cluster that were deleted into the list.

Public void Delete (Key key) {if (! keys.Contains (key)) return; int i = Hash (key); while (! key.Equals (Keys [I])) I = (I + 1)% M; keys [I] = default (Key); values [I] = default (Value) I = (I + 1)% M; while (keys [I]! = null) {Key keyToRedo = keys [I]; Value valueToRedo = values [I]; keys [I] = default (Key); values [I] = default (Value) / / reinsert Put (keyToRedo,valueToRedo); I = (I + 1)% M;} N Murray; if (N > 0 & N > = M / 8) Resize (M hand 2);} bond cluster

The average cost of linear detection depends on a continuous set of entries, also known as key clusters, after elements are re-inserted into an array. Obviously, only short bond clusters can ensure high efficiency. As more and more keys are inserted, this requirement is difficult to meet, and there will be more and more longer bond clusters. In addition, based on the assumption of uniformity, every position of the array has the same possibility to insert a new key, and the long key cluster is more likely to be longer than the short key cluster, because the hash value of the new key increases the length of the cluster no matter where it falls in the cluster.

Performance Analysis of Linear Detection method

Although the form of the final result is relatively simple, it is very difficult to accurately analyze the performance of the linear detection method.

In a hash table based on linear detection, which is M in size and contains N = α M bonds, based on the assumption of uniformity, the detection times required for hits and misses are: ~ 1max 2 (1 + (1 / (1-α)) and ~ 1max 2 (1 + (1 / (1-α) ^ 2), respectively. Especially when α is about 1 beat 2, the number of times needed to find a hit is about 3 hand 2, and the number of times it takes to miss is about 5 hand 2. When α approaches 1, the accuracy of these estimates decreases, and we will ensure that the utilization rate of the hash table is less than 1 prime 2. Let's take a look at dynamically resizing the array.

Resize the array private void Resize (int cap) {LinearProbingHashST t = new LinearProbingHashST (cap); for (int I = 0 / M = 8 / M, Resize (2 / M); when N > 0 & N

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report