What are the collection classes that implement the hash table data structure in C # 07/04 Update SLTechnology News&Howtos

What are the collection classes that implement the hash table data structure in C #

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces what are the collection classes that realize the data structure of hash table in C#. It is very detailed and has certain reference value. Friends who are interested must finish reading it.

The collection classes that implement hash table data structures in C # are:

(1) System.Collections.Hashtable

(2) System.Collections.Generic.Dictionary

The former is a general type of hash table, while the latter is a generic version of the hash table. Dictionary and Hashtable are not just a simple difference between generics and non-generics, they use a completely different hash conflict resolution. Dictionary I have done a dynamic demo program, using the Window application. Although Dictionary is more beautiful and beautiful than Hashtable, it is also a pity that Hashtable is not equipped with a dynamic demo program. This time using Silverlight to make, the reason is very simple, it can be hung on the Internet for everyone to easily watch.

First, let's take a look at the effect. It should be noted here that Silverlight 2.0 RTW must be installed to run the game properly. Download address: http://www.microsoft.com/silverlight/resources/install.aspx?v=2.0

Only integers are accepted in the key edit box in the program, because the hash code of the integer is the integer itself, which allows you to see the changes in the hash table more intuitively. If an illegal character is entered, an integer is randomly selected from 0 to 999 for addition or deletion.

Hash conflict resolution

The goal of hash function is to minimize conflicts, but conflicts are unavoidable in practical applications, so when conflicts occur, there must be a corresponding solution. The possibility of conflict is related to the following two factors:

(1) loading factor α: the so-called loading factor refers to the ratio of the number of records n stored in the contract table to the size of the hash address space m, that is, α = n / m, the smaller α is, the smaller the possibility of conflict is; the larger α is, the greater the possibility of conflict is. This is easy to understand, because the smaller α is, the larger the proportion of free units in the hash table is, so the less likely it is that the record to be inserted will conflict with the inserted record; on the contrary, the larger α, the smaller the proportion of free units in the hash table, so the greater the possibility of conflict between the record to be inserted and the inserted record; on the other hand, the smaller α, the lower the utilization of storage storage. Conversely, the higher the utilization of storage storage. In order to reduce the occurrence of conflicts and improve the utilization of storage space, α is usually controlled within the range of 0.6-0.9, and the HashTable class of C # sets the * * value of α at 0.72.

(2) it is related to the hash function used. If the hash function is selected properly, the hash addresses can be evenly distributed in the hash address space, thus reducing the occurrence of conflicts; otherwise, the hash addresses may be concentrated in certain areas, thus increasing the possibility of conflicts.

Conflict resolution techniques can be divided into two categories: open hash (also known as chain address method) and closed hash (also known as open address method). A hash table is a contiguous address space implemented by an array. The difference between the two conflict resolution techniques is whether the conflicting elements are stored outside or within the space of the array:

(1) the conflicting elements of the open hash method are stored outside the array space. The word "open" can be understood as the need to "open" additional space to store conflicting elements.

(2) the conflicting elements of the closed hash method are stored in the array space. The word "closed" can be understood as all elements, whether there is a conflict or not, are "closed" in the array. The closed hash method is also known as the open address method, and the meaning index group space is open to all elements, regardless of conflict or not.

Closed hash method (open address method)

The closed hash method stores all elements in an array of hash tables. When a conflict occurs, look for an empty unit that can store records near the conflict location. The process of finding the "next" vacancy is called probing. The above method can be expressed by the following formula:

Hi= (h (key) + di)% Miyama 1, 2, … , k (k ≤ MMI 1)

Where h (key) is a hash function, m is a hash table length, and di is an incremental sequence. According to the value of di, it can be divided into several detection methods. Here is only the double hash method used by Hashtable.

Double hash method

The double hash method, also known as the second degree hash, is a better method in the closed hash method, which takes the value of another hash function of the keyword as an increment. If the two hash functions are H1 and H2, the detection sequence is:

(H1 (key) + H2 (key))% m, (H1 (key) + 2h2 (key))% m, (H1 (key) + 3h2 (key))% m, …

Where m is the hash table length. Thus, the formula for detecting the next open address by the double hash method is:

(H1 (key) + I * H2 (key))% m (1 ≤ I ≤ Mmur1)

There are many ways to define H2, but without any method, it is necessary to make the value of H2 (key) and m prime (also called coprime, which means that the * common divisor of two numbers is 1, or the two numbers have no common factor, except 1) in order to make the conflicting synonym addresses evenly distributed in the whole hash table, otherwise it may result in circular calculation of synonym addresses. If m is a prime, then any number between H2 and mmur1 is interprime with m, so H2 can be simply defined as:

H2 (key) = key% (m-2) + 1

Dissecting System.Collections.Hashtable

A GetHashCode () method is defined in the Mother of everything object class, and the default implementation of this method is to return a unique integer value to ensure that it is not modified during the lifetime of the object. Since each type derives directly or indirectly from object, all objects can access the method. Naturally, strings or other types can be represented by unique numeric values. In other words, the GetHashCode () method unifies the hash function construction methods for all objects. Of course, since the GetHashCode () method is a virtual method, you can also construct your own hash function by overriding this method.

The realization principle of Hashtable

Hashtable uses a closed hash method to resolve conflicts, which represents a single element in the hash table through a structure bucket, which has three members:

(1) key: indicates the key, that is, the keyword in the hash table.

(2) val: indicates the value, that is, the value corresponding to the keyword.

(3) hash_coll: it is an int type that represents the hash code corresponding to the key.

The int type occupies 32 bits of storage space, and its * bit is the symbol bit, which is a positive integer when it is "0" and a negative integer when it is "1". Hash_coll uses the * * bit to indicate whether there is a conflict in the current position. If it is "0", that is, if it is a positive number, there is no conflict; if it is "1", it means that there is a conflict in the current position. The reason why a bit is specially used to store the hash code and mark whether conflicts occur is mainly to improve the efficiency of the hash table. This point will be mentioned later.

Hashtable uses a double hash method to resolve conflicts, but it is slightly different from the double hash method mentioned earlier. It detects addresses in the following ways:

H (key, I) = H1 (key) + I * H2 (key)

The formulas of hash functions H1 and H2 are as follows:

H1 (key) = key.GetHashCode ()

H2 (key) = 1 + ((H1 (key) > 5) + 1)% (hashsize-1))

Due to the use of second-degree hash, the final value of h (key, I) may be greater than hashsize, so it is necessary to perform modular operation on h (key, I). The final calculated hash address is:

Hash address = h (key, I)% hashsize

[note]: the hash_coll field of the bucket structure stores the value of h (key, I) instead of the hash address.

All the elements of the hash table are stored in a bucket array called buckets (also known as a data bucket). The following shows the process of inserting and deleting data from a hash table, where the data elements are represented by (keys, values, hash codes). Note that this example assumes that the length of Hashtable is 11, that is, hashsize = 11, and only the first five elements are shown here.

(1) insert elements (K1meng v1pl 1) and (k2pr v2p2).

Because there is no conflict between the two inserted elements, the value of H1 (key)% hashsize is directly used as its hash code while H2 (key) is ignored. The effect is shown in figure 8.6.

(2) insert elements (k3pr v3pl 12)

The hash code of the newly inserted element is 12, and because the hash table length is 11 key 12% 11 = 1, the new element should be inserted at index 1, but because index 1 is already occupied by K1, the hash code needs to be recalculated using H2 (hash).

H2 (key) = 1 + ((H1 (key) > 5) + 1)% (hashsize-1))

H2 (key) = 1 + ((12 > > 5) + 1)% (11-1)) = 2

The new hash address is H1 (key) + I * H2 (key) = 1 + 1 * 2 = 3, so K3 is inserted at index 3. Because there is a conflict at index 1, you need to set its * bit to "1".

(1000000000000000000000000000000000000001) 2 = (- 2147483647) 10

The final effect is shown in figure 8.7.

(3) insert elements (k4menv4pence14)

The hash code of K4 is 14 14% 11 = 3, and index 3 has been occupied by K3, so the address is recalculated using the second hash, and the new address is 14. There is a conflict at index 3, so you need to set the high order to "1".

(12) 10 = (0000000000000000000000000000000000001100) 2 after high position "1"

(1000000000000000000000000000001100) 2 = (- 2147483636) 10

The final effect is shown in figure 8.8.

(4) delete elements K1 and K2

When Hashtable deletes a conflicting element (hash_coll is negative), it points the key of the element to the array buckets, and sets all the lower 31 bits of the element's hash_coll to "0" and retains the * * bit. Because the original hash_coll is negative, the * bit is "1".

(10000000000000000000000000000000) 2 = (- 2147483648) 10

It is not possible to determine whether an index is empty just by determining whether the value of hash_coll is-2147483648, because when there is a conflict at index 0, the value of its hash_coll is also-2147483648, which is why key points to buckets. Here, spaces with key pointing to buckets and a hash_ value of-2147483648 are called "conflicting spaces". As shown in figure 8.8, when K1 is deleted, the space at index 1 is a conflicting space.

When Hashtable deletes an element that does not have a conflict (hash_coll is positive), it sets both the key and the value to null,hash_coll to 0. This conflict-free vacancy is called "conflict-free vacancy". As shown in figure 8.9, there are conflict-free spaces at index 2 after K2 is deleted, and when a Hashtable is initialized, all positions in the buckets array are conflict-free spaces.

When the hash table looks for elements through keywords, it first calculates the hash address of the key, and then directly accesses the corresponding location of the array through this hash address and compares the two key values. If the same, the search succeeds and returns; if different, the next step is determined according to the value of hash_coll. When hash_coll is 0 or positive, it indicates that there is no conflict, and the search fails at this time; if hash_coll is negative, it indicates that there is a conflict. At this time, you need to continue to calculate the hash address through the second-degree hash, and repeat until the corresponding key value is found to indicate that the search is successful. If you encounter a positive number of hash_coll or the number of times of calculating the second-degree hash is equal to the length of the hash table, the lookup fails. It can be seen that the main purpose of setting the high bit of hash_coll to the conflict bit is to improve the search speed and avoid calculating the second hash meaninglessly.

Code implementation of Hashtable

The implementation of the hash table is more complex. In order to simplify the code, this example ignores part of the error judgment. Please do not set the key value to null when testing.

Using System

2 public class Hashtable

3 {

4 private struct bucket

5 {

6 public Object key; / / key

7 public Object val; / / value

8 public int hash_coll; / / hash code

10 private bucket [] buckets; / / an array of hash table data (data buckets)

11 private int count; / / number of elements

12 private int loadsize; / / the number of elements currently allowed to be stored

13 private float loadFactor; / / fill factor

14 / / default construction method

15 public Hashtable (): this (0,1.0f) {}

16 / / Construction method of specified capacity

17 public Hashtable (int capacity, float loadFactor)

18 {

19 if (! (loadFactor > = 0.1f & & loadFactor 0.72f? 0.72f: loadFactor

23 / / calculate the table length based on capacity

24 double rawsize = capacity / this.loadFactor

25 int hashsize = (rawsize > 11)? / / Primes with table length greater than 11

26 HashHelpers.GetPrime ((int) rawsize): 11

27 buckets = new bucket [hashsize]; / / initialize the container

28 loadsize = (int) (this.loadFactor * hashsize)

29}

30 public virtual void Add (Object key, Object value) / / add

31 {

32 Insert (key, value, true)

33}

34 / / Hash code initialization

35 private uint InitHash (Object key,int hashsize

36 out uint seed,out uint incr)

37 {

38 uint hashcode = (uint) GetHash (key) & 0x7FFFFFFF; / / take the absolute value

39 seed = (uint) hashcode; / / H2

40 incr = (uint) (1 + ((seed > > 5) + 1)% ((uint) hashsize-1); / / h3

41 return hashcode; / / return hash code

42}

43 public virtual Object this [Object key] / / indexer

44 {

45 get

46 {

47 uint seed; / / h2

48 uint incr; / / h3

49 uint hashcode = InitHash (key, buckets.Length

50 out seed, out incr)

51 int ntry = 0; / / used to represent the I value in h (key,i)

52 bucket b

53 int bn = (int) (seed% (uint) buckets.Length) / / h (key,0)

54 do

55 {

56 b = buckets [bn]

57 if (b.key = = null) / / b when there is no conflict vacancy

58 {/ / cannot find the corresponding key, return empty

59 return null

60}

61 if (b.hash_coll & 0x7FFFFFFF) = = hashcode) & &

62 KeyEquals (b.key, key))

63 {/ / found successfully

64 return b.val

65}

66 bn = (int) ((long) bn + incr)%

67 (uint) buckets.Length); / / h (key+i)

68} while (b.hash_coll

< 0 && ++ntry < buckets.Length); 69 return null; 70 } 71 set 72 { 73 Insert(key, value, false); 74 } 75 } 76 private void expand() //扩容 77 { //使新的容量为旧容量的近似两倍 78 int rawsize = HashHelpers.GetPrime(buckets.Length * 2); 79 rehash(rawsize); 80 } 81 private void rehash(int newsize) //按新容量扩容 82 { 83 bucket[] newBuckets = new bucket[newsize]; 84 for (int nb = 0; nb < buckets.Length; nb++) 85 { 86 bucket oldb = buckets[nb]; 87 if ((oldb.key != null) && (oldb.key != buckets)) 88 { 89 putEntry(newBuckets, oldb.key, oldb.val, 90 oldb.hash_coll & 0x7FFFFFFF); 91 } 92 } 93 buckets = newBuckets; 94 loadsize = (int)(loadFactor * newsize); 95 return; 96 } 97 //在新数组内添加旧数组的一个元素 98 private void putEntry(bucket[] newBuckets, Object key, 99 Object nvalue, int hashcode) 100 { 101 uint seed = (uint)hashcode; //h2 102 uint incr = (uint)(1 + (((seed >

> 5) + 1)%

103 ((uint) newBuckets.Length-1)); / / h3

104 int bn = (int) (seed% (uint) newBuckets.Length); / / Hash address

105 do

106 {/ / New elements can be added when the current position is a conflict vacancy or a non-conflict vacancy.

If ((newBuckets [bn] .key = = null) | |

108 (newBuckets [bn] .key = = buckets))

109 {/ / assignment

110 newBuckets [bn] .val = nvalue

111 newBuckets [bn] .key = key

112 newBuckets[ bn] .hash _ coll | = hashcode

113 return

114}

115 / / when other elements already exist in the current location

116 if (newBuckets [bn] .hash _ coll > = 0)

117 {/ / set the high bit of hash_coll to 1

118 newBuckets [bn] .hash _ coll | =

119 unchecked ((int) 0x80000000)

120}

Hash H2 (key) + h3 (key)

Bn = (int) (long) bn + incr)% (uint) newBuckets.Length)

123} while (true)

124}

125 protected virtual int GetHash (Object key)

126 {/ / get hash code

127 return key.GetHashCode ()

128}

129 protected virtual bool KeyEquals (Object item, Object key)

130 {/ / used to determine whether two key are equal

131return item = = null? False: item.Equals (key)

132}

133 / / used to add elements when add is true, and to modify element values when add is false

134private void Insert (Object key, Object nvalue, bool add)

135 {/ / expand if you exceed the upper limit of the number of elements allowed to be stored

136 if (count > = loadsize)

137 {

138expand ()

139}

140 uint seed; / / h2

141 uint incr; / / h3

142 uint hashcode = InitHash (key, buckets.Length,out seed, out incr)

143The int ntry = 0; / / is used to indicate the I value in h (key,i)

144 int emptySlotNumber =-1; / / used to record vacancies

145 int bn = (int) (seed% (uint) buckets.Length); / / Index mark

146 do

147 {/ / if there is a conflict vacancy, continue to look backward to determine if the same key exists

If (emptySlotNumber = =-1 & & (Buckets [bn] .key = = buckets) & &

149 (buckets [bn] .hash _ coll

< 0)) 150 { 151 emptySlotNumber = bn; 152 } 153 if (buckets[bn].key == null) //确定没有重复键才添加 154 { 155 if (emptySlotNumber != -1) //使用之前的空位 156 bn = emptySlotNumber; 157 buckets[bn].val = nvalue; 158 buckets[bn].key = key; 159 buckets[bn].hash_coll |= (int)hashcode; 160 count++; 161 return; 162 } 163 //找到重复键 164 if (((buckets[bn].hash_coll & 0x7FFFFFFF)==hashcode) && 165 KeyEquals(buckets[bn].key, key)) 166 { //如果处于添加元素状态，则由于出现重复键而报错 167 if (add) 168 { 169 throw new ArgumentException("添加了重复的键值！"); 170 } 171 buckets[bn].val = nvalue; //修改批定键的元素 172 return; 173 } 174 //存在冲突则置hash_coll的***位为1 175 if (emptySlotNumber == -1) 176 { 177 if (buckets[bn].hash_coll >

= 0)

178 {

179 buckets [bn] .hash _ coll | = unchecked ((int) 0x80000000)

180}

181}

182 bn = (int) ((long) bn + incr)% (uint) buckets.Length); / / second hash

While (+ + ntry < buckets.Length)

184 throw new InvalidOperationException ("add failed!")

185}

186 public virtual void Remove (Object key) / / remove an element

187 {

188 uint seed; / / h2

189 uint incr; / / h3

190 uint hashcode = InitHash (key, buckets.Length,out seed, out incr)

I in 191 int ntry = 0; / / h (key,i)

192 bucket b

193 int bn = (int) (seed% (uint) buckets.Length); / / Hash address

194 do

195 {

196b = buckets [bn]

197 if ((b.hash_coll & 0x7FFFFFFF) = = hashcode) & &

198 KeyEquals (b.key, key) / / if the corresponding key value is found

199 {/ / keep * * bits, the rest are clear 0

200 buckets [bn] .hash _ coll & = unchecked ((int) 0x80000000)

201 if (Buckets [bn] .hash _ coll! = 0) / / if there was a conflict

202 {/ / make key point to buckets

203 buckets [bn] .key = buckets

204}

205 else / / originally there is no conflict

206 {/ / leave key empty

207 buckets [bn] .key = null

208}

Val = null; / / release the corresponding "value".

210 count--

211 return

212} / / second-degree hash

213 bn = (int) (long) bn + incr)% (uint) buckets.Length)

While (b.hash_coll < 0 & & + + ntry < buckets.Length)

215}

216 public override string ToString ()

217 {

218 string s = string.Empty

219 for (int I = 0; I < buckets.Length; iTunes +)

220 {

221 if (buckets [I] .key! = null & & buckets.key! = buckets)

222 {/ / print index, key, value, hash_coll when not empty

223 s + = string.Format ("{0mlle talk 5} {1m m le 8} {2m m m 8} {3m m m 8}\ r\ n")

224 i.ToString (), buckets [I] .key.ToString ()

225 buckets.val.ToString ()

226 buckets [I] .hash _ coll.ToString ()

227}

228 else

229 if the space is empty, print the index and hash_coll

230s + = string.Format ("{0mam talk 21} {1m m 8}\ r\ n", i.ToString ()

231 buckets [I] .hash _ coll.ToString ()

232}

233}

234 return s

235}

236 public virtual int Count / / attribute

237 {/ / get the number of elements

238 get {return count;}

239}

The implementations of Hashtable and ArrayList are similar, for example, both are further abstracted from arrays, and both can automatically scale capacity exponentially.

These are all the contents of the article "what are the collection classes that implement hash table data structures in C#?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.