What is the principle of HashSet in .NET? 07/06 Update SLTechnology News&Howtos

What is the principle of HashSet in .NET?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the principle of HashSet in .NET". In daily operation, I believe that many people have doubts about the principle of HashSet in .NET. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the question of "what is the principle of HashSet in .NET?" Next, please follow the editor to study!

Hash table principle

HashSet is based on the principle of hash table. To learn HashSet, you must first understand the hash table.

Hash table (hash table, also known as hash table) is a data structure that directly accesses the storage location according to key. It maps the data of the query to a location in the table through a function of key value, which speeds up the search speed.

The above functions are hash functions, which should be calculated as simply as possible to improve the efficiency of insertion and retrieval; the calculated addresses should be evenly distributed as far as possible to reduce hash conflicts; and they should have greater compressibility to save memory. The common construction methods of hash function include direct address method, division remainder method, digital analysis method and so on. HashSet uses the division residue method to divide the HashCode of an element by the remainder of a constant (hash table Size) as the address, and the constant usually chooses a prime.

The hash value of two equal objects is the same, but the hash value of two different objects is likely to be the same, this is the hash conflict. The methods to deal with conflicts include open address method, linked list method, double hash method and so on. HashSet uses the linked list method to put conflicting elements in the linked list.

A hash table is a data structure for high-performance collection operations, which has the following characteristics:

Disorder and no repetition, and the time complexity of insertion and search is O (1).

Do not use index

Automatic capacity expansion when capacity is insufficient, but the cost of capacity expansion is high.

Can provide many high-performance collection operations, such as merge, clipping, etc.

HashSet implementation

HashSet has two built-in arrays, as follows. The index value calculated by the hash function is stored in _ buckets, and the value in _ buckets starts at 1, so you need-1 when using it. This value is the relative index of the _ entries array, and if there is no conflict, the value pointed to is the relative index of the element to be found. If a conflict occurs, the element can also be quickly located according to the conflict linked list. _ entries holds the Entry object, and the Entry type is shown below. HashCode is the hash value of the element, which is used in search, insert, delete, expansion and other operations. Value stores data. Next has different functions at different times. When Entry is in the list, it forms a conflicting linked list, whose Next points to the next element of the conflicting linked list, and the next value of the last element of the linked list is-1; if Entry has been deleted by the list, it forms a vacant linked list, its Next points to the next element of the vacant linked list, and the last element of the vacant linked list has a value of-2.

HashSet also has several key members: _ count, _ freeList, _ freeCount. _ count indicates the number of elements added, and note that it is not the actual number of elements stored, because it is not updated when the element is deleted. _ freeList is the empty chain header, its value points to the deleted _ entries index, and _ entries [_ freeList] .Next points to the relative position of the next empty space. _ freeCount represents the number of vacancies, and the actual number of elements stored in the list is _ count-_ freeCount.

Private int []? _ buckets; / / _ entries index array private Entry [] _ entries; / / entity array private int _ count; / / actual number of stored elements private int _ freeList; / / vacancy chain header index, initial value-1private int _ freeCount; / / number of vacancies private struct Entry {public int HashCode; public int Next; public T Value;} public int Count = > _ count-_ freeCount / / _ count does not record deleted elements, so the actual number of elements is _ count-_ freeCount

HashSet provides multiple constructor overloads, and if no arguments are passed, _ buckets and _ entries are not initialized. When an element is added, Initialize (0) is called. The Initialize method accepts an int parameter that represents the list capacity that needs to be initialized. The actual initialized list capacity is the minimum prime number greater than or equal to this value. The prime number is taken as the list length because the value is used as the divisor of the hash function constructed by the division residue method, which makes the distribution of the prime remainder more uniform and reduces the occurrence of conflicts.

Private int Initialize (int capacity) {int size = HashHelpers.GetPrime (capacity); / / get the minimum prime of > = capacity var buckets = new int [size]; var entries = new Entry [size]; / /. Return size;}

When looking for an element, first call the element's GetHashCode method to calculate the hash value, and then call the GetBucketRef method to perform the hash function operation to get the index. The return value of GetBucketRef-1 is the real index I, but if I is-1, the element is not found. If I > = 0, it means that there is an element in the list that has the same hash value as the element to be found, but an equal hash value does not necessarily mean that the element is equal. It is necessary to further judge the HashCode. If the HashCode is equal, then determine whether the element is equal. If it is satisfied, the element is found and the index I of _ entries is returned.

Private int FindItemIndex (T item) {/ /. Int hashCode = item! = null? Item.GetHashCode (): 0; if (typeof (T) .IsValueType) {int I = GetBucketRef (hashCode)-1; / / _ buckets element starts from 1 while (I > = 0) / / has the same hash value as item, not necessarily item {ref Entry entry = ref entries [I] If (entry.HashCode = = hashCode & & EqualityComparer.Default.Equals (entry.Value, item)) {return I; / / HashCode is equal and the elements are equal, then the element is found and the _ entries index} I = entry.Next; collisionCount++; /.}} / /. Return-1;} private ref int GetBucketRef (int hashCode) {int [] buckets = _ bucketsfunctions; return ref buckets [(uint) hashCode% (uint) buckets.Length]; / / construct a hash function using the division residue method

When inserting an element, it will first find out whether the element to be inserted exists. HashSet is not duplicated, so if the inserted element already exists, it will directly return false. If no element exists, it looks for the index where the element is stored. If there is a deleted vacancy, the element is placed on the space pointed to by _ freeList, and if there is no vacancy, the element is inserted in the order of _ entries. After finding the index, you can assign the element's HashCode and the element to the corresponding field of _ entries.If there is no conflict, the next value is-1; if there is a conflict, a linked list is formed and added to the chain header, and the Next points to the next location of the conflict.

Private bool AddIfNotPresent (T value, out int location) {bucket = ref GetBucketRef (hashCode); / / bucket is of type ref int. If there is no conflict, int should be 0 because the default value is 0 / /. Int index; if (_ freeCount > 0) / / there are deleted spaces. Place the elements on the deleted spaces {index = _ freeList; _ freeCount--; / / update the number of deleted spaces _ freeList = StartOfFreeList-entries [_ freeList] .Next; / update the vacancy index} else / / store the elements {int count = _ count in the order of _ entries If (count = = entries.Length) / / insufficient capacity, expand {Resize (); bucket = ref GetBucketRef (hashCode);} index = count; _ count = count + 1; entries = _ entries;} {/ / assign ref Entry entry = ref entries! [index]; entry.HashCode = hashCode Entry.Next = bucket-1; / / if there is no conflict, it is-1, otherwise it forms a linked list and points to the index of the next element in conflict entry.Value = value; bucket = index + 1; / / here the bucket is assigned, that is, the corresponding element of _ buckets is changed, that is, the value of _ buckets indexed by the hash value of the inserted element points to the index + 1 _ version++ of the entries where the inserted element is stored. Location = index;} / /. Return true;}

If the capacity of the list is insufficient during insertion, the Resize method is called to expand the capacity. The size after expansion is the minimum prime number which is greater than or equal to 2 times the original size. After obtaining the size to be expanded, reallocate the entries memory with the new size, and call the Array.Copy method to copy the original content to the new location. Because the length of the list changes, the hash value changes, so the contents of _ buckets (the _ entries index) need to be updated, as does the value of entry.Next.

Private void Resize () = > Resize (HashHelpers.ExpandPrime (_ count), forceNewHashCodes: false); public static int ExpandPrime (int oldSize) {int num = 2 * oldSize; if ((uint) num > 2146435069u & & 2146435069 > oldSize) {return 2146435069;} return GetPrime (num); / / return the minimum prime of twice the original size} private void Resize (int newSize, bool forceNewHashCodes) {var entries = new Entry [newSize]; Array.Copy (_ entries, entries, count) / /... _ buckets = new int [newSize]; for (int I = 0; I

< count; i++) { ref Entry entry = ref entries[i]; if (entry.Next >

=-1) / update _ buckets content {ref int bucket = ref GetBucketRef (entry.HashCode); / / get the hash value entry.Next = bucket-1 obtained by the hash function divisor by the new size; / / update Next bucket = I + 1 / / Update the element of _ buckets to point to the recalculated _ entry index + 1}} _ entries = entries;}

When deleting an element, first look for the existence of the element to be deleted. If the hash value conflicts, the previous index of the conflicting linked list is recorded. After finding the element, you need to update the pointer to the conflicting linked list. After deleting the element, the number of freeCount vacancies is updated, the index of the deleted element is assigned to _ freeList, and the record deletes the vacancy, adding it to the header of the vacancy chain, whose Next points to the relative position of the next vacancy. When you insert an element, the element is inserted at the vacancy index of the _ freeList record and the value of _ freeList is updated based on the Next of the vacancy.

Public bool Remove (T item) {int last =-1; int hashCode = item! = null? (_ comparer?.GetHashCode (item)?? Item.GetHashCode (): 0; ref int bucket = ref GetBucketRef (hashCode); int I = bucket-1; while (I > = 0) {ref Entry entry = ref entries [I]; if (entry.HashCode = = hashCode & & (_ comparer?.Equals (entry.Value, item)? EqualityComparer.Default.Equals (entry.Value, item)) {/ / find the element to be deleted if (last < 0) / / the element to be deleted is at the head of the linked list. Update the value of the _ buckets element to point to the next location {bucket = entry.Next + 1 in the linked list. } else / / non-linked header of the element to be deleted. Update the next value of the element on the linked list [last] .Next = entry.Next; entry.Next = StartOfFreeList-_ freeList / / empty linked list, record the relative position of the next vacant index, update _ freeList if (RuntimeHelpers.IsReferenceOrContainsReferences ()) entry.Value = defaultdeleted according to this value; _ freeList = I; / record delete element position. The next time you insert an element, it will insert the number of vacancies return true after deletion in this _ freeCount++; / / update } last = I; / / there is a conflict, record a position on the linked list I = entry.Next;} return false;} at this point, the study of "what is the principle of HashSet in .NET" is over, hoping to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.