Case Analysis of Common data structure and complexity of web 04/21 Update SLTechnology News&Howtos

Case Analysis of Common data structure and complexity of web

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "web common data structure and complexity case analysis". In daily operation, I believe that many people have doubts about web common data structure and complexity case analysis. Xiaobian consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "web common data structure and complexity case analysis". Next, please follow the editor to study!

How to choose a data structure

Array (T [])

When the number of elements is fixed and subscripts need to be used.

Linked list (LinkedList)

When an element needs to be able to be added at both ends of the list. Otherwise, use List.

Resizable array list (List)

When the number of elements is not fixed and subscript is required.

Stack (Stack)

When you need to implement LIFO (Last In First Out).

Queue (Queue)

When you need to implement FIFO (First In First Out).

Hash table (Dictionary)

When you need to use key-value pairs (Key-Value) to quickly add and find, and the elements do not have a specific order.

Tree-based dictionary (SortedDictionary)

When you need to use value pairs (Key-Value) to quickly add and find, and elements are sorted by Key.

Hash table based set (HashSet)

When you need to save a unique set of values and the elements do not have a specific order.

Tree based set (SortedSet)

When you need to save a unique set of values and the elements need to be sorted.

Array

In computer programming, Array is one of the simplest and most widely used data structures. In any programming language, arrays have something in common:

The contents of the array are stored using continuous memory (Contiguous Memory).

All elements in the array must be of the same type, or a derived type of the type. Therefore, the array is considered to be a homogeneous data structure (Homegeneous Data Structures).

The elements of the array can be accessed directly. For example, if you need to access the I element of the array, you can access it directly using arrayName [I].

General operations for arrays include:

Allocate space (Allocation)

Data access (Accessing)

In C #, you can declare array variables in the following ways.

1 int allocationSize = 10 2 bool [] booleanArray = new bool [allocationSize]; 3 FileInfo [] fileInfoArray = new FileInfo [allocationSize]

The above code allocates a contiguous block of memory in the CLR managed heap to hold the array elements of type allocationSize and type arrayType. If arrayType is a value type, allocationSize unsealed (unboxed) arrayType values will be created. If arrayType is a reference type, allocationSize references of type arrayType will be created.

If we assign values to some positions in the FileInfo [] array, the reference relationship is shown in the following figure.

But this flexibility comes at the expense of performance. In the description of Array above, we know that Array stores value types in an unboxed (unboxed) manner. Because the Add method of ArrayList accepts parameters of type object, a boxing operation occurs if a value of value type is added. This results in additional overhead when reading and writing ArrayList frequently, resulting in performance degradation.

List

When generics are introduced in .NET, the performance costs of the above ArrayList can be eliminated using generics. .net provides a new array type, List.

Generics allow developers to defer the selection of data types when creating data structures until they decide which type to choose when they use them. The main advantages of Generics include:

Type safety (Type Safety): types defined with generics can only be used with specified types or derived types of types.

Performance (Performance): generics remove runtime type checking, eliminating the overhead of boxing and unboxing.

Reusability: generics break the tight coupling between data structures and stored data types. This improves the reusability of the data structure.

List is equivalent to a homogeneous one-dimensional array (Homogeneous self-redimensioning array). It can read elements as quickly as Array, while maintaining the flexibility of variable length.

1 / / create int type list 2 List myFavoriteIntegers = new List (); 3 4 / create string type list 5 List friendsNames = new List ()

List is also implemented internally using Array, but it hides the complexity of these implementations. You don't need to specify the initial length when you create a List, and you don't have to worry about resize when adding elements to the List.

1 List powersOf2 = new List (); 2 3 powersOf2.Add (1); 4 powersOf2.Add (2); 5 6 powersOf2 [1] = 10 powersOf2 7 8 int sum = powersOf2 [1] + powersOf2 [2]

The progressive runtime (Asymptotic Running Time) complexity of List is the same as Array.

Queue

When we need to use first-in-first-out order (FIFO) data structures,. NET provides us with Queue. The Queue class provides Enqueue and Dequeue methods to access Queue.

Inside Queue, a circular array of T objects is created, and the head and tail variables are used to point to the head and tail of the array.

By default, the initialization capacity of Queue is 32, which can also be specified through the constructor.

The Enqueue method determines whether there is enough capacity in the Queue to hold the new element. If so, add the element directly and increment the index tail. The tail here uses a modular operation to ensure that the tail does not exceed the length of the array. If the capacity is not enough, Queue expands the array capacity according to a specific growth factor.

By default, the value of the growth factor (growth factor) is 2.0, so the length of the internal array doubles. You can also specify the growth factor in the constructor. The capacity of Queue can also be reduced by the TrimExcess method.

The Dequeue method returns the current element based on the head index, then points the head index to null, and increments the value of head.

Stack

When you need to use last-in-first-out order (LIFO) data structures,. NET provides us with Stack. The Stack class provides Push and Pop methods to access Stack.

The elements stored in Stack can be represented graphically by a vertical collection. When a new element is pushed into the stack (Push), the new element is placed on top of all other elements. When the Pop is needed, the element is removed from the top.

The default capacity of Stack is 10. Like Queue, the initial capacity of Stack can be specified in the constructor. The capacity of Stack can be automatically expanded according to the actual use, and the capacity can be reduced by TrimExcess method.

If the number of elements in the Stack Count is less than its capacity, the complexity of the Push operation is O (1). If the capacity needs to be expanded, the complexity of the Push operation becomes O (n). The complexity of the Pop operation is always O (1).

Hashtable

Now we want to use the employee's social security number as a unique identity for storage. The format of the social security number is DDD-DD-DDDD (D ranges from 0 to 9).

If you use Array to store employee information, and to query employees with social security numbers 1111122-3333, you will try to traverse all the selections of the array, that is, perform a query operation with a complexity of O (n). A better way is to sort the social security numbers so that the query complexity is reduced to O (log (n)). But ideally, we prefer the query complexity to be O (1).

One option is to create a large array ranging from 0000 to 9999.

The disadvantage of this scheme is that it wastes space. If we only need to store information for 1000 employees, only 0.0001% of the space is used.

The second scheme is to compress the sequence with a hash function (Hash Function).

We choose to use the last four digits of the social security number as the index to reduce the span of the interval. This will range from 0000 to 9999.

Mathematically, this conversion from 9 digits to 4 digits is called hash conversion (Hashing). You can compress the index space (indexers space) of an array to the corresponding hash table (Hash Table).

In the above example, the input of the hash function is a 9-digit social security number, and the output is the last 4 digits.

H (x) = last four digits of x

The figure also illustrates a common behavior in the calculation of hash functions: hash conflicts (Hash Collisions). In other words, it is possible that the last four digits of the two social security numbers are 0000.

Hash conflicts are a factor that causes operations to be corrupted when adding new elements to the Hashtable. If no conflict occurs, the element is inserted successfully. If a conflict occurs, it is necessary to determine the cause of the conflict. Therefore, hash conflicts increase the cost of operation, and Hashtable is designed to minimize the occurrence of conflicts.

One way to avoid hash conflicts is to choose the appropriate hash function. The probability of conflicts in the hash function is related to the distribution of the data. For example, if the last four digits of a social security number are distributed randomly, it is more appropriate to use the last four digits. However, if the last four places are allocated according to the employee's year of birth, it is obvious that the year of birth is not evenly distributed, then choosing the last four places will cause a lot of conflicts.

We call the method of selecting the appropriate hash function the collision avoidance mechanism (Collision Avoidance).

When dealing with conflicts, there are many policies that can be implemented, which are called conflict resolution mechanisms (Collision Resolution). One way is to put the element to be inserted into another block space, because the same hash position is already occupied.

For example, one of the simplest implementations is linear mining (Linear Probing), with the following steps:

When inserting a new element, use the hash function to locate the element in the hash table

Check to see if the element already exists in the hash table. If the location content is empty, insert and return, otherwise go to step 3.

If the location is I, check if iroom1 is empty, if it is occupied, check iroom2, and so on until you find a location where the content is empty.

Now if we want to insert the information of five employees into the hash table:

Alice (333-33-1234)

Bob (44-44-1234)

Cal (55555-1237)

Danny (000-00-1235)

Edward (111-00-1235)

The inserted hash table may be as follows:

The insertion process of the element:

The social security number of Alice is hashed to 1234, so it is stored in location 1234.

The social security number of Bob is 1234, but since the information of Alice has been stored at location 1234, check that the next location 1235 is empty, then the information of Bob is put to 1235.

The social security number of Cal is empty at 1237 by hash 1237, so Cal is placed at 1237.

If the social security number of Danny has been occupied by hash 1235, then check whether position 1236 is empty and 1236 is empty, so Danny is put to 1236.

Edward's social security number has been occupied by hash 1235, check 1236, is also occupied, and then check 1237, until 1238, the location is empty, so Edward is placed at 1238.

Linear mining (Linear Probing) is simple, but it is not the best strategy to resolve conflicts, because it will lead to the aggregation of similar hashes. This results in a conflict that persists when searching the hash table. For example, in the hash table in the above example, if we want to access the information of Edward, because the hash of Edward is 1235, but what we find at 1235 is Bob, so we search 1236 and find Danny, and so on until we find Edward.

One improved method is secondary mining (Quadratic Probing), that is, the step size of each check location space is a square multiple. That is, if position s is occupied, first check s + 12, then check s-12, then check s-12, s + 22, s-22, and so on, instead of using s + 1, as in linear mining, s + 1, and so on. The way it grows. Nevertheless, secondary mining will also lead to the problem of hash aggregation of the same kind.

The implementation of Hashtable in .NET requires that you add an element not only with an element (Item), but also with a Key for that element. For example, Key is the employee social security number and Item is the employee information object. You can use Key as an index to find Item.

1 Hashtable employees = new Hashtable (); 2 3 / / Add some values to the Hashtable, indexed by a string key 4 employees.Add ("11122-3333", "Scott"); 5 employees.Add ("2222-33-4444", "Sam"); 6 employees.Add ("333-44-55555", "Jisun") 7 8 / / Access a particular key 9 if (employees.ContainsKey ("11-22-3333")) 10 {11 string empName = (string) employees ["111-22-3333"]; 12 Console.WriteLine ("Employee 111-22-33 s name is:" + empName); 13} 14 else15 Console.WriteLine ("Employee 111-22-3333 is not in the hash table...")

The hash function in the Hashtable class is more complex than the implementation of the social security number described earlier. The hash function must return an ordinal (Ordinal Value). The example of a social security number can be achieved by intercepting the last four digits. But in fact, the Hashtable class can accept any type of value as Key, thanks to the GetHashCode method, a method defined in System.Object. The default implementation of GetHashCode returns a unique integer and is guaranteed to remain the same for the lifetime of the object.

The hash function in the Hashtable class is defined as follows:

H (key) = [GetHash (key) + 1 + ((GetHash (key) > 5) + 1)% (hashsize-1)]% hashsize

The GetHash (key) here defaults to calling the GetHashCode method of key to get the returned hash value. Hashsize refers to the length of the hash table. Because the module is to be calculated, the final result H (key) ranges from 0 to hashsize-1.

A hash conflict occurs when an element is added or obtained in a hash table. Earlier, we briefly introduced two conflict resolution strategies:

Linear mining (Linear Probing)

Secondary mining (Quadratic Probing)

In the Hashtable class, a completely different technique is used, called rehashing (which is also called double-precision double hashing in some materials).

The second-degree hash works as follows:

There is a collection that contains a set of hash functions H1...Hn. When you need to add or get elements from the hash table, first use the hash function H1. If it causes a conflict, try using H2, and so on, until Hn. All hash functions are very similar to H1, except that they choose a multiplication factor (multiplicative factor).

In general, the hash function Hk is defined as follows:

Hk (key) = [GetHash (key) + k * (1 + (GetHash (key) > 5) + 1)% (hashsize-1)]% hashsize

When using a second-degree hash, it is important that every location in the hash table is accessed only once after performing hashsize mining. That is, for a given key, both Hi and Hj are not used for the same location in the hash table. The quadratic hash formula is used in the Hashtable class, which always maintains (1 + ((GetHash (key) > 5) + 1)% (hashsize-1)) and hashsize are primes each other (two numbers are primes each other means that they have no common prime factor).

The second-degree hash provides a better strategy to avoid conflicts than the linear mining (Linear Probing) and secondary mining (Quadratic Probing) introduced earlier.

The Hashtable class contains a private member variable loadFactor,loadFactor that specifies the maximum ratio between the number of elements in the hash table and the number of slot. For example, if loadFactor equals 0. 5, only half of the space in the hash table contains element values, and the other half is empty.

The constructor of the hash table allows the user to specify loadFactor values, which range from 0.1 to 1.0. However, no matter what the value you provide, the range will not exceed 72%. Even if you pass a value of 1.0, the loadFactor value of the Hashtable class is still 0.72. Microsoft thinks the best value of loadFactor is 0.72, which balances speed and space. So although the default loadFactor is 1. 0, it is automatically changed to 0. 72 internally. Therefore, it is recommended that you use the default value of 1.0 (but actually 0.72).

When you add a new element to the Hashtable, you need to check to ensure that the ratio of the element to the space size does not exceed the maximum ratio. If exceeded, the hash tablespace will be expanded. The steps are as follows:

The location space of the hash table is almost doubled. To be exact, the position space value increases from the current prime value to the next largest prime value.

Because all the element values in the hash table will depend on the location space value of the hash table when the hash is second, all the values in the table also need to be re-hash.

From this, we can see that the expansion of the hash table will be at the cost of performance loss. Therefore, we should estimate in advance the number of elements most likely to be contained in the hash table and construct it with appropriate values when initializing the hash table to avoid unnecessary expansion.

Dictionary

The Hashtable class is a type loosely coupled data structure, and developers can specify any type as Key or Item. When .NET introduced generic support, type-safe Dictionary classes appeared. Dictionary uses strong typing to restrict Key and Item, and when creating an instance of Dictionary, you must specify the type of Key and Item.

Dictionary variableName = new Dictionary ()

If we continue to use the social security number and employee example described above, we can create an instance of Dictionary:

Dictionary employeeData = new Dictionary ()

So we can add and delete employee information.

1 / / Add some employees2 employeeData.Add (455110189) = new Employee ("Scott Mitchell"); 3 employeeData.Add (455110191) = new Employee ("Jisun Lee"); 45 / / See if employee with SSN 123-45-6789 works here6 if (employeeData.ContainsKey (123456789))

There is more than one difference between Dictionary and Hashtable. In addition to supporting strong typing, Dictionary also uses a different conflict resolution strategy (Collision Resolution Strategy), a new technology called chain technology (chaining).

The previously used mining technique (probing) attempts the next location in the list if a conflict occurs. If a binary hash (rehashing) is used, all hashes will be recalculated. The new chaining technology (chaining) will use additional data structures to handle conflicts. Each slot in Dictionary is mapped to an array. When a conflict occurs, the conflicting elements are added to the bucket list.

The following diagram describes that each bucket in Dictionary contains a linked list to store elements of the same hash.

In the image above, the Dictionary contains eight buckets, the location of the top-down yellow background. A certain number of Employee objects have been added to the Dictionary. If a new Employee is to be added to Dictionary, it will be added to the bucket corresponding to the hash of its Key. If an Employee already exists in the same location, the new element will be added to the front of the list.

Adding elements to Dictionary involves hashing and linked list operations, but it is still constant with a complexity of O (1).

When querying and deleting a Dictionary, the average time depends on the number of elements in the Dictionary and the number of bucket. Specifically, the run time is O (n _ box), where n is the total number of elements and m is the number of buckets. But Dictionary is almost always implemented as n = m, that is, the total number of elements never exceeds the total number of buckets. So O (nago) also becomes a constant O (1).

At this point, the study of "case analysis of common data structures and complexity of web" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.