Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement hash table by JavaScript

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to implement the hash table in JavaScript. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

I. the principle of hash table

Hash table is a very important data structure, almost all programming languages have direct or indirect application of this data structure, it is usually based on arrays, at that time, it has more advantages over arrays:

It can provide very fast insert-delete-find operations.

The speed of the hash table is even faster than the number, and you can basically find the desired elements in an instant.

Hash tables are much easier to code than numbers.

But hash tables also have some disadvantages compared to arrays:

The arrays in the hash table are unordered, so you cannot traverse the elements in a fixed way (for example, from small to large).

In general, the key in the hash table does not allow repetition, and the same key cannot be placed to hold different elements.

So, what exactly is a hash table?

Its structure is an array, but the magic lies in a transformation of subscript values, which we can call a hash function, through which we can get the HashCode.

Next, we can look at an example:

Use a data structure to store word information, such as 50000 words, each word has its own titled application after finding the word, and so on. How should we operate?

Maybe we can try to convert the letters to the appropriate subscript. But how do you convert a character to the subscript value of an array? Is there a way to convert words to the subscript values of an array? If the word is converted to the subscript of the array, if we want to find information about a word later, we can directly follow the subscript value to access the desired element. So how do you convert a string to a subscript value?

In fact, computers have many coding schemes that use numbers instead of word characters, that is, character coding. Of course, we can design our own coding system, such as an is 1, b is 2, c is 3, and so on. But with the coding system, how can a word be converted into a number?

Here, we have two options:

Option 1: addition of numbers

A simple way to convert a word is to sum the coding of each character of the word.

For example, if the word cats is converted into a number: 3 "1" 20 "19" 43, then 43 is stored in the array as the subscript of the cats word.

But one obvious problem with this scheme is that many words may end up with a subscript of 43.

We know that only one data can be stored in a subscript location in an array, and if it is stored in later data, it will inevitably result in data overwriting, so it is obviously unreasonable for a subscript to store so many words.

Scheme 2: continuous multiplication of powers

In fact, we usually use numbers greater than 10, we can use the power of the multiplication to express its uniqueness, for example: 6543 = 6 * 10 ³+ 5 * 10 ²+ 4 * 10 + 4

Our words can also be expressed in this scheme, such as cats = 3 * 27 ³+ 1 * 27 ²+ 20 * 27 * 17 = 60337

The number obtained in this way can basically guarantee its uniqueness and will not repeat with other words.

But there is a problem if a word is zzzzzzzzzz. Then the number obtained is more than 7000000000000. Arrays may not be able to represent such a large subscript value, even if you can create such a large array, in fact, there are a lot of invalid words, which are meaningless.

Summary of the two scenarios:

The first scheme (adding and summing the numbers) produces too few array subscripts, and the second scheme (multiplying the power of 27) produces too many array subscripts.

So now we need a compression method to compress the large range of integers obtained in the power continuous multiplication scheme system into the range of acceptable arrays. A simple way is to use the remainder operator, whose function is to get the remainder of a number divided by another.

Implementation of the remainder operation: (a number between 0-199)

Suppose you compress the number from 0 to 199 (large) to the number from 0 to 9 (small).

Result of subscript index = large% small

When a number is divisible by 10, the remainder must be between 0 and 9

For example, 16 = 6, 23 = 3.

There will still be repetition, but the number of repetitions is significantly smaller. For example, if you take five numbers between 0 and 199 and put them in an array of length 10, they will also repeat, but the probability of repetition is very small.

Second, the concept of hash table

Now that we understand the principle of hashing, we can look at several concepts:

Hashing: the process of converting a large number to an array-wide subscript is called hashing

Hash function: for example, convert a word into a big number, and the code implementation of the big number in hashing is placed in a function, which is the hash function.

Hash table: finally, the data is inserted into this array, and the encapsulation of the entire structure can be called a hash table.

III. The problem of Hash conflict

As mentioned earlier, the hashed subscript values may still be repeated, which will lead to conflicts. For example, if we choose five numbers from 0 to 199 and put them in cells of length 10, if we randomly select 33recovery45pr 27re86pr 92, then eventually their position will be 3-5-7-6-2, without conflict, but what if there is an 86 and a 66 among them? At this point, a conflict will occur. How to solve this problem?

There are two common solutions:

1. Chain address method

Chain address method is a common conflict resolution method (also known as zipper method).

As shown in the following figure:

Now that you have created an array with a memory of 10, you need to store some numbers inside the array, which may be repeated after being hashed. The method of linking numbers with the same subscript value through a linked list or array is called chain addressing. When we want to find a value, we can first find the corresponding linked list or array according to its subscript, and then look inside it.

From the picture, we can see that the solution to the conflict is that what is stored in each array unit is no longer a single data but a chain, and the common structure of this chain is array or chain. Which way should be adopted in the specific application? In fact, both methods are fine, and the efficiency is about the same. of course, in some implementations, the newly inserted data is placed at the front of the array or linked list, in which case the linked list is best used. Because inserting data in the first place in an array requires all other items to move back, while linked lists do not have such a problem.

2. Open address method

The main way of open address method is to find blank cells to add duplicate data. As shown in the following figure:

If there is a number 32, and now we want to insert it into the array, our solution is:

The newly inserted 32 should have been inserted into the position of 52, but that location already contains data

It can be found that the positions of 3, 5 and 9 do not have any content.

At this time, you can find the corresponding blank space to put the data.

But there are three different ways to explore this position:

1. Linear exploration

That is, a linear search blank unit.

Insert 32

The index=2 is hashed, but at the time of insertion, it is found that the location is already 52.

At this point, start from the location of the index+1 to find the right location to place 32.

The first empty position detected is the position where the value is inserted.

Query 32

First, index= 2 is obtained by hashing. Compare whether the location result of 2 is the same as the query value. If the same is the same, it will be returned directly.

If it is different, look for it linearly, from the index location + 1 to find the same as 32.

It is important to note that if the position of 32 has not been inserted before, you do not need to query the entire hash table to determine whether the value exists, but stop if the query reaches an empty position. Because it is impossible to skip the empty position to other positions before 32.

There is also a problem with linear detection: if the previously inserted data is inserted continuously, the newly inserted data requires a long detection distance.

2. Second exploration

The second exploration is optimized on the basis of linear exploration.

Linear detection can be thought of as a detection with a step size of 1, for example, starting from the subscript value x, and then from x to 1, x to 2, and 3 in turn.

The second probe optimizes the step size, for example, starting with the subscript value x, and then detecting it in sequence: xcircle 1 ², xtrees 2 ², and xtrees 3 ².

However, there are still some problems in the secondary detection: for example, if we insert 32-112-82-42-52 continuously, then the step size is the same when they add up in turn, that is, in this case, it will cause a kind of aggregation with different step sizes, or it will affect the efficiency. How to solve this problem? Let's take a look at the Hashfa.

3. Re-hash method

The way to hash the keyword is to hash the keyword once with another hash function, and use the result of this hashing as the step size. For the specified keyword, the step size is the same throughout the detection. Different keywords use different step sizes.

The second hashing requires the following characteristics:

Different from the first hash function

Cannot output to 0 (otherwise, there will be no step size, each interjection will stand still, and the algorithm will enter an endless loop)

Computer experts have designed a hash function that works well.

StepSize = constant-(key% constant)

Where constant is a prime number and is less than the capacity of the array, and key is the value obtained by the first hashing.

For example: stepSize = 5-(key%5), the requirement is met, and the result cannot be 0.

Fourth, the realization of hash function

The main advantage of the hash table is its speed. One way to improve the speed is to have as few multiplication and division as possible in the hash function. The designed hash function needs to have the following advantages:

(1) Fast calculation

(2) uniform distribution

Let's make it happen:

First of all, the main operation of the hash function we implemented is to convert the character creation into larger numbers and compress the large numbers into the array range. As follows:

Function hashFunc (str,size) {/ / define the hashCode variable var hashCode = 0; / / according to the Horner algorithm, calculate the value of hashCode / / first convert the string to the numeric code for (var I = 0 * * I)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report