Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the MAP structure?

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "what is the structure of MAP". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

MAP structure

There are two main ways to implement Map: hash table and search tree (search tree). For example, hashMap in Java is based on a hash table, while Map in C++ is based on a balanced search binary tree, the red-black tree. The following is a comparison of the time complexity of different implementations.

As you can see, for element lookups, the average and worst efficiency of the binary search tree is O (log n), and the average efficiency of the hash table implementation is O (1), but at worst it can reach O (n), but if the hash table is well designed, the worst-case scenario will not happen (so readers don't want to know how Go designed Map). In addition, the key returned by the binary search tree is ordered, while the hash table is out of order.

Hash table

Because of the hash table-based (also known as hash table) implementation of map in Go, this article does not explore the map implementation of the search tree. The following is the description of map on Go's official blog.

One of the most useful data structures in computer science is the hash table. Many hash table implementations exist with varying properties, but in general they offer fast lookups, adds, and deletes. Go provides a built-in map type that implements a hash table.

To learn a hash table, you must first understand two concepts: a hash function and a hash conflict.

Hash function

Hash functions (often called hash functions) are functions that can be used to map arbitrary size data to fixed size values, including MD5, SHA series, and so on.

A well-designed hash function should include the following features:

Uniformity: a good hash function should be mapped as evenly as possible within its output range, that is, each hash value in the output range should be generated with roughly the same probability.

High efficiency: the hash efficiency is high, even if the input parameters are very long, the hash value can be calculated quickly.

Deterministic: the hash process must be deterministic, which means that it must always generate the same hash value for a given input value.

Avalanche effect: small changes in input values can also make huge changes in output values.

Irreversible: the original data cannot be derived from the output value of the hash function.

Hash conflict

Again, a hash function is a function that maps data of any size to a fixed size value. Then, we can foresee that even if the hash function is well designed, almost every input value can be mapped to a different hash value. However, when the input data is large enough to exceed the maximum number that a combination of fixed size values can express, conflicts will be inevitable! (see drawer principle)

Drawer principle: there are ten apples on the table. Put these ten apples in nine drawers. No matter how you put them, we will find at least one drawer with no less than two apples. The drawer principle is sometimes called the pigeon nest principle.

Map source code parsing

"you are also the protagonist of today.

Data structure?

It seems to be very similar to java, array linked list. It is useless to say more, nothing is clearer than the source code.

/ / A header for a Go map.type hmap struct {count int / / represents the number of elements in the hash table, and this is the field value returned when len (map) is called. Flags uint8 / / status flag, four state bit meanings are explained in the following constants. The logarithmic log_2 of B uint8 / / buckets (bucket) (the maximum number of hash table elements can reach the load factor * 2 ^ B) noverflow uint16 / / the approximate number of overflow buckets. Hash0 uint32 / / hash seed. Buckets unsafe.Pointer / / A pointer to the buckets array, which is 2 ^ B in size and nil if the number of elements is 0. Oldbuckets unsafe.Pointer / / if expansion occurs, oldbuckets is the pointer to the old buckets array, and the size of the old buckets array is 1 buckets 2 of the new buckets. In the non-expansion state, it is nil. Nevacuate uintptr / / indicates the progress of capacity expansion. A buckets smaller than this address indicates that the relocation has been completed. The extra * mapextra / / field is designed to optimize GC scanning. Used when neither key nor value contains pointers and both can be inline. Extra is a pointer to the mapextra type. }

And then there's a bmap structure that stores kv pairs in each bucket. Buckets is a pointer to the first address of the actual stored bucket array. The structure of bucket is as follows:

Type bmap struct {tophash [bucketCnt] uint8 / / an 8-length uint8 composition}

In fact, the above data structure is not the structure of golang runtime, the compiler will dynamically create a new structure for it at compile time, as follows:

Type bmap struct {topbits [8] uint8 keys [8] keytype values [8] valuetype pad uintptr overflow uintptr}

Bmap is what we often call a "bucket" structure. A maximum of 8 key are stored in each bucket. These key fall into the same bucket because the hash results are "the same" after hashing. In the bucket, the high 8 bits of the hash value calculated by key will determine where the key falls into the bucket (there are up to 8 positions in a bucket, ps: a type of calculation).

If it corresponds to a picture.

Use

Use golang, directly use make.

Make (map [K] V) make (map [K] V, len) / / shit, it was a mistake of me to know that map could be assigned the initial size today.

The make function is actually located by the compiler to call runtime.makemap (), and the main job is to initialize the fields of the hmap structure, such as calculating the size of B, setting the hash seed hash0, and so on.

/ / if the compiler thinks that map and the first bucket can be created directly on the stack, h and bucket may both be non-empty / / if h! = nil, then map can create / / if h.buckets! = nil directly in h, then the bucket pointed to by h can use func makemap (t * maptype, hint int, h * hmap) * hmap {/ / math.MulUintptr returns the product of hint and t.bucket.size as the first bucket of map. And determine whether the product overflows. Mem, overflow: = math.MulUintptr (uintptr (hint), t.bucket.size) / / maxAlloc, which varies depending on the platform system For more information, please see src/runtime/malloc.go if overflow | | mem > maxAlloc {hint = 0} / / initialize Hmap if h = = nil {h = new (hmap)} / / get the hash seed h.hash0 = fastrand () / / based on the number of elements entered by fastrand. Find the B value that can hold these elements B: = uint8 (0) for overLoadFactor (hint, B) {Bread +} h.B = B / / assign the initial hash table / / if B is 0, then the buckets field will then lazily assign if h.B! = 0 {var nextOverflow * bmap / / makeBucketArray to create an array of map's underlying buckets. It allocates at least the size of H.B ^ 2. H.buckets, nextOverflow = makeBucketArray (t, h. B, nil) if nextOverflow! = nil {h.extra = new (mapextra) h.extra.nextOverflow = nextOverflow}} return h}

Based on the above code, we can determine that under normal circumstances, the storage space of normal buckets and overflow buckets in memory is contiguous, only referenced by different fields in hmap.

Note that the result returned by this function is: * hmap is a pointer, while the makeslice function we talked about earlier returns a Slice structure object. This is also the difference between the return values of makemap and makeslice: when map and slice are function parameters, the operation on map within the function parameters will affect map itself, while it will not on slice (as mentioned in the previous article on slice).

The main reason: one is the pointer (* hmap), the other is the structure (slice). The function parameters in Go language are all value passes, and inside the function, the parameters are copy to local. * after the hmap pointer copy is finished, it still points to the same map, so the operation on map inside the function will affect the argument. After slice is copy, it will become a new slice, and the operation on it will not affect the argument.

Hash function

The algorithm of the hash function corresponds to the type of key one by one. Depending on the type of key, the alg field of the key field of the maptype structure is set to the hash and equal functions of the corresponding type.

When initializing the go program running environment (schedinit in src/runtime/proc.go), you need to initialize the hash through the alginit method.

Allocation of hash key buckets

For hashmap, the most important thing is to locate the actual storage location according to the key. After hashing, key gets the hash value, which is 64 bit bits (for 64-bit computers). Determine which bucket the key falls on based on the last B bit bit of the hash value. If B = 5, then the number of barrels, that is, the length of the buckets array, is 2 ^ 5 = 32.

Suppose, there is now a key calculated by the hash function, and the hash result is:

| 10010111 | 00001111011011001000111100101010001001011001010101010 │ 00110 |

Locate key, and if it is not found in bucket and overflow is not empty, continue to search in overflow bucket until you find or find all the key slots, including all overflow bucket. (here you need to traverse all the bucket of the bucket linked list of a slot in the bucket array)

Expand capacity

Using the hash value of key, you can quickly locate the target key. However, as more and more key are added to the map, the probability of key collision is also increasing. The eight cell in bucket will gradually fill up, and the efficiency of finding, inserting, and deleting key will become less and less. Ideally, only one key is installed in a bucket, so that the efficiency of O (1) can be achieved, but the space consumption is too large and the cost of exchanging space for time is too high.

Go language uses a bucket to load 8 key, after locating to a certain bucket, but also need to locate to the specific key, which actually takes time for space.

Of course, to do so, there must be a degree, otherwise all the key will fall into the same bucket and directly degenerate into a linked list, and the efficiency of various operations will not be reduced to O (n) directly.

Therefore, there needs to be an indicator to measure the situation described above, which is the loading factor. The load factor is defined in the Go source code as follows:

LoadFactor: = count / (2 ^ B)

Count is the number of elements of map, and 2 ^ B represents the number of bucket.

Let's talk about the timing of triggering map expansion: when a new key is inserted into map, a conditional test will be performed. If the following two conditions are met, the expansion will be triggered:

The loading factor exceeds the threshold, and the threshold defined in the source code is 6.5.

There are two situations where there are too many bucket in overflow:

When B is greater than 15:00, that is, when the total number of bucket is greater than 2 ^ 15, if the number of bucket of overflow is greater than 2 ^ 15, capacity expansion is triggered.

When B is less than 15:00, capacity expansion will also be triggered if the number of bucket of overflow is greater than 2 ^ B.

How to reduce the influence of map expansion

Because map capacity expansion consumes a lot of performance (bucket creation and replication), we need to minimize capacity expansion and allocate data as expected during initialization.

How sync.Map reduces the granularity of locks

From the map design, we can see that it is not a concurrency secure data structure. When reading and writing map at the same time, the program is easy to make mistakes. Therefore, if you want to use map in concurrency, add a lock (sync.Mutex or sync.RwMutex). In fact, concurrency security map--sync.Map has been implemented for us in the Go standard library.

Sync.Map introduction

Go 1.9 officially provides sync.Map to optimize thread-safe concurrent read and write map. The implementation is also based on the built-in map keyword.

This implementation is similar to a thread-safe map [interface {}] interface {}. The optimization of this map is mainly applicable to the following scenarios:

(1) the key-value pair of a given key is written only once, but read many times, such as in a growing cache; (2) when the key values of multiple goroutine reads, writes, and overwrites do not intersect.

In both cases, using Map may significantly reduce lock contention than Go Map, which uses separate mutexes or RWMutex.

For the rest of the case, it is best to use RWMutex to ensure thread safety.

/ / encapsulated thread-safe maptype Map struct {/ / lock mu Mutex / / is actually a readOnly structure / / a read-only data structure, because it is read-only, so there are no read-write conflicts. / / readOnly contains a portion of map data for concurrent secure access. (redundancy, memory for performance) / / access to this part does not require a lock. The read atomic.Value / / readOnly / / dirty data contains the entries contained in the current map, and it contains the latest entries (including the undeleted data in the read, although it is redundant, but it is very fast to upgrade the dirty field to read, instead of copying it one by one, but directly using this data structure as part of the read field), and some data may not be moved to the read field. / / the operation on dirty needs to be locked because there may be read-write competition for its operation. / / when dirty is empty, such as initialization or promotion, the next write operation will copy the undeleted data in the read field to this data. Dirty map [interface {}] * entry / / when reading the entry from Map, if the entry is not included in the read, it will try to read from the dirty. At this time, the misses will be incremented by one, and / / when the misses accumulates to the length of the dirty, the dirty will be promoted to read to avoid missfrom the dirty too many times. Because the operation of dirty needs to be locked. Misses int} / / readOnly is an immutable struct stored atomically in the Map.read field.type readOnly struct {m map [interface {}] * entry / / if some data of Map.dirty is not in m, the value is true amended bool} / An entry is a slot in the map corresponding to a particular key.type entry struct {/ * interface {} p unsafe.Pointer}

As you can see from the source code, this lock holds two map, which takes up extra space, but not much (typical space for time).

Read data flow

Write data flow

Data deletion process

Summarize the optimization point

Unlocked reading and separation of reading and writing

Write locking and delayed lifting

Pointers and lazy deletions, the value saved by map are pointers. Lazy deletion, the actual deletion is to check and then delete when Store.

The principle of implementation is suitable for scenarios where map+Mutex uses Mutex mutexes to realize serialized access to map by multiple goroutine. Both Mutex locking and releasing locks are suitable for scenarios where the read-write ratio is close. Map+RWMutex uses RWMutex to achieve read-write lock separation and locking for map reading and writing, so as to improve the read concurrency performance compared with Mutex. Compared with Mutex, sync.Map is suitable for reading more and writing less scenarios. The underlying layer separates read and write map and atomic instructions to achieve approximately lock-free reading. And by delaying the update to ensure that there are more unlocked reads and fewer modifications, elements are added and deleted less frequently, in most cases, instead of the above two ways to achieve "what is the MAP structure", this is the end of the introduction, thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report