Common implementation methods of MySQL dynamic hash structure 04/18 Update SLTechnology News&Howtos

Common implementation methods of MySQL dynamic hash structure

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "the common implementation of MySQL dynamic hash structure". In daily operation, I believe that many people have doubts about the common implementation of MySQL dynamic hash structure. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "common implementation methods of MySQL dynamic hash structure". Next, please follow the editor to study!

MySQL dynamic hash structure

1. Common ways of implementation

Some time ago, I have been studying the hash structure in mysql. I probably figured out the hash structure of this no empty slot. I read several articles about the hash structure in mysql. I found that many articles were not clear enough to explain the key points of this dynamic hash. I would like to write about the experience of reading this code of hash in mysql these days.

The hash structure in mysql is different from the general hash structure that uses linked lists to resolve conflicts. The hash structure for conflict resolution in linked lists is used in memcached,jdk. The most common hash structure is shown below:

This hash structure is very simple to implement. Assign an array of 2 ^ n size in advance, then take the remainder of the key-value pair 2 ^ n, and then put the data into the corresponding array subscript according to the remainder. If this position element happens to be occupied by other elements, then a single linked list is used to solve the key-value conflict. If the number of key values in the hash structure exceeds a certain threshold. It is considered that there are too many data elements in this structure, and after a high probability of hash, we must find the key value by traversing the conflict linked list, then increase the size of the hash array by a multiple of 2, and then rehash each element.

There is also a hash structure that pre-allocates an array. If the element conflicts, it does not take the linked list to resolve the conflict, but takes the next free location as the real storage address of the element.

In any case, the advantages of the above two hash structures are easy to understand, easy to implement, and the disadvantage is a waste of space. It can be found that the hash array is pre-allocated, then the space of the elements has been determined, for example: if according to the above structure, if there are never elements in the 4 position, then the space occupied by this position will always be wasted.

two。 Dynamic hash structure with no free space

The characteristic of the hash structure in mysql is that there is no wasted free space, and the array is allocated dynamically. At any time, the space opened by this array is always the same as the number of elements in the current hash structure.

The focus of the implementation is to calculate the hash value of an element and then use a formula to calculate the mask to find the location of the element's real hash array. In the previous two hash structures, this formula is generally: hash mod 2 ^ n, but the formula for calculating the mask of this dynamic hash structure is:

Code:

182 static uint my_hash_mask (my_hash_value_type hashnr, size_t buffmax

183 size_t maxlength)

184 {

185 if ((hashnr & (buffmax-1)

< maxlength) return (hashnr & (buffmax-1)); 186 return (hashnr & ((buffmax >

> 1)-1))

187}

Where hashnr is the hash value represented by a string, buffmax is 2 ^ n, maxlength is the number of records in the current array (it is the length of the current array, allocated space), and you can see that maxlength is between buffmax/2 and buffmax through the code. The above code means that the mask is calculated according to the original way of taking the remainder, and the remainder is divided by hashnr by buffmax. There will be a situation here, that is, it is possible that the key value of the remainder will be greater than the current number equal to the current number of record. According to the original way, as long as the remainder is calculated from the buffmax, then the range from the corresponding hash array is [0MaxbuffMax1], which is all allocated memory. However, in the dynamic hash structure, arrays that exceed records are not allocated, that is, no memory is allocated from (records,buffmax-1]. The size of the array is records, not buffmax. Here, the method is to divide buffmax by 2, and then get the remainder to get the corresponding hash key value.

There is another mystery why we divide by 2, why not divide by 3, we can know that the difference between a number divided by an and this number divided by 2 prime an is 2max a, which is inevitable, for example, 39 picks 8 prime 7, then the difference between 39 pedigree 3, 7 and 3 is 4, that is 4 8 hash 2, so it shows that if the hash value is the remainder of buffmax, if it is greater than or equal to records, then it will be halved and then take the remainder. The difference between this remainder and the real remainder is buffmax/2.

You can see that when the remainder of the dynamic hash table is greater than or equal to records, a compromise is chosen, that is, the hash value is obtained by buffmax/2 to obtain a temporary hash mask.

In this dynamic hash table, records is added by 1 for each element inserted, and if records==buffmax is used, the buffmax is tripled again. We will find a problem with this rule. Suppose that last time we successfully inserted the element and the mask fell within [0RecordsMur1], then records will be added 1 because of the successful insertion of the new element. At this time, if the original mask calculated by buffmax is larger than records, it falls within [0RecordsMask], and now records has increased one bit, so there is a problem with the location where the previous record was stored. He should be in the current records position.

So the characteristic of this dynamic hash structure is that before inserting a new element, try to restore the location element that should belong to the current newly opened array, then the subscript calculation method of the element belonging to this new place:

386 halfbuff= info- > blength > > 1

three hundred and eighty seven

388 idx=first_index=info- > records-halfbuff

The meaning of this code is to divide blength (records is between [blength/2,blength]) by 2, and then the current records minus halfbuff, which can calculate the subscript of the element that should belong to the current position in the previous step. This process is the inverse process of finding the remainder mentioned earlier, for example: if the current records=5 blength=8, then if the hash value of an element is 13, then by finding the mask, the remainder is 5, but at this time records=5 So by that compromise, 130.4x1, then 1 is the location of the final mask. Well, after inserting data in the previous step, records=6, then the original insert data 13 in the previous step should be a mask of 5. Then the element in the position of 1 in the current records=6,6-5 mask 1 is the element in which the mask is 5 in the previous step. Due to the limitation of records, it is placed in the position of 1, so it is necessary to re-enlarge the position in which the mask is 5.

How to know whether the current hash value is placed in the position of 1 due to the limitation of records, or whether the location 1 that should belong to him is obtained by calculating the mask directly. The result can be obtained through the bit operator &

398 if (! (hash_nr & halfbuff))

This code means to determine whether the hash value belongs to a low bit (it should belong to a low bit or is restricted by records, which is bounded by records), or whether the hash value of 13 belongs to the high bit, not the position with the original mask 1, or whether the low bit is bounded by records), then the hash value of 13 belongs to the position where the mask is 1, and 9 belongs to the position where the mask is 1.

Through the above analysis, the dynamic hash structure assigns the location of an element each time a new element is inserted. The first step is to move the element that was placed in the lower position in the previous step and restore it to its original position. You only need to deal with the element in the records-halfbuff position, because it has been processed before, and any processing larger than this element cannot be moved, because the records is not large enough to move the data with these large masks. Draw a picture to explain the knowledge mentioned earlier.

As shown in the figure, the hash structure has stored five elements, now records=5,blength=8, the blue space (index= [0jin4]) represents the allocated space, each allocated space is occupied by data, there is no free space, the green space of index=5 is newly allocated, used to insert new data, red space, index= [6prime7] does not exist, it is drawn in order to show blength. So before inserting new data, because the newly allocated space may originally belong to the element whose hash mask is 5, you need to find this element before inserting the new element, put it in the place of 5, and insert the new element as an unused place. To know that the element that originally belongs to index=5 must be hash to the place of index=1 (because the remainder of blength=8 is 5, then the remainder of 4 must be 1), then see what the hash value of the element of index=1 is. If it is 1, then the element of index=1 does not have to move, otherwise the element of index=1 is adjusted to the position of 5.

In other words, this dynamic hash structure adjusts the original structure before inserting an element, and moves the elements that were originally inserted into other index back to its original index. This is the essence of the dynamic hash structure.

3. Movement of elements

According to the above analysis, before inserting new data, we need to adjust the position of the existing elements, and the goal is to restore the linked list of high-order elements while correcting the linked list of low-order elements, because the current linked list is a low-level linked list, which contains high-order elements. To restore the starting point to the high-order linked list of records elements, the starting point of the current linked list is the records-blength/2 element, if the hash mask of this element is equal to records. Then it means that this element belongs to index=records, then you need to move this element to this location, and the movement of this element will lead to confusion of the next pointer of other nodes (because the position of this element has moved), so the purpose of element movement is to restore the high-order element to its original position, and to restore the next pointer of both low-order and high-order elements.

Mobile element logic reference source code, need to distinguish between low and high,low indicates that there is an element that already belongs to the current hash mask, high indicates that this element does not belong to the current hash mask value, the real mask value is plus blength/2, in the case of the same hash mask, several important representation bits, I say the meaning of my understanding: (there may be deviation)

LowFind: indicates that an element belongs to the current hash mask

LowUsed: whether the low-order element still occupies the old free position (false represents the previous low-order element occupies the new free position, and true represents the old position)

HighFind: indicates that there is an element that does not belong to the current hash mask, equal to the current mask + blength/2

HighUsed: whether the high-order element occupies the old free position (false represents the previous high-order element occupies the new free position, and true represents the old position)

Important variables:

Empty: represents the idle index in the hash structure

Gpos: represents the index of the previous element that belongs to the low order (current mask)

Ptr_to_rec: represents the data of the previous element that belongs to the current mask

Gpos2: represents the index of the previous element that belongs to the current mask + blength/2

Ptr_to_rec2: represents the data of the previous element that belongs to the high order (current mask + blength/2)

For the movement of an element, it starts from the element of the current records-blength/2 and starts to adjust the position of the element with the same hash mask (see the previous explanation, because the element belonging to the current position must be records-blength/2 because the element belonging to the current position is recalculated according to 2/blength). Generally speaking, there are two cases, one is that the element of the current position is recalculated according to the new records, the hash mask still belongs to the original mask. It is considered that this is a low bit, and the other is that the element of the current position re-calculates that the hash mask belongs to the records position according to records, and that this is a high bit.

The adjustment of the element position and the change code of the next pointer:

385 data=dynamic_element (& info- > array,0,HASH_LINK*); / / calculate the position of the first element of the hash array

386 halfbuff= info- > blength > > 1; / / in order to get the index where the element may belong to the records location

three hundred and eighty seven

388 idx=first_index=info- > records-halfbuff; / / minus halfbuff to get the index that may belong to records position

/ / if you are not satisfied, you need to recover the data on the misplaced index.

389 if (idx! = info- > records) / * If some records * /

390 {

391 do

392 {

393 pos=data+idx; / / get the first index, low bit

394 hash_nr=rec_hashnr (info,pos- > data); / / recalculate the hash value

395 if (flag = = 0) / * First loop; Check if ok * /

396 if (my_hash_mask (hash_nr, info- > blength, info- > records)! = first_index)

397 break

398 if (! (hash_nr & halfbuff)) / / do and operate to determine whether it really belongs to the records position according to the hash value

399 {/ * Key will not move * /

/ / if the result of the operation is equal to 0, it means that the element of the hash value belongs to the current position. Enter case1.

400 if (! (flag & LOWFIND)) / / if the element belongs to a low level and does not appear, enter case1-a

401 {

402 if (flag & HIGHFIND)

/ / if the element belongs to the low position, but the element belonging to the high position has been found, then the current element must belong to the low position due to the position conflict, but is squeezed to another location.

403 {/ / enter case1-a-1

404 flag=LOWFIND | HIGHFIND

405 / * key shall be moved to the current empty position * /

406 gpos=empty; / / the current location pos becomes empty

407 ptr_to_rec=pos- > data

408 empty=pos; / * This place is now free * /

409}

410 else

411 {/ / enter case1-a-2

412 flag=LOWFIND | LOWUSED; / * key isn't changed * /

413 gpos=pos

414 ptr_to_rec=pos- > data

415}

416}

417 else

418 {/ / enter case1-b

419 if (! (flag & LOWUSED))

420 {

421 / * Change link of previous LOW-key * /

422 gpos- > data=ptr_to_rec

423 gpos- > next= (uint) (pos-data)

424 flag= (flag & HIGHFIND) | (LOWFIND | LOWUSED)

425}

426 gpos=pos

427 ptr_to_rec=pos- > data

428}

429}

430 else / / enter case2

431 {/ * key will be moved * /

432 if (! (flag & HIGHFIND)) / / enter case2-a

433 {

434 flag= (flag & LOWFIND) | HIGHFIND

Key shall be moved to the last (empty) position * /

436 gpos2 = empty; empty=pos

437 ptr_to_rec2=pos- > data

438}

439 else

440 {/ / enter case2-b

441 if (! (flag & HIGHUSED))

442 {

443 / * Change link of previous hash-key and save * /

444 gpos2- > data=ptr_to_rec2

445 gpos2- > next= (uint) (pos-data)

446 flag= (flag & LOWFIND) | (HIGHFIND | HIGHUSED)

447}

448 gpos2=pos

449 ptr_to_rec2=pos- > data

450}

451}

452}

/ / Recurse to the last element of the hash mask conflict linked list

453while ((idx=pos- > next)! = NO_RECORD)

four hundred and fifty four

455 if ((flag & (LOWFIND | LOWUSED)) = = LOWFIND)

456 {

/ / if there is no LowUsed, the last element of the current linked list is not in the original position, so set the next pointer to null

457 gpos- > data=ptr_to_rec

458 gpos- > next=NO_RECORD

459}

460 if ((flag & (HIGHFIND | HIGHUSED)) = = HIGHFIND)

461 {

462 gpos2- > data=ptr_to_rec2

463 gpos2- > next=NO_RECORD

464}

Note: note that the movement of the element is only the movement of the data,next pointer.

Case1: the hash mask of the current element is low. Theoretically, this part of the element should not be moved, but if the key value conflicts with the element, it should be moved to its original location, and the next pointer should be updated.

1Mura: the same hash mask has not appeared in the low bit yet.

1-a-1: it does not appear in the low order, but it appears in the high position, so the element that appears in the high position must restore the high element to the records position. At this time, you just need to restore the element to the free position (the position that the high element gives way) and mark the current position as empty.

1-a-2: means that the low bit does not appear and the high bit does not appear, so the current element maintains its current position, and the low element maintains the position of the original hash mask.

1murb: indicates that the previous low-order element of the current element occupies a new free position, when the next pointer of the new free position is still the same, and the next pointer of the previous low-order element is needed to point to the low-order element of the current position.

For example: if the current element is low, the previous element belongs to low, and the previous element has changed the position of the low element due to the high-order element giving way to the position (because the high-low-order mask is the same, the low-order is squeezed to the position of other masks), then the pointer of the next of the low-order element needs to be reconstructed.

Suppose that element b and element a have the same hash mask, both belong to the low position, the position of element a does not belong to the position of the current mask and needs to be adjusted to the green position, at the same time, the original position of a becomes the free position, and the position a needs to reset the next pointer of a to b

There is a detail in the code:

Flag= (flag & HIGHFIND) | (LOWFIND | LOWUSED)

This code means that LOWFIND | LOWUSED is changed in flag, and the status HIGHUSED is removed, indicating that the previous element is not a high-level element. This is because after setting the next pointer of the current element b and the previous element a (a-> next=b), if the next element c is high, then according to the original logic b-> next=c, then the next linked list will have an error, so set HIGHUSED to false at this time This causes the next pointer from the previous high-order element to the current element c to be reset when recursive to the c element.

Case2: indicates that the current element is a high-order element

2mura: if there has not been a high-order element before, then put the current element into an idle position, if it is not the first high-order element, it does not need to be moved, because the linked list of the latter element is complete, and the next pointer from the second element to the later high-order element is correct, but the next pointer from the first element to the second high-order element is incorrect, because the first high-order element is moved to the new empty position.

2Mub: this situation is similar to 1murb, where the next pointer of the first high-order element is set to the second element, and the subsequent next pointer is correct. Never mind, this situation will cause the latter element to adjust the next pointer of the previous low-order element to the next low-order element if it is a low-order element, so the expression flag= (flag & LOWFIND) | (HIGHFIND | HIGHUSED) masks the state LOWUSED.

4. Insertion of new elements

The number of records of the hash structure increased by 1, causing the hash structure to reposition elements whose masks are equal to records and records-blength/2 (high and low elements). This is a new element and can be inserted. Calculate the mask of the new element and find the corresponding position. if that position is the same as the position of the empty pointer, then the element is the first element of the mask and can be inserted directly. If it is not equal to the position of the empty pointer, then there is an element that belongs to the new element, which may be an element with the same hash mask or an element with a different mask. If the mask is the same, the current element is placed in the position of empty, while the next pointer of the element in the original position points to the position of empty. If the mask is different, place the current element in the empty location, and link the last link in the list equal to the element mask to the new element, which requires finding the last element of the mask.

Code:

/ / calculate the mask of this records

468 idx= my_hash_mask (rec_hashnr (info, record), info- > blength, info- > records + 1)

469 pos=data+idx

470 if (pos = = empty) / / if the location of the mask is free, insert it directly

471 {

472 pos- > data= (uchar*) record

473 pos- > next=NO_RECORD

474}

475 else

476 {

477 / * Check if more records in same hash-nr family * /

478 empty [0] = pos [0]

/ / calculate the mask of the pos location element

479 gpos= data + my_hash_rec_mask (info, pos, info- > blength, info- > records + 1)

480 if (pos = = gpos) / / if equal, insert the element directly into the free position and point the next pointer of the pos position to the new element

481 {

482 pos- > data= (uchar*) record

Pos- > next= (uint) (empty-data)

484}

485 else

/ / if the mask is different, find the last element whose mask is equal to gpos, and point the next pointer of the last element to the new element

486 {

487 pos- > data= (uchar*) record

488 pos- > next=NO_RECORD

Movelink (data, (uint) (pos-data), (uint) (gpos-data), (uint) (empty-data))

490}

491}

492 if (+ + info- > records = = info- > blength)

493 info- > blength+= info- > blength

494 return (0)

At this point, the study of "the common implementation of MySQL dynamic hash structure" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.