Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analyze the implementation principle of database

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "analyzing the principle of database implementation". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Hash connection, if there is enough memory, first traverse the inner table to create the Hash table, and then traverse the exterior to calculate the HashCode for the connection key. If it is consistent, traverse the linked list with the same HashCode in the Hash table, and the value is returned.

If there is not enough memory, you can traverse two tables, use the same Hash function to split the table into N Hash "partitions", traverse each Hash partition of the inner table and the corresponding Hash partition of the appearance, and return this value if you find data consistent with the connection key value.

See the code notes for details.

# include # include # include "hash_join.h" # define MAX_ELEMENTS 1024 hash codestatic int generate_hashcode / generate hash codestatic int generate_hashcode (int n) {return n% HASH_BUCKET;} / / generate hash bucket (write to file, simulate as file) static int generate_bucket (FILE * file,char * tag) {printf ("- generate_bucket -\ n"); / / array char buf [Max _ BYTES] FILE * fd = NULL; for (;! feof (file);) {int x = read_int (file,buf); if (x = 0) break; int hashcode = generate_hashcode (x); char filename [30]; sprintf (filename, "/ cygdrive/d/tmp/hash/%s_%d.csv", tag,hashcode); / / printf ("Hash code is% dmBucket filename is% s.\ n", hashcode,filename) Fd = fopen (filename, "a"); if (fd = = NULL) {printf ("Can not open file% s.\ n", filename); return 0;} / / write to the file write_int (fd,x); fclose (fd);} return 1 } / / load the hash table into memory, suitable for situations where there is enough memory / / use two-dimensional arrays to simulate the Hash table, D1: hash bucket, D2: data in the bucket static int load_hashtable (int ht [] [MAX_ELEMENTS]) {printf ("- load_hashtable -\ n"); for (int item0 int I < HASH_BUCKET) Sprintf +) {/ / Loop barrel number char filename [Max _ BYTES]; / / read the file sprintf (filename, "/ cygdrive/d/tmp/hash/inner_%d.csv", I); FILE * fd = fopen (filename, "r"); if (fd = = NULL) {/ / printf ("Can not open file:% s\ n", filename); continue;} int jung0; char buf [Max _ BYTES]; for ( ! feof (fd) & & j < MAX_ELEMENTS;) {/ / put the contents of the file in the array int x = read_int (fd,buf); ht [I] [jacks +] = x;} fclose (fd);} return 1 } / / create hash table in memory for hash connection static void hash_join_onmemory (FILE * outerfile,FILE * innerfile) {printf ("- hash_join_onmemory -\ n"); int ht [hash _ BUCKET] [MAX_ELEMENTS]; char buffer [Max _ BYTES]; int flag = 0; / / create hash bucket file flag = generate_bucket (innerfile, "inner") If (! flag) {printf ("Can not generate bucket file!\ n"); return;} / / loaded into the hash table (2D array simulation) flag = load_hashtable (ht); if (! flag) {printf ("Can not load hashtable!\ n"); return;} / / traverses the second file, executing JOIN for (;! feof (outerfile) ) {/ / read the second file, and execute join int outer = read_int (outerfile,buffer); / / calculate hashcode int hashcode = generate_hashcode (outer); for (int item0 Hash +) {/ / traverse the data in the hash bucket to find the corresponding data if (ht [hashcode] [I] = = outer) {printf ("Found one,hash bucket is% d _ line value is:% d.\ n", hashcode,outer) Use disk cache for hash connection static void hash_join_ondisk (FILE * outerfile,FILE * innerfile) {printf ("- hash_join_ondisk -\ n"); char buffer [Max _ BYTES]; int flag = 0; / / create hash "bucket" file flag = generate_bucket (innerfile, "inner") If (! flag) {printf ("Can not generate inner bucket file!\ n"); return;} flag = generate_bucket (outerfile, "outer"); if (! flag) {printf ("Can not generate outer bucket file!\ n"); return;} / / traverses files with the same hash value, and executes the connection for (int iExtensibili < HASH_BUCKET;i++) {/ / char innerfame from barrel 0 [Max _ BYTES] Char outerfame [Max _ BYTES]; / / read the file sprintf (innerfname, "/ cygdrive/d/tmp/hash/%s_%d.csv", "inner", I); sprintf (outerfname, "/ cygdrive/d/tmp/hash/%s_%d.csv", "outer", I); FILE * fd_inner = fopen (innerfname, "r"); if (fd_inner = NULL) {/ / printf ("Can not open file:% s\ n", filename) Continue;} FILE * fd_outer = fopen (outerfname, "r"); if (fd_outer = = NULL) {continue;} for (;! feof (fd_outer);) {int v_out = read_int (fd_outer,buffer); if (v_out = = 0) continue; for (;! feof (fd_inner)) ) {int v_in = read_int (fd_inner,buffer); if (v_in = = 0) continue; if (v_out = = v_in) {printf ("Found one,hash bucket is% d force value is:% d.\ n", iGrady out);}} rewind (fd_inner) } / / execute Hash connection void hash_join (char * file1,char * file2,char * flag) {printf ("- hash join -\ n"); FILE * outerfile = fopen (file1, "r"); if (outerfile = = NULL) {printf ("Can not open file% s.\ n", file1); return } / / Open the second file FILE * innerfile = fopen (file2, "r"); if (innerfile = = NULL) {printf ("Can not open file% s.\ n", file2); return;} / execute JOIN if (strcmp (flag, "memory") = = 0) hash_join_onmemory (outerfile,innerfile); else hash_join_ondisk (outerfile,innerfile); / / close fclose (outerfile); fclose (innerfile);}

Running output

$cat file1.csv1234512342939900220 $cat file2.csv1120340555023433901 $/ cygdrive/d/tmp/test.exe file1.csv file2.csv- use memory-- hash join-hash_join_onmemory-generate_ Bucket-load_hashtable-Found one Hash bucket is 1,value is: 1.Found one,hash bucket is 3,value is: 3.Found one,hash bucket is 1,value is: 1.Found one,hash bucket is 106,value is: 234.Found one,hash bucket is 20 Value is: 20-use disk-- hash join-hash_join_ondisk-generate_bucket- -- generate_bucket-Found one Hash bucket is 1 is: 1.Found one,hash bucket is 1, is: 1.Found one,hash bucket is 3, is: 3.Found one,hash bucket is 20, is: 20.Found one,hash bucket is 106, is: 234. This is the end of the content of analyzing the principle of database implementation. Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report