In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This section briefly explains the implementation of writing temporary tablespaces to PostgreSQL NOT IN at execution time.
The test data are as follows:
[local]: 5432 pg12@testdb=# select count (*) from tbl; count-1 (1 row) Time: 6.009 ms [local]: 5432 pg12@testdb=# select count (*) from tasking big empty; count-10000001 (1 row) [local]: 5432 pg12@testdb=# I, data structure
Tuplestorestate
Private status of Tuplestore related operations.
/ * Possible states of a Tuplestore object. These denote the states that * persist between calls of Tuplestore routines. * / typedef enum {TSS_INMEM, / * Tuples still fit in memory * / TSS_WRITEFILE, / * Writing to temp file * / TSS_READFILE / * Reading from temp file * /} TupStoreStatus;/* * Private state of a Tuplestore operation. * / struct Tuplestorestate {TupStoreStatus status; / * status enumeration value Enumerated value as shown above * / int eflags; / * capability flags (OR of pointers' flags) * / bool backward; / * store extra length words in file? * / bool interXact; / * keep open through transactions? * / bool truncated; / * tuplestore_trim has removed tuples? * / int64 availMem / * remaining memory available, in bytes * / int64 allowedMem; / * total memory allowed, in bytes * / int64 tuples; / * number of tuples added * / BufFile * myfile; / * underlying file, or NULL if none * / MemoryContext context; / * memory context for holding tuples * / ResourceOwner resowner / * resowner for holding temp files * / * * These function pointers decouple the routines that must know what kind * of tuple we are handling from the routines that don't need to know it. * They are set up by the tuplestore_begin_xxx routines. * (Although tuplestore.c currently only supports heap tuples, I've copied * this part of tuplesort.c so that extension to other kinds of objects * will be easy if it's ever needed.) * * Function to copy a supplied input tuple into palloc'd space. (NB: we * assume that a single pfree () is enough to release the tuple later, so * the representation must be "flat" in one palloc chunk. State- > availMem * must be decreased by the amount of space used. * / void * (* copytup) (Tuplestorestate * state, void * tup); / * Function to write a stored tuple onto tape. The representation of the * tuple on tape need not be the same as it is in memory; requirements on * the tape representation are given below. After writing the tuple, * pfree () it, and increase state- > availMem by the amount of memory space * thereby released. * / void (* writetup) (Tuplestorestate * state, void * tup); / * Function to read a stored tuple from tape back into memory. 'len' is * the already-read length of the stored tuple. Create and return a * palloc'd copy, and decrease state- > availMem by the amount of memory * space consumed. * / void * (* readtup) (Tuplestorestate * state, unsigned int len); / * This array holds pointers to tuples in memory if we are in state INMEM. * In states WRITEFILE and READFILE it's not used. * * When memtupdeleted > 0, the first memtupdeleted pointers are already * released due to a tuplestore_trim () operation, but we haven't expended * the effort to slide the remaining pointers down. These unused pointers * are set to NULL to catch any invalid accesses. Note that memtupcount * includes the deleted pointers. * / void * * memtuples; / * array of pointers to palloc'd tuples * / int memtupdeleted; / * the first N slots are currently unused * / int memtupcount; / * number of tuples currently present * / int memtupsize; / * allocated length of memtuples array * / bool growmemtuples / * memtuples' growth still underway? * / * * These variables are used to keep track of the current positions. * * In state WRITEFILE, the current file seek position is the write point; * in state READFILE, the write position is remembered in writepos_xxx. * (The write position is the same as EOF, but since BufFileSeek doesn't * currently implement SEEK_END, we have to remember it explicitly.) * / TSReadPointer * readptrs; / * array of read pointers * / int activeptr; / * index of the active read pointer * / int readptrcount; / * number of pointers currently valid * / int readptrsize / * allocated length of readptrs array * / int writepos_file; / * file# (valid if READFILE state) * / off_t writepos_offset; / * offset (valid if READFILE state) * /} # define COPYTUP (state,tup) ((* (state)-> copytup) (state,tup)) # define WRITETUP (state,tup) ((* (state)-> writetup) (state,tup)) # define READTUP (state,len) ((* (state)-> readtup) (state,len)) # define LACKMEM (state) ((state)-> availMem
< 0)#define USEMEM(state,amt) ((state)->AvailMem-= (amt) # define FREEMEM (state,amt) ((state)-> availMem + = (amt))
TSReadPointer
Tuplestore read pointer
/ * Possible states of a Tuplestore object. These denote the states that * persist between calls of Tuplestore routines. * / typedef enum {TSS_INMEM, / * Tuples still fit in memory * / TSS_WRITEFILE, / * Writing to temp file * / TSS_READFILE / * Reading from temp file * /} TupStoreStatus;/* * State for a single read pointer. If we are in state INMEM then all the * read pointers' "current" fields denote the read positions. In state * WRITEFILE, the file/offset fields denote the read positions. In state * READFILE, inactive read pointers have valid file/offset, but the active * read pointer implicitly has position equal to the temp file's seek position. * * Special case: if eof_reached is true, then the pointer's read position is * implicitly equal to the write position, and current/file/offset aren't * maintained. This way we need not update all the read pointers each time * we write. * / typedef struct {int eflags; / * capability flags * / bool eof_reached; / * read has reached EOF * / int current; / * next array index to read * / int file; / * temp file# * / off_t offset; / * byte offset in file * /} TSReadPointer
BufFile
This data structure represents a buffered file containing one or more physical files (each accessed through a virtual file descriptor managed by fd.c)
/ * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE. * The reason is that we'd like large BufFiles to be spread across multiple * tablespaces when available. * BufFiles will be split into several GB segments regardless of the size of the RELSEG_SIZE. * the reason is that we tend to distribute large BufFiles across multiple tablespaces when available. * / # define MAX_PHYSICAL_FILESIZE 0x40000000#define BUFFILE_SEG_SIZE (MAX_PHYSICAL_FILESIZE / BLCKSZ) / * * This data structure represents a buffered file that consists of one or * more physical files (each accessed through a virtual file descriptor * managed by fd.c). * this data structure represents a buffered file containing one or more physical files (each accessed through a virtual file descriptor managed by fd.c) * / struct BufFile {/ / the number of physical files in the collection int numFiles / * number of physical files in set * / / * all files except the last have length exactly MAX_PHYSICAL_FILESIZE * / /-except for the last file, the size of the other files is MAX_PHYSICAL_FILESIZE / / using the numFiles-assigned array File * files; / * palloc'd array with numFiles entries * / / across transactions? Bool isInterXact; / * keep open over transactions? * / / dirty data? Bool dirty; / * does buffer need to be written? * / / is it read-only? Bool readOnly; / * has the file been set to read only? * / / for sharing, the space size of the segment file SharedFileSet * fileset; / * space for segment files if shared * / for sharing, the name of the BufFile is const char * name; / * name of this BufFile if shared * / / * * resowner is the ResourceOwner to use for underlying temp files. (We * don't need to remember the memory context we're using explicitly, * because after creation we only repalloc our arrays larger.) * ResourceOwner * / ResourceOwner resowner; / * * "current pos" is position of start of buffer within the logical file for temporary files. * Position as seen by user of BufFile is (curFile, curOffset + pos). * "current pos" is the starting position of the buffer in the logical file. * the location seen by BufFile users is ((curFile, curOffset + pos)) * / the file index, part (0.n) of the current location, int curFile; / * file index (0.n) part of current pos * / the offset part of the current location, off_t curOffset. / * offset part of current pos * / the effective number of bytes in int pos; / * next read/write position in buffer * / buffer in the next buffer position int pos; / * total # of valid bytes in buffer * / PGAlignedBlock buffer;}; II. Source code interpretation
Tuplestore_puttupleslot
Put the received tuple into the tuplestore
/ * Accept one tuple and append it to the tuplestore. * put the received tuple into the tuplestore * * Note that the input tuple is always copied; the caller need not save it. * Note that the input tuple is usually copied and the caller does not need to store the tuple. * * If the active read pointer is currently "at EOF", it remains so (the read * pointer implicitly advances along with the write pointer); otherwise the * read pointer is unchanged. Non-active read pointers do not move, which * means they are certain to not be "at EOF" immediately after puttuple. * This curious-seeming behavior is for the convenience of nodeMaterial.c and * nodeCtescan.c, which would otherwise need to do extra pointer repositioning * steps. * if the active read pointer is currently in the EOF position, the status quo will remain (the read pointer is synchronized with the write pointer by default). * otherwise, the reading pointer is unchanged. The inactive read pointer does not move, which means that it is not in the EOF state immediately after puttuple. * this seemingly strange behavior is convenient for nodeMaterial.c and nodeCtescan.c processing, otherwise additional pointer relocation is required. * * tuplestore_puttupleslot () is a convenience routine to collect data from * a TupleTableSlot without an extra copy operation. The tuplestore_puttupleslot () routine does not require additional copy actions to collect data from TupleTableSlot. * / voidtuplestore_puttupleslot (Tuplestorestate * state, TupleTableSlot * slot) {MinimalTuple tuple; MemoryContext oldcxt = MemoryContextSwitchTo (state- > context); / * * Form a MinimalTuple in working memory * assemble MinimalTuple * / tuple = ExecCopySlotMinimalTuple (slot) in working memory; USEMEM (state, GetMemoryChunkSpace (tuple)); tuplestore_puttuple_common (state, (void *) tuple); MemoryContextSwitchTo (oldcxt);}
Tuplestore_puttuple_common
The realization of tuplestore_puttupleslot function
Static voidtuplestore_puttuple_common (Tuplestorestate * state, void * tuple) {TSReadPointer * readptr; int i; ResourceOwner oldowner; state- > tuples++; switch (state- > status) {case TSS_INMEM: / * * Update read pointers as needed; see API spec above. * update the read pointer as needed * / readptr = state- > readptrs; for (I = 0; I
< state->Readptrcount; readptr++, iTunes +) {if (readptr- > eof_reached & & I! = state- > activeptr) {/ / has reached the end and the pointer is not active, then set the corresponding state and location readptr- > eof_reached = false; readptr- > current = state- > memtupcount } / * Grow the array as needed. Note that we try to grow the array * when there is still one free slot remaining-if we fail, * there'll still be room to store the incoming tuple, and then * we'll switch to tape-based operation. * expand the array size as needed. * Note: try to increase the array when there is still a free slot left. If it fails, there is still room to store the entered tuple, * then switch to the tape-based operation. * / if (state- > memtupcount > = state- > memtupsize-1) {(void) grow_memtuples (state); Assert (state- > memtupcount
< state->Memtupsize);} / * Stash the tuple in the in-memory array * / / point to tuple state- > memtups [state-> memtupcount++] = tuple; / * * Done if we still fit in available memory and have array slots. * there is still available memory and an array of slots. All work has been done and can be returned. * / if (state- > memtupcount)
< state->Memtupsize & &! LACKMEM (state) return; / / otherwise, it needs to be closed / * * Nope; time to switch to tape-based operation. Make sure that * the temp file (s) are created in suitable temp tablespaces. * switch to tape-base operation. * make sure that temporary files are created in the appropriate temp tablespace. * / PrepareTempTablespaces (); / * associate the file with the store's resource owner * / / Associated file and storage resource host oldowner = CurrentResourceOwner; CurrentResourceOwner = state- > resowner; state- > myfile = BufFileCreateTemp (state- > interXact); CurrentResourceOwner = oldowner / * * Freeze the decision about whether trailing length words will be * used. We can't change this choice once data is on tape, even * though callers might drop the requirement. * the decision on whether to use the ending length word needs to be "frozen". * this choice cannot be changed once the data is down, even though the caller may waive the request. * / state- > backward = (state- > eflags & EXEC_FLAG_BACKWARD)! = 0; state- > status = TSS_WRITEFILE; dumptuples (state); break; case TSS_WRITEFILE: / * * Update read pointers as needed; see API spec above. Note: * BufFileTell is quite cheap, so not worth trying to avoid * multiple calls. * update the read pointer as needed. * Note: BufFileTell is very efficient, so it's not worth trying to avoid repeating multiple calls. * / readptr = state- > readptrs; for (I = 0; I
< state->Readptrcount; readptr++, iTunes +) {if (readptr- > eof_reached & & I! = state- > activeptr) {readptr- > eof_reached = false; BufFileTell (state- > myfile, & readptr- > file, & readptr- > offset) }} / / # define WRITETUP (state,tup) ((* (state)-> writetup) (state,tup)) WRITETUP (state, tuple); break; case TSS_READFILE: / * * Switch from reading to writing. * switch from read to write. * / if (! state- > readptrs [state-> activeptr] .eof _ reached) BufFileTell (state- > myfile, & state- > readptrs [state-> activeptr] .file, & state- > readptrs [state-> activeptr] .offset) If (BufFileSeek (state- > myfile, state- > writepos_file, state- > writepos_offset, SEEK_SET)! = 0) ereport (ERROR, (errcode_for_file_access (), errmsg ("could not seek in tuplestore temporary file:% m") State- > status = TSS_WRITEFILE; / * * Update read pointers as needed; see API spec above. * update the read pointer as needed. * / readptr = state- > readptrs; for (I = 0; I
< state->Readptrcount; readptr++, iTunes +) {if (readptr- > eof_reached & & I! = state- > activeptr) {readptr- > eof_reached = false; readptr- > file = state- > writepos_file; readptr- > offset = state- > writepos_offset } / / # define WRITETUP (state,tup) ((* (state)-> writetup) (state,tup)) WRITETUP (state, tuple); break; default: elog (ERROR, "invalid tuplestore state"); break;} voidBufFileTell (BufFile * file, int * fileno, off_t * offset) {* fileno = file- > curFile * offset = file- > curOffset + file- > pos;} III. Tracking analysis
Perform SQL:
[local]: 5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b)
Start gdb and enter the breakpoint
(gdb) b tuplestore_puttupleslotBreakpoint 1 at 0xab9134: file tuplestore.c, line 712. (gdb) cContinuing.Breakpoint 1, tuplestore_puttupleslot (state=0x1efec78, slot=0x1efd4e0) at tuplestore.c:712712 MemoryContext oldcxt = MemoryContextSwitchTo (state- > context); (gdb)
Input parameters
(gdb) n717 tuple = ExecCopySlotMinimalTuple (slot); (gdb) 718 USEMEM (state, GetMemoryChunkSpace (tuple)); (gdb) 720 tuplestore_puttuple_common (state, (void *) tuple) (gdb) p * state$1 = {status = TSS_INMEM, eflags = 2, backward = false, interXact = false, truncated = false, availMem = 4177840, allowedMem = 4194304, tuples = 0, myfile = 0x0, context = 0x1efce00, resowner = 0x1e5d308, copytup = 0xaba7bd, writetup = 0xaba811, readtup = 0xaba9d9, memtuples = 0x1f18ed0, memtupdeleted = 0, memtupcount = 0, memtupsize = 2048, growmemtuples = true, readptrs = 0x1f056a0, activeptr = 0, activeptr = 1, readptrcount = 8, readptrcount = 0, readptrsize = 0} (readptrsize) p * readptrsize = {readptrsize = readptrsize Tts_flags = 16, tts_nvalid = 0, tts_ops = 0xc3e780, tts_tupleDescriptor = 0x7f16f33f5378, tts_values = 0x1efd550, tts_isnull = 0x1efd558, tts_mcxt = 0x1efce00, tts_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, tts_tableOid = 49155} (gdb) p slot- > tts_values [0] $3 = 0 (gdb)
Enter tuplestore_puttuple_common
(gdb) steptuplestore_puttuple_common (state=0x1efec78, tuple=0x1f05ce8) at tuplestore.c:771771 state- > tuples++; (gdb)
Current status TSS_INMEM
(gdb) p state- > status$4 = TSS_INMEM (gdb)
Update the read pointer if necessary (no need to update)
(gdb) n773 switch (state- > status) (gdb) 780 readptr = state- > readptrs; (gdb) 781 for (I = 0; I
< state->Readptrcount; readptr++, iTunes +) (gdb) p * readptr$5 = {eflags = 2, eof_reached = true, current = 0, file = 2139062143, offset = 9187201950435737471} (gdb) n783 if (readptr- > eof_reached & & I! = state- > activeptr) (gdb) p state- > readptrcount$6 = 1 (gdb) p state- > activeptr$7 = 0 (gdb) n781 for (I = 0; I
< state->Readptrcount; readptr++, iTunes +) (gdb)
Extend the array if necessary (not actually needed)
(gdb) 796 if (state- > memtupcount > = state- > memtupsize-1) (gdb) p state- > memtupcount$8 = 0 (gdb) p state- > memtupsize-1 $9 = 2047 (gdb) n803 state- > memtups [state-> memtupcount++] = tuple; (gdb)
Put it into memory and return
(gdb) n808 if (state- > memtupcount)
< state->Memtupsize & &! LACKMEM (state) (gdb) 809 return; (gdb)
Exit function
(gdb) 892} (gdb) tuplestore_puttupleslot (state=0x1efec78, slot=0x1efd4e0) at tuplestore.c:722722 MemoryContextSwitchTo (oldcxt); (gdb) 723} (gdb) ExecMaterial (pstate=0x1efd1b8) at nodeMaterial.c:149149 ExecCopySlot (slot, outerslot); (gdb)
After using ignore N times, the state of state- > status changes to TSS_WRITEFILE
(gdb) ignore 4 4194303Will ignore next 4194303 crossings of breakpoint 4. (gdb) cContinuing.Breakpoint 3, tuplestore_puttuple_common (state=0x160ba38, tuple=0x7f2cd90cc0b0) at tuplestore.c:771771 state- > tuples++; (gdb)... tuplestore_puttupleslot (state=0x160ba38, slot=0x160a2a0) at tuplestore.c:722722 MemoryContextSwitchTo (oldcxt); (gdb) cContinuing.Breakpoint 3, tuplestore_puttuple_common (state=0x160ba38, tuple=0x7f2cd90cc0e8) at tuplestore.c:771771 state- > tuples++ (gdb) p * state$9 = {status = TSS_WRITEFILE, eflags = 2, backward = false, interXact = false, truncated = false, availMem = 3669944, allowedMem = 4194304, tuples = 4192545, myfile = 0x162ad80, context = 0x1609bc0, resowner = 0x1579170, copytup = 0xaba7bd, writetup = 0xaba811, readtup = 0xaba9d9, memtuples = 0x7f2cd914a050, memtupdeleted = 0, memtupcount = 0, memtupsize = 65535, growmemtuples = false, readptrs = 0x1627590, activeptr = 0, activeptr = 1, readptrcount = 8, readptrcount = 0 Writepos_offset = 0} (gdb) n773 switch (state- > status) (gdb) 841 readptr = state- > readptrs (gdb) 842 for (I = 0; I
< state->Readptrcount; readptr++, for +) (gdb) 844 if (readptr- > eof_reached & & I! = state- > activeptr) (gdb) 842 for (I = 0; I)
< state->Readptrcount; readptr++, iTunes +) (gdb) 853 WRITETUP (state, tuple); (gdb) 854 break (gdb) p * state- > myfile$10 = {numFiles = 1, files = 0x7f2cd934c008, isInterXact = false, dirty = true, readOnly = false, fileset = 0x0, name = 0x0, resowner = 0x1579170, curFile = 0, curOffset = 58687488, pos = 8156, nbytes = 8156 Buffer = {data = "\ 000\ t\ 030\ 000\ 335\ 366?\ 016\ 000\ 000\ 000\ 001\ 000\ t\ 030\ 000\ 000\ 000\ 001\ 000\ t\ 030\ 000\ 337\ 366?\ 016\ 000\ 000\ 001\ 000\ t\ 030\ 000\ 340\ 366?\ 000\ 000\ 016\ 000\ 000\ 001\ 000\ t < pad > <... 016\ 000\ 000\ 001\ 000\ t\ 030\ 000\ 346\ 366?\ 000\ 016\ 000\ 000\ 001\ 000\ t\ 030\ 000\ 347\ 366?\ 000\ 016\ 000\ 000\ 000\ 030\ 000\ 000\ 350?\ 000\ 016\ 000\ 000\ 001\ 000\ t\ 030\ 000\ 351\ 366?\ 000\ 016\ 000\ 000\ 001\ 000 000\ t\ 030\ 000\ 352\ 366?\ 000\ 016\ 000\ 000\ 001\ 000\ 000\ t\ 030\ 000 ". Force_align_d = 1.7780737478550286e-307, force_align_i64 = 18004352582551808}}.
DONE
IV. Reference materials
N/A
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.