In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you about how to understand the InnoDB engine. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
I. Summary
Innodb physical files include system tablespace file ibdata, user tablespace file ibd, log file ib_logfile, temporary tablespace file ibtmp,undo independent tablespace and so on.
System tablespace is the most important file in innodb, which records key information including metadata information, transaction system information, ibuf information, double write and so on.
Log files are mainly used to record redo log. Innodb writes redo logs before all data changes. To ensure atomic writes of redo logs, logs are usually written in 512-byte block units. However, due to modern file system upgrades, block_size is usually set to 4k, so innodb also provides an option to enable redo logs to be written in 4k units.
The temporary tablespace file is used to store all uncompressed temporary tables, and the rollback segments dedicated to the 1st to 32th temporary tables are also stored in this file. Because of the properties of the temporary table itself, the file is recreated when it is restarted.
Undo independent tablespaces are optional for innodb and are configured by innodb_undo_tablespaces. By default, this value is 0, which means that undo data is stored in ibdata. If innodb_undo_tablespaces is set to non-0, the undo rollback segment can be allocated to different files. Currently, undo tablespace can only be enabled in the install phase.
In addition to the log files, all the above files have a more unified physical structure. All physical files are made up of pages (page or block), and without being compressed, the size of a page is UNIV_PAGE_SIZE (16384Jing 16K). Pages for different purposes have the same format of header (38) and footer (8), in which the page check value, page number, tablespace number, LSN and other general information are recorded, as detailed in the following table. All page are organized in a certain way. Let's understand the file structure of innodb in detail from physical structure, logical structure, and file management process.
II. File physical structure 2.1 basic physical structure
Each data file in innodb belongs to a table space (tablespace), and different table spaces are marked with a uniquely identified space id. It is worth noting that although the system tablespace ibdata includes different files ibdata1, ibdata2... But these files are logically connected, and they all belong to the tablespace with a space_id of 0
Within the tablespace, all pages are divided and managed according to extent for physical units. All pages within the extent are physically adjacent. For different page size, the corresponding extent size is also different, corresponding to:
Typically, an extent consists of 64 physically contiguous pages, and a tablespace can be understood to be made up of physically adjacent extent. To organize these extent, each extent has a 40-byte XDES entry. With XDES entry, we can easily know whether each page of the extent is free or not, as well as its current status. The format is as follows:
All XDES entry are placed uniformly in the extent description page, and an extent description page stores up to 256 XDES entry to manage its subsequent physical neighbors of 256 extent (256 page 64 = 16384 page), as shown in the following figure:
As can be seen from the figure, each XDES entry has a strict corresponding page, and the upper and lower boundaries of the corresponding page can be described as follows:
Min_scope = extent description page page_no + xdes number * 64max_scope = (extent description page page_no + xdes number * 64) + 63
It is worth noting that the extent description page of page 0 also records the information related to the table space (FSP HEADER), which is of type FIL_PAGE_TYPE_FSP_HDR. The other extent description pages are of the same type, FIL_PAGE_TYPE_XDES.
2.2 system data Page
The system table space (ibdata) not only stores the data of SYS_TABLE / SYS_INDEX and other system tables, but also stores rollback information (undo), insert buffer index page (IBUF bitmap), system transaction information (trx_sys), second write buffer (double write) and other information.
The core data in innodb is stored in the system data page in ibdata. The system data page mainly includes: FIL_PAGE_TYPE_FSP_HDR, FIL_PAGE_IBUF_BITMAP, FIL_PAGE_TYPE_SYS, IBUF_ROOT_PAGE, FIL_PAGE_TYPE_TRX_SYS, FIL_PAGE_TYPE_SYS, DICT_HDR_PAGE and so on.
FIL_PAGE_TYPE_FSP_HDR/FIL_PAGE_TYPE_XDES
Extent description page (page 0Universe 16384Accord 32768 / … ), as mentioned above, so it will not be carried out again.
FIL_PAGE_IBUF_BITMAP
The second page type of ibdata is FIL_PAGE_IBUF_BITMAP, which is mainly used to track the change buffer information of each subsequent page. Because of the limited space in bitmap page, an ibuf bitmap page is also created after every 256extent Page.
FIL_PAGE_INODE
The third page of ibdata, of type FIL_PAGE_INODE, is used to manage the segment in the data file, and each inode page can store FSP_SEG_INODES_PER_PAGE (default is 85) records. Segment is the logical unit of table space management, each index occupies 2 segment, which is used to manage leaf nodes and non-leaf nodes respectively. A detailed introduction to segment will be carried out in section 3.
FSP_IBUF_HEADER_PAGE_NO and FSP_IBUF_TREE_ROOT_PAGE_NO
The above two pages are the fourth page and fifth page of Ibdata, respectively. Change buffer is also essentially a btree structure, with its root page fixed at the fifth page FSP_IBUF_TREE_ROOT_PAGE_NO. Because the fields in the FSP_IBUF_TREE_ROOT_PAGE_NO that were originally used to record the leaf inode entry are used to maintain the free page linked list, ibdata needs to use page 4 FSP_IBUF_TREE_ROOT_PAGE_NO for spatial management of the ibuf.
FSP_TRX_SYS_PAGE_NO
The sixth page of ibdata is of type FSP_TRX_SYS_PAGE_NO, which records the important transaction system information of innodb, including the maximum transaction ID of persistence, the address of 128rseg (rollback segment), double write location, and so on. Of these 128 rseg, rseg0 is fixed in ibdata, rseg1-rseg32 is used to manage temporary tables, and rseg33-rseg128 is still placed in ibdata when undo independent tablespace is not opened (innodb undo tablespace = 0), otherwise it is placed in undo independent tablespace. There are 1024 slot recorded in each rseg, and each slot can also correspond to a transaction, which is used to manage the undo record for that transaction. Since each slot also needs to request and release page, each slot also corresponds to a segment (spatial management logical unit).
FSP_DICT_HDR_PAGE_NO
The eighth page of ibdata is of type FSP_DICT_HDR_PAGE_NO and is used to store information in the data dictionary. This page stores the root page of SYS_TABLES,SYS_TABLE_IDS,SYS_COLUMNS,SYS_INDEXES and SYS_FIELDS, as well as the current largest TABLE_ID/ROW_ID/INDEX_ID/SPACE_ID. When operating on the user table, we need to obtain the corresponding table space of the user table and the page_no of its index root page from the data dictionary table before we can locate the location of the specific data and add, delete, modify and query it. (only when you get the data dictionary, can you further find the corresponding tablespace and the page no where the clustered index of the table is located based on the table information stored in it.)
Double write buffer
Innodb uses double write buffer to prevent partial writing problems on data pages, always writing double write buffer before writing a data page, and then writing data files. When the crash recovers, if the page in the data file is corrupted, an attempt is made to recover from the dblwr. Double write buffer has a total of 128 page, divided into two block. Since the dblwr has been initialized when installing the instance, the two block have a fixed position in the Ibdata. Page64 ~ 127belongs to the first block,page 128191 and belongs to the second block.
When innodb_file_per_table is in off state, all user tables are stored in ibdata like system tables such as SYS_TABLE / SYS_INDEX. When innodb_file_per_table is enabled, innodb creates a separate ibd file for each user table. The ibd file stores the index data of the corresponding user table and the insert buffer bitmap. The rollback data (undo) of the table is still recorded in ibdata.
III. Document logical structure 3.1 basic logical structure
In order to organize each extent, innodb maintains linked lists of three extent in the first page in the table space: FSP_FREE, FSP_FREE_FRAG, and FSP_FULL_FRAG. Concatenate the completely unused, partially used, and completely used Xdes entry respectively. As shown in the following figure:
Segment (segment or inode) is the logical unit used to manage physical files. You can apply to the table space for allocation and release of page or extent. It is the basic element that makes up the index and rollback segment. To save space, each segment first allocates 32 pages (FSEG_FRAG_ARR) from the tablespace FREE_FRAG when these 32 pages are insufficient. Expand according to the following principle: if the current extent is less than 1 extent, then expand to 1 extent full; when the table space is less than 32MB, expand one extent; at a time greater than 32MB, expand 4 extent at a time.
When allocating free extent for segment, if there is no free extent on the tablespace FSP_FREE, some free extent will be reinitialized for FSP_FREE. The allocation of extent is similar to the implementation of a loan and repayment mechanism. Segment leases an extent from a tablespace, and the extent can reappear in FSP_FREE/FSP_FULL_FRAG/FSP_FULL only if the space is returned by segment.
Extent from within segment to manage these allocations. There are also three extent linked lists: FSEG_FREE, FSEG_NOT_FULL, and FSEG_FULL, which also correspond to Xdes entry that is completely unused, partially used, and fully used, respectively. The structure of segment is shown in the following figure
Inode entry is the structure used to manage segment, with one inode entry corresponding to one segment. 32 pages of segment (FSEG_FRAG_ARR), FSEG_FREE, FSEG_NOT_FULL, FSEG_FULL and other information are recorded in inode entry. The specific structure of inode entry is shown in the following table:
The inode page where the inode entry is located is likely to be full, so two inode Page linked lists FSP_SEG_INODES_FULL and FSP_SEG_INODES_FREE are maintained through the header page (FIL_PAGE_TYPE_FSP_HDR). The former corresponds to an inode page linked list with no free inode entry, while the latter corresponds to at least one inode page linked list with free inode entry, as shown in the following figure:
3.2 Index
The structure that really builds the user data in the ibd file is btree. Each index in the table corresponds to a btree. The primary key (cluster index) records all column data of the row (plus transaction id column and rollback ptr) on the leaf node of the btree. When there is no primary key in the table, innodb assigns a unique rowID to each row of the table and constructs a btree based on it. If there is a secondary index (secondary index) in the table, its btree leaf node stores the key value plus the cluster index index key value.
Each btree uses two Segment to manage the data page, one managing the leaf node (leaf segment) and one managing the non-leaf node (non-leaf segment). The inode entry addresses of these two segment are recorded in the root page of btree. Root page is assigned on the first fragmented page of non-leaf segment (FSEG_FRAG_ARR).
When we add, delete, modify and query a table, we first need to change the metadata information of the table from the FSP_DICT_HDR_PAGE_NO on page 8 of ibdata, obtain the corresponding root page no of each index of the table from the SYS_INDEXES table, and then operate the user data btree of the table through root page. The logical structure of the tablespace is shown in the following figure:
3.3 Index page data
The most basic page type of the index is FIL_PAGE_INDEX, which is structured as shown in the following table. Index Header records the page-related information such as the btree level where page is located and the number of index ID,page directory slots to which it belongs. The leaf-segment of the index and the inode entry,system records of the non-leaf segment recorded in the Fseg Header include infimum and supremum, which represent the smallest and largest virtual records of the page, respectively. Page directory is the index of on-page records. Btree can only retrieve the search within the page,page where the record is located. The search needs to use a binary search built through page directory.
Innodb stores data by row. Line formats such as antelope (compact and redundant) and barracuda (dynamic and compressed) are currently supported by MySQL. The main difference between barracuda and antelope lies in the way it deals with off-line data. Barracuda only stores address pointers such as off-line data, unlike antelope, which stores 768 bytes of row prefix content. The record header information stores the deletion mark, the total number of columns, downlink relative offset and other information, and the system column includes rowID, transactionID, rollback pointer and so on.
IV. Document management process
The following is a brief introduction to the management process of innodb files with the simplified source code.
4.1 the process of creating btree
The process of creating a btree can be summarized as follows: first create a non_leaf segment, use the home page of the non_leaf segment (that is, the first of the 32 fragmented pages) as a root page;, then create a leaf_segment; and finally initialize the root page as necessary. For detailed procedures, please refer to the following code:
The segment headers of btr_create (ulint type, ulint space, const page_size_t& page_size, index_id_t index_id, dict_index_t* index, const btr_create_t* btr_redo_create_info, mtr_t* mtr) {/ * index tree is stored in the newly allocated root page, and the segment headers of ibuf tree is placed in a separate ibuf header page. The following code blocks the creation logic of ibuf tree, focusing on the creation process of index tree * / / * local variable * /. / * create a non_leaf segment segment, and store the address of the segment to the location where the first page of the segment is offset to PAGE_HEADER + PAGE_BTR_SEG_TOP, and record the block corresponding to the first page page of the non_leaf segment segment with block. The block will be used as the root page * / block = fseg_create of the btree (space, 0, PAGE_HEADER + PAGE_BTR_SEG_TOP, mtr) If (block = = NULL) {return (FIL_NULL);} / * record the information of root page * / page_no = block- > page.id.page_no (); frame = buf_block_get_frame (block) / * create a leaf_segment and store the segment header in a location on root page offset to PAGE_HEADER + PAGE_BTR_SEG_LEAF * / if (! fseg_create (space, page_no, PAGE_HEADER + PAGE_BTR_SEG_LEAF, mtr)) {/ * there is not enough space to allocate a new segment The allocated root page * / btr_free_root (block, mtr) needs to be released. Return (FIL_NULL);} / * initialize index page on root page and do different processing depending on whether the page is compressed or not * / page_zip = buf_block_get_page_zip (block); if (page_zip) {/ * other logic * / page = page_create_zip (block, index, 0,0, NULL, mtr) } else {/ * other logic * / page = page_create (block, mtr, dict_table_is_comp (index- > table), dict_index_is_spatial (index));} / * set the index id * / btr_page_set_index_id on root page (page, page_zip, index_id, mtr) / * set the front and rear pages of root page to NULL * / btr_page_set_next (page, page_zip, FIL_NULL, mtr); btr_page_set_prev (page, page_zip, FIL_NULL, mtr); / * other logic * / * return the page number of root page * / return (page_no);} 4.2 segment creation process
The process of creating a segment is relatively simple: first assign an inode entry to the segment in inode page, then initialize it on inode entry, and update the maximum segment id in the space header. It should be noted that when the incoming page is 0, it means that to create an independent segment, the current inode entry address needs to be recorded in the first page of the segment and returned; when the passed page is non-0, the segment needs to record the current inode entry address at the specified location of the specified page. For more information, please refer to the code:
Buf_block_t*fseg_create_general (/ * = * / ulint spaceworthy idjinghorn!
< in: space id */ ulint page, /*!< in: page where the segment header is placed: if this is != 0, the page must belong to another segment, if this is 0, a new page will be allocated and it will belong to the created segment */ ulint byte_offset, /*!< in: byte offset of the created segment header on the page */ ibool has_done_reservation, /*!< in: TRUE if the caller has already done the reservation for the pages with fsp_reserve_free_extents (at least 2 extents: one for the inode and the other for the segment) then there is no need to do the check for this individual operation */ mtr_t* mtr) /*!< in/out: mini-transaction */{ /* 局部变量 */ ... /* 如果传入的page是0,则创建一个独立的段,并把segment header的信息 存储在段首page中。如果传入page是非0,则这是一个非独立段,需要将 segment header的信息存储在指定page的指定位置上 */ if (page != 0) { /* 获取指定page */ block = buf_page_get(page_id_t(space_id, page), page_size, RW_SX_LATCH, mtr); header = byte_offset + buf_block_get_frame(block); } /* 其他逻辑 */ /* 获取space header和inode_entry */ space_header = fsp_get_space_header(space_id, page_size, mtr); inode = fsp_alloc_seg_inode(space_header, mtr); if (inode == NULL) { goto funct_exit; } /* 获取当前表空间最大segment id,并更新表空间最大 segment id */ seg_id = mach_read_from_8(space_header + FSP_SEG_ID); mlog_write_ull(space_header + FSP_SEG_ID, seg_id + 1, mtr); /* 初始化inode entry的segment id 和 FSEG_NOT_FULL_N_USED */ mlog_write_ull(inode + FSEG_ID, seg_id, mtr); mlog_write_ulint(inode + FSEG_NOT_FULL_N_USED, 0, MLOG_4BYTES, mtr); /* 初始化inode entry的三个extent链表 */ flst_init(inode + FSEG_FREE, mtr); flst_init(inode + FSEG_NOT_FULL, mtr); flst_init(inode + FSEG_FULL, mtr); /* 初始化innode entry的32个碎片页 */ mlog_write_ulint(inode + FSEG_MAGIC_N, FSEG_MAGIC_N_VALUE, MLOG_4BYTES, mtr); for (i = 0; i < FSEG_FRAG_ARR_N_SLOTS; i++) { fseg_set_nth_frag_page_no(inode, i, FIL_NULL, mtr); } /* 如果传入的page是0,则分配一个段首page */ if (page == 0) { block = fseg_alloc_free_page_low(space, page_size, inode, 0, FSP_UP, RW_SX_LATCH, mtr, mtr#ifdef UNIV_DEBUG , has_done_reservation#endif /* UNIV_DEBUG */ ); header = byte_offset + buf_block_get_frame(block); mlog_write_ulint(buf_block_get_frame(block) + FIL_PAGE_TYPE, FIL_PAGE_TYPE_SYS, MLOG_2BYTES, mtr); } /* 在page指定位置记录segment header,segment header由 inode page所在的space id,page no, 以及inode entry的在 inode page 中的页内偏移组成 */ mlog_write_ulint(header + FSEG_HDR_OFFSET, page_offset(inode), MLOG_2BYTES, mtr); mlog_write_ulint(header + FSEG_HDR_PAGE_NO, page_get_page_no(page_align(inode)), MLOG_4BYTES, mtr); mlog_write_ulint(header + FSEG_HDR_SPACE, space_id, MLOG_4BYTES, mtr);funct_exit: DBUG_RETURN(block);}4.3 extent的分配过程 表空间分配extent的逻辑比较简单,直接查询FSP_FREE上有没有剩余的extent即可,没有的话就为FSP_FREE重新初始化一些extent。详细逻辑如下: staticxdes_t*fsp_alloc_free_extent( ulint space_id, const page_size_t& page_size, ulint hint, mtr_t* mtr){ /* 局部变量 */ ... /* 获取space header */ header = fsp_get_space_header(space_id, page_size, mtr); /* 获取hint页所在的xdes entry */ descr = xdes_get_descriptor_with_space_hdr( header, space_id, hint, mtr, false, &desc_block); fil_space_t* space = fil_space_get(space_id); /* 当hint页所在的xdes entry的状态是XDES_FREE时,直接将其摘下返回, 否则尝试从FSP_FREE中为segment分配extent。如果FSP_FREE为空, 则需要进一步从未初始化的空间中为FSP_FREE新分配一些extent, 并从新的FSP_FREE中取出第一个extent返回 */ if (descr && (xdes_get_state(descr, mtr) == XDES_FREE)) { /* Ok, we can take this extent */ } else { /* Take the first extent in the free list */ first = flst_get_first(header + FSP_FREE, mtr); if (fil_addr_is_null(first)) { fsp_fill_free_list(false, space, header, mtr); first = flst_get_first(header + FSP_FREE, mtr); } /* 分配失败 */ if (fil_addr_is_null(first)) { return(NULL); /* No free extents left */ } descr = xdes_lst_get_descriptor( space_id, page_size, first, mtr); } /* 将分配到的extent从FSP_FREE中删除 */ flst_remove(header + FSP_FREE, descr + XDES_FLST_NODE, mtr); space->Free_len--; return (descr);}
It's a little more complicated when assigning extent to segment: first check to see if there is any remaining extent in the FSEG_FREE, and if you don't use fsp_alloc_free_extent to request extent from the tablespace. In the second case, there is not enough extent in FSEG_FREE, so further attempts will be made to allocate more extent to FSEG_FREE. The detailed process is as follows:
Staticxdes_t*fseg_alloc_free_extent (fseg_inode_t* inode, ulint space, const page_size_t& page_size, mtr_t* mtr) {/ * local variable * /. / * if FSEG_FREE is not empty, assign extent to segment from it, if FSEG_FREE is empty Then assign extent * / if (flst_get_len (inode + FSEG_FREE) > 0) {first = flst_get_first (inode + FSEG_FREE, mtr) to the current segment from calling fsp_alloc_free_extent. Descr = xdes_lst_get_descriptor (space, page_size, first, mtr);} else {descr = fsp_alloc_free_extent (space, page_size, 0, mtr); if (descr = = NULL) {return (NULL) } / * set the extent applied from space to segment private state (XDES_FSEG), and add extent to FSEG_FREE * / seg_id = mach_read_from_8 (inode + FSEG_ID); xdes_set_state (descr, XDES_FSEG, mtr); mlog_write_ull (descr + XDES_ID, seg_id, mtr) Flst_add_last (inode + FSEG_FREE, descr + XDES_FLST_NODE, mtr); / * there is not much extent left in the current FSEP_FREE. Try to assign more physically adjacent extent * / fseg_fill_free_list (inode, space, page_size, xdes_get_offset (descr) + FSP_EXTENT_SIZE, mtr) to the current segment. } return (descr);} 4.4 page allocation process
The process of allocating tablespace page is as follows: first check whether the extent where hint_page is located is suitable for allocating free pages, and if not, try to find free pages from the FSP_FREE_FRAG linked list. If the FSP_FREE_FRAG is empty, a new extent is assigned, added to the FSP_FREE_FRAG, and free pages are allocated to it.
Static MY_ATTRIBUTE ((warn_unused_result)) buf_block_t*fsp_alloc_free_page (ulint space, const page_size_t& page_size, ulint hint, rw_lock_type_t rw_latch, mtr_t* mtr Mtr_t* init_mtr) {/ * local variable * /... / * get the xdes entry * / header = fsp_get_space_header (space, page_size, mtr) of the tablespace header and the extent where the hint page is located. Descr = xdes_get_descriptor_with_space_hdr (header, space, hint, mtr) / * if the status of the xdes entry is XDES_FREE_FRAG, then allocate the page directly from the extent Otherwise, find free page * / if from FSP_FREE_FRAG (descr & & (xdes_get_state (descr, mtr) = = XDES_FREE_FRAG)) {/ * Ok, we can take this extent * /} else {/ * Else take the first extent in free_frag list * / first = flst_get_first (header + FSP_FREE_FRAG, mtr) / * try to find free pages from FSP_FREE_FRAG. When the FSP_FREE_FRAG linked list is empty, you need to assign a new extent using fsp_alloc_free_extent. Add the extent to the FSP_FREE_FRAG and assign free page * / if (fil_addr_is_null (first)) {descr = fsp_alloc_free_extent (space, page_size, hint, mtr) If (descr = = NULL) {/ * No free space left * / return (NULL);} xdes_set_state (descr, XDES_FREE_FRAG, mtr); flst_add_last (header + FSP_FREE_FRAG, descr + XDES_FLST_NODE, mtr) } else {descr = xdes_lst_get_descriptor (space, page_size, first, mtr);} / * Reset the hint * / hint = 0 } / * assign a free page from the found extent * / free = xdes_find_bit (descr, XDES_FREE_BIT, TRUE, hint% FSP_EXTENT_SIZE, mtr); if (free = = ULINT_UNDEFINED) {ut_print_buf (stderr, ((byte*) descr)-500,1000); putc (', stderr); ut_error } page_no = xdes_get_offset (descr) + free; / * other logic * / / * set the XDES_FREE_BIT for assigning page to false in fsp_alloc_from_free_frag, indicating that it is occupied The FSP_FRAG_N_USED field of the incremental header page If the extent is full, remove it from the FSP_FREE_FRAG and add it to the FSP_FULL_FRAG linked list, updating the value of FSP_FRAG_N_USED * / fsp_alloc_from_free_frag (header, descr, free, mtr) / * return * / return (page_id_t (space, page_no), page_size, rw_latch, mtr, init_mtr) after initializing Page content;}
In order to make the logically adjacent nodes in segment physically adjacent and improve the utilization of table space as much as possible, the logic of allocating page in segment is more complex. The detailed process is as follows:
Staticbuf_block_t*fseg_alloc_free_page_low (fil_space_t* space, const page_size_t& page_size, fseg_inode_t* seg_inode, ulint hint, byte direction, rw_lock_type_t rw_latch, mtr_t* mtr Mtr_t* init_mtr#ifdef UNIV_DEBUG, ibool has_done_reservation#endif / * UNIV_DEBUG * /) {/ * local variable * /. / * calculates the number of page currently used and occupied by segment. The former statistical method is to accumulate the used number of 32 fragmented pages, the number of page used in FSEG_FULL/FSEG_NOT_FULL, and the latter statistical method is to accumulate the used number of 32 fragmented pages, and the total number of page in the three linked lists of FSEG_FULL/FSEG_NOT_FULL/FSEG_FREE * / reserved = fseg_n_reserved_pages_low (seg_inode, & used, mtr) / * get the xdes entry of the extent where the tablespace header and hint page are located * / space_header = fsp_get_space_header (space_id, page_size, mtr); descr = xdes_get_descriptor_with_space_hdr (space_header, space_id, hint, mtr) If (descr = = NULL) {/ * indicates that hint page is outside free limit, set hint page to 0 and cancel the function of hint page * / hint = 0; descr = xdes_get_descriptor (space_id, hint, page_size, mtr) } / * In the big if-else below we look for ret_page and ret_descr * / *-- * / if ((xdes_get_state (descr) Mtr) = = XDES_FSEG) & & mach_read_from_8 (descr + XDES_ID) = = seg_id & & (xdes_mtr_get_bit (descr, XDES_FREE_BIT, hint% FSP_EXTENT_SIZE, mtr) = = TRUE) {take_hinted_page: / * 1. The extent where hint page resides belongs to the current segment And hint page is also idle, which is the ideal situation * / ret_descr = descr Ret_page = hint; goto got_hinted_page; / *-- * /} else if (xdes_get_state (descr, mtr) = = XDES_FREE & & reserved-used
< reserved / FSEG_FILLFACTOR && used >= FSEG_FRAG_LIMIT) {/ * 2. Segment space utilization is higher than the critical value (7 + 8, FSEG_FILLFACTOR), and the extent where the hint page is located is in the XDES_FREE state. The extent is directly removed from the FSP_FREE, assigned to the FSEG_FREE of the segment, and hint page * / ret_descr = fsp_alloc_free_extent (space_id, page_size, hint, mtr) is returned. Xdes_set_state (ret_descr, XDES_FSEG, mtr); mlog_write_ull (ret_descr + XDES_ID, seg_id, mtr); flst_add_last (seg_inode + FSEG_FREE, ret_descr + XDES_FLST_NODE, mtr) / * allocate several physically adjacent extent * / fseg_fill_free_list (seg_inode, space_id, page_size, hint + FSP_EXTENT_SIZE, mtr) to the FSEG_FREE of segment if the utilization conditions permit; goto take_hinted_page / *-- * /} else if ((direction! = FSP_NO_DIR) & & (reserved-used)
< reserved / FSEG_FILLFACTOR) && (used >= FSEG_FRAG_LIMIT) & & (!! (ret_descr = fseg_alloc_free_extent (seg_inode, space_id, page_size, mtr) {/ * 3. When the utilization is less than the critical value, it is not recommended to allocate a new extent to avoid space waste. Get the free extent from FSEG_FREE to allocate the new page * / ret_page = xdes_get_offset (ret_descr); if (direction = = FSP_DOWN) {ret_page + = FSP_EXTENT_SIZE-1 }} else if ((xdes_get_state (descr, mtr) = = XDES_FSEG) & & mach_read_from_8 (descr + XDES_ID) = = seg_id & & (! xdes_is_full (descr, mtr) {/ * 4. When the extent where hint page resides belongs to the current segment, if there is a free page in the extent, return * / ret_descr = descr; ret_page = xdes_get_offset (ret_descr) + xdes_find_bit (ret_descr, XDES_FREE_BIT, TRUE, hint% FSP_EXTENT_SIZE, mtr);} else if (reserved-used > 0) {/ * 5. If the number of page occupied by the segment is greater than the practical number of page, it means that the segment still has free page. First, check whether there is an unfilled extent on the FSEG_NOT_FULL linked list. If not, then check whether there is a completely free extent * / fil_addr_t first on the FSEG_FREE linked list. If (flst_get_len (seg_inode + FSEG_NOT_FULL) > 0) {first = flst_get_first (seg_inode + FSEG_NOT_FULL, mtr);} else if (flst_get_len (seg_inode + FSEG_FREE) > 0) {first = flst_get_first (seg_inode + FSEG_FREE, mtr) } else {return (NULL);} ret_descr = xdes_lst_get_descriptor (space_id, page_size, first, mtr); ret_page = xdes_get_offset (ret_descr) + xdes_find_bit (ret_descr, XDES_FREE_BIT, TRUE, 0, mtr);} else if (used)
< FSEG_FRAG_LIMIT) { /* 6. 当前segment的32个碎片页尚未使用完毕,使用fsp_alloc_free_page从 表空间FSP_FREE_FRAG中分配独立的page,并加入到该inode的frag array page 数组中 */ buf_block_t* block = fsp_alloc_free_page( space_id, page_size, hint, rw_latch, mtr, init_mtr); if (block != NULL) { /* Put the page in the fragment page array of the segment */ n = fseg_find_free_frag_page_slot(seg_inode, mtr); fseg_set_nth_frag_page_no( seg_inode, n, block->Page.id.page_no (), mtr);} return (block);} else {/ * 7. When the above situation is not satisfied, directly use fseg_alloc_free_extent to allocate a free extent, and return * / ret_descr = fseg_alloc_free_extent (seg_inode, space_id, page_size, mtr); if (ret_descr = = NULL) {ret_page = FIL_NULL Ut_ad (! has_done_reservation);} else {ret_page = xdes_get_offset (ret_descr);}} / * page allocation failed * / if (ret_page = = FIL_NULL) {return (NULL) } got_hinted_page: / * Mark available hint page as used status * / if (ret_descr! = NULL) {fseg_mark_page_used (seg_inode, ret_page, ret_descr, mtr) } / * return * / return after initialization of Page content (fsp_page_create (page_id_t (space_id, ret_page), page_size, rw_latch, mtr, init_mtr);}
The file structure of innodb consists of page (page), extent (area), segment (segment), tablespace (table space) and so on from bottom to top. Page is the most basic physical unit, all page have the same header and footer; extent is usually composed of 64 consecutive page, tablespace is composed of continuous extent; segment is the logical unit used to manage physical files, you can apply to the table space for allocation and release of page or extent, is the basic element of index and rollback segment Tablespaces are a macro concept. When innodb_file_per_table is ON, a user table corresponds to a tablespace.
The above is the editor for you to share how to understand the InnoDB engine, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.