In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article is to share with you about how to insert data in InnoDB. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.
Tablespace / Tablespace
If I want to be a writer and make up my mind to write a long masterpiece, then I need to write it down on paper. The first step is to buy two large rolls of uncut white paper from the paper mill. Accordingly, in the computer, all data also need to be recorded on disk, magnetic tape, optical disc and other storage media for long-term preservation.
These media are divided into files, which are the physical space in which data is stored.
Since I bought two rolls of paper, and any one of them could store text, whenever I started writing, I struggled to decide which volume should be recorded. This is painful for me, who has difficulty in choice.
Therefore, I plan to hire a secretary to dictate to him what I want to write, and use him to transcribe my words into a specific paper roll. I don't care which volume I write on.
By the same token, when you manipulate a file with a program, you first need to specify the file path. However, in the database, the table is oriented to development, while the storage device is oriented to operation and maintenance. When a developer creates a table, it is difficult to determine which file a table corresponds to. And the operation and maintenance staff will dynamically add files to the database according to the actual situation.
The tight coupling between tables and files seriously restricts the convenience of database use, so it makes sense to add a layer of tablespace between files and tables, which docks tables up and down; developers only need to operate tables in tablespaces, and the specific storage is automatically maintained by Innodb storage engine according to tablespaces.
Tablespaces are the highest level of logical structure in the InnoDB storage engine, and all data is logically stored in tablespaces.
Tablespaces mainly include the following types:
The system tablespace stores metadata for change buffer, doublewrite buffer, and all objects related to innodb. Such as: tablespace and database information, table structure and field information and so on. The .frm file that was originally used to store table structure information is removed from mysql8.0, and all metadata is stored in this system tablespace. The relevant core views in the system tablespace information_schema library are as follows:
If there is a corresponding table user in the database world, the test is as follows: query the tablespace information of the table: select * from information_schema.innodb_tablespace where name='world/user'; (space: tablespace id,name: tablespace name)
Query the data file information corresponding to the table space: select * from information_schema.files where tablespace_name='world/user'; (file_name: relative path of data file)
Query the id corresponding to the table: select * from information_schema.innodb_tables where name='world/user'
Query the page number (root page no) select * from information_schema.innodb_indexes where table_id=1269 and name='primary'; of the root node corresponding to the primary key index (page_no:B+ tree root page no;name='primary' represents the primary key index)
The system tablespace also has a corresponding data file, which defaults to xxx\ MySQL Server 8.0\ Data\ ibdata1 (under windows). Only system tablespaces can specify multiple files, and other types of tablespaces can specify only one data file.
Independent tablespace each table corresponds to a separate tablespace. Start the independent tablespace by configuring the parameter in my.ini: innodb_file_per_table=1; otherwise, it defaults to the system tablespace. After 5.6.6, this configuration is enabled by default, so it defaults to independent tablespaces.
When you create a table, a tablespace corresponding to the table name is automatically created for the table, and a tablespace file with the table name .ibd is generated under the database directory. For example, the result of creating a user table in the database world is as follows
A regular tablespace is a tablespace that is created manually with the create tablespace tablespace name.
Temporary tablespaces store temporary tables and rollback segments corresponding to temporary table changes. The default temporary file is (under windows) xxx\ MySQL Server 8.0\ Data\ ibtmp1
Zone / cluster / Extent
Because a roll of original paper is too large, it may cover more than ten or even dozens of floors of my mansion floor, which is very inconvenient to use. after all, my 9-square-meter mansion still needs space to meet visitors. The best way is to cut the paper into A4-sized data pages.
Similarly, the capacity of a disk or file is very considerable and extremely difficult to manage, so innodb divides the file into blocks of equal size, also known as pages.
For a literary story, you can find the next page in turn through the page number, so as to read the story completely. Usually when we finish reading the first page, we will immediately go on to read the second page, but if the corresponding pages are scattered in the bedroom, toilet and living room, the reading experience will be greatly reduced. If the cost of binding these scattered pages can be put together, the convenience of reading can be greatly improved.
According to the locality principle, when using data, cpu will use logically adjacent data in the next step. Therefore, in order to improve the performance of data read operations, innodb stores logically adjacent data in adjacent pages as much as possible, and in order to achieve this goal, Innodb introduces the concept of region / cluster.
An area / cluster is a physically continuously allocated piece of space, and the extent is divided into contiguous pages to store data from the same logical unit (such as the index segment, data segment below). A range / cluster consisting of 64 consecutive pages (Page) by default, with a default size of 16K for each page.
In fact, innodb first divides the file into continuous regions / clusters, and then divides the files into continuous pages within the area / cluster. On the whole, a file is not only a series of continuous pages on the micro level, but also a series of continuous areas / clusters on the macro level. Knowing the page number and page size of a page can calculate the specific location of the page on disk, in the same way, knowing a page number can calculate the size of an area / cluster and how many regions / clusters the page is in (it does not have a number itself, but assuming that the first area / cluster is 0, you can know how many it logically is).
If you think of a page as a page in a real book, then extent can be regarded as a real book.
The purpose of a zone is to allocate contiguous space to logical units and to manage the state of storage space in the zone (such as which pages are full, which are not used, and which contain fragments). Specific through different zone / cluster chain tables to indicate the spatial state of the area itself, and through the XDES_BITMAP in XDES Entry to indicate the spatial state of pages in the area).
# paragraph / Segment
At that time, after Da Liu finished writing the first book of three-body, it was not updated for a long time, but because the content was too wonderful, Obama not only wrote an email, but also urged change through diplomatic means. In order to avoid damage to Sino-US relations, Da Liu did the same and wrote two more books in succession.
These three books which are logically coherent are generally called three-body, so we call this kind of related multiple books a set. Similarly, innodb classifies logically related regions / clusters as a segment.
In order to make the same logical unit physically have continuous storage space, Innodb put forward the concept of zone, but the minimum operating unit of io is page, io can not fill one area at a time, and the data can be erased (deleted) and rewritten, so it is necessary to record the space status of the area itself and the area: which areas are full, which areas have not been used, and which areas have debris space.
In innodb, the management information that records the storage space status of related areas is called segment entities, and the sum of the areas managed by segment entities is called segments. The purpose of the segment is to manage the usage of the area and to provide space storage state when space is allocated to data.
The paragraph can be similarly seen as a set of books in reality.
The data in innodb is organized in the way of B + tree, leaf nodes store keywords and row data, and non-leaf nodes store keywords (index data) and page numbers. Index data and business row data have different data structures, so they are stored separately, the index data of non-leaf nodes are stored in one segment, and the business data of leaf nodes are stored in another segment. The corresponding data are also stored in areas and pages of different structures.
The logical structure of data is as follows:
The physical storage structure is as follows:
Segments are logical components of tablespaces and are used to store data with the same meaning, such as non-leaf nodes in B + pairs or leaf nodes in B + trees. The common segments are data segment, index segment, rollback segment and so on.
Each time an index is created, two segments are created: one is the data segment (the leaf node of the B+ tree), and the other is the index segment (non-leaf node). For clustered indexes (usually primary key indexes) data segments store index keywords and business rows (all fields); for nonclustered indexes, data segments store index keywords and primary keys; if you query through nonclustered indexes, you need to find out the primary key through the B+ tree first, and then query the specific rows from the clustered index twice through the primary key, which is called back to the table. Below: secondary index (nonclustered index) on the left and primary key index (clustered index) on the right
Table data is organized and stored through a clustered index, that is, a B+ tree created by a primary key index, so a primary key should be specified when the table is created. If no primary key is specified and no unique index is created, the table creates a self-increasing hidden field by default: row_id as the key field of the clustered index B+ tree. Because it is a hidden field, this field can only be used when querying back to the table.
Page / Page
As mentioned above, pages, like the pages of a real book, are the smallest unit of io operations in innodb. A page in innodb is similar to a page in a real book.
The default size of the page is 16KB; it can be specified through the innodb_page_size parameter, and the optional values are: 4KB, 8KB, 16KB, 32KB, 64KB; when page size is 4, 8, 16KB, the number of page corresponding to an extent changes synchronously to ensure that the extent (zone / cluster) size remains unchanged at 1m. When page size is 32KB or 64KB, the number of page in extent is guaranteed to remain unchanged, and extent synchronization becomes 2m and 4m.
Each page has a corresponding number starting with 0, which is called the page number. Because the data file in the tablespace is divided into pages of equal size, you can calculate the exact address of the page on disk by knowing the page number and then according to the initial location of the file.
Similarly, a table corresponds to a clustered index, and the page number of root page is specified in the clustered index metadata, so the Innodb engine can calculate the exact address of the index B+ tree root page based on the page number and page size, thus manipulating the entire table data.
Page is mainly used to store business-related data, but page storage is also needed for extent and segment information that exists to manage memory allocation. Innodb is divided into the following categories according to the content of page storage:
FSP HDR pages: a tablespace may correspond to multiple data files, each with its own number. The tablespace is the top-level structure in the database. The meta-information such as the corresponding tablespace files can be queried through the metadata in the system tablespace, but the corresponding segments and regions of the current tablespace can not be queried. Therefore, the storage status of pages in the tablespace can not be obtained.
In order to make the physical storage of the tablespace have an entry for external access, it is specified that the tablespace information and the pointer to the segment linked list owned by the current tablespace are stored in the page page 0 of file 0 in the tablespace.
Any page consists of a header, a header, and a footer.
A page defaults to 16KB, and the pointer data for segments and regions is small, so only part of the header information is needed to maintain it. Most of the remaining space is used to store part of the entity information of the sending area owned by the current tablespace.
Header: indicates the current page number, type, and tablespace. Footer: mainly used for data checking. Page body: this is the main part of the page used to store data.
The page body is divided into the table space header information area and the business data area. FSP HEADER: (1): tablespace information: corresponding space id, total number of tablespace pages, etc. (2): segment information: the linked list pointer of the page where the segment entity is filled with data, and the linked list pointer of the page where the segment entity is not filled with data (pointing to the page where the segment entity is not, but the page where the segment entity is located, 85 segment entities are stored in one page). (3): fragment area / cluster information: free fragment area / cluster (XDES entity itself, not the page where the XEDS entity is located) linked list pointer, underwritten fragment list pointer, full fragment list. These zone / cluster information does not belong to any segment, but belongs to tablespaces and is used to allocate space to the segment the next time it requests space.
In theory, a region / cluster will be completely allocated to a segment, but some regions / clusters will be directly assigned to the table space and used as fragment areas. To reduce waste, only some of the pages in these areas are assigned to a specified segment.
For example, when you proudly announce that you are going to write a masterpiece and ask your secretary to give you 500 pages of paper, the secretary will probably have seen through everything. On the one hand, he will respond to you, and on the other hand, he will only give you three pages of paper. because he thinks you probably can't hold back six words in seven days. Similarly, when innodb allocates space to a newly created segment, it does not allocate a zone / cluster at the beginning, but allocates 32 pages from the fragment area first. only when these 32 pages are used up, innodb considers this segment to be a big data segment and officially begins to allocate a complete zone / cluster to it.
Data section:
FSP HEADER points to segment linked lists and fragment linked lists, but these are just linked list pointers, and the real area information nodes are stored in the data area of the current page. A zone / cluster information entity is called a XDES Entry (eXtent DEScript); a page stores 256 XDES Entry.
The XDES Entry contains the segment id (if assigned to a segment), the next node pointer in the fragment linked list, and so on. It does not contain page information, because the area / cluster has a corresponding physical space, and the page in its space is the page owned, so it does not need to be specified in the entry.
Careful friends will find that although XDES Entry is a description zone / cluster, it does not specify the number or address of the zone / cluster, so which region / cluster in the physical space does it correspond to?
The zone / cluster itself does not have a number, but the zone / cluster, like a page, is allocated continuously from the first byte of the file. At the same time, the first page of the first area of every 256zone / cluster is the index page of the 256zone / cluster, namely XDES page.
XDES page has a page No, so you can calculate the address of this XDES page, that is, the address of all the zones / clusters of this page. The next 255regions / clusters have a corresponding XDES Entry stored in the XDES page. The offset of the position of these XDES Entry in this page is the offset of the next 255regions / clusters. From the current location of all XDES page regions / clusters and the corresponding offset, the physical location of a XDES Entry corresponding region / cluster can be calculated.
The FSP HDR page is like the cover page of a table space, which is the entry page of the entire table space.
XDES page: the XDES page is the abbreviation of the eXtent DEScript area / cluster description page, the page used to store the area / cluster information entity, that is, the page that stores the XDES Entry. It is exactly the same except that it is different from FSP HEADER in the FSP page. In essence, the home page is also a XDES page, but the home page is the first page of the entire table space, so it records the tablespace information part-time.
XDES Entry: a logical block that stores the information of the zone itself.
Because a page XDES can only store 256 entry, corresponding to 256 extents, logically one xdex page is needed to store the next series of 256 extents for every 256 extents.
INODE page: like the Entry information corresponding to the same area / cluster, the tablespace only points to the linked list of segment pages (non-segment entities) in various states, but does not store the segment information itself. An inode page is a page that stores the inode entry that describes the segment information.
An inode page stores 85 segment entities by default, and each entity points to the area / cluster linked list of different states corresponding to this segment: unused area / cluster linked list, full area / cluster linked list, underwritten area / cluster linked list.
Pages above the Index page are pages that store physical space usage and are used to manage areas / clusters and segments themselves. The index page is used to finally store business data. In innodb, table data is organized and stored through a clustered index, while leaf nodes are stored in one segment, and non-leaf nodes are stored in another segment, but eventually they are stored in pages of type Index.
The details of the index page are shown below:
The storage structure in the index page is shown below:
The business data on the page is an one-way linked list logically arranged in order. There are two virtual rows in the page, which represent the row with the lowest index value and the largest row in the entire page, that is, the first and last rows in the linked list, to define the scope of the linked list.
In addition, for index segments, there are about 16250B on a page to store user data. One line contains a 4-byte int type key, a 6-byte page number pointing to the leaf node, and about 6 bytes of row header, for a total of about 16 bytes. Then a page of rough calculations can store 16,250thers and 16 is about 1000. To optimize the query, the addresses of the first rows of these rows are stored in a 2-byte space called slot every 4-8 rows of data, and the slot together form an array called Page directory.
As shown in the figure: the last slot of the array stores the first row of infimum, the second slot stores row4, and the first slot stores the last row of data supremum. In this way, the page directory array is an ordered array, and we can quickly locate the data block through a binary search algorithm, and then traverse the block to find the final data that meets the requirements.
Note: because there is free space between the user line and the footer, and the number of slot is not fixed by the number of lines in the page, that is, the length of the page dirctory array is not fixed, so the slot is allocated by appending forward in reverse order.
Whole structure
The above is the structure and data information of different objects in the tablespace. Let's take a look at how the components are related from an overall point of view.
Microscopically, tablespace files are physically separated into pages of equal size and contiguous size.
Macroscopically, tablespace files are physically separated into flood-multiplied and continuous regions / clusters.
Page 0 of document 0 is called the FSP page, the front page, and can be assumed to be the cover page of the tablespace. It stores the linked list pointers of other components of the entire tablespace and is the entry page of the entire tablespace.
Logically, the FSP page points to different components through two lines. (1): point to segment information through FSP_SEG_INODES_FULL (full segment page linked list) and FSP_SEG_INODES_FREE (underwritten segment page linked list). The segment entity points to the region / cluster belonging to this paragraph through FSEG_FREE (idle region / cluster chain table), FSEG_FULL (full region / cluster) and FSEG_NOT_FULL (underwritten region / cluster). (2): point to regions / clusters that do not belong to any segment through FSP_FREE (idle zone / cluster chain table), FSP_FREE_FRAG (underwritten fragment / cluster), and FSP_FULL_FRAG (full fragment / cluster).
The first page of the first area / cluster of every 256 regions / clusters stores the management information of the 256 regions / clusters. Page 0 is called a FSP page because it is special, and the other pages are called XDES pages. With this page number and the Entry location offset stored in it, the location of the 256extents on disk can be easily calculated. So even if there is no record zone / cluster number or address in the XDES Entry, you can know which zone / cluster each Entry manages.
When inserting a piece of data into an index page, if the page is full, you need to apply for space from the area / cluster where the page is located. If the area / cluster is also full, apply to the segment. If the segment is also full, the tablespace will apply to the table space. The tablespace will apply to the disk for 3 regions / clusters through the operating system and add them to the FSP_FREE list in the FSP. Then it is allocated at one level and stored in its corresponding linked list.
Line / Row
All of the objects described above are designed to allocate a piece of physical space for storage to the business data, so that the business data can finally be recorded on the specified page. While innodb is stored on a row-based basis, let's take a simple look at the storage structure in line Compact format.
Each record contains a series of header information describing the storage status of the current record as shown in the figure. However, in addition to the header information, the data stored varies according to the node where the record is located.
The clustered cable leaf node stores the business rows in the table, including the transaction id, rollback segment pointer, and a hidden row_id when no primary key and unique index are specified.
The non-leaf node is searched for the B+ tree, so the minimum record value of the child node and the page number of the child node are recorded.
The relationship between B+ Tree Node and page
Innodb page is just a physical storage space, equivalent to a page of a book, just a carrier of data. B+ tree nodes are the logical structure of data, and they have no inevitable relationship in theory. You can store a complete Btree in a page page, or you can store a complete Btree together with multiple page pages, and even correspond the page page to the nodes in the B+ tree one by one.
In fact, for the sake of simplicity in Innodb, the B+ tree node corresponds to the page page one by one, and the following is a simple extension process.
Suppose you have a clustered index B+ tree that starts like this:
Insert 16, 17, and 18 rows of data into the B+ tree as follows (green part):
Continue to insert 19 rows of data into the B+ tree, and the original space has been fully expanded as follows (blue):
If the clustered index uses a self-incrementing primary key, the data is stored in each page as an append, and if the page is full, you only need to reallocate one page to continue to append.
If the clustered index uses unordered columns such as uuid, because B+ is a logically ordered collection, inserting data into the B+ tree is likely to be inserted into the already full page page, causing the original page to split. It moves first like inserting data into an array to make room for new data. Therefore, it is recommended to use ordered columns as a clustered index.
How to store a piece of data step by step
After a lot of hard work, I was finally able to insert a piece of data from beginning to end to explore how innodb stores the data into a file step by step. My sisters are probably fascinated by it and happy to think about it. I'm really a smart guy.
Stretch and look at your sisters sweetly.
Hey, where is it? Did I cross into the equal space?
Forget it, from the beginning to the end, I will tell it to myself. I can't fascinate others. I still don't believe that I can't charm myself.
Create table user in database world
CREATE TABLE user (id int (11) NOT NULL AUTO_INCREMENT, name varchar (10) DEFAULT NULL, age int (11) DEFAULT NULL, gender smallint (6) DEFAULT NULL, create_time date DEFAULT NULL, PRIMARY KEY (id) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
Innodb stores table structure information in the tables and columns of the information_ schema library in the system tablespace
Create a tablespace and create a separate tablespace world/user and the corresponding data file world/user.ibd, and update it to information_schema.innodb_tablespaces
Synchronize the file information corresponding to the updated tablespace to information_schema.files
Page 0 of tablespace 0, that is, page 0 of world/user.ibd file, is specified as the cover page of tablespace.
Create a clustered index if you specify a primary key or unique index, create a clustered index with the specified column, otherwise create a clustered index with hidden column row_id and store it in information_schema.innodb_indexes
Create two segments for the index: the index segment (non-leaf node) and the data segment (leaf node), and store the segment information in the segment linked list of the tablespace cover page.
Create the first page of the index, namely Root Page, and record the segment information in the segment linked list of Root Page, so as to manage the segment information of this B+ tree. At the same time, record the Root PageNo in information_schema.innodb_indexes, as shown above. From the page to associate the logical table with the physical storage, this Root Page is equivalent to the cover of the index.
Insert data insert a piece of data into the table as follows
Insert into world.user (name,age,gender,create_time) values ('Mu Ye Xiaoxiao', 18jin1 now ())
Extract the database name and table name from sql, and find the table id from information_schema.innodb_tables
According to the table id, the Root Page No of the clustered index corresponding to the table found from information_schema.innodb_indexes is 4.
Calculate the physical address of the Root Page through Root Page No 4. According to the segment information specified in Root Page, insert index data into Root Page, insert data rows into the pages corresponding to the data segments, and associate two types of pages.
If there is not enough space on a page, the zone / cluster in which the current page is located will be calculated and applied for space, and the zone / cluster will query the free pages and allocate them according to the bitmap in the XDES Entry. If the zone / cluster also has no free space, the required space will be applied to the upper segment, table space, and operating system level by level.
The applied tablespaces are stored in their respective linked lists (for example, the applied tablespaces are stored in the corresponding FSP_FREE linked lists).
When the page is allocated or expanded, in order to ensure that it can be found through the Root Page No in innodb_indexes, the Root Page physical space and the Root node corresponding to the B+ tree remain unchanged, that is, the page number remains the same, and it is always the space with page number 4.
When the physical page corresponding to B+ is constantly changing, in order to ensure the balance of the tree, a new Root node will be generated. In order to keep the Root page unchanged, innodb will copy and exchange the new Root node data to the original Root Page page by way of exchange, which can ensure that the Root Page will never change, that is, the relationship between the table and the physical space will never be broken.
Summary
Tablespace is a logical structure in a database, which decouples the association of tables, indexes, etc., with files.
Segment is also a logical structure, which groups the data with the same logical meaning and storage structure, which is easy to manage.
An area is a physical storage structure that corresponds to the real physical space on a large disk. It starts with the first byte of the file and is divided into the same size and logically concatenates the extents through XDES Entry. The relationship between XDES Entry and the physical space area it manages can be calculated from the page on which XDES Entry is located and the intra-page offset.
A page is the smallest unit of physical storage for IO operations. It is also divided by the same size starting from the first byte of the file. A table organizes data by index, and the corresponding Root page No of the table is stored in the clustered index metadata. Pages are numbered and can be associated with physical space by numbering.
Segments and regions are all for managing the storage state of the space and allocating space for the page. The real query only needs to operate the data on the physical space of the whole table through the association of all levels of nodes in the Page No and B+ tree.
The row is the physical unit that ultimately stores the business data. The default page is 16K, which can store about 1000 rows of index data (non-leaf nodes), or 20 rows or more of business data (leaf nodes). Between pages through the B+ tree "binary search (assuming multi-division)" algorithm to quickly locate data, the page is through Page Directory, multiple rows are divided into a group, a group corresponding to a slot in the Page Directory ordered array, so that a "binary search" optimization can be carried out within the page.
To record the state of the row itself, a record innodb adds additional record header information. If it is a leaf node, additional fields such as row_id (hidden primary key), trx_id (transaction id), rollback pointer, and so on will be added.
The above is how to insert data into InnoDB, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.