Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analysis on the principle of oracle b-tree index search

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

An index, like a table, is a kind of segment. The user's data is stored in it, which takes up disk space as well as tables. However, the form of data storage in the index is very different from that in the table. When you understand the index, you can imagine a book in which the content of the book is equivalent to the data in the table, and the catalog in front of the book is equivalent to the index of the table. At the same time, in general, the disk space occupied by the index is much smaller than that of the table, and its main function is to speed up the search of data.

However, as an optional data structure, you can choose to create an index for a table or not. This is because once the index is created, it means that when oracle DML the table (including INSERT, UPDATE, DELETE), it must handle the additional workload (that is, the maintenance of the index structure) and storage overhead. So when creating an index, you need to consider whether the improvement in query performance brought about by creating the index is worth compared to the additional overhead.

B-tree index is a typical tree structure, which is always balanced, that is, any path from Root node to Leaf node is equidistant. The main components it contains are:

Leaf node (Leaf node): contains entries that point directly to rows of data in the table.

Branch node: contains entries that point to other branch nodes or leaf nodes in the index.

Root node (Branch node): a B-tree index has only one root node, which is actually the branch node at the top of the tree.

For branch node blocks (including root node blocks), the index entries they contain are arranged in order (reverse can be specified, in reverse order). Each index entry (also known as each record) has two fields. The first field represents the minimum key value contained in the index block currently linked under the branch node block (according to the B+tree minimum key value copy principle); the second field is four bytes, indicating the address of the linked index block, which points to the next index block. The number of record rows that can be held in a branch node block is determined by the size of the data block and the length of the index key value.

For leaf node blocks, the index entries are arranged in the same order as branch nodes (the default is ascending order, or it can be specified as descending order when creating the index). Each index entry (also known as each record) also has two fields. The first field represents the key value of the index, which is a value for a single-column index, and multiple values for a multi-column index. The second field represents the ROWID of the record row corresponding to the key value, and the ROWID is the physical address of the record row in the table. In the leaf node, each index entry takes up a row of space in the data block. Each line uses 2 to 3 bytes as the header, which is used to store information such as tags and lock types. At the same time, in the first field that represents the key value of the index, each index column has 1 byte to represent the data length, followed by the specific value of the column.

Next, dump the index structure of the branch node and the index information of the leaf node.

Create test data

[sql]

Sys@ORCL > select * from v$version where rownum=1

BANNER

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0-Prod

Sys@ORCL > drop table tt purge

Drop table tt purge

*

ERROR at line 1:

ORA-00942: table or view does not exist

Sys@ORCL > create table tt as select * from dba_objects

Table created.

Sys@ORCL > select count (*) from tt

COUNT (*)

-

50356

Sys@ORCL > insert into tt select * from tt

50356 rows created.

Sys@ORCL > commit

Commit complete.

Sys@ORCL > select count (*) from tt

COUNT (*)

-

100712

Sys@ORCL > create index btree_tt on tt (object_name)

Index created.

Check the Blevel, height (blevel: depth of the node) of the index. Root is on layer 0, and so on. Height=blevel+1)

[sql]

Sys@ORCL > select index_name,blevel from dba_indexes where index_name='BTREE_TT'

INDEX_NAME BLEVEL

BTREE_TT 2

Sys@ORCL > analyze index btree_tt validate structure

Index analyzed. Sys@ORCL > select name,height from index_stats where name='BTREE_TT'

NAME HEIGHT

BTREE_TT 3

Get the object number of btree_tt and dump the index structure

[sql]

Sys@ORCL > select object_id from dba_objects where owner='SYS' and object_name='BTREE_TT'

OBJECT_ID

-

52614

Sys@ORCL > oradebug setmypid

Statement processed.

Sys@ORCL > alter session set events' immediate trace name treedump level 52614'

Session altered.

Sys@ORCL > oradebug tracefile_name

/ u01/app/oracle/admin/orcl/udump/orcl_ora_5234.trc

View the treedump trc file

[sql]

-begin tree dump

Branch: 0x40efaa 4255658 (0: nrow: 2, level: 2)

Branch: 0x40f603 4257283 (- 1: nrow: 247, level: 1)

Leaf: 0x40efab 4255659 (- 1: nrow: 182 rrow: 182)

Leaf: 0x40efac 4255660 (0: nrow: 182 rrow: 182)

Leaf: 0x40efad 4255661 (1: nrow: 186 rrow: 186)

Leaf: 0x40efae 4255662 (2: nrow: 189 rrow: 189)

Leaf: 0x40efaf 4255663 (3: nrow: 186 rrow: 186)

Leaf: 0x40efb0 4255664 (4: nrow: 190 rrow: 190)

Leaf: 0x40efb1 4255665 (5: nrow: 185 rrow: 185)

Leaf: 0x40efb2 4255666 (6: nrow: 179 rrow: 179)

Leaf: 0x40efb3 4255667 (7: nrow: 187 rrow: 187)

Leaf: 0x40efb4 4255668 (8: nrow: 181rrow: 181)

....

....

Branch: 0x40f6fb 4257531 (0: nrow: 248, level: 1)

Leaf: 0x40f602 4257282 (- 1: nrow: 228 rrow: 228)

Leaf: 0x40f604 4257284 (0: nrow: 226 rrow: 226)

Leaf: 0x40f605 4257285 (1: nrow: 224 rrow: 224)

Leaf: 0x40f606 4257286 (2: nrow: 223 rrow: 223)

Leaf: 0x40f607 4257287 (3: nrow: 217 rrow: 217)

Leaf: 0x40f608 4257288 (4: nrow: 253 rrow: 253)

Leaf: 0x40f609 4257289 (5: nrow: 232 rrow: 232)

....

....

Leaf: 0x40f6f8 4257528 (244: nrow: 191 rrow: 191)

Leaf: 0x40f6f9 4257529 (245: nrow: 181 rrow: 181)

Leaf: 0x40f6fa 4257530 (nrow: 99 rrow: 99)

-end tree dump

Interpret trc files

The first column of each row indicates: node type, branch is the branch node (including the root node), and leaf is the leaf node

The second column indicates: node address, hexadecimal

The third column indicates: node address, decimal

The fourth column indicates that the position relative to the previous node: the root node starts from 0, and the other branch nodes and leaf nodes start from 1.

The fifth column indicates: (nrow) the number of index entries contained in the current node (including entries for delete)

The sixth column says: (level) the level of the branch node. In the index of oracle, the level number is reversed, that is, if an index has N layer, the level number of the root node is N, while the level number of the branch node of the next layer under the root node is Nchor 1.

The seventh column indicates: (rrow) the number of valid index entries, because if the index entry is deleted, it will not be immediately purged out of the index block. So the number of nrow minus rrow indicates the number of index entries that have been deleted

The above method dumps the entire index in a tree form. At the same time, we can dump an index node to see what is stored in it.

The index block contents of the root node are dumped below.

As can be seen from the trc file: root node branch: 0x40efaa 4255658 (0: nrow: 2, level: 2)

[sql]

Sys@ORCL > select dbms_utility.data_block_address_file (4255658) fno

2 dbms_utility.data_block_address_block (4255658) bno

3 from dual

FNO BNO

--

1 61354

Sys@ORCL > alter system dump datafile 1 block 61354

System altered.

Sys@ORCL > oradebug tracefile_name

/ u01/app/oracle/admin/orcl/udump/orcl_ora_5234.trc

View the trc content of the root node

[sql]

Header address 230057028=0xdb66444

Kdxcolev 2

KDXCOLEV Flags =-

Kdxcolok 0

Kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y

Kdxconco 2

Kdxcosdc 0

Kdxconro 1

Kdxcofbo 30=0x1e

Kdxcofeo 8026=0x1f5a

Kdxcoavs 7996

Kdxbrlmc 4257283=0x40f603

Kdxbrsno 0

Kdxbrbksz 8056

Kdxbr2urrc 0

Row#0 [8026] dba: 4257531=0x40f6fb

Col 0; len 18; (18): 41 4c 4c 5f 43 4f 4c 5f 50 52 49 56 53 5f 4d 41 44 45

Col 1; len 6; (6): 00 40 ee c5 00 2c

-end of branch block dump-

Kdxcolev said: index level number. In our example, the level of the root node is 2 and the leaf should be 0.

Kdxcolok indicates whether there are DML active transactions on the index

Kdxconco indicates the number of columns in the index entry

Kdxcosdc said: the number of index structure changes, when you modify an index key value, the value is increased by 1

Kdxconro indicates the number of index entries in the current Inode

Kdxcofbo indicates that the current Inode starts recording from the number of bytes.

Kdxcofeo indicates which byte is the end of the available space of the current index node.

Kdxcoavs indicates the total amount of free space available on the current Inode. That is, the value of kdxcofeo-kdxcofbo

Kdxbrlmc indicates the location of the branch node

Kdxbrsno: the last modified index entry number, which is 0 here, indicates that it is a new index.

Kdxbrbksz said: available block size, from this we can know that even if pctfree is 0, for 8k blocks, we can not completely use up

[sql]

Row#0 [8026] dba: 4257531=0x40f6fb

Col 0; len 18; (18): 41 4c 4c 5f 43 4f 4c 5f 50 52 49 56 53 5f 4d 41 44 45

Col 1; len 6; (6): 00 40 ee c5 00 2c

This part of the content is the index entry recorded in the root node, a total of 1 line (in the definition of B+ tree, if you follow the principle of minimum key replication, then m sub-trees in each non-leaf node in the tree must have an m key). Each index entry points to a branch node, where col 1 represents the address of the linked branch node, and if there are no other branch nodes under the root node, col 1 is TERM;col 0 indicating the minimum key value linked by the branch node. Note that here col 0; len 18; (18):-- the row number of the column, starting with 0, followed by the length of the column and the value of the column, then this value is called separator key, and this separator key can distinguish between the real index values, so we also know that branch block does not store complete index values, as long as it can be distinguished. That is, Oracle records only the prefixes of index key values in Branch block, not all values, because this saves space and allows more index entries to be stored. At the same time, we can understand why the query uses like'% xxx' instead of Btree indexes, because Branch block stores prefixes.

The contents of the leaf node block are dumped below

Choose any leaf: leaf: 0x40f6fa 4257530 (nrow: 99 rrow: 99)

[sql]

Sys@ORCL > select dbms_utility.data_block_address_file (4257530) fno

2 dbms_utility.data_block_address_block (4257530) bno

3 from dual

FNO BNO

--

1 63226

Sys@ORCL > oradebug setmypid

Statement processed.

Sys@ORCL > alter system dump datafile 1 block 63226

Sys@ORCL > oradebug tracefile_name

/ u01/app/oracle/admin/orcl/udump/orcl_ora_6177.trc

Some of the contents of the leaf node are summarized as follows:

[sql]

Block header dump: 0x0040f6fa

Object id on Block? Y

Seg/obj: 0xcd86 csc: 0x00.a3506 itc: 2 flg:-typ: 2-INDEX

Fsl: 0 fnx: 0x0 ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0000.000.00000000 0x00000000.0000.00-0 fsc 0x0000.00000000

0x02 0xffff.000.00000000 0x00000000.0000.00 Cmurmuri-0 scn 0x0000.000a3506

Leaf block dump

=

Header address 221234268=0xd2fc45c

Kdxcolev 0

KDXCOLEV Flags =-

Kdxcolok 0

Kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y

Kdxconco 2

Kdxcosdc 0

Kdxconro 99

Kdxcofbo 234=0xea

Kdxcofeo 4692=0x1254

Kdxcoavs 4458

Kdxlespl 0

Kdxlende 0

Kdxlenxt 0=0x0

Kdxleprv 4257529=0x40f6f9

Kdxledsz 0

Kdxlebksz 8032

Row#0 [7992] flag: -, lock: 0, len=40

Col 0; len 30; (30):

73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 53 77 69 74 63 68 53 74 61 74

65 6d 65 6e 74

Col 1; len 6; (6): 00 40 f3 25 000a

Row#1 [7953] flag: -, lock: 0, len=39

Col 0; len 29; (29):

73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 54 68 69 73 45 78 70 72 65 73

73 69 6f 6e

Col 1; len 6; (6): 00 40 f 0 74 00 31

Row#2 [7914] flag: -, lock: 0, len=39

Col 0; len 29; (29):

73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 54 68 69 73 45 78 70 72 65 73

73 69 6f 6e

Col 1; len 6; (6): 00 40 f 0 74 00 32

..

..

Row#97 [4727] flag: -, lock: 0, len=35

Col 0; len 25; (25):

79 43 62 43 72 53 75 62 53 61 6d 70 6c 69 6e 67 54 79 70 65 31 37 30 5f 54

Col 1; len 6; (6): 00 40 F1 f 1 000c

Row#98 [4692] flag: -, lock: 0, len=35

Col 0; len 25; (25):

79 43 62 43 72 53 75 62 53 61 6d 70 6c 69 6e 67 54 79 70 65 31 37 30 5f 54

Col 1; len 6; (6): 00 40 f4 a2 00 10

-end of leaf block dump-

Values that are different from branch nodes are resolved as follows:

Kdxlespl indicates the number of uncommitted transactions when the leaf node is split

Kdxlende indicates the number of index entries deleted

Kdxlenxt: the address of the next leaf node of the current leaf node

Kdxlprv: the address of the last leaf node of the current leaf node

Kdxledsz: deleted space

The next part of the dump file is the index entry section. Lock: 0 indicates that the lock information 0 in the ITL indicates that it is not locked; len: indicates the length of the index value; and flag represents the tag, such as the delete tag. Col represents the column number, starting at 0, then comes the key value of the index and the last three parts of the rowid (relative file number, block number, line number): col 0 is the key value, and col 1 is rowid.

In other words, the Leaf node mainly stores the complete index key value and the partial rowid of the relevant index key value (this rowid removes the data object number part), while the leaf node also stores two pointers (DBA), which point to the previous leaf node and the next leaf node respectively. So the leaf node is the structure of the two-way linked list. From the previous description of the architecture of the B-tree index, we can know that it is a tree-like three-dimensional structure. But the arrangement corresponding to the data file is, of course, a flat form, as shown below. Therefore, when oracle needs to access an index block, it is bound to jump over the structure.

/ root / branch / branch / leaf / … / leaves / branches / leaves / leaves / … / leaves / branches / leaves / leaves / … / leaves / branches /.

When oracle needs to get an index block, it starts from the root node, according to the key value to find, so that it knows the branch node of the next layer, and then visits the branch node of the next layer, and again accesses the branch node of the next layer according to the key value, and finally accesses the lowest leaf node. As you can see, when it gets the physical Iamp O block, it is carried out one after another, sequentially. In the process of getting the final physical block, we can't read multiple blocks at the same time, because we don't know which block to access next without getting the current block. Therefore, when a data block is accessed on an index, it corresponds to a db file sequential read wait event, which is rooted in that we jump from one index block to another in order to find the final index block.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report