In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
For this article, refer to realkid4's blog:
Http://blog.itpub.net/17203031/viewspace-681311/
For more information about B-Tree, Bitmap, and functional indexing, please refer to another blog post:
Summary of use cases of B-Tree, Bitmap and functional indexing in Oracle
Generally speaking, the data columns selected by the index are best with high dispersion and good selectivity. From the point of view of the index tree structure, the column values are distributed in the position of the leaf node. In this way, the number of leaf nodes obtained by tree structure search is more efficient.
In practice, we often encounter the skew of data column values. That is, the value of the entire column data is limited. However, most of the data values are concentrated in a few values, and the proportion of other values is very small. For example, a data column has values of "N", "B", "M", "P" and "Q", of which 55% of the data rows are "N", 40% of the data rows are "B", and the remaining values are distributed in 5% of the data rows. There are some problems with indexing data columns of this structure.
First of all, the default database is to index all column values (non-null). This means that both high-frequency and low-frequency values will appear on the leaf nodes of the index structure. Of course, most of these leaf nodes are these duplicate values.
Secondly, under the action of CBO (cost-based optimizer), the search for high frequency values generally does not choose the index as the search path, because full table scanning may be more efficient. We have indexed the data columns, but queries for high-frequency words never go to the index path (as the following experiment will prove).
Finally, the established index consumes a lot of space and time. The established index covers all values, and the space occupied by massive data tables is bound to be large. At the same time, in the low-frequency data query, although the index path will be taken, but the logical and physical reading also has some loss.
Let's introduce a solution
Idea: since the high frequency value will not go to the index path when querying, you can consider removing it from the process of building the index and establishing the index structure only for those low frequency data values. In this way, the index tree structure is relatively small, and the efficiency of index query can be improved.
The specific method is to use the decode function. Decode (a ~ ~ b ~ c ~ ~ d ~ e … F) meaning: if aquib, return c, equal to d, return e, and finally return f if there is no match. For the above example, you can use decode (column name, 'column name, null,'B',null, column name), which means that if the column value is N or B, set it directly to null, otherwise the column value will be returned, and the function index will be established.
This is done with the help of two functions of Oracle: 1, do not generate an index on the null value; 2, functional index
The following experiments are used to prove this method:
1. Construct the experimental test environment.
-- create a test data table tb_wjq
SEIANG@seiang11g > create table tb_wjq as select * from dba_objects where owner in ('SEIANG','PUBLIC','HR','SYSMAN','XDB','BI','SYS')
Table created.
-- use scripts to insert large amounts of data
Begin
For i in 1..8 loop
Insert / * + append * / into tb_wjq select * from tb_wjq
Commit
End loop
End
/
SEIANG@seiang11g > select count (*) from tb_wjq
COUNT (*)
-
9804160
Elapsed: 00:00:01.54
The distribution of the amount of data used in the experiment
SEIANG@seiang11g > select owner,count (*) from tb_wjq group by owner
OWNER COUNT (*)
SEIANG 3072
PUBLIC 4352256
HR 4352
SYSMAN 454912
XDB 149760
BI 1024
SYS 4838784
7 rows selected.
Elapsed: 00:00:02.44
It can be seen that most of the more than 90, 000 pieces of data are concentrated on PUBLIC, SYSMAN and SYS, and the frequency of other data is relatively small. The tilting trend of the data is obvious.
2. Build an index
-- build general and functional indexes on owner columns, respectively.
SEIANG@seiang11g > create index idx_tb_wjq_owner_normal on tb_wjq (owner)
Index created.
Elapsed: 00:00:24.72
SEIANG@seiang11g > create index func_idx_tb_wjq_owner on tb_wjq (decode (owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner))
Index created.
Elapsed: 00:00:12.34
Index func_idx_tb_wjq_owner converts PUBLIC, SYSMAN, Sys values into null, excluding the process of indexing. It can be seen not only from the time taken to create the index above, but also from the index segment information below, the space difference between the two indexes is relatively large, which also proves this point.
SEIANG@seiang11g > select owner,segment_name,segment_type,bytes/1024/1024,blocks,extents from dba_segments where segment_name='IDX_TB_WJQ_OWNER_NORMAL'
OWNER SEGMENT_NAME SEGMENT_TYPE BYTES/1024/1024 BLOCKS EXTENTS
SEIANG IDX_TB_WJQ_OWNER_NORMAL INDEX 184 23552 94
Elapsed: 00:00:00.01
SEIANG@seiang11g > select owner,segment_name,segment_type,bytes/1024/1024,blocks,extents from dba_segments where segment_name='FUNC_IDX_TB_WJQ_OWNER'
OWNER SEGMENT_NAME SEGMENT_TYPE BYTES/1024/1024 BLOCKS EXTENTS
SEIANG FUNC_IDX_TB_WJQ_OWNER INDEX 3 384 18
As can be seen from above, a data column is also indexed. The common index type IDX_TB_WJQ_OWNER_NORMAL occupies 94 extents and 23552 blocks, accounting for about 184m of space. However, the space of the functional index FUNC_IDX_TB_WJQ_OWNER only uses the initial allocation of 18 regions, 384 data blocks, accounting for about 3m of space. As a result, the space advantage is intuitively reflected!
-- collecting statistical data. Because of the experimental nature and large amount of data, high sampling rate is used to collect statistical information.
SEIANG@seiang11g > exec dbms_stats.gather_table_stats ('SEIANG',' TB_WJQ', cascade = > true, estimate_percent = > 100pr methodological opt = > 'for all indexed columns')
PL/SQL procedure successfully completed.
Elapsed: 00:00:49.67
3. Analysis of retrieval efficiency
The XDB value of 149760 of owner data is analyzed.
-- Direct index search:
SEIANG@seiang11g > select * from tb_wjq where owner='XDB'
149760 rows selected.
Elapsed: 00:00:01.89
Execution Plan
Plan hash value: 3735191644
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 149k | 13m | 4708 (1) | 00:00:57 |
| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 149k | 13m | 4708 (1) | 00:00:57 |
| | * 2 | INDEX RANGE SCAN | IDX_TB_WJQ_OWNER_NORMAL | 149k | | 348 (1) | 00:00:05 |
-
Predicate Information (identified by operation id):
2-access ("OWNER" = 'XDB')
Statistics
8 recursive calls
0 db block gets
26000 consistent gets
6349 physical reads
0 redo size
16482673 bytes sent via SQL*Net to client
110336 bytes received via SQL*Net from client
9985 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
149760 rows processed
It is found that when BI is used as the search value, the index search is carried out. The following is a comparison with a functional index search.
SEIANG@seiang11g > select * from tb_wjq where decode (owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner) = 'XDB'
149760 rows selected.
Elapsed: 00:00:01.54
Execution Plan
Plan hash value: 3652333940
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 149k | 13m | 7091 (1) | 00:01:26 |
| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 149k | 13m | 7091 (1) | 00:01:26 |
| | * 2 | INDEX RANGE SCAN | FUNC_IDX_TB_WJQ_OWNER | 149k | | 316 (1) | 00:00:04 |
-
Predicate Information (identified by operation id):
2-access (DECODE ("OWNER", 'PUBLIC',NULL,'SYSMAN',NULL,'SYS',NULL, "OWNER") =' XDB')
Statistics
0 recursive calls
0 db block gets
25998 consistent gets
0 physical reads
0 redo size
9017261 bytes sent via SQL*Net to client
110336 bytes received via SQL*Net from client
9985 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
149760 rows processed
After comparison, we can find that the method of using functional index has some differences in execution time, physical logic reading, and CPU usage.
General index
Functional index
Execution time
00: 00: 01.89
00: 00: 01.54
CPU usage
4708
7091
Consistent gets
26000
25998
Physical reads
6349
0
Conclusion: the functional index processing bias method is used to optimize the query efficiency and index structure in a certain length. The data in the above table shows that the consumption of logical physical reads is greatly reduced (the index structure is simplified), and the execution time is reduced in turn. Because the use of functions is calculated, CPU usage is relatively high, within an acceptable range.
However, this method has some limitations and must be carefully planned before application.
First of all, the data table data should be large. Because after all, the establishment and search of the functional index consumes more than the ordinary index, if the data table is small, the degree of optimization can not make up for the cost of consumption, the loss may outweigh the gain. In a series of experiments, the author also found that this performance advantage can not be highlighted when the amount of data is medium to small.
Secondly, the tilt trend of column value is obvious. Through the discussion at the beginning, it is not difficult to find that the higher the tilt of the column value, the larger the amount of data removed by the functional index, and the smaller and more optimized the generated index tree structure. This is the core of this method!
Finally, when using the functional index to search, the higher the frequency of the search, the better the optimization effect. In this case, the column with the value XDB has 149760 rows, and you can see the obvious performance optimization. However, when we select the BI value with 1024 pieces of data, this optimization trend can be seen, but the degree is significantly reduced (the experimental results are shown below). The reason here may be that the amount of data is small, and the difference between the logical and physical readings of the two methods is reduced.
-- Direct index
SEIANG@seiang11g > select * from tb_wjq where owner='BI'
1024 rows selected.
Elapsed: 00:00:00.03
Execution Plan
Plan hash value: 3735191644
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 1024 | 98K | 35 (0) | 00:00:01 |
| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 1024 | 98K | 35 (0) | 00:00:01 |
| | * 2 | INDEX RANGE SCAN | IDX_TB_WJQ_OWNER_NORMAL | 1024 | | 5 (0) | 00:00:01 |
-
Predicate Information (identified by operation id):
2-access ("OWNER" = 'BI')
Statistics
0 recursive calls
0 db block gets
280 consistent gets
151 physical reads
0 redo size
98579 bytes sent via SQL*Net to client
1271 bytes received via SQL*Net from client
70 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1024 rows processed
-- functional index
SEIANG@seiang11g > select * from tb_wjq where decode (owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner) = 'BI'
1024 rows selected.
Elapsed: 00:00:00.01
Execution Plan
Plan hash value: 3652333940
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 1024 | 98K | 50 (0) | 00:00:01 |
| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 1024 | 98K | 50 (0) | 00:00:01 |
| | * 2 | INDEX RANGE SCAN | FUNC_IDX_TB_WJQ_OWNER | 1024 | | 3 (0) | 00:00:01 |
-
Predicate Information (identified by operation id):
2-access (DECODE ("OWNER", 'PUBLIC',NULL,'SYSMAN',NULL,'SYS',NULL, "OWNER") =' BI')
Statistics
0 recursive calls
0 db block gets
279 consistent gets
3 physical reads
0 redo size
33969 bytes sent via SQL*Net to client
1271 bytes received via SQL*Net from client
70 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1024 rows processed
However, if you are querying SYS, PUBLIC, or SYSMAN, all indexes are skipped and the diameter is scanned throughout the table.
SEIANG@seiang11g > select * from tb_wjq where owner='SYS'
4838784 rows selected.
Elapsed: 00:00:45.85
Execution Plan
Plan hash value: 1501781665
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
| | 0 | SELECT STATEMENT | | 4838K | 452m | 39893 (1) | 00:07:59 |
| | * 1 | TABLE ACCESS FULL | TB_WJQ | 4838K | 452m | 39893 (1) | 00:07:59 |
Predicate Information (identified by operation id):
1-filter ("OWNER" = 'SYS')
Statistics
8 recursive calls
0 db block gets
457638 consistent gets
139684 physical reads
0 redo size
255169095 bytes sent via SQL*Net to client
3548958 bytes received via SQL*Net from client
322587 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4838784 rows processed
Author: SEian.G (hard practice changes in 72, but it is difficult to laugh at 81)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.