In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to optimize oracle database". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to optimize oracle database.
The essence of directory SQL optimization
SQL optimized Road Map
2.1 set SQL optimization goals
2.2 check the implementation plan
2.3 check statistics
2.4 check efficient access structure
2.5 check the parameters that affect the optimizer
2.6 problems with SQL statement writing
2.7 poor execution plan due to SQL excellent\ x2F restrictions
SQL optimization case
SQL execution plan acquisition
4.1 how to obtain an accurate implementation plan
4.2 understand the execution sequence of the execution plan
The essence of SQL optimization
In general, there are many ways to optimize SQL to make SQL run faster and SQL faster, such as improving the efficiency of indexes, or parallel queries. You can see the formula inside:
Execution efficiency, or execution time, is directly proportional to the total amount of resources (S) needed to complete a SQL and inversely proportional to the amount of resources (V) that can be accessed per unit time. The greater the S, the lower the efficiency, and the greater the V, the higher the efficiency. For example, through parallel query, you can increase the amount of resources accessed per unit time.
Of course, this is only considered in terms of execution time. SQL optimization must not only reduce the execution time, but also seek a balance between the use of resources and the reduction of execution time. Otherwise, blind parallelism may not improve efficiency, but deplete system resources.
Http\ x3A? It is said that the essence of SQL optimization is: 1, shorten the response time; 2, improve the system throughput; 3, improve the system load capacity. It is necessary to use a variety of means to find a balance between improving system throughput and increasing system load capacity, and improving the efficiency of a single SQL. To minimize the total amount of resources that a SQL needs to access, for example, it is better to use an index, then do not use a full table scan.
Two SQL optimized Road Map
An optimization roadmap for SQL is as follows:
Specific operation steps:
2.1 set SQL optimization goals
Get the SQL to be optimized, establish optimization goals: from AWR, ASH, ORA tools and other active discovery of problematic SQL, users report performance problems DBA intervention, through the understanding of the implementation of SQL, the initial formulation of SQL optimization goals.
2.2 check the implementation plan
Explain tools, sql*plus autotrace, dbms_xplan, 10046, 10053, awrsqrpt.sql, etc. The implementation plan is the core content of our SQL optimization, no plan, no optimization. There are some skills and many ways to implement the plan, and there are differences between them.
2.3 check statistics
ORACLE uses DBMS_STATS package to manage statistics, involving statistics of system statistics, tables, columns, indexes, partitions and other objects. Statistics are the guarantee that SQL can use the correct execution plan. We know that the ORACLE CBO optimizer uses statistics to determine the correct execution path, JOIN, so accurate statistics are the first condition to produce a correct execution plan.
From this diagram, we can see what steps a SQL needs to go through to generate an implementation plan, in my opinion: 1, correct query transformation; 2, accurate statistical information is an important guarantee to generate a correct implementation plan. Of course, BUG, or optimizer limitations, can also cause SQL to be inefficient and unable to produce a correct execution plan.
As shown in the figure:
2.4 check the performance / effectiveness access structure
Important access structures, such as indexes, partitions, etc., can quickly improve the efficiency of SQL execution. The data stored in the table itself, such as too many fragments, serious data skew, and large data storage dispersion, will also affect the efficiency.
2.5 check the parameters that affect the optimizer
2016-02-21 23:17izer_index_cost_adj, optimizer_dynamic sampling, _ optimizer_mjc_enabled, _ optimizer_cost_based_transformation, hash_join_enable and so on have a great influence on the execution plan of SQL. For example, sometimes we improve efficiency by disabling the _ optimizer_mjc_enabled parameter so that the execution plan does not use Cartesian product, because there are many problems with this parameter, so the general production library is required to disable it.
What else can affect the implementation of the plan? Yes, new features, every version of new features, is introduced with good intentions, but in practice, BUG may be triggered. For example, 11g ACS (adaptive cursor sharing), automatic serial direct path (automatic serial direct path reading), extended statistics, SQL query result cache and so on. Some new features can cause problems, so you need to use them with caution.
For example, 11g adaptive cursor sharing, adaptive cursor sharing, was introduced to solve the problem of using bound variables and data tilt values to generate diverse execution plans. Because the purpose of binding variables is to share the execution plan, but the data is skewed, some values require indexes, and some values require full tables, which contradicts the use of bound variables. In the past, it can be solved through the setting of cursor_sharing=similar, but there are a lot of BUG, which will cause the problem of too high version count, or we can write multiple SQL to solve different values (if the values are very few). This is not a good solution. 11g acs was introduced to solve these problems and let these things be done by oracle. But things are contrary to your wishes. In the future, when you encounter a change in the implementation plan, there are fast and slow, you can first check whether the acs has been closed.
Alter system set "_ optimizer_extended_cursor_sharing_rel" = 'NONE'
2.6 problems with SQL statement writing
SQL statements are complex and use unreasonable syntax, such as UNION instead of UNION ALL, which can lead to poor performance. This is not to say that the ORACLE optimizer is so powerful that we can write SQL freely, which is incorrect. SQL is a programming language, it can execute quickly, there are some general rules, follow the characteristics of this programming language, simplify sentences, can write a good program. If there is a problem with the writing of the SQL statement, we need to rewrite it, adjust the business, change the involved, and so on.
2.7 poor execution plan due to SQL optimizer limitations
This is very important, the statistical information is accurate, the SQL is not complex, and the index also has. All satisfied, why my SQL is still poor, then you have to consider the optimizer limitations. Here is a common execution plan limitation, when semi join is used with or (that is, exists (subquery) or. Or in (subquery) or..., if there is a FILTER operator because of OR in the execution plan, you should be aware that the slow factor may be related to OR. At this point we have to rewrite SQL, of course, to UNION or UNION ALL.
OK, after all the above checks, my system is still very poor, the function is still very slow, or it is no longer possible to adjust and improve the performance from the SQL itself, what should I do? Optimize the design, this is the ultimate method. Some things cannot be solved without optimizing the design. For example, after running a bunch of SQL,CPU at the peak of the business, it is already very tight and does not add to it. Suddenly, a business that consumes resources is launched, and other SQL cannot be adjusted. That can only optimize the design, for example, some resource-consuming business can be executed over a different period of time.
The above points are the areas that we need to consider in our optimization and can be checked step by step. Of course, 80 to 90 percent pure SQL performance tuning can be solved by indexing, collecting correct statistics, and rewriting to avoid optimizer limitations.
Three SQL optimization cases
Look at the first one to get the SQL to be optimized. If active optimization, generally from AWR, ASH, etc., to find the poor performance of SQL, and then optimize it.
Looking at a case, 72% of the SQL accounting for CPU comes from the same module, and the first line is a stored procedure. By comparing the green framed SQL below with the first line, mainly through EXECUTION, it is basically judged that the green framed SQL below is in that stored procedure. You can also confirm with the business, OK, the execution frequency of these SQL is very high, because it is a marketing business, if you want to optimize, you have to deal with these SQL.
These SQL, the buffer gets of a single SQL is only more than 1000 points, the efficiency is still very high, but because the implementation is too frequent, it consumes a lot of resources, so, we have to check, can it be better?
Take Article 1 SQL:58q183atbusat as an example:
SELECT B.ACT_ID
B.ACT_NAME
B.TASK_ID
B.MKT_DICTION
B.CUST_GROUP_ID
NVL (B.ATTEST_FLAG,'N')
NVL (B.DOUWIN_FLAG,'N')
B.CHN_DESC
NVL (B.SIGN_FLAG,'N')
B.MAX_EXECUTE_NUM
FROM (SELECT DISTINCT (ACT_ID))
FROM MK_RULECHN_REL
WHERE CHN_STATUS = '04'
AND CHN_TYPE =: B1) A
TABLE (CAST (: B2 AS TYPE_MK_ACTIONINFO_TABLE)) B
WHERE A.ACT_ID = B.ACT_ID
SQL is actually very simple, a query to build a table A, a TABLE function to build table B association. I don't know if you are familiar with this TABLE function. That is, to convert a collection into a table, which is something in PL/SQL.
That collection part is the TABLE function. The following table shows a full table scan:
Step by step check, can not find the problem, but know, may be caused by HASH JOIN full table scan problem, whether it is better to take NESTED LOOPS+INDEX, obviously, to check how many rows the TABLE function probably returns.
After confirmation, a maximum of 200-300 rows will be returned, and the final result set is only a few hundred rows.
So guess, the problem is that the TABLE function, without HASH JOIN, the above execution plan, the TABLE function part, ROWS is empty.
To check alone: return 8168 rows and return more than 8000 rows, which is enough to lead to HASH JOIN. In fact, the most we can return is 200-300 lines:
So the row returned by each step is an important factor in the choice of JOIN mode, you can Google it, the TABLE function returns 8168 is a fixed value, block_size=8K is so large, it can be said that this is a limitation of ORACLE.
As long as you use the TABLE function, you prefer to use HASH JOIN.
If http://www.oracle-developer.net/display.php?id=427 is interested, you can see the content of this link.
There are many solutions, that is, we have to go to NESTED LOOPS+index, since 8168 is very large, so let the optimizer know that the TABLE function returns fewer rows, only about 100 rows.
The following can be done, of course, you can also use hint:use_nl and so on.
CARDINALITY hint (9i +) undocumented
OPT_ESTIMATE hint (10g +) undocumented
DYNAMIC_SAMPLING hint (11.1.0.7 +)
Extensible Optimiser (10g +).
Because the SELECT part of SQL only accesses B and all comes from the TABLE function, it can be rewritten as a subquery. Using a subquery, distinct is naturally unnecessary, because it is a semi join (semi-join).
Finally rewrite the use of cardinality hint to let the optimizer know that B returns only 100 rows, you give me NESTED LOOPS+INDEX, and then solve.
The original sql:
Modified sql:
Increased efficiency by dozens of times:
An application that accounts for 72%, after we have improved dozens of times, it is obviously excellent for the performance of the system. In the end, with a 50 per cent increase in the number of executions, w4sd08pa host CPU usage decreased from 47 per cent at its peak to 23 per cent on average.
This problem can be solved in two aspects:
1, guess and test the limits of the optimizer (the table function always returns rows 8168); 2, the actual returned rows 200-300. Both are indispensable. If the actual rows returned are tens of thousands, then you can't achieve good results simply by optimizing SQL.
Scan the QR code at the end of the article and follow the DBA+ community Wechat official account (dbaplus). You can download the DBA+ Community Technology Salon, OOW Conference, 2015GOPS, DCon2015 and other technical events PPT.
Four SQL execution plan acquisition
The execution plan is the core of SQL tuning, and the above SQL also starts by seeing that there may be problems with HASH JOIN.
So first of all, we need to solve two problems:
1. How to get the execution plan I want (accurate plan)
2. How to understand and find out the problems in the implementation plan.
4.1 how to obtain an accurate implementation plan
Get how SQL executes the plan:
EXPLAIN PLAN
Estimate
Ignore binding variables
Non-execution
SQL_TRACE
Real plan, need to be parsed with TKPROF tool
You can get the value of the bound variable
EVENT 10053
Real plan
Study the causes of the implementation plan
AUTOTRACE
Internal use of EXPLAIN PLAN
DBMS_XPLAN
Dbms_xplan.display_cursor
Dbms_xplan.display_awr
Real plan
OTHERS
Such as awrsqrpt, sqlt, pl/sql, sql developer, toad, etc.
How do people usually get the implementation plan? I generally use more dbms_xplan.display_cursor, the advantages are obvious: 1, get the real implementation of the plan; 2, a variety of parameters. You can also get the value of the bound variable for easy validation.
10053 is to check the behavior of the optimizer. I really don't understand why we can take a look at that plan and use it less.
10046 you can check the contents of some waiting events, or you can get bound variables, which are generally used less.
Set autotrace traceonly or explain, whose execution plans are from the same source, remember, are all from plan_table, are estimated, may not be true execution plans, and may not be accurate.
So, if you don't see it right, you have to question its accuracy. The advantage of autotrace traceonly is that you can see consistent reads, physical reads, return lines, etc., which is true. Because consistent reading and physical reading can be used to verify the optimization effect.
Others, such as awrsqrpt, can get the execution plan, but I seldom use it, especially the tool plsq developer, F5 looks at the plan, I almost don't use it, it is also the estimation plan in plan table. If it's long, it can't be analyzed.
I suggest you look at the real plan, to say one thing, I often go through alter session set statistics_level=all or gather_plan_statistics hint, and then execute sql, and then pass
Select * from table (dbms_xplan.display_cursor (null,null,'allstats last')); view the information actually executed
The benefit is obvious, you can see the E-ROWS (estimated line), A-ROWS (real line), STARTS,BUFFER GETS,A-TIME (real execution time) and other information of each step of the execution plan. By comparing the gap between the estimated and the real, we can determine which table statistics may be problematic and whether the execution plan is wrong, saving us the need to calculate how many rows are returned according to the predicate.
Note that if a SQL is executed for a long time, we can stop it by looking at the plan in the above way. For example, if we execute a SQL that we don't play for 2 hours, I usually have no patience. I will terminate in 5 minutes at the most. After termination, the execution information can also be seen through display_cursor.
For example, if a step is executed 1 million times, it will take me 3 hours to complete the SQL. I have executed it 100 times in 5 minutes. What I want to see when I terminate the SQL is a proportional situation. We can judge by this ratio which step takes the longest time, where there is a problem, and then solve it.
The optimizer has many limitations, such as the fixed return of 8168 for the TABLE function, or algorithmic restrictions. Many are not correct, and if the algorithm is very different from the real one, it may lead to problems. Sometimes statistical information can not be collected accurately, such as histograms, there are a lot of problems, so there are several more histograms of 12c. There were only equal height and equal frequency histograms before.
The set statistics_level just wrote directly will output the result, we can ask him not to output the result:
1. Put the sql content in the file, preceded by set termout off (so that the output result can not be output)
2. Then in the display_cursor file
Using this kind of thing to look at the execution plan, it is sometimes very convenient to find out the problem, otherwise we have to manually write SQL to calculate the actual returned rows according to the corresponding predicate of each step, and then compare it. With this, ORACLE helps us do it all.
4.2 understand the execution sequence of the execution plan
What do you usually think of the implementation plan?
COPY, go to UE.
Use the cursor method to find the entrance, the first to execute, the cursor to locate the ID=0, and then indent all the way down, if blocked, then this part is the entrance.
For example, the continued indexing of ID=10 is blocked by ID=11, so step 10 is the entry.
After finding the entrance, reverse the cursor, using the parallel level of the first execution, the right first execution principle, to see the relationship between the parent operation and the child operation, and move the cursor.
For example, in step 13 here, I just need to position the cursor in front of the P of PARTITION, then move up, and immediately know that its driver table is ID=5 's VIEW, because they are aligned.
Then check to see if there is a problem with the JOIN relationship, the returned row estimate, and so on.
Do you know that there is an exception to the right-most-top-first implementation rule of the implementation plan? It is incorrect to pass the above rules.
(scalar quantum query)
SELECT a.employee_id
A.department_id
(SELECT COUNT (*) FROM emp_b b
WHERE a.department_id=b.department_id
) cnt
FROM emp_a a
For example, this ID=2 is in the front, but it is actually driven by ID=3, that is, driven by emp_a, which violates the general order rules of the execution plan. Just pay attention to it at ordinary times. Binding variables appear in the scalar quantum query predicate, such as: B1 here, because each time you take a value to drive the subquery.
Figure out how to do the implementation plan, so what are you looking at at the implementation plan?
1. The way you look at JOIN
2. look at the access mode of the table, walk the whole table, walk the index.
3. See if there are any operations that often affect performance, such as FILTER
4. Look at the gap between cardinality (rows) and reality.
Don't pay too much attention to the fact that COST,COST is estimated. Big is not necessarily slow, small is not necessarily fast. Of course, for example, COST is very small, rows returns are very small, very slow. In that case, we may have to consider whether the statistics are too old.
Statistics are important, just one example:
With the index, COST is small and everything is perfect, but AWR actually accounts for 80% of the resources. What's the general situation? Purely from the SQL point of view, that is, the implementation of the plan is estimated to be wrong, their own test, very slow. That is, the COST is very small, the ROWS is very small, the index is very small, the perfect plan is wrong, then it is obvious that it is basically caused by statistics.
The actual step 4 is to take the sendtime index, which should return 1689393 rows, but the execution plan estimate returns 1 row, and the statistics are not accurate. Check again that the statistics collection date is before May.
SQL > SELECT COUNT (1) FROM MSP.T_MS_MEDIA_TASK WHERE SENDTIME > = TRUNC (SYSDATE,'dd') AND MONTHDAY = TO_CHAR (SYSDATE,'mmdd')
? COUNT (1)
-
?? 1689393
Collect statistics and for all columns size repeat keeps the original histogram information
? exec DBMS_STATS.GATHER_TABLE_STATS (ownname= > 'MSP',tabname= >' Thousand MSsignals MEDIABJTASKTASKTASKTASKTASKTASKTASKTASKTASKTASKTASKTASKTASKMethods > 'for all columns size repeat', no_invalidate= > false,cascade= > true,degree = > 10)
1.68 million rows are returned, but the existing statistics make cbo think it is 1, which is too different.
Method_opt= > 'for all columns size repeat', here, update statistics, preferably using for all columns size repeat...
What are the benefits of repeat, such as listing histograms, will be reserved for you, column no statistics will be collected according to for all columns size 1. You can collect the rest as you used to.
You use a for all columns size 1 or size skewonly, or do not write (auto) may change the original way of collecting statistics, may affect the efficiency of SQL implementation.
Efficient access structure makes SQL faster, not to mention, mainly for indexing. How to build an index is also a very complex problem, to say one thing, the general composite index, the high frequency of equivalent query conditions, as a leading column is better. Because direct access may be more efficient than SYSDATE
AND (EXISTS (SELECT 1)
FROM DBPRODADM.pd_prc_dict c
WHERE a.element_idb = c.prod_prcid
AND c.prod_prc_type ='1')
OR a.element_idb ='X')
AND a.relation_type = '10'
When OR and semi join are put together, it will trigger the problem that the subquery unnest cannot be carried out, that is, it may generate FILTER, which causes the SQL to be very slow, sometimes even for a few days or even dozens of days.
Steps 5 and 6 have been carried out more than 920000 times, which must be slow. The problem is that there is a FILTER.
FILTER is similar to a loop and exists in an unnest subquery, similar to a scalar quantum query, and there are also things that bind variables in predicates.
Their only advantage is to build HASH tables internally. If there are a large number of matching duplicate values, then the number of probes is small and the efficiency is good, but most of the time, if there are not many duplicate values, then it will be a disaster.
The limitations of this optimizer generally have to be rewritten, because the SQL structure determines that it is impossible to execute the plan efficiently. Because although I left here, but the number of execution is too much, if the number of execution is small, it does not matter.
Rewritten sql:
SELECT ELEMENT_TYPEA
ELEMENT_IDA
ELEMENT_TYPEB
ELEMENT_IDB
RELATION_TYPE
EFF_RULE_ID
EXP_RULE_ID
CTRL_CODE
EFF_DATE
EXP_DATE
GROUP_ID
BASE_TIME_ TYPE
POWER_RIGHT
POSITIVE_TYPE
BOTHWAY_FLAG
FROM DBPRODADM.pd_prc_rel a
WHERE EXISTS (SELECT 1
FROM DBPRODADM.pd_prc_dict b
WHERE a.element_ida = b.prod_prcid
AND b.prod_prc_type ='1')
AND a.exp_date > SYSDATE
AND (EXISTS (SELECT 1)
FROM DBPRODADM.pd_prc_dict c
WHERE a.element_idb = c.prod_prcid
AND c.prod_prc_type ='1')
OR a.element_idb ='X')
AND a.relation_type = '10'
Obviously, the condition here is exists or. Then rewrite it with UNION or UNION ALL. To avoid duplicate lines, use UNION.
Select ELEMENT_TYPEA,ELEMENT_IDA,ELEMENT_TYPEB,ELEMENT_IDB,RELATION_TYPE
, EFF_RULE_ID,EXP_RULE_ID,CTRL_CODE,EFF_DATE,EXP_DATE,GROUP_ID,BASE_TIME_TYPE
POWER_RIGHT,POSITIVE_TYPE,BOTHWAY_FLAG
From DBPRODADM.pd_prc_rel a
Where exists
(select 1
From DBPRODADM.pd_prc_dict b
Where a.element_ida = b.prod_prcid
And b.prod_prc_type ='1')
And a.exp_date > sysdate
And exists (select 1
From DBPRODADM.pd_prc_dict c
Where a.element_idb = c.prod_prcid
And c.prod_prc_type ='1')
And a.relation_type = '10'
Union
Select ELEMENT_TYPEA,ELEMENT_IDA,ELEMENT_TYPEB,ELEMENT_IDB,RELATION_TYPE
, EFF_RULE_ID,EXP_RULE_ID,CTRL_CODE,EFF_DATE,EXP_DATE,GROUP_ID,BASE_TIME_TYPE
POWER_RIGHT,POSITIVE_TYPE,BOTHWAY_FLAG
From DBPRODADM.pd_prc_rel a
Where exists
(select 1
From DBPRODADM.pd_prc_dict b
Where a.element_ida = b.prod_prcid
And b.prod_prc_type ='1')
And a.exp_date > sysdate
And a.element_idb ='X'
And a.relation_type = '10'
Both branches have a HASH JOIN,starts of 1, although all of them are full table scans, but the execution efficiency has improved significantly. Is there any room for optimization after the execution time is from 12s to 7s and from 222w to 4.5w?
Special logic reading is much less. Follow-up optimization:
1) if UNION is used for rewriting, can it be changed to UNION ALL to avoid sorting?
2) with so many full table scans, can some of them be indexed? Of course, this can be done, but it's not the main job anymore. This case tells us that the optimizer has many limitations, not everything.
In addition to the correct statistical information, good SQL structure, can enable SQL to correctly query transformation, the correct access structure, such as indexes, etc. All are prerequisites for SQL to execute efficiently. Complicated! Inefficient, simple! = efficient. Let the optimizer understand, and have the appropriate access structure support, is the king!
Simple SQL is not the guarantee of fast, complex is not necessarily slow, efficient execution plan is the most important, index optimization SQL, the most important thing is to make bad execution plan better.
That is to start from many aspects, and finally achieve our optimization goal.
At this point, I believe you have a deeper understanding of "how to optimize the oracle database". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.