In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to optimize SQL query statements", the content of the explanation is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "how to optimize SQL query statements" bar!
1. First of all, we need to understand what is meant by implementing the plan.
The execution plan is a query scheme made by the database based on SQL statements and statistics of related tables, which is automatically analyzed by the query optimizer. For example, if a SQL statement is used to look up a record from a table of 100000 records, the query optimizer will choose the "index lookup" mode. If the table is archived and there are only 5000 records left, the query optimizer will change the scheme. The "full table scan" mode is adopted.
It can be seen that the execution plan is not fixed, it is "personalized". It is important to produce a correct "execution plan":
(1) does the SQL statement clearly tell the query optimizer what it wants?
(2) is the database statistics obtained by the query optimizer * * and correct?
2. The writing of unified SQL sentences.
The programmer thinks it is the same for the following two SQL statements, and the database query optimizer thinks it is different.
Select*from dual select*From dual
In fact, because of the difference in case, the query analyzer thinks that they are two different SQL statements that must be parsed twice. Generate 2 execution plans. So as a programmer, you should make sure that the same query statements are consistent everywhere, not even one more space!
3. Don't make the SQL statement too complicated
I often see a SQL statement captured from the database printed as long as two sheets of A4 paper. Generally speaking, such a complex sentence is usually problematic. I took the 2-page SQL sentence to consult the original author, and he said that it was too long for him to understand it for a while. It is conceivable that even the original author may be confused about the SQL statement, and the database will also be confused.
In general, it is common to take the results of an Select statement as a subset and then query from that subset, but according to experience, more than three layers of nesting, the query optimizer is easy to give the wrong execution plan. Because it was fainted. Things like artificial intelligence, after all, are worse than people's resolution. If people are dizzy, I can guarantee that the database will also be dizzy.
In addition, execution plans can be reused, and the simpler the SQL statement, the more likely it is to be reused. Complex SQL statements have to be reparsed as long as one character changes, and then this pile of garbage is stuffed into memory. You can imagine how inefficient the database will be.
4. Use "temporary table" to temporarily store intermediate results
An important way to simplify SQL statements is to use temporary tables to temporarily store intermediate results, but the benefits of temporary tables are far more than these. Temporary results are temporarily stored in temporary tables, and the subsequent query is in tempdb, which can avoid scanning the main table many times in the program, and greatly reduce the "shared lock" blocking "update lock" in program execution, reducing blocking and improving concurrent performance.
5. Binding variables must be used in OLTP system SQL statements.
Select*from orderheader where changetime > '2010-10-20 00 01' select*from orderheader where changetime >' 2010-09-22 00 VR 01'
The above two statements, which the query optimizer thinks are different SQL statements, need to be parsed twice. If you use binding variables
Select*from orderheader where changetime > @ chgtime
The @ chgtime variable can pass any value so that a large number of similar queries can reuse the execution plan, which greatly reduces the burden on the database to parse the SQL statement. One parsing and multiple reuse is the principle to improve the efficiency of the database.
6. Peep of binding variables
There are two sides to everything, and binding variables are applicable to most OLTP processing, but there are exceptions. For example, when the field in the where condition is a "skewed field".
"tilted field" means that most of the values in the column are the same, such as a population questionnaire, in which more than 90% of the "nationality" column is Han nationality. So if a SQL statement is to query the population of the Han nationality at the age of 30, then the column of "nationality" must be included in the where condition. There will be a big problem if you use the binding variable @ nation at this time.
Imagine that if the * * values passed in by @ nation are "Han", then the entire execution plan must choose table scans. Then, the second value passed in is the "Buyi family", in theory, the "Buyi family" may only account for 1/10000, and should be looked up by index. However, due to the reuse of the "Han" execution plan of * * parsing, the table scan method will also be used for the second time. This problem is known as "binding variable snooping", and it is recommended that you do not use binding variables for "skewed fields".
7. Use begin tran only when necessary
A SQL statement in SQL Server is a transaction by default, and it is also the default commit after the execution of the statement. In fact, this is a minimized form of begin tran, like implying a begin tran at the beginning of each sentence and an commit at the end.
In some cases, we need to explicitly declare begin tran, for example, to do "insert, delete, change" operations, you need to modify several tables at the same time, requiring either several tables to be modified successfully or not. Begin tran can play such a role, it can execute several SQL statements together, and then commit together. The advantage is that the data is consistent, but nothing is perfect. The price paid by Begin tran is that all resources locked by SQL statements cannot be released until commit is dropped before committing.
It can be seen that if Begin tran traps too many SQL statements, the performance of the database will be poor. Before the big transaction commits, other statements are bound to be blocked, resulting in a lot of block.
The principle used by Begin tran is that under the premise of ensuring data consistency, the fewer SQL statements trapped by begin tran, the better! In some cases, triggers can be used to synchronize data, not necessarily begin tran.
8. Nolock should be added to some SQL queries
Adding nolock to the SQL statement is an important means to improve the concurrency performance of SQL Server, but it is not necessary to do so in oracle, because the structure of oracle is more reasonable, and there is a undo table space to keep the "data foreground". If the data is not commit in the modification, then what you read is the copy before it was modified, which is placed in the undo table space. In this way, the reading and writing of oracle can not affect each other, which is also widely praised by oracle. The reading and writing of SQL Server will block each other. In order to improve the concurrency performance, nolock can be added to some queries, so that writing is allowed when reading, but the disadvantage is that uncommitted dirty data may be read. There are three principles for using nolock.
(1) nolock cannot be added if the query result is used for "insert, delete, modify".
(2) the query table belongs to frequent page splits, so use nolock carefully!
(3) using temporary tables can also save "data foreground", which is similar to the function of undo table space of oracle.
Do not use nolock if you can use temporary tables to improve concurrency performance.
9. The clustered index is not built on the order field of the table, and the table is prone to page splitting.
For example, if the order table has order number orderid and customer number contactid, which field should the clustered index be added to? For this table, the order number is added sequentially, and if a clustered index is added on the orderid, the new rows are added at the end, so it is not easy to generate page splits. However, since most queries are based on customer numbers, it makes sense to add a clustered index to the contactid. Contactid is not a sequential field for an order table.
For example, if the "contactid" of "Zhang San" is 001, then the order information of "Zhang San" must be placed on the * page of the table. If "Zhang San" places a new order today, the order information cannot be placed on the * * page of the table, but on the * page! What if the page is full? I'm sorry, all the data in the table has to be moved back to make room for this record.
The index of SQL Server is different from that of Oracle. The clustered index of SQL Server actually sorts the table according to the order of the clustered index fields, which is equivalent to the index of oracle. SQL Server's clustered index is a form of organization of the table itself, so it is very efficient. Also because of this, insert a record, its position is not randomly placed, but in order to put on the data page that should be placed, if that data page has no space, it will cause the page to split. So it is obvious that the clustered index is not built on the order fields of the table, and the table is prone to page splitting.
Once encountered a situation, a buddy of a table after re-indexing, the efficiency of the insert dropped significantly. It is estimated that the situation is like this. The clustered index of the table may not be built on the order field of the table, and the table is often archived, so the data of the table exists in a sparse state. For example, if Zhang San has placed 20 orders, but only 5 orders in the last 3 months, and the archiving strategy is to retain the data for 3 months, then the past 15 orders of Zhang San have been archived, leaving 15 vacancies, which can be reused when insert occurs. In this case, because there is space available, the page split will not occur. However, query performance is low because the query must scan for spaces that do not have data.
The situation changed after rebuilding the clustered index, because rebuilding the clustered index is to rearrange the data in the table, the original empty space is gone, and the filling rate of the page is very high, and page splitting often occurs when inserting data, so the performance is greatly degraded.
Do you want to give a low page fill ratio for tables where the clustered index is not built on sequential fields? Do you want to avoid rebuilding the clustered index? Is a question worth considering!
10. Query tables with frequent page splits after adding nolock, which is easy to skip or repeat reads.
After adding nolock, you can query "insert, delete, change" at the same time, but because "insert, delete, change" occur at the same time, in some cases, once the data page is full, the page split is inevitable, and the nolock query is happening at this time. For example, the records that have been read on page 100 may be divided to page 101 because of the page split, which may cause the nolock query to read the data repeatedly when reading page 101. Produce "repeat reading". By the same token, if the data on page 100 is divided into 99 pages before it is read, the nolock query may miss the record, resulting in a "skip read".
The above mentioned buddy, after adding nolock, some operations reported errors, it is estimated that it is possible because the nolock query produced a repeat read, 2 identical records to insert other tables, of course, there will be primary key conflicts.
11. Attention should be paid when using like for fuzzy query
Sometimes you need to make some fuzzy queries, such as
Select*from contact where username like'% yue%'
Keyword% yue%, due to the use of "%" in front of yue, the query must take a full table scan, do not add% in front of keywords unless necessary
12. the influence of implicit conversion of data types on query efficiency
Sql server2000's database, our program does not use strong typing to submit the value of this field when submitting the sql statement. Sql server2000 automatically converts the data type, which will cause the input parameters to be inconsistent with the primary key field type. At this time, sql server2000 may use a full table scan. This problem is not found on Sql2005, but it should be noted.
13. Three ways of joining SQL Server tables
(1) Merge Join
(2) Nested Loop Join
(3) Hash Join
SQL Server 2000 has only one join method-Nested Loop Join. If the A result set is small, it defaults to the appearance. Each record in An is scanned once in B, and the actual number of rows scanned is equal to the number of A result set rows x B result set rows. So if both result sets are large, the result of Join is bad.
SQL Server 2005 added Merge Join. If the join fields of table An and table B happen to be the fields where the clustered index is located, then the order of the table has been arranged, as long as the two sides are put together. The cost of join is equal to the number of result set rows of table A plus the number of result set rows of table B, one is addition, the other is multiplication, it can be seen that the effect of merge join is much better than Nested Loop Join.
If there is no index on the joined field, then the efficiency of SQL2000 is quite inefficient, and SQL2005 provides Hash join, which is equivalent to temporarily indexing the result set of table B, so SQL2005 is much more efficient than SQL2000, which I think is an important reason.
To sum up, pay attention to the following points when joining tables:
(1) try to select the field where the clustered index is located in the connection field.
(2) carefully consider the where condition and minimize the result set of tables An and B.
(3) if many join connection fields lack an index, and you are still using SQL Server 2000, upgrade.
Thank you for your reading, the above is the content of "how to optimize SQL query statements", after the study of this article, I believe you have a deeper understanding of how to optimize SQL query statements, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.