In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you how to write a high-performance sql, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
First of all, we have to figure out what is meant by carrying out the plan.
The execution plan is a query scheme made by the database based on SQL statements and statistics of related tables, which is automatically analyzed by the query optimizer. For example, if a SQL statement is used to look up a record from a table of 100000 records, the query optimizer will choose the "index lookup" mode. If the table is archived and there are only 5000 records left, the query optimizer will change the scheme. The "full table scan" mode is adopted.
It can be seen that the execution plan is not fixed, it is "personalized". It is important to produce a correct "execution plan":
Does the SQL statement clearly tell the query optimizer what it wants to do?
Is the database statistics obtained by the query optimizer up-to-date and correct?
The writing of unified SQL sentence
The programmer thinks it is the same for the following two SQL statements, and the database query optimizer thinks it is different.
Select*from dual select*From dual
In fact, because of the difference in case, the query analyzer thinks that they are two different SQL statements that must be parsed twice. Generate 2 execution plans. So as a programmer, you should make sure that the same query statements are consistent everywhere, not even one more space!
Don't make the SQL statement too complicated
I often see a SQL statement captured from the database printed as long as two sheets of A4 paper. Generally speaking, such a complex sentence is usually problematic. I took the 2-page SQL sentence to consult the original author, and he said that it was too long for him to understand it for a while. It is conceivable that even the original author may be confused about the SQL statement, and the database will also be confused.
In general, it is common to take the results of an Select statement as a subset and then query from that subset, but according to experience, more than three layers of nesting, the query optimizer is easy to give the wrong execution plan. Because it was fainted. Things like artificial intelligence, after all, are worse than people's resolution. If people are dizzy, I can guarantee that the database will also be dizzy.
In addition, execution plans can be reused, and the simpler the SQL statement, the more likely it is to be reused. Complex SQL statements have to be reparsed as long as one character changes, and then this pile of garbage is stuffed into memory. You can imagine how inefficient the database will be.
Use temporary Table to temporarily store intermediate results
An important way to simplify SQL statements is to use temporary tables to temporarily store intermediate results, but the benefits of temporary tables are far more than these. Temporary results are temporarily stored in temporary tables, and the subsequent query is in tempdb, which can avoid scanning the main table many times in the program, and greatly reduce the "shared lock" blocking "update lock" in program execution, reducing blocking and improving concurrent performance.
Binding variables must be used in OLTP system SQL statements
Select*from orderheader where changetime > '2010-10-20 00 01' select*from orderheader where changetime >' 2010-09-22 00 VR 01'
The above two statements, which the query optimizer thinks are different SQL statements, need to be parsed twice. If you use binding variables
Select*from orderheader where changetime > @ chgtime
The @ chgtime variable can pass any value so that a large number of similar queries can reuse the execution plan, which greatly reduces the burden on the database to parse the SQL statement. One parsing and multiple reuse is the principle to improve the efficiency of the database.
Binding variable snooping
There are two sides to everything, and binding variables are applicable to most OLTP processing, but there are exceptions. For example, when the field in the where condition is a "skewed field".
"tilted field" means that most of the values in the column are the same, such as a population questionnaire, in which more than 90% of the "nationality" column is Han nationality. So if a SQL statement is to query the population of the Han nationality at the age of 30, then the column of "nationality" must be included in the where condition. There will be a big problem if you use the binding variable @ nation at this time.
Imagine that if the first value passed in by @ nation is "Han", then the entire execution plan must choose a table scan. Then, the second value passed in is the "Buyi family", in theory, the "Buyi family" may only account for 1/10000, and should be looked up by index. However, since the execution plan of the "Han" parsed for the first time is reused, the table scan method will also be used for the second time. This problem is known as "binding variable snooping", and it is recommended that you do not use binding variables for "skewed fields".
Use begin tran only when necessary
A SQL statement in SQL Server is a transaction by default, and it is also the default commit after the execution of the statement. In fact, this is a minimized form of begin tran, like implying a begin tran at the beginning of each sentence and an commit at the end.
In some cases, we need to explicitly declare begin tran, for example, to do "insert, delete, change" operations, you need to modify several tables at the same time, requiring either several tables to be modified successfully or not. Begin tran can play this role, it can execute several SQL statements together, and then commit together. The advantage is that the data is consistent, but nothing is perfect. The price paid by Begin tran is that all resources locked by SQL statements cannot be released until commit is dropped before committing.
It can be seen that if Begin tran traps too many SQL statements, the performance of the database will be poor. Before the big transaction commits, other statements are bound to be blocked, resulting in a lot of block.
The principle used by Begin tran is that under the premise of ensuring data consistency, the fewer SQL statements trapped by begin tran, the better! In some cases, triggers can be used to synchronize data, not necessarily begin tran.
Nolock should be added to some SQL queries
Adding nolock to the SQL statement is an important means to improve the concurrency performance of SQL Server, but it is not necessary to do so in oracle, because the structure of oracle is more reasonable, and there is a undo table space to keep the "data foreground". If the data is not commit in the modification, then what you read is the copy before it was modified, which is placed in the undo table space. In this way, the reading and writing of oracle can not affect each other, which is also widely praised by oracle. The reading and writing of SQL Server will block each other. In order to improve the concurrency performance, nolock can be added to some queries, so that writing is allowed when reading, but the disadvantage is that uncommitted dirty data may be read. There are three principles for using nolock.
(1) nolock cannot be added if the query result is used for "insert, delete, modify".
(2) the query table belongs to frequent page splits, so use nolock carefully!
(3) using temporary tables can also save "data foreground", which is similar to the function of undo table space of oracle.
Do not use nolock if you can use temporary tables to improve concurrency performance.
The clustered index is not built on the order field of the table, and the table is prone to page splitting.
For example, if the order table has order number orderid and customer number contactid, which field should the clustered index be added to? For this table, the order number is added sequentially, and if a clustered index is added on the orderid, the new rows are added at the end, so it is not easy to generate page splits. However, since most queries are based on customer numbers, it makes sense to add a clustered index to the contactid. Contactid is not a sequential field for an order table.
For example, if the "contactid" of "Zhang San" is 001, then the order information of "Zhang San" must be placed on the first data page of the table. If "Zhang San" places a new order today, the order information cannot be placed on the last page of the table, but on the first page! What if the first page is full? I'm sorry, all the data in the table has to be moved back to make room for this record.
The index of SQL Server is different from that of Oracle. The clustered index of SQL Server actually sorts the table according to the order of the clustered index fields, which is equivalent to the index of oracle. SQL Server's clustered index is a form of organization of the table itself, so it is very efficient. Also because of this, insert a record, its position is not randomly placed, but in order to put on the data page that should be placed, if that data page has no space, it will cause the page to split. So it is obvious that the clustered index is not built on the order fields of the table, and the table is prone to page splitting.
Once encountered a situation, a buddy of a table after re-indexing, the efficiency of the insert dropped significantly. It is estimated that the situation is like this. The clustered index of the table may not be built on the order field of the table, and the table is often archived, so the data of the table exists in a sparse state. For example, if Zhang San has placed 20 orders, but only 5 orders in the last 3 months, and the archiving strategy is to retain the data for 3 months, then the past 15 orders of Zhang San have been archived, leaving 15 vacancies, which can be reused when insert occurs. In this case, because there is space available, the page split will not occur. However, query performance is low because the query must scan for spaces that do not have data.
The situation changed after rebuilding the clustered index, because rebuilding the clustered index is to rearrange the data in the table, the original empty space is gone, and the filling rate of the page is very high, and page splitting often occurs when inserting data, so the performance is greatly degraded.
Do you want to give a low page fill ratio for tables where the clustered index is not built on sequential fields? Do you want to avoid rebuilding the clustered index? Is a question worth considering!
Query tables with frequent page splits after adding nolock, which is easy to skip or repeat reads.
After adding nolock, you can query "insert, delete, change" at the same time, but because "insert, delete, change" occur at the same time, in some cases, once the data page is full, the page split is inevitable, and the nolock query is happening at this time. For example, the records that have been read on page 100 may be divided to page 101 because of the page split, which may cause the nolock query to read the data repeatedly when reading page 101. Produce "repeat reading". By the same token, if the data on page 100 is divided into 99 pages before it is read, the nolock query may miss the record, resulting in a "skip read".
The above mentioned buddy, after adding nolock, some operations reported errors, it is estimated that it is possible because the nolock query produced a repeat read, 2 identical records to insert other tables, of course, there will be primary key conflicts.
Attention should be paid when using like for fuzzy query
Sometimes you need to make some fuzzy queries, such as
Select*from contact where username like'% yue%'
Keyword% yue%, due to the use of "%" in front of yue, the query must take a full table scan, do not add% in front of keywords unless necessary
The influence of implicit conversion of data types on query efficiency
Sql server2000's database, our program does not use strong typing to submit the value of this field when submitting the sql statement. Sql server2000 automatically converts the data type, which will cause the input parameters to be inconsistent with the primary key field type. At this time, sql server2000 may use a full table scan. This problem is not found on Sql2005, but it should be noted.
These are all the ways to write a high-performance sql. Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.