In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
1.2. Access data only through indexes
Sometimes, we only access a few fields in the table, and the contents of the fields are less, so we can create a separate combined index for these fields, so that we can get the data directly by accessing the index. Generally, the index takes up much less disk space than the table, so this approach can greatly reduce disk IO overhead.
Such as: select id,name from company where type='2'
If this SQL is used frequently, we can create a composite index on type,id,name
Create index my_comb_index on company (type,id,name)
With this composite index, SQL can return data directly through the my_comb_index index without having to access the company table.
Let's take a dictionary as an example: there is a need to query the number of all Chinese characters in a Chinese dictionary. if our dictionary does not have a catalog index, we can only count one word from the dictionary and return the result. If we have a pinyin directory, we can only access the Chinese characters in the pinyin directory to count. If a dictionary has 1000 pages and the Pinyin catalogue has 20 pages, our data access cost is equivalent to one-50th of the full table access.
Keep in mind that performance optimization is endless, when the performance can meet the requirements, do not over-optimize. In the actual database, it is impossible to build every field requested by SQL in the index, so this method of accessing data only through the index is generally only used in the core application, that is, the query with the highest access to the core table and a small amount of data in the query field.
1.3.Optimization of SQL execution plan
SQL execution plan is one of the core technologies of relational database, which represents the data access algorithm when SQL is executed. As the business requirements are becoming more and more complex, the amount of table data is getting larger and larger, programmers are becoming more and more lazy, and SQL also needs to support very complex business logic, but the performance of SQL still needs to be improved. Therefore, an excellent relational database not only needs to support complex SQL syntax and more functions, but also needs a set of excellent algorithm libraries to improve SQL performance.
At present, ORACLE has about 300 algorithms for SQL execution plan, and it is increasing all the time, so SQL execution plan is a very complex subject. It is very good that an ordinary DBA can master 50 kinds. Even senior DBA can not describe the algorithm of each execution plan clearly. Although there are so many algorithms, it doesn't mean that we can't optimize the execution plan, because there are only a dozen commonly used SQL execution plan algorithms. If a programmer can figure out a dozen algorithms, he will know 80% of the SQL execution plan tuning knowledge.
Because of the space, the SQL implementation plan requires a presentation, so I won't say much here.
2. Return less data 2.1, data paging processing
General data paging methods are as follows:
2.1.1, client (application or browser) paging
Download all data from the application server to the local application or browser, which is paged within the application or browser through local code
Advantages: simple coding, reducing the number of network interactions between client and application server
Disadvantages: long interaction time for the first time, occupying client memory
Adapt to the scenario: the network latency between the client and the application server is large, but the subsequent operation is required to be smooth, such as mobile GPRS, ultra-remote access (transnational) and so on.
2.1.2. Application server paging
Download all the data from the database server to the application server, and then filter the data inside the application server. The following is an example of application server-side Java program paging:
List list=executeQuery ("select * from employee order by id")
Int count= list.size ()
List subList= list.subList (10,20)
Advantages: simple coding, only one SQL interaction, and better performance when the total data is about the same as the paged data.
Disadvantages: the performance is poor when the total amount of data is large.
Adapt to the scenario: the database system does not support paging processing, the amount of data is small and controllable.
2.1.3. Database SQL paging
Using database SQL paging requires two SQL completion.
A SQL calculates the total quantity
A SQL returns paged data
Advantages: good performance
Disadvantages: complex coding, different database syntax, need two SQL interactions.
Oracle databases generally use rownum for paging. There are two common paging syntax:
Paging directly through rownum:
Select * from (
Select a.paper.rownum rn from
(select * from product a where company_id=? Order by status) a
Where rownum10
Data access cost = index IO+ index table data IO corresponding to all recorded results
Using rowid paging syntax
The optimization principle is to find out the ROWID of the paged record through the pure index, and then return the data to the table through ROWID, which requires that the inner query and sorting fields are all in the index.
Create index myindex on product (company_id,status)
Select b.* from (
Select * from (
Select a.paper.rownum rn from
(select rowid rid,status from product a where company_id=? Order by status) a
Where rownum10) a, product b
Where a.rid=b.rowid
Data access cost = index IO+ index paging result corresponding to the table data IO
Example:
A company product has 1000 records, 20 of which are paged, assuming that accessing the company index requires 50 IO,2 records and one table data IO.
Then according to the first ROWNUM paging method, it takes 550 (50 million 1000 IO) IO, and according to the second ROWID paging method, only 60 IO is needed.
2.2. Return only the required fields
Performance can be improved by removing unnecessary return fields, for example:
Before adjustment: select * from product where company_id=?
Adjusted: select id,name from product where company_id=?
Advantages:
1. Reduce the overhead of data transmission on the network
2. Reduce the overhead of server data processing
3. Reduce client memory footprint
4. Find the problem in advance when the field is changed, and reduce the program BUG
5. If all the fields accessed happen to be in one index, you can use index-only access to improve performance.
Disadvantages: increase coding workload
Because it will increase some coding workload, it is generally required to require programmers to do so through the development specification, otherwise the rectification work will be even greater after the project is online.
If you have large fields or more fields in your query table, such as memo information, file contents, etc., you must pay attention to these problems when querying the table, otherwise it may cause serious performance problems. If the table is queried frequently and the probability of requesting large content fields is low, we can split a large table into two one-to-one relational tables and put the less commonly used large content fields in a separate table. Such as a table that stores uploaded files:
T_FILE (ID,FILE_NAME,FILE_SIZE,FILE_TYPE,FILE_CONTENT)
We can split it into two one-to-one relationship tables:
T_FILE (ID,FILE_NAME,FILE_SIZE,FILE_TYPE)
T_FILECONTENT (ID, FILE_CONTENT)
With this split, the single record and the total size of the TFILEE table can be greatly reduced, so that the performance is better when querying T_FILE, and the TFILECONTENT table is accessed when you need to query the contents of the FILE_CONTENT field.
3. Reduce the number of interactions 3.1and batch DML
The database access framework generally provides an interface for batch submission, and jdbc supports batch submission processing. When you want to insert 10 million pieces of data into a table at a time, if you use ordinary executeUpdate processing, then the number of interactions with the server is 10 million, and an estimate can be submitted to the database server for 10000 times per second. It takes 1000 seconds to complete all the work. If you use batch submission mode, 1000 entries are submitted once, then the number of interactions with the server is 10, 000 times, and the number of interactions is greatly reduced. Generally speaking, the use of batch operation will not reduce the physical IO of many database servers, but it will greatly reduce the number of interactions between the client and the server, thus reducing the network delay overhead initiated many times, and also reducing the CPU overhead of the database.
Suppose you want to insert 10 million data into a regular table, each record size is 1K bytes, there is no index on the table, and the client and database server network is 100Mbps. The following is the performance ratio of various batch sizes estimated based on current general computer capabilities:
Unit: ms
No batch
Batch=10
Batch=100
Batch=1000
Batch=10000
Server transaction time
0.1
0.1
0.1
0.1
0.1
Server IO processing time
0.02
0.2
two
twenty
two hundred
Network interaction initiation time
0.1
0.1
0.1
0.1
0.1
Network data transmission time
0.01
0.1
one
ten
one hundred
Subtotal
0.23
0.5
3.2
30.2
300.2
Average processing time per record
0.23
0.05
0.032
0.0302
0.03002
As can be seen from above, Insert operation increasing Batch can improve performance by nearly 8 times. Generally, Update or Delete operation based on primary key may also improve performance by 2-3 times, but it is not as obvious as Insert, because Update and Delete operations may have greater overhead in physical IO access. The above values are only theoretical calculations, and the actual situation needs to be measured according to the specific environment.
3.2 、 In List
In many cases, we need to query database records by some ID. We can use an ID to send a request to the database, as shown below:
For: var in ids [] do begin
Select * from mytable where id=:var
End
We can also do a small optimization, as shown below, to write SQL in this way of ID INLIST:
Select * from mytable where id in (: id1,id2,...,idn)
By doing this, the number of SQL requests can be greatly reduced, thus improving performance. If there are 10000 ID, are they all processed in one SQL? The answer is definitely no. First of all, most databases will have limits on the length of SQL and the number of IN. For example, the IN of ORACLE does not allow more than 1000 values.
In addition, the current database generally adopts cost-based optimization rules, when the number of IN reaches a certain value, it is possible to change the SQL execution plan, from index access to full table access, which will lead to a sharp change in performance. As the number of values in the IN in the SQL increases, the execution plan of the SQL will become more complex and take up more memory, which will increase the server CPU and memory costs.
To evaluate how many values to put in the IN at a time, you also need to consider the cost of the local memory of the application server. If there is concurrent access, you need to calculate the upper limit of concurrency within the life cycle of the local data, otherwise it may lead to memory overflow.
Taken into consideration, generally speaking, the performance does not change much when the number of values in IN exceeds 20, and it is also specified that it should not exceed 100. exceeding it may cause instability of the execution plan and increase the cost of database CPU and memory, which requires professional DBA evaluation.
3. Set Fetch Size
When we use select to query data from the database, the data is not returned to the client by default, nor is it all returned to the client at once, but is processed according to the fetch_size parameters of the client, only fetch_size records are returned each time, and then the data is fetched from the server when the client cursor traverses to the tail, until the final transmission is completed. So if we want to fetch a large amount of data from the server at one time, we can increase the fetch_size, which can reduce the interaction times of the result data transmission and the server data preparation time, and improve the performance.
The following is the jdbc test code, which uses the local database, and the table cache is stored in the database CACHE, so there is no network connection and disk IO overhead. The client only traverses the cursor without any processing, which can better reflect the impact of fetch parameters:
String vsql = "select * from t_employee"
PreparedStatement pstmt = conn.prepareStatement (vsql,ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY)
Pstmt.setFetchSize (1000)
ResultSet rs = pstmt.executeQuery (vsql)
Int cnt = rs.getMetaData () .getColumnCount ()
Object o
While (rs.next ()) {
For (int I = 1; I
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.