In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly talks about "mysql database and optimization method of tens of millions of level data". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "tens of millions of level data mysql database and optimization methods" it!
1. In order to optimize the query, we should avoid full table scanning as far as possible, and we should first consider establishing indexes on the columns involved in where and order by.
two。 Try to avoid judging the null value of a field in the where clause, otherwise it will cause the engine to give up using the index and do a full table scan
Sql code: select id from t where num is null
You can set the default value of 0 on num to ensure that there is no null value for the num column in the table, and then query like this:
Sql code: select id from t where num=0
3. Try to avoid using it in the where clause! = or operator, otherwise the engine abandons the use of indexes and performs a full table scan.
4. Try to avoid using or to join conditions in the where clause, otherwise it will cause the engine to give up using the index and do a full table scan
Sql code: select id from t where num=10 or num=20
You can query it like this:
Sql code: select id from t where num=10 union all select id from t where num=20
5.in and not in should also be used with caution, otherwise it will lead to full table scanning, such as:
Sql Code: select id from t where num in (1, 2, 2, 3)
For consecutive values, use between instead of in:
Sql Code: select id from t where num between 1 and 3
6. The following query will also cause a full table scan:
Sql code: select id from t where name like'c%'
To improve efficiency, consider full-text retrieval.
7. Using parameters in the where clause also results in a full table scan. Because SQL parses local variables only at run time, the optimizer cannot defer the choice of access plan until run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and cannot be used as an input for index selection. A full table scan will be performed as follows:
Sql code: select id from t where num=@num
You can force the query to use the index instead:
Sql code: select id from t with (index (index name)) where num=@num
8. Expression manipulation of fields in the where clause should be avoided as far as possible, which will cause the engine to abandon the use of indexes and perform full table scans.
Sql code: select id from t where num/2=100
You can query it like this:
Sql code: select id from t where num=100*2
9. Functional manipulation of fields in the where clause should be avoided as far as possible, which will cause the engine to abandon the use of indexes and perform full table scans. Such as:
Sql code: select id from t where substring (name,1,3) = 'abc';#name id that begins with abc
It should be changed to:
Sql code: select id from t where name like 'abc%'
10. Do not perform functions, arithmetic operations, or other expression operations to the left of the "=" in the where clause, or the system may not be able to use the index correctly.
11. When using an index field as a condition, if the index is a composite index, the first field in the index must be used as a condition to ensure that the system uses the index, otherwise the index will not be used. and the order of the fields should be consistent with the order of the index as far as possible.
twelve。 Don't write meaningless queries, such as generating an empty table structure:
Sql code: select col1,col2 into # t from t where 1: 0
This type of code does not return any result sets, but consumes system resources, and should be changed to this:
Sql code: create table # t (… )
13. In many cases, using exists instead of in is a good choice:
Sql code: select num from a where num in (select num from b)
Replace it with the following statement:
Sql Code: select num from a where exists (select 1 from b where num=a.num)
14. Not all indexes are valid for the query. SQL optimizes the query based on the data in the table. When there are a large number of duplicate data in the index column, the SQL query may not make use of the index. For example, if there are fields * in a table, male and female are almost equal, then even if the index is built on * *, the query efficiency will not be affected.
15. Index is not the more the better, the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and update, because insert or update may rebuild the index, so how to build the index needs to be carefully considered, depending on the specific situation. It is best to have no more than 6 indexes in a table, and if there are too many, consider whether it is necessary to build indexes on some infrequently used columns.
16. Updating clustered index data columns should be avoided as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the value of this column changes, it will lead to the adjustment of the order of the whole table records, which will consume a lot of resources. If the application system needs to update the clustered index data column frequently, it needs to consider whether the index should be built as a clustered index.
17. Try to use numeric fields, and try not to design character fields that contain only numeric information, which will reduce the performance of queries and connections, and increase storage overhead. This is because the engine compares each character in the string one by one when processing queries and connections, while for numeric types, it only needs to be compared once.
18. Use varchar/nvarchar instead of char/nchar as much as possible, because first of all, the storage space of longer fields is small, which can save storage space, and secondly, for queries, searching in a relatively small field is obviously more efficient.
19. Don't use select * from t anywhere, replace "*" with a specific list of fields, and don't return any fields that you don't need.
20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, note that the index is very limited (only the primary key index).
21. Avoid creating and deleting temporary tables frequently to reduce the consumption of system table resources.
twenty-two。 Temporary tables are not unavailable, and using them appropriately can make some routines more efficient, for example, when you need to re-reference a dataset in a large or commonly used table. However, for one-time events, it is best to use an export table.
23. When creating a new temporary table, if you insert a large amount of data at one time, you can use select into instead of create table to avoid causing a large amount of log to improve speed; if the amount of data is small, in order to ease the resources of the system table, you should first create table, and then insert.
24. If temporary tables are used, be sure to explicitly delete all temporary tables at the end of the stored procedure, first truncate table, and then drop table, to avoid prolonged locking of system tables.
25. Avoid using cursors as much as possible, because cursors are inefficient, and if cursors operate on more than 10,000 rows of data, you should consider rewriting them.
twenty-six。 Before using a cursor-based or temporary table approach, you should look for a set-based solution to the problem, which is usually more effective.
twenty-seven。 Like temporary tables, cursors are not unavailable. Using FAST_FORWARD cursors for small datasets is generally better than other row-by-row processing methods, especially if you have to reference several tables to get the data you need. Routines that include "totals" in the result set are usually executed faster than using cursors. If you allow it during development, you can try both the cursor-based approach and the set-based approach to see which method works better.
twenty-eight。 Set SET NOCOUNT ON at the beginning of all stored procedures and triggers and SET NOCOUNT OFF at the end. There is no need to send a DONE_IN_PROC message to the client after each statement of the stored procedure and trigger is executed.
twenty-nine。 Try to avoid large transaction operations and improve the concurrency ability of the system. The sql optimization method uses indexes to traverse tables more quickly. The index built by default is a non-clustered index, but sometimes it is not optimal. Under a non-clustered index, the data is physically randomly stored on the data page. Reasonable index design should be based on the analysis and prediction of various queries. Generally speaking:
a. There are a large number of duplicate values and there are often range queries (>
< ,>=, < =) and order by, group by occurred columns, you can consider establishing a cluster index
b. Multiple columns are often accessed at the same time, and each column contains duplicate values. Consider establishing a combined index.
c. The composite index should try to make the key query form index coverage, and its leading column must be the most frequently used column. Although indexes help to improve performance, it is not better to have as many indexes as possible. On the contrary, too many indexes can lead to system inefficiency. Each time the user adds an index to the table, the maintenance of the index collection has to be updated accordingly.
Add:
1. Use format conversion as little as possible in massive queries.
2. ORDER BY and GROPU BY: using ORDER BY and GROUP BY phrases, any index can help improve the performance of SELECT.
3. Any operation on the column will result in a table scan, including database tutorial functions, evaluation expressions, and so on. When querying, move the operation to the right of the equal sign as much as possible.
4. IN and OR clauses often use worksheets to invalidate the index. If you do not produce a large number of duplicate values, you can consider taking the clause apart. The index should be included in the split clause.
5. As long as you can meet your needs, use smaller data types as much as possible: for example, use MEDIUMINT instead of INT
6. Try to set all columns to NOT NULL. If you want to save NULL, set it manually instead of setting it as the default value.
7. Use VARCHAR, TEXT and BLOB types as little as possible
8. If you have only a small number of data that you know. It is best to use the ENUM type
9. Build an index, as graymice said.
At this point, I believe that you have a deeper understanding of the "mysql database and optimization methods of tens of millions of data". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.