What are the optimization skills of mysql database 04/25 Update SLTechnology News&Howtos

What are the optimization skills of mysql database

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Editor to share with you what mysql database optimization skills, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

1. Optimize your query for query caching

Most MySQL servers have query caching turned on. This is one of the most effective ways to improve performance, and it is handled by MySQL's database engine. When many of the same queries are executed multiple times, the query results are placed in a cache so that subsequent identical queries access the cached results directly without manipulating the table.

The main problem here is that it is easy for programmers to ignore. Because, some of our query statements will cause MySQL not to use caching. Take a look at the following example:

The difference between the above two SQL statements is CURDATE (), and MySQL's query cache has no effect on this function. Therefore, SQL functions like NOW () and RAND () or other such functions do not turn on query caching because the returns of these functions are variable. So, all you need is to replace the MySQL function with a variable to turn on the cache.

2. EXPLAIN your SELECT query

Use the EXPLAIN keyword to let you know how MySQL handles your SQL statements. This can help you analyze the performance bottlenecks of your query or table structure.

EXPLAIN's query results will also tell you how your index primary key is used and how your data tables are searched and sorted. Wait, wait.

Pick one of your SELECT statements (the most complex one with multiple table joins is recommended) and add the keyword EXPLAIN to the front. You can use phpmyadmin to do this. Then you will see a form. In the following example, we forgot to add the group_id index and have a table join:

When we index the group_id field:

We can see that the former result shows that 7883 rows are searched, while the latter only searches the 9 and 16 rows of the two tables. Looking at the rows column allows us to find potential performance problems.

3. Use LIMIT 1 1 when only one row of data is needed

Sometimes when you look up the table, you already know that the result will have only one result, but because you may need to go to the fetch cursor, or you may check the number of records returned.

In this case, adding LIMIT 1 can increase performance. In the same way, the MySQL database engine stops searching after finding a piece of data, rather than continuing to look back for the next piece of data that matches the record.

The following example is just to find out if there are any users of "China". Obviously, the latter one will be more efficient than the previous one. (please note that the first one is Select * and the second is Select 1)

4. Index the search field

An index is not necessarily a primary key or a unique field. If there is a field in your table that you will always use to search, please index it.

From the picture above, you can see the search string "last_name LIKE'a%". One is indexed, the other is no index, and the performance is about 4 times worse.

In addition, you should also need to know what kind of search can not use the normal index. For example, when you need to search for a word in a large article, such as "WHERE post_content LIKE'% apple%'", the index may be meaningless. You may need to use a MySQL full-text index or create your own index (for example, search keywords or Tag or something)

5. Use the same type of example when using the Join table

If your application has many JOIN queries, you should make sure that the Join fields in both tables are indexed. In this way, MySQL will start the mechanism to optimize the SQL statement of Join for you.

Moreover, these fields that are used for Join should be of the same type. For example, if you want to Join the DECIMAL field with an INT field, MySQL will not be able to use their indexes. For those STRING types, you also need to have the same character set. (the character sets of the two tables may not be the same)

6. Never ORDER BY RAND ()

Want to disrupt the returned rows of data? Pick a data at random? I don't know who invented this usage, but many beginners like it very much. But you really don't understand the terrible performance problems of doing so.

If you really want to mess up the returned rows of data, there are N ways to do this. Using this only degrades the performance of your database exponentially. The problem here is that MySQL will have to execute the RAND () function (which takes a lot of CPU time), and this is for each row to be recorded and then sorted. Even if you use Limit 1, it won't help (because you have to sort)

The following example is to pick a record at random:

7. Avoid SELECT *

The more data is read from the database, the slower the query becomes. And, if your database server and WEB server are two separate servers, it will also increase the load of network traffic. Therefore, you should form a good habit of taking what you need.

8. Always set an ID for each table

We should set an ID as its primary key for each table in the database, and preferably an INT type (UNSIGNED is recommended), and set the automatically added AUTO_INCREMENT flag.

Even if your users table has a field with a primary key called "email", don't let it be the primary key. Using the VARCHAR type as the primary key degrades performance. In addition, in your program, you should use the ID of the table to construct your data structure.

Moreover, under the MySQL data engine, there are some operations that require the use of primary keys, in which case, the performance and settings of primary keys become very important, such as clusters, partitions.

Here, there is only one exception, that is, the "foreign key" of the "associated table", that is, the primary key of this table is formed by the primary key of several individual tables. We call this situation "foreign keys". For example, if there is a "student table" with a student's ID, and a "curriculum schedule" with a course ID, then the "score sheet" is a "related table", which is related to the student table and the curriculum schedule. In the grade table, the student ID and the course ID are called "foreign keys" which together form the primary key.

9. Use ENUM instead of VARCHAR

The ENUM type is very fast and compact. In fact, it holds TINYINT, but it appears as a string on the outside. In this way, it is perfect to use this field to make a list of options.

If you have a field, such as "gender", "country", "nationality", "status" or "department", and you know that the values of these fields are limited and fixed, you should use ENUM instead of VARCHAR.

MySQL also has a "suggestion" (see Article 10) on how to reorganize your table structure. When you have a VARCHAR field, this proposal will tell you to change it to ENUM. Using PROCEDURE ANALYSE (), you can get relevant suggestions

10. Get advice from PROCEDURE ANALYSE ()

PROCEDURE ANALYSE () will ask MySQL to help you analyze your fields and their actual data, and will give you some useful advice. These suggestions become useful only if there is actual data in the table, because data is needed to make some big decisions.

For example, if you create an INT field as your primary key, but there is not much data, PROCEDURE ANALYSE () will advise you to change the type of this field to MEDIUMINT. Or if you use a VARCHAR field, because there is not much data, you may get one that allows you to put it

Change it to ENUM's proposal. These suggestions may not be accurate enough because there is not enough data.

In phpmyadmin, when you look at the table, click "Propose table structure" to view these suggestions.

It is important to note that these are just suggestions, and they will become accurate only when you have more and more data in your table. It's important to remember that you're the one who makes the final decision.

11. Use NOT NULL whenever possible

Unless you have a very special reason to use NULL values, you should always keep your fields NOT NULL. This seems to be a little controversial. Please read on.

First of all, ask yourself what's the difference between "Empty" and "NULL" (if it's INT, it's 0 and NULL). If you think there is no difference between them, then you should not use NULL. Do you know? In Oracle, the strings of NULL and Empty are the same!)

Don't assume that NULL doesn't need space, it needs extra space, and your program will be more complex when you make comparisons. Of course, this is not to say that you can't use NULL, the reality is very complicated, there will still be some cases, you need to use the NULL value.

12.Prepared Statements

Prepared Statements, much like a stored procedure, is a collection of SQL statements that run in the background, and we can get a lot of benefits from using prepared statements, whether it's performance or security issues.

Prepared Statements can check some of the variables you bind to protect your program from "SQL injection" attacks. Of course, you can also check your variables manually, however, manual checks are prone to problems and are often forgotten by programmers. This problem is better when we use some framework or ORM.

In terms of performance, this will give you a considerable performance advantage when the same query is used multiple times. You can define some parameters for these Prepared Statements, and MySQL will only parse it once.

Although the latest version of MySQL uses binary form to transmit Prepared Statements, this makes network transmission very efficient.

Of course, there are cases where we need to avoid using Prepared Statements because it does not support query caching. But it is said that version 5.1 is supported.

To use prepared statements in PHP, you can check its manual: mysql extension or database abstraction layer, such as PDO.

13. Unbuffered query

Normally, when you execute a SQL statement in your script, your program will stop there until no SQL statement returns, and then your program will continue to execute. You can use unbuffered queries to change this behavior.

Mysql_unbuffered_query () sends a SQL statement to MySQL instead of automatically fethch and caching the results as mysql_query () does. This can save a lot of memory, especially those queries that produce a lot of results, and you don't have to wait for all the results to be returned, just the first row of data. you can start working on the query results right away.

However, there will be some limitations. Because you either read all the lines, or you need to call mysql_free_result () to clear the results before making the next query. Also, mysql_num_rows () or mysql_data_seek () will not be available. Therefore, you need to think carefully about whether to use unbuffered queries.

14. Save the IP address as UNSIGNED INT

Many programmers create a VARCHAR (15) field to hold the string IP instead of the shaping IP. If you use shaping to store, it only takes 4 bytes, and you can have fixed-length fields. Moreover, it will give you an advantage in query, especially if you need to use the WHERE condition: IP between ip1 and ip2.

We have to use UNSIGNED INT because the IP address uses the entire 32-bit unsigned integer.

For your query, you can use INET_ATON () to convert a string IP into an integer and INET_NTOA () to convert an integer into a string IP. In PHP, there are also functions like ip2long () and long2ip ().

15. Fixed length watches will be faster.

If all the fields in the table are "fixed length", the entire table is considered "static" or "fixed-length". For example, there are no fields of the following type in the table: VARCHAR,TEXT. As long as you include one of these fields, the table is not a "fixed-length static table", so the MySQL engine will handle it in a different way.

Fixed-length tables improve performance because MySQL searches faster, and because these fixed lengths are easy to calculate the offset of the next data, they are naturally read quickly. If the field is not fixed in length, then every time you want to find the next one, you need the program to find the primary key.

Also, fixed-length tables are easier to cache and rebuild. However, the only side effect is that fixed-length fields waste some space, because fixed-length fields allocate so much space whether you use them or not.

Using the "vertical split" technique (see next), you can split your watch into two, one with fixed length and the other with variable length.

16. Vertical segmentation

"Vertical splitting" is a method of turning tables in a database into several tables according to columns, which can reduce the complexity of tables and the number of fields, thus achieving the purpose of optimization. (in the past, I worked on a project in a bank and saw a table with more than 100 fields, which was terrifying.)

Example 1: there is a field in the Users table that is a home address. This field is optional, compared to, and you do not need to read or rewrite this field frequently except for your personal information when you are operating in the database. So why not put him on another table? This will make your table have better performance, let's think about it, a large number of times, I for the user table, only the user ID, user name, password, user role and so on will be often used. Smaller watches always have good sex.

Yes.

Example 2: you have a field called "last_login" that is updated every time the user logs in. However, each update causes the query cache for the table to be emptied. So, you can put this field in another table so that it doesn't affect your constant reading of the user's ID, user name, and user role, because the query cache will add a lot of performance to you.

In addition, you need to note that the tables formed by these separated fields, you will not Join them regularly, otherwise, the performance will be even worse than when it is not split, and it will be extremely degraded.

17. Split large DELETE or INSERT statements

If you need to perform a large DELETE or INSERT query on an online site, you need to be very careful not to stop your entire site from responding. Because these two operations will lock the table, once the table is locked, other operations can not come in.

Apache will have many child processes or threads. Therefore, it works quite efficiently, and our server does not want to have too many child processes, threads and database links, which takes up a lot of server resources, especially memory.

If you lock your table for a period of time, say 30 seconds, then for a site with high traffic, the accumulated access processes / threads, database links, and the number of files opened in those 30 seconds may not only allow you to park the WEB service Crash, but also cause your entire server to crash immediately.

So, if you have a big deal, you must split it, using the LIMIT condition is a good way. Here is an example:

18. The smaller the column, the faster.

For most database engines, hard disk operation is probably the most significant bottleneck. So making your data compact can be very helpful in this situation because it reduces access to the hard drive.

See MySQL's documentation Storage Requirements to see all the data types.

If a table has only a few columns (for example, a dictionary table, a configuration table), there is no reason to use INT as the primary key. It would be more economical to use MEDIUMINT, SMALLINT, or a smaller TINYINT. If you don't need to record time, using DATE is much better than DATETIME.

Of course, you also need to leave enough room to expand, otherwise, if you do this later, you will die ugly. see the example of Slashdot (November 06, 2009). A simple ALTER TABLE statement took more than 3 hours because there were 16 million pieces of data in it.

19. Choose the right storage engine

There are two storage engines in MySQL, MyISAM and InnoDB, each of which has its pros and cons. The previous article "MySQL: InnoDB or MyISAM?" discussed this matter.

MyISAM is suitable for some applications that require a large number of queries, but it is not very good for a large number of writes. Even if you just need to update one field, the entire table will be locked, and no other process, even the read process, will be able to operate until the read operation is complete. In addition, MyISAM is extremely fast for calculations like SELECT COUNT (*).

The trend of InnoDB will be a very complex storage engine, which will be slower than MyISAM for some small applications. It supports "row locks", so when there are more writes, it will be better. In addition, he supports more advanced applications, such as transactions.

20. Use an object relational mapper (Object Relational Mapper)

With ORM (Object Relational Mapper), you can get reliable performance gains. Everything an ORM can do can also be written manually. However, this requires a senior expert.

The most important thing about ORM is "Lazy Loading", that is, it will only be done when it is needed to take a value. But you also need to be aware of the side effects of this mechanism, as it is likely to degrade performance by creating many, many small queries.

ORM can also package your SQL statements into a transaction, which is much faster than executing them alone.

At present, my favorite ORM of PHP is: Doctrine

21. Beware of "permanent links"

The purpose of permanent links is to reduce the number of times MySQL links are recreated. When a link is created, it will always be in the connected state, even if the database operation is over. And, since our Apache started reusing its children-- that is, the next HTTP request will reuse Apache's children and reuse the same MySQL links.

In theory, this sounds very good. But in terms of personal experience (and most people's), this feature creates more trouble. Because you only have a limited number of links, memory problems, file handles, and so on.

Moreover, Apache runs in an extremely parallel environment and creates a lot of processes. This is why this "permanent link" mechanism does not work well. Before you decide to use permanent links, you need to think carefully about the architecture of your entire system.

22.sql optimized index optimization 1. Independent column

When making a query, the index column cannot be part of an expression or an argument to a function, otherwise the index cannot be used.

For example, the following query cannot use the index of the actor_id column:

# this is the wrong SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5

Optimization: you can move expressions and function operations to the right side of the equal sign. As follows:

SELECT actor_id FROM sakila.actor WHERE actor_id = 5-1 × 2. Multi-column index

When you need to use multiple columns as criteria for a query, using multiple column indexes performs better than using multiple single column indexes.

For example, in the following statement, it is best to set actor_id and film_id to multi-column indexes. Yuanfudao has a question, see the link for details, can make a deeper understanding.

SELECT film_id, actor_id FROM sakila.film_actorWHERE actor_id = 1 AND film_id = 1 AND film_id = 3. Order of index columns

Put the most selective index column first.

The selectivity of the index refers to the ratio of the index value that is not repeated to the total number of records. The maximum value is 1, where each record has a unique index corresponding to it. The higher the selectivity, the higher the differentiation of each record and the higher the query efficiency.

For example, in the results shown below, customer_id is more selective than staff_id, so it is best to put the customer_id column in front of the multi-column index.

SELECT COUNT (DISTINCT staff_id) / COUNT (*) AS staff_id_selectivity,COUNT (DISTINCT customer_id) / COUNT (*) AS customer_id_selectivity,COUNT (*) FROM payment;# results as follows: staff_id_selectivity: 0.0001customer_id_selectivity: 0.0373COUNT (*): 160494. Prefix index

For columns of type BLOB, TEXT, and VARCHAR, you must use a prefix index, indexing only the first portion of the characters.

The selection of prefix length needs to be determined according to index selectivity.

5. Overlay index

The index contains the values of all fields that need to be queried. It has the following advantages:

1. Indexes are usually much smaller than the size of data rows, and read-only indexes can greatly reduce the amount of data access.

two。 Some storage engines, such as MyISAM, only cache indexes in memory, while data is cached by the operating system. Therefore, accessing only the index can be done without using system calls (which are usually time-consuming).

3. For the InnoDB engine, if the secondary index can override the query, there is no need to access the primary index.

6. Give priority to using indexes to avoid full table scans

Mysql puts% behind when using like for fuzzy queries to avoid starting fuzzy queries.

Because mysql uses the index only when using the latter% when using like queries.

For example, indexes are not used by'% ptd_''or'% ptd_%', while 'ptd_%' uses indexes.

# query the full table without using the index EXPLAIN SELECT * FROM `user`WHERE username LIKE'% ptd_%';EXPLAIN SELECT * FROM `user`WHERE username LIKE'% ptd_';# useful to the index EXPLAIN SELECT * FROM `user`WHERE username LIKE 'ptd_%'

Another example: all the people surnamed Zhang in the frequently used query database:

SELECT * FROM `user`WHERE username LIKE 'Zhang%'; 7. Avoid using in and not in as much as possible, which will cause the database engine to abandon the index for full table scanning.

For example:

SELECT * FROM t WHERE id IN (2) SELECT * FROM T1 WHERE username IN (SELECT username FROM T2)

Optimization method: if it is a continuous value, you can use between instead. As follows:

SELECT * FROM t WHERE id BETWEEN 2 AND 3

If it is a subquery, you can use exists instead. As follows:

SELECT * FROM T1 WHERE EXISTS (SELECT * FROM T2 WHERE t1.username = t2.username) 8. Avoid using or as much as possible, which will cause the database engine to abandon the index for full table scan.

Such as:

SELECT * FROM t WHERE id = 1 OR id = 3

Optimization: union can be used instead of or. As follows:

SELECT * FROM t WHERE id = 1UNIONSELECT * FROM t WHERE id = 39. Avoid judging null values as far as possible, which will cause the database engine to abandon the index for full table scan SELECT * FROM t WHERE score IS NULL.

Optimization method: you can add the default value of 0 to the field and judge the value of 0. As follows:

SELECT * FROM t WHERE score = 010. Try to avoid the operation of expressions and functions on the left side of the equal sign in the where condition, which will cause the database engine to abandon the index for full table scanning.

Same as the first, separate column

SELECT * FROM T2 WHERE score/10 = 9SELECT * FROM T2 WHERE SUBSTR (username,1,2) = 'li'

Optimization: you can move expressions and function operations to the right side of the equal sign. As follows:

SELECT * FROM T2 WHERE score = 10*9SELECT * FROM T2 WHERE username LIKE 'li%'11. Avoid using where 1 conditions when there is a large amount of data. Usually, in order to facilitate the assembly of query conditions, we will use this condition by default, and the database engine will give up indexes for full table scanning SELECT * FROM t WHERE 1 conditions.

Optimization method: use code to assemble sql to judge, there is no where plus where, there is where plus and.

The benefit of the index: after the index is established, the whole table is not scanned during the query, but the locked result of the index table is queried.

The disadvantage of index: when the database carries on the DML operation, in addition to maintaining the data table, but also needs to maintain the index table, the operation and maintenance cost increases.

Application scenario: the situation where the amount of data is relatively large and there are more query fields.

Index rules:

1. The field with high selectivity is selected as the index, and the general unique has the highest selectivity.

two。 Composite index: the higher the selectivity, the higher the ranking. (left prefix principle)

3. If both of the query conditions are highly selective, it is best to index both.

Query optimization optimized by 23.SQL 1. Use Explain for analysis

Explain is used to analyze SELECT query statements, and developers can optimize query statements by analyzing Explain results.

The more important fields are:

Select_type: query type, including simple query, federated query, subquery, etc.

Key: index used

Rows: number of rows scanned

two。 Optimize data access

1. Reduce the amount of data requested

Return only the necessary columns: it is best not to use the SELECT * statement.

Return only the necessary rows: use the LIMIT statement to restrict the data returned.

Caching repeatedly queried data: using caching can avoid querying in the database, especially when the data to be queried is often repeatedly queried, the query performance improvement caused by caching will be very obvious.

two。 Reduce the number of rows scanned on the server side

The most efficient way is to use indexes to override queries.

3. Reconstruct query mode

1. Syncopated big query

If a large query is executed at once, it may lock a lot of data at once, occupy the entire transaction log, deplete system resources, and block many small but important queries.

two。 Decompose big join query

Decomposing a large join query into a single table query for each table, and then associating it in the application, has the following benefits:

Make caching more efficient: for join queries, if one of the tables changes, the entire query cache cannot be used. For multiple queries after decomposition, even if one of the tables changes, the query cache for other tables can still be used.

Decomposed into multiple single-table queries, the cached results of these single-table queries are more likely to be used by other queries, thus reducing the number of queries with redundant records.

Reduce lock competition

Connecting at the application layer makes it easier to split the database, making it easier to achieve high performance and scalability.

The query itself may also be more efficient. For example, in the following example, using IN () instead of join queries allows MySQL to query in ID order, which may be more efficient than random joins.

SELECT * FROM tabJOIN tag_post ON tag_post.tag_id=tag.idJOIN post ON tag_post.post_id=post.idWHERE tag.tag='mysql';SELECT * FROM tag WHERE tag='mysql';SELECT * FROM tag_post WHERE tag_id=1234;SELECT * FROM post WHERE post.id IN (123, 456, 567, 9098, 8904) Analyze query statement

Through the analysis of the query statement, we can understand the execution of the query statement and find out the bottleneck of the query statement execution, so as to optimize the query statement. EXPLAIN statements and DESCRIBE statements are provided in mysql to analyze query statements.

The basic syntax of the EXPLAIN statement is as follows:

EXPLAIN [EXTENDED] SELECT select_options

Using the EXTENED keyword, the EXPLAIN statement produces additional information. Select_options is the query option for select statements, including the from where clause, and so on.

By executing this statement, you can analyze the execution of the select statement after EXPLAIN, and be able to analyze some characteristics of the queried table.

For example: EXPLAIN SELECT * FROM user

Explain the query results:

A, id:select identifier, which is the query sequence number of select.

B, select_type: identifies the type of select statement.

It can have the following values:

B1, SIMPLE (simple) represent simple queries, excluding join queries and subqueries.

B2, PRIMARY (primary) represents the main query, or the outermost query statement.

B3, UNION (union) indicates the second or subsequent query statement of the join query.

B4, DEPENDENT UNION (dependent union) join the second or subsequent select statement in the query. It depends on the external inquiry.

B5, UNION RESULT (union result) join the results of the query.

B6. The first select statement of the SUBQUERY (subquery) subquery.

The first select of the b7, DEPENDENT SUBQUERY (dependent subquery) subquery depends on the external query.

B8, DERIVED (derived) exports the SELECT of the table (a subquery of the FROM clause).

C, table: represents the table of the query.

D, type: indicates the connection type of the table.

The various connection types are given below, from the best type to the worst type.

D1, system, which is a system table with only one row. This is a special case of the const connection type.

D2, const, the data table has at most one matching row, which is read at the beginning of the query and treated as a constant in the rest of the query optimization. Const tables are queried quickly because they are read only once. Const is used in situations where constant values are used to compare all parts of an primary key or unique index.

For example: EXPLAIN SELECT * FROM user WHERE id=1

D3, eq_ref, for each row combination from the previous table, read a row from that table. This type can be used when all parts of an index are used in a query and the index is UNIQUE or PRIMARY KEY. Eq_ref can be used to compare indexed columns using the "=" operator. The comparison value can be a constant or an expression for the column of the table read before the table.

For example: EXPLAIN SELECT * FROM user,db_company WHERE user.company_id = db_company.id

D4, ref for any combination of rows from the previous table, all matching rows are read from that table. This type is used in cases where it is neither UNION nor primaey key, or when the query uses the left subset of the index column, that is, the combination of the left part of the index. Ref can be used for indexed columns that use the = or operator.

D5, ref_or_null, if the join type is ref, but if you add mysql, you can specifically search for rows containing null values, and the optimization of this join type is often used in solving subqueries.

D6, index_merge, which indicates that the index merge optimization method is used. In this case, the key column contains a list of the indexes used, and the key_len contains the longest key elements of the index used.

D7, unique_subquery, which replaces the ref of the in subquery in the following form. Is an index query function, which can completely replace the subquery and is more efficient.

D8, index_subquery, this join type is similar to unique_subquery and can replace in subqueries, but is only suitable for non-unique indexes in the following forms of subqueries.

D9, range, only retrieves rows in a given range, using an index to select rows. The key column shows that that index is used. Key_len contains the longest key element of the index used. When using =, >, > =

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.