Carding of MySQL DBA basic knowledge points and query Optimization 07/19 Update SLTechnology News&Howtos

Carding of MySQL DBA basic knowledge points and query Optimization

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

This paper mainly summarizes some common operations in the work, as well as unreasonable operations, and some useful data and information collected during the optimization of slow queries. This paper is suitable for developers with MySQL foundation.

I. Index correlation

Index cardinality: the cardinality is the number of different values contained in the data column. For example, if a data column contains values 1, 3, 7, 4, 7, 3, its cardinality is 4. The index works best when the cardinality of the index is higher than the number of rows in the data table (that is, the column contains many different values, with few duplicates). If a data column contains many different ages, the index quickly distinguishes the data rows. If a data column is used to record gender (there are only "M" and "F" values), then the index is of little use. If the probability of occurrence of values is almost equal, then no matter which value you search for, you may get half of the data rows. In these cases, it is best not to use an index at all, because when the query optimizer finds that a value appears in a high percentage of the table's data rows, it generally ignores the index and performs a full table scan. The usual percentage line is "30%".

Many students learned from big data but never got it right. They spent months trying to get a clear introduction to the basic concepts and found that the code they wrote could not be formed on and off. Big data Learning Exchange qun 606859705 and Daniel discuss progressive learning, exchange problems, and help each other. There are good learning tutorials and development tools in the group. If you have any questions about learning big data (learning methods, learning efficiency, how to get a job), you can consult at any time.

Cause of index failure:

For index column operations, the operations include (+, -, *, /,! ,%, like'%_' (% preceded) type error, such as field type varchar,where conditional with number.

Apply internal functions to the index, in which case you should establish a function-based index such as select * from template t where ROUND (t.logicdb_id) = 1

At this time, ROUND (t.logicdb_id) should be built as the index. Mysql8.0 starts to support functional indexes. 5.7. it can be supported by virtual columns. Previously, only one new ROUND (t.logicdb_id) column can be created and maintained.

If the condition has or, it will not be used even if there is a conditional index (which is why it is recommended to use less or). If you want to use or and want the index to be valid, you can only add an index to each column in the or condition.

If the column type is a string, be sure to use quotation marks in the data in the condition, otherwise do not use the index; B-tree index is null will not go, is not null will go, bitmap index is null,is not null will follow the leftmost principle of combination index

Establishment of index

The most important thing is to select the highly differentiated column as the index according to the statements frequently queried by the business, and the formula of distinguishing degree is COUNT (DISTINCT col) / COUNT (*). Indicates the rate at which the field is not duplicated, and the higher the ratio, the fewer records we scan. If the unique feature in the business is best to establish a unique key, on the one hand, the correctness of the data can be guaranteed, on the other hand, the efficiency of the index can be greatly improved.

2. Useful information in EXPLIAN

Basic usage

Desc or explain plus sql

Extended explain plus sql, and then you can view the actual executed statements through show warnings, which is also very useful. Many times, different writing methods are analyzed by sql and the actual executed code is the same.

Characteristics that improve performanc

Index coverage (covering index): the data that needs to be queried can be found on the index. There is no need to go back to the table EXTRA column to display using index.

ICP feature (Index Condition Pushdown): originally, index is only an access mode of data access. The data obtained by the memory engine through indexing back to the table will be passed to the MySQL server layer for where conditional filtering. When version 5.6 is opened, if part of the where condition can use the indexed fields, MySQL server will push this part down to the engine layer, and you can use the where condition of index filtering to filter the data in the storage engine layer. EXTRA displays using index condition. Need to understand the architecture diagram of mysql is divided into server and storage engine layer

Index merge (index merge): multiple indexes are conditionally scanned separately, and then their respective results are combined (intersect/union). OR is generally used, and if it is an AND condition, consider establishing a composite index. The index type displayed by EXPLAIN will show that index_merge,EXTRA will show the specific merge algorithm and the index used.

Extra field

Using filesort: indicates that MySQL sorts the data using an external index instead of reading it in the order in which it is indexed within the table. The sort operation that cannot be done with indexes in MySQL is called "file sorting", but it is not necessarily file sorting, but the internal using temporary is fast-arranging: temporary tables are used to hold intermediate results, and MySQL uses temporary tables when sorting query results. It is common in sorting order by and grouping query group byusing index: it indicates that the overlay index (Covering Index) is used in the corresponding SELECT operation to avoid accessing the data rows of the table. The value of the impossible where: WHERE clause is always false and cannot be used to get any tuple select tables optimized away: optimize the MIN/MAX operation based on the index without the GROUP BY clause or optimize the COUNT (*) operation for the MyISAM storage engine without waiting for the execution phase to calculate. The query execution plan generation phase completes the optimization distinct: optimize the distinct operation and stop finding the same value after finding the first matching meta-ancestor

Note: using filesort,using temporary these two items need to note that these two items are very performance-consuming, in the use of group by, although there is no use of order by, if there is no index, it is possible to appear using filesort,using temporary at the same time, because group by is sorted first in the group, if there is no need for sorting, you can add an order by NULL to avoid sorting, so using filesort will be removed, can improve a little performance.

Type field

System: the table has only one row of records (equal to the system table). This is a special case of const type and usually does not appear.

Const: if found in turn by index, const is used to compare primary key indexes or unique indexes. Because only one row of data can be matched, it is very fast. If you place the primary key in the where list, MySQL can convert the query to a constant

Eq_ref: unique index scan, with only one record in the table matching for each index key. Common in primary key or unique index scans

Ref: a non-unique index scan that returns all rows that match a single value. It is essentially an index access, which returns all rows that match a single value, but it may find multiple qualified rows, so it should be a mixture of lookups and scans.

Range: only retrieve a given range of rows, using an index to select rows. The key column shows which index is used, that is, queries such as between, in, etc., appear in your where statements. This range scan index is better than a full table scan, because it only needs to start at one point of indentation and end at another point, without scanning all the indexes.

The difference between index:Full Index Scan, index and ALL is that the index type only traverses the index tree, which is usually faster than ALL because the index file is usually smaller than the data file. (that is, although both ALL and index read the whole table, index is read from the index and ALL is read from the hard disk.)

All:Full Table Scan, traversing the entire table to get matching rows

III. Field types and codes

Mysql returns string length: the CHARACTER_LENGTH method (like CHAR_LENGTH) returns the number of characters, and the LENGTH function returns the number of bytes, three bytes per Chinese character.

Index length calculation statement for fields such as varvhar: select count (distinct left (test,5)) / count (*) from table; as close to 1 as possible

The maximum utf8 of mysql is 3 bytes and does not support emoji emoticons. Only utf8mb4 must be used. You need to configure the client character set to utf8mb4 in the mysql configuration file. The connection string of jdbc does not support the configuration of characterEncoding=utf8mb4. The best way is to specify the initialization sql in the connection pool, for example: hikari connection pool. Other connection pools are similar to spring.datasource.hikari.connection-init-sql=set names utf8mb4. Otherwise, you need to execute set names utf8mb4 before each sql execution.

Msyql collation (generally using _ bin and _ genera_ci):

Utf8_genera_ci is not case-sensitive, and ci is an abbreviation for case insensitive, that is, case-insensitive

Utf8_general_cs is case-sensitive, and cs stands for case sensitive, which means case-sensitive, but currently

Collations similar to * * _ genera_cs are no longer supported in the MySQL version, and utf8_bin is used instead.

Utf8_bin stores each character in a string in binary data, case-sensitive.

So, also case-sensitive, what's the difference between utf8_general_cs and utf8_bin?

Cs stands for case sensitive, which is case-sensitive; bin means binary, that is, binary coding comparison.

Under the utf8_general_cs collation, even if it is case-sensitive, some Western European characters are not distinguished from Latin characters, such as ä = a, but sometimes they do not need ä = a, which is why there is utf8_bin.

The characteristic of utf8_bin is that it uses the binary encoding of characters for operation, and any different binary coding is different, so under the utf8_bin collation: ä a

The initial connection in sql yog specifies the encoding type using the initialization command of the connection configuration

IV. Summary of SQL sentences

Commonly used but easy to forget:

If there is a primary key or unique key conflict, do not insert: insert ignore into

Update if there is a primary key or unique key conflict, note that this will affect the increment of self-increment: INSERT INTO room_remarks (room_id,room_remarks) VALUE (1, "sdf") ON DUPLICATE KEY UPDATE room_remarks= "234"

If there is a new replacement, if values does not contain a self-incrementing column, the value of the self-incrementing column will change: REPLACE INTO room_remarks (room_id,room_remarks) VALUE (1, "sdf")

Backup form: CREATE TABLE user_info SELECT FROM user_info

Replication table structure: CREATE TABLE user_v2 LIKE user

Import: INSERT INTO user_v2 SELECT FROM user or INSERT INTO user_v2 (id,num) SELECT id,num FROM user from the query statement

Update of linked tables: UPDATE user a, room b SET a.num=a.num+1 WHERE a.room_id=b.id

Linked table deletion: DELETE user FROM user,black WHERE user.id=black.id

Lock related (rarely used for understanding)

Shared lock: select id from tb_test where id = 1 lock in share mode

Exclusive lock: select id from tb_test where id = 1 for update

Used in optimization:

Force the use of an index: select from table force index (idx_user) limit 2

Prohibit the use of an index: select from table ignore index (idx_user) limit 2

Disable caching (remove the effect of caching during testing): select SQL_NO_CACHE from table limit 2

View statu

View character set SHOW VARIABLES LIKE 'character_set%'

View collation SHOW VARIABLES LIKE 'collation%'

Notes on SQL Writing

The parsing order of where statements is from right to left, and the condition is to put where rather than having as far as possible.

Delay correlation (deferred join) technology is used to optimize super-multi-paging scenarios, such as limit 10000. Delay correlation can avoid returning to the table.

Distinct statements consume a lot of performance and can be optimized through group by

Try not to have more than three tables.

Step on the pit

If there is a self-incrementing column, the truncate statement will reset the cardinality of the self-incrementing column to 0. In some scenarios, we should pay great attention to using the self-incrementing column as the business id.

The aggregate function will filter automatically. For example, if column an is of type int and all NULL, SUM (a) returns NULL instead of 0.

Mysql cannot use "a=null" to judge null equality, this result is always UnKnown,where and having, UnKnown is always regarded as false,check constraint, UnKnown will be treated as true. So use "an is null" to deal with

60,000,000 large tables are modified online

When mysql has a large amount of data in the table, if the table structure is modified, the table will be locked and the business request will be blocked. Mysql introduced online updates after 5.6.But in some cases, tables are still locked, so the pt tool (Percona Toolkit) is generally used.

If you add an index to a table:

Pt-online-schema-change-- user='root'-- host='localhost'-- ask-pass-- alter "add index idx_user_id (room_id,create_time)"

7. Slow query log

Sometimes if the online request times out, you should pay attention to the slow query log, the analysis of the slow query is very simple, first find the location of the slow query log file, and then use mysqldumpslow to analyze. To query slow log information, you can view related variables directly by executing the sql command. The common sql is as follows:

-- View slow query configuration

-- whether the slow_query_log slow query log is enabled

-- the value of slow_query_log_file is the recorded slow query log to the file

-- long_query_time specifies the threshold for slow query

Does log_queries_not_using_indexes record all queries that do not take advantage of the index

SHOW VARIABLES LIKE'% quer%'

Check whether the slow query is in the form of a log or a table

SHOW VARIABLES LIKE 'log_output'

-- check the number of slow queries

SHOW GLOBAL STATUS LIKE 'slow_queries'

The tool for mysqldumpslow is very simple. I mainly use the following parameters:

-t: limit the number of lines output. I usually take the first ten lines.

-s: according to what to sort the default is the average query time at, I also often use c query times, because the query times are very frequent, but the time is not high, it is necessary to optimize, and t query time, check that statement special card.

-v: output details

Example: mysqldumpslow-v-s t-t 10 mysql_slow.log.2018-11-20-0500

8. View sql process and kill process

If you perform a sql operation, but do not return, you can look at its actual execution status by querying the process list. If the sql is very time-consuming, you can use the kill command to avoid affecting the online process. You can also visually see the current sql execution status by looking at the process list. If the current database load is high, a large number of processes may appear in the process list. It takes a long time to execute. The command is as follows:

-- View the list of processes

SHOW PROCESSLIST

-- kill a process

Kill 183665

If you use sqlyog, there is also a graphical page in the menu bar-tools-display-process list. You can right-click to kill the process on the process list page. As follows:

View a list of processes

Kill process

IX. Thoughts on the performance of some databases

When optimizing the company's slow query log, most of the time you may forget to build an index. This kind of problem is easy to solve, just add an index. But there are two situations that can't be solved simply by adding an index:

Business code cycle through the database: consider such a scenario, getting user fan list information and adding paging to ten sql is actually very simple, and the performance of querying through linked tables is also very high, but sometimes, many developers use to take out a string of id, and then cycle through the information of each id, so if a lot of id puts great pressure on the database, and the performance is very low.

Statistical sql: most of the time, there are ranking lists in business. It is found that companies use databases to calculate directly. When doing aggregation operations on some large tables, they often take more than five seconds. These sql are generally very long and difficult to optimize. In such scenarios, if business permits (such as low consistency requirements or statistics from time to time), you can specifically do statistics from the database. In addition, I suggest using redis cache to handle this kind of business.

Super-large paging: some super-large paged slow queries such as limit 4000 and mysql 1000 are found in the slow query log. Because the paging of mysql is done in the server layer, delayed association can be used to reduce back to the table. However, such requests will not appear in the normal business logic of the relevant business code, so it is likely that malicious users are brushing the interface, so it is best to intercept these malicious requests by adding verification to the interface during development.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.