In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you what the underlying data structure of the MySQL index is, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Case study:
CREATE TABLE `employees` (`id`int (11) NOT NULL AUTO_INCREMENT, `name` varchar (24) NOT NULL DEFAULT''COMMENT' name', `age`int (11) NOT NULL DEFAULT'0' COMMENT 'age', `position`varchar (20) NOT NULL DEFAULT''COMMENT' position', `hire_ time`timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'entry time', PRIMARY KEY (`id`), KEY `idx_name_age_ position` (`name`, `age`, `position`) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COMMENT=' employee record table' INSERT INTO employees (name,age,position,hire_time) VALUES ('LiLei',22,'manager',NOW ()); INSERT INTO employees (name,age,position,hire_time) VALUES (' HanMeimei', 23); INSERT INTO employees (name,age,position,hire_time) VALUES ('Lucy',23,'dev',NOW ())
Analyze the index usage of the following sql:
SELECT * FROM employees WHERE name= 'LiLei';SELECT * FROM employees WHERE name=' LiLei' AND age = 22 AND position = 'manager';SELECT * FROM employees WHERE age = 22 AND position =' manager';SELECT * FROM employees WHERE name= 'LiLei' AND age > 22 AND position =' manager';SELECT * FROM employees WHERE name! = 'LiLei'
Index Analysis of Mysql
MySQL officially defines index as: index (Index) is a data structure that helps MySQL to obtain data efficiently. The essence of the index: the index is a data structure, and it is a data structure that implements the advanced search algorithm. The index is generally stored on disk in the form of a file, and index retrieval requires disk Icano operation.
Disk access principle
Seek time (slow, time-consuming).
Rotation time (faster) pre-read: the length is an integral multiple of the page (main memory and disk exchange data per page, 4K per page).
Structure of the index
Binary tree
Red and black tree
HASH
BTREE .
Index underlying data structure and algorithm
Hash index
If it is an equivalent query, the hash index has an absolute advantage, on the premise that the key value is unique.
Hash indexing can not complete range query and hash indexing, and some fuzzy query hash indexes such as like 'xxx%' do not support multi-column joint indexes. in the case of a large number of duplicate keys, the efficiency of hash indexing is also very low because of the hash collision problem.
B-Tree
Degree (Degree)-number of data stores of the node
Leaf nodes have the same depth
The pointer to the leaf node is empty
The data key in the node is arranged incrementally from left to right.
B+Tree
Non-leaf nodes do not store data, but only key, which can be magnified
Leaf nodes do not store pointers
Sequential access pointers to improve the performance of interval access.
MyISAM index implementation (nonclustered)
The MyISAM index file and the data file are separate.
InnoDB index implementation (clustering)
The data file itself is the index file.
The table data file itself is an index structure file composed of B+Tree.
Clustered index-the leaf node contains the complete data record
Why InnoDB tables require primary keys, and it is recommended to use self-increasing primary keys of integers
Why do non-primary key index structure leaf nodes store primary key values? (consistency and storage space savings).
Federated index data structure
EXPLAIN execution plan
Use the EXPLAIN keyword to simulate the optimizer's execution of SQL statements, to know how MySQL handles your SQL statements, and to analyze performance bottlenecks in your query or table structure.
Syntax: Explain + SQL statement
Add the explain keyword before the select statement, and MySQL sets a flag on the query, and when the query is executed, it returns information about the execution plan instead of executing the SQL (if the subquery is included in the from, the subquery is still executed, putting the results in the temporary table).
Implement the role of the plan
Reading order of the table
Operation type of data read operation
Which indexes can be used
Which indexes are actually used
References between tables
How many rows in each table are queried by the optimizer.
Explain case
DROP TABLE IF EXISTS `actor`; CREATE TABLE `actor` (`id` int (11) NOT NULL, `name` varchar (45) DEFAULT NULL, `update_ time` datetime DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO `actor` (`id`, `name`, `update_ time`) VALUES CREATE TABLE `qing` (`id` int (11) NOT NULL AUTO_INCREMENT, `name` varchar (10) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_ name` (`name`) ENGINE=InnoDB DEFAULT CHARSET=utf8;INSERT INTO `ang` (`id`, `name`) VALUES (3dhow0'), (1djingl'), (2mcmpl); DROP TABLE IF EXISTS `film_ actor` CREATE TABLE `film_ actor` (`id` int (11) NOT NULL, `film_ id` int (11) NOT NULL, `actor_ id` int (11) NOT NULL, `remark` varchar (25525) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_film_actor_ id` (`film_ id`, `actor_ id`) ENGINE=InnoDB DEFAULT CHARSET=utf8;INSERT INTO `film_ actor` (`id`, `film_ id`, `actor_ id`) VALUES (1), (2) (2), (2)
Explain select * from actor
Each table in the query outputs one row, and if two tables join the query through join, two rows are output. The meaning of a table is quite broad: it can be a subquery, an union result, and so on.
Two varieties of explain
Explain extended
Some additional information on query optimization will be provided on top of explain. Then you can get the optimized query statement through the show warnings command to see what the optimizer has optimized. In addition, there is a filtered column, which is a half-fraction value, and rows * filtered/100 can estimate the number of rows that will be joined to the previous table in the explain (the previous table refers to the table whose id value in explain is lower than the current table id value).
Explain extended select * from film where id = 1
Show warnings
Explain partitions
There is an extra partitions field compared to explain, and if the query is based on a partition table, it shows the partition that the query will access.
Columns in explain
Next we will show the information for each column in the explain.
1. Id
The number of the id column is the serial number of the select, there are several id for every select, and the order of the id is increased in the order in which the select appears. MySQL divides select queries into simple queries (SIMPLE) and complex queries (PRIMARY). Complex queries are divided into three categories: simple subqueries, derived tables (subqueries in from statements), and union queries. The larger the id column, the higher the execution priority, the same id is executed from top to bottom, and id is the last execution of NULL.
1) simple subquery
Explain select (select 1 from actor limit 1) from film
2) Sub-query in from clause
Explain select id from (select id from film) as der
This query executes with a temporary table alias named der, which is referenced by an external select query.
3) union query
Explain select 1 union all select 1
The union result is always placed in an anonymous temporary table, and the temporary table does not appear in the SQL, so its id is NULL.
2. Select_ type column
Select_type indicates whether the corresponding row is a simple or complex query, and if it is a complex query, which of the three complex queries mentioned above. 1) simple: simple query. The query does not include subqueries and union.
Explain select * from film where id = 2
2) primary: outermost select in complex queries 3) subquery: subqueries contained in select (not in the from clause) 4) derived: subqueries contained in the from clause. MySQL stores the results in a temporary table, also known as a derived table (which means derived in English). Use this example to learn about primary, subquery, and derived types.
Explain select (select 1 from actor where id = 1) from (select * from film where id = 1) der
5) union: the second and subsequent select 6 in union) union result: select that retrieves the results from the union temporary table uses this example to learn about the union and union result types:
Explain select 1 union all select 1
3. Table column
This column indicates which table the row of explain is accessing. When there is a subquery in the from clause, the table column is in format, indicating that the current query depends on the query of id=N, so execute the query of id=N first. When there is union, the value of the table column of UNION RESULT is, and 1 and 2 represent the select row id that participates in union.
4. Type column
This column represents the association type or access type, that is, MySQL decides how to look up the rows in the table and find the approximate range of data row records.
The complete result values from the best to the worst are system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL.
Need to remember: system > const > eq_ref > ref > range > index > ALL generally speaking, it is necessary to ensure that the query reaches the range level, preferably refNULL:mysql can decompose the query statement in the optimization phase, and there is no need to access the table or index in the execution phase. For example, selecting the minimum value in the index column can be done by looking up the index separately, without the need to access the table at execution time.
Explain select min (id) from film
Const, system:mysql can optimize a part of a query and convert it into a constant (see the results of show warnings). When all columns used for primary key or unique key are compared to constants, the table has at most one matching row, read once, and is faster. System is a special case of const. Only one tuple in the table is system.
Explain extended select * from (select * from film where id = 1) tmp
Show warnings
All parts of the eq_ref:primary key or unique key index are concatenated, and at most one qualified record is returned. This is probably the best join type outside of const, and this kind of type does not occur in simple select queries.
Explain select * from film_actor left join film on film_actor.film_id = film.id
Ref: compared to eq_ref, you don't use a unique index, but use a partial prefix of a normal index or a unique index. If the index is compared to a value, you may find multiple rows that match the criteria.
Simple select query, name is a normal index (non-unique index)
Explain select * from film where name = "film1"
Associated table query, idxfilmactorid is the joint index of filmid and actorid, which uses the left prefix film_id part of filmactor.
Explain select film_id from film left join film_actor on film.id = film_actor.film_id
Range: range scanning usually occurs in in (), between, >, =, and so on. Use an index to retrieve a given range of rows.
Explain select * from actor where id > 1
Index: scan full table indexes, which is usually faster than ALL.
Explain select * from film
ALL: a full table scan, which means that mysql needs to find the desired row from beginning to end. Usually this needs to be optimized by adding an index.
Explain select * from actor
5. Possible_ Keys column
This column shows which indexes the query might use to find it. When explain, it is possible that possible_keys has columns, while key displays NULL, which is because there is not much data in the table. Mysql thinks that the index is not very helpful to this query and chooses a full table query. If the column is NULL, there is no associated index. In this case, you can improve query performance by checking the where clause to see if an appropriate index can be created, and then use explain to see the effect.
6. Key column
This column shows which index mysql actually uses to optimize access to the table. If no index is used, the column is NULL. If you want to force mysql to use or ignore indexes in the possible_keys column, use force index, ignore index in the query.
Explain select * from film ignore index (idx_name)
7. Key_ Lenn column
This column shows the number of bytes used by mysql in the index, which allows you to figure out which columns in the index are used. For example, filmactor's federated index idxfilmactorid consists of two int columns, filmid and actorid, and each int is 4 bytes. From the keylen=4 in the result, it can be inferred that the query uses the first column, the filmid column, to perform an index lookup.
Mysql > explain select * from film_actor where film_id = 2
The key_len calculation rules are as follows:
String
Char (n): n byte length
Varchar (n): 2 bytes store string length, if utf-8, length 3n + 2
Numerical type
Tinyint:1 byte
Smallint:2 byte
Int:4 byte
Bigint:8 byte
Time type
Date:3 byte
Timestamp:4 byte
Datetime:8 byte
If the field is allowed to be NULL, a 1-byte record is required to see if it is NULL.
The maximum length of the index is 768 bytes. When the string is too long, mysql will do a process similar to the left prefix index, extracting the first half of the characters for indexing.
8. Ref column
This column shows the columns or constants used by the table to look up values in the index of the key column record, such as: const (constant), field name (example: film.id).
9. Rows column
This column is the number of rows that mysql estimates to read and detect. Note that this is not the number of rows in the result set.
10. Extra column
This column shows additional information. Common important values are as follows: Using index: the columns of the query are overwritten by the index, and the where filter is the leading column of the index, which is a high-performance performance. An overlay index is typically used (the index contains the fields of all queries). For innodb, there will be a lot of improvement in secondary index performance.
Explain select film_id from film_actor where film_id = 1
Using where: the column of the query is not overwritten by the index, and the where filter is not the leading column of the index.
Explain select * from actor where name ='a'
Using where Using index: the column of the query is overwritten by the index, and the where filter is one of the index columns but not the leading column of the index, which means that the eligible data cannot be queried directly through the index lookup.
Explain select film_id from film_actor where actor_id = 1
NULL: the column of the query is not overwritten by the index, and the where filter is the leading column of the index, which means that the index is used, but some fields are not overwritten by the index and must be implemented by "returning to the table". The index is not used purely, nor is the index not used at all.
Explain select * from film_actor where film_id = 1
Using index condition: similar to Using where, the column of the query is not completely covered by the index, and the range of a leading column in the where condition
Explain select * from film_actor where film_id > 1
Using temporary:mysql needs to create a temporary table to process the query. This kind of situation generally needs to be optimized, the first thing is to think of using the index to optimize. 1. Actor.name has no index, so a temporary table is created to distinct.
Explain select distinct name from actor
Film.name sets up the idx_name index. When querying, extra is using index and no temporary table is used.
Explain select distinct name from film
Instead of reading rows from the table in indexed order, Using filesort:mysql sorts the results with an external index. At this point, mysql browses all eligible records according to the join type, saves the sort keywords and row pointers, and then sorts the keywords and retrieves the row information in order. In this case, it is generally necessary to consider using indexes to optimize.
1. Actor.name does not create an index, so it browses the entire actor table, saves the sort keyword name and the corresponding id, and then sorts name and retrieves row records.
Explain select * from actor order by name
2. Film.name establishes the idx_name index, and extra is using index when querying.
Explain select * from film order by name
Indexing best practices
Table used
CREATE TABLE `employees` (`id`int (11) NOT NULL AUTO_INCREMENT, `name` varchar (24) NOT NULL DEFAULT''COMMENT' name', `age`int (11) NOT NULL DEFAULT'0' COMMENT 'age', `position`varchar (20) NOT NULL DEFAULT''COMMENT' position', `hire_ time`timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'entry time', PRIMARY KEY (`id`), KEY `idx_name_age_ position` (`name`, `age`, `position`) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COMMENT=' employee record table' INSERT INTO employees (name,age,position,hire_time) VALUES ('LiLei',22,'manager',NOW ()); INSERT INTO employees (name,age,position,hire_time) VALUES (' HanMeimei', 23); INSERT INTO employees (name,age,position,hire_time) VALUES ('Lucy',23,'dev',NOW ())
Best practic
1. Full value matching
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei'
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 22
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 22 AND position =' manager'
two。 Leftmost prefix rule
If you index multiple columns, follow the leftmost prefix rule. This means that the query starts at the leftmost front column of the index and does not skip the columns in the index.
EXPLAIN SELECT * FROM employees WHERE age = 22 AND position = 'manager'
EXPLAIN SELECT * FROM employees WHERE position = 'manager'
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei'
3. Do not do anything on the index column (calculation, function, (automatic or manual) type conversion), which will cause the index to fail and turn to a full table scan
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei'
EXPLAIN SELECT * FROM employees WHERE left (name,3) = 'LiLei'
4. The storage engine cannot use the column to the right of the scope condition in the index
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 22 AND position =' manager'
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age > 22 AND position =' manager'
5. Try to use overlay indexes (queries that only access the index (index columns contain query columns) and reduce select * statements
EXPLAIN SELECT name,age FROM employees WHERE name= 'LiLei' AND age = 23 AND position =' manager'
EXPLAIN SELECT * FROM employees WHERE name= 'LiLei' AND age = 23 AND position =' manager'
The inability of 6.mysql to use an index when using (! = or) will result in a full table scan.
EXPLAIN SELECT * FROM employees WHERE name! = 'LiLei'
7.is null,is not null also cannot use the index
EXPLAIN SELECT * FROM employees WHERE name is null
8.like begins with a wildcard ('$abc...') mysql index failure becomes a full table scan operation
EXPLAIN SELECT * FROM employees WHERE name like'% Lei'
EXPLAIN SELECT * FROM employees WHERE name like 'Lei%'
Problem: to solve the problem that the like'% string% 'index is not used? a) using an overlay index, the query field must be an overlay index field
EXPLAIN SELECT name,age,position FROM employees WHERE name like'% Lei%'
B) when the fields pointed to by the overlay index are varchar (380) and above, the overlay index will be invalid!
9. Index invalidation of string without single quotation marks
EXPLAIN SELECT * FROM employees WHERE name = '1000'
EXPLAIN SELECT * FROM employees WHERE name = 1000
10. Use or less, the index will fail in many cases when using it to connect
EXPLAIN SELECT * FROM employees WHERE name = 'LiLei' or name =' HanMeimei'
11.in and exsits optimization
Principle: small tables drive large tables, that is, small datasets drive large datasets
In: when the dataset of Table B must be smaller than that of Table A, in is better than exists
Select * from A where id in (select id from B)
Explainselect*fromfilmwhereidin (selectfilm_idfromfilm_actor)
Exists: when the data set of table An is smaller than that of table B, exists is better than in. Put the data of main query An into subquery B for conditional verification. According to the verification results (true or false), you can decide whether the data of the main query should retain select * from A where exists (select 1 from B where B.id = A.id) # the ID field of table An and table B should be indexed.
Explain select * from film where exists (select 1 from film_actor where film_actor.film_id = film.id)
EXISTS (subquery) only returns TRUE or FALSE, so the SELECT * in a subquery can also be SELECT 1 or select X. officially, the SELECT list is ignored during actual execution, so there is no difference. The actual execution of the exists subquery may have been optimized rather than compared one by one as we understand it. Exist sub-query can often be replaced by JOIN, which needs specific analysis of specific problems.
Summary:
① MySQL supports two ways of sorting filesort and index,Using index, which means that MySQL scans the index itself to complete the sorting. The efficiency of index is high, while that of filesort is low.
② order by uses Using index in two situations.
The order by statement uses the leftmost front column of the index.
The combination of the where clause and the order by clause conditional column satisfies the leftmost front column of the index.
③ tries to sort on the index column, following the leftmost prefix rule of index establishment (the order in which the index is created).
④ Using filesort is generated if the condition of order by is not on the index column. ⑤ group by is similar to order by in that it essentially sorts and then groups, following the leftmost prefix rule in the order in which the index is created. Note that where is higher than having, so don't go to having for the qualification that can be written in where.
The above content is what is the underlying data structure of the MySQL index. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.