Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is a MySQL connection query

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

What is a MySQL connection query? I believe that many inexperienced people are at a loss about this, so this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

1 effect

In the database, the join operation is called join, and its function is to connect the data of multiple tables (through the join condition), get the data from multiple tables and merge them together and return them to the client as a result set. For example:

Table A:

Idnameage1A182B193C20

Table B:

Iduidgender11F22M

The data for merging the two tables can be obtained through the connection:

Select A. gender from A left join B on A.id=B.uididnameagegender1A18F2B19M3C20null2 connection keyword

To join two tables, we can use two keywords: on,using. On can specify specific conditions, using specifies columns of the same name and data type as equivalence criteria, and multiple columns are separated by commas.

It is as follows:

On: select * from A join B on A.id=B.id and B.name=''using: select * from A join B using (id,name) = select * from A join B on A.id=B.id and A.name=B.name3 connection type 3.1 intra-connection and cross-connection syntax: a join | inner join | cross join B representation: an and B satisfy the intersection of connection condition records. If there is no connection condition, it is the Cartesian product of An and B. in MySQL, cross join Inner join and join implement the same function. Therefore, in the official documentation of MySQL, it is indicated that the three are equivalent. Implicit join syntax: from A Magi BJI C expression: equivalent to the join features of on and using that cannot be used: a comma is an implicit join operator. Implicit connections are standard content in SQL92, while explicit connections are standard in SQL99. Although many people still use private connections, it has been removed from the standard. From a usage point of view, it is recommended to display joins, which can more clearly show the join relationships between multiple tables and the properties of join dependencies. 3.2Outer join left outer join syntax: A left join B performance: all the data in the left table is retained, the records in the right table meet the join conditions, and the records that do not meet the conditions are all null right outer join syntax: A right join B performance: all the data in the right table is retained, the records in the left table meet the connection conditions, and the records that do not meet the conditions are all null full outer connections.

MySQL does not support full external connections, only left outer connections and right outer connections. If you want to obtain fully connected data, you can obtain it by merging left and right connected data, such as select * from A left join B on A.name = B.name union select * from A right join B on B.name = B.name;.

Here the union will be automatically deduplicated, so that the data obtained is all externally connected.

3.3.Natural connection syntax: A natural join B = A natural left join B = A natural right join B expression: it is equivalent to a connection for which the connection condition cannot be specified. MySQL uses the same name and type fields in the left and right table as the connection condition. Features: natural connection is also divided into natural inner connection, left outer connection, right outer connection, its performance is consistent with the above mentioned, but the connection conditions are automatically determined by MySQL. 4 execution sequence

During the connection process, the order in which the MySQL keywords are executed is as follows:

From-> on | using-> where-> group by-> having-> select-> order by-> limit

As you can see, the condition of join is prior to where, that is, the result set is filtered by where after the result set is obtained by joining. Therefore, when using join, we should provide the condition of connection as much as possible, and use less condition of where, so as to improve the query performance.

5 connection algorithm

There are three algorithms for join, which are Nested Loop Join,Hash join,Sort Merge Join. According to the official documentation of MySQL, MySQL only supports Nested Loop Join.

Specifically, Nested Loop Join is divided into three subdivision algorithms:

SNLJBNLJINLJ

Let's take a look at how these three algorithms are joined for the join statement select * from A left join B on A.id=B.tid.

5.1 Simple Nested Loop Join (SNLJ)

Without using the index, SNLJ scans the connected two tables through a two-layer loop to get the output of two records that meet the criteria. That is, let two tables do Cartesian product scan, is a more violent algorithm, will be more time-consuming. The process is as follows:

For (an in A) {for (b in B) {if (a.id = = b.tid) {output;}

Of course, even if there is no index available, or MySQL decides that a full table scan may be faster than using an index, MySQL will not choose to use the overly rough SNLJ algorithm, but instead use the following algorithm.

5.2 Block Nested Loop Join (BNLJ)

INLJ is the join algorithm used when MySQL cannot use indexes. The rows of the outer loop are stored in join buffer, and each line of the inner loop is compared with the record in the entire buffer, thus reducing the number of inner loops. The logic is as follows:

For (blockA in A.blocks) {for (b in B) {if (b.tid in blockA.id) {output;}

Compared with the SNLJ algorithm, the BNLJ algorithm can effectively reduce the number of inner loops by dividing the result set of the outer loop.

Principle

For example, the result set of the outer loop is 100 rows, and the inner table needs to be scanned 100 times using the SNLJ algorithm. If the BNLJ algorithm is used, assuming that the number of each shard is 10, the 10 rows of records read from the Outer Loop table (external table) are first put into join buffer, and then the 10 rows of data are directly matched in each loop in the InnerLoop table (internal table), so that the inner loop only needs 10 times, and the scanning of the internal table is reduced by 910. Therefore, the BNLJ algorithm can significantly reduce the number of inner loop table scans.

Of course, here, regardless of the SNLJ or BNLJ algorithm, their total number of comparisons is the same, comparing each line of the outer loop with each line of the inner loop.

The BNLJ algorithm reduces the total number of scanning rows, while the SNLJ algorithm scans the data of table A row by row in the outer loop, and then takes A.id to table B to scan row by row to see if it matches. The BNLJ algorithm is that the outer loop scans the data of table A row by row, then puts it into memory block, then scans table B row by row, and compares the row data of B with the data block of An in memory block. Here, you can compare the data of many rows of A with the data of B at a time, and compare them in memory, which is faster.

Influencing factors

Here, the total number of scan rows of the BNLJ algorithm is determined by the amount of data N of the outer loop, the number of blocks K and the amount of data M of the inner loop. Among them, the number of blocks K is closely related to the amount of data N of the outer loop, which can be expressed as λ N, where the value of λ is (0,1). Then the total number of scans is Cronn + λ NM.

It can be seen that in this formula, the size of both N and λ will affect the number of scan rows, but λ is the key factor affecting the number of scan rows, and the smaller the value, the better (unless the difference between N and M is very large, then N will become the key factor).

So what will affect the size of λ? That's the size of the join_buffer_size setting for MySQL. λ and join_buffer_size are reciprocal. The larger the join_buffer_size, the larger the block, and the smaller the λ, the less the number of blocks, that is, the less the number of outer loops. Therefore, when the index is not used, we should give priority to expanding the size of the join_buffer_size, so that the optimization effect will be more obvious. When the upper index can be used, MySQL uses the following algorithm for join.

5.3 Index Nested Loop Join (INLJ)

INLJ is the algorithm used by MySQL to determine that the index of the driven table can be used. Assuming that the data behavior of table An is similar to that of table B, and B.tid has established an index, then Index Nested Loop Join will be used for select * from A left join B on A.identicesB.tidjol. The process is as follows:

For (an in A) {if (a.id in B.tid.Index) {output;}}

A total of 10 loops are required, and the data of B is queried through the index once in each loop. If we turn to B left join A, we have to loop B a total of 100 times, so if we use join, we need to use the small table as the driver table, so as to effectively reduce the number of loops. It is important to note, however, that the premise of this conclusion is that the index of the driven table can be used.

The inner loop of INLJ reads the index, which can reduce the number of memory loops and improve the efficiency of join, but there is also a disadvantage, that is, if the scanned index is non-clustered index and needs to access non-index data, there will be an operation to read the data back to the table, which will lead to one more random Icano operation. For example, the above match to tid in the index, but also to find the location of the tid line on the disk, see my previous article: how to store the detailed index of the MySQL index.

6 pay attention to increase the join conditions as much as possible, reduce the size of the dataset after join, use small result sets to drive large result sets, first join the tables with small filter results, and then join the join fields of the driven tables with larger result sets, and use the upper index. Using the index above includes using this field, and there will be no failure of the index. Common problems with setting large enough join_buffer_size7 external connections

Q: if you want to filter the data that drives the table, such as the left join filter the data of the left table, should you filter in the join condition or where?

A: to pass where filtering, the connection condition only affects the connection process and does not affect the number of results returned by the connection (in some cases, the connection condition will affect the number of results returned by the connection, for example, in the left connection, the matching data on the right side is not unique)

Q: what if the data rows matched by the driven table are not unique, resulting in the final connection data exceeding the amount of data in the driven table? For example, for a left join, the matching rows in the right table are not unique.

A:join first deweights the driven table, for example, by group by: A lef join (select * from B group by name).

After reading the above, have you mastered what is the method of MySQL join query? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report