Optimization Analysis of big data query in Mysql 04/27 Update SLTechnology News&Howtos

Optimization Analysis of big data query in Mysql

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces Mysql big data query optimization analysis, hoping to supplement and update some knowledge, if you have other questions to understand, you can continue to follow my updated article in the industry information.

Mysql big data query optimization for many webmasters will not be carefully analyzed, for this problem Xiaobian recently encountered a 100W data optimization problem, the following sorted out some mysql related query optimization tests and related analysis hope to help you.

Related mysql video tutorials

First, an optimization of a simple correlation subquery.

In many cases, the performance of subqueries implemented on mysql is poor, which sounds a little sad. In particular, when using the IN () subquery statement, it is difficult to estimate the time spent on a table of a certain order of magnitude. My knowledge of mysql is not deep, so I can only figure out the mystery slowly.

Suppose there is an exists query statement like this:

Select * from table1 where exists (select * from table2 where id > = 30000 and table1.uuid=table2.uuid)

Table1 is a 100, 000-row table, table2 is a million-row table, and the local test results take 2.40 seconds.

Through the explain, you can see that the subquery is a related subquery (DEPENDENCE SUBQUERY); Mysql first scans the table of the external table1, and then executes the subquery one by one based on the returned uuid. If the outer table is a large table, we can imagine that the query performance will be worse than this test.

A simple optimization solution is to replace the subquery with the inner join method, and the query statement can be changed to:

Select * from table1 innner join table2 using (uuid) where table2.id > = 30000

The test result of this machine takes 0.68s.

Through explain, you can see that mysql uses the SIMPLE type (subquery or query other than union); the Mysql optimizer first filters table2, then makes a Cartesian product of table1 and table2 to get the result set, and then filters the data through on conditions.

2. efficiency analysis and optimization of multi-table joint query

1. Multi-table connection type

1. The Cartesian product (cross join) in MySQL can be CROSS JOIN or omit CROSS that is JOIN, or use', 'such as:

01.SELECT * FROM table1 CROSS JOIN table2 02.SELECT * FROM table1 JOIN table2 03.SELECT * FROM table1,table2 SELECT * FROM table1 CROSS JOIN table2 SELECT * FROM table1 JOIN table2 SELECT * FROM table1,table2

Because the result returned is the product of two joined tables, it is generally not recommended when there are WHERE, ON or USING conditions, because it is very slow when there are too many items in the table. Generally use LEFT [OUTER] JOIN or RIGHT [OUTER] JOIN

two。 Inner join INNER JOIN calls INNER JOIN an equivalent join in MySQL, that is, you need to specify an equivalent join condition in which CROSS and INNER JOIN are divided together in MySQL. Join_table: table_reference [INNER | CROSS] JOIN table_factor [join_condition]

3. The outer join in MySQL is divided into left outer join and right join, that is, in addition to returning the results that meet the connection conditions, you should also return the results in the left table (left join) or right table (right join) that do not meet the connection conditions, and the corresponding NULL is used.

Example:

User table:

Id | name-1 | libk 2 | zyfon 3 | daodao

User_ Action table:

Sql:

Analysis:

Notice that there is another record of user_id=4 and action=swim in user_action, but it does not appear in the result.

The users of id=3 and name=daodao in the user table do not have a corresponding record in user_action, but appear in the result set.

Since it is now left join, all work shall be based on left.

Results 1 the records of both the left table and the right table are both on the left table and on the right table, and 5 is the record on the left table but not on the right table.

How it works:

Read an entry from the left table and select all the right table records (n entries) that match on to join to form n records (including duplicate rows, such as result 1 and result 3). If there is no table matching the on condition on the right, the joined fields are null. Then move on to the next one.

Extension:

We can use the rule of displaying null without on matching in the right table to find out all the records in the left table and not in the right table. Note that the column used to judge must be declared as not null.

Such as:

Sql:

01.select id, name, action from user as u 02.left join user_action an on u.id = a.user_id 03.where a.user_id is NULL select id, name, action from user as u left join user_action an on u.id = a.user_id where a.user_id is NULL

(note:

1. Column value is null should use is null instead of = NULL

two。 Here the a.user_id column must be declared as NOT NULL.)

The result of the above sql:

Id | name | action-3 | daodao | NULL--

General usage:

A. LEFT [OUTER] JOIN:

In addition to returning the results that meet the join conditions, you also need to display the data columns in the left table that do not meet the join conditions, corresponding to the use of NULL

01.SELECT column_name FROM table1 LEFT [OUTER] JOIN table2 ON table1.column=table2.column SELECT column_name FROM table1 LEFT [OUTER] JOIN table2 ON table1.column=table2.column b. RIGHT [OUTER] JOIN:

RIGHT is similar to LEFT JOIN except that in addition to displaying the results that meet the join conditions, you also need to display the data columns in the right table that do not meet the join conditions, using the corresponding NULL.

01.SELECT column_name FROM table1 RIGHT [OUTER] JOIN table2 ON table1.column=table2.column SELECT column_name FROM table1 RIGHT [OUTER] JOIN table2 ON table1.column=table2.columnTips:

1. On a.c1 = b.c1 equals using (C1)

2. INNER JOIN and (comma) are semantically equivalent

3. When MySQL is retrieving information from a table, you can prompt it to select which index.

This feature is useful if EXPLAIN shows that MySQL uses the wrong index in the possible index list.

By specifying USE INDEX (key_list), you can tell MySQL to use the most appropriate of the possible indexes to find record rows in the table.

Optionally, one of the two syntax IGNORE INDEX (key_list) can be used to tell MySQL not to use a specific index. Such as:

01.mysql > SELECT * FROM table1 USE INDEX (key1,key2) 02.- > WHERE key1=1 AND key2=2 AND key3=3; 03.mysql > SELECT * FROM table1 IGNORE INDEX (key3) 04.- > WHERE key1=1 AND key2=2 AND key3=3; mysql > SELECT * FROM table1 USE INDEX (key1,key2)-> WHERE key1=1 AND key2=2 AND key3=3; mysql > SELECT * FROM table1 IGNORE INDEX (key3)-> WHERE key1=1 AND key2=2 AND key3=3

two。 Constraints for table joins

Add display conditions WHERE, ON, USING

1.WHERE clause mysql >

01.SELECT * FROM table1,table2 WHERE table1.id=table2.id; SELECT * FROM table1,table2 WHERE table1.id=table2.id

2. ON

Mysql >

01.SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id; 02. 03.SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id 04.LEFT JOIN table3 ON table2.id=table3.id; SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id;SELECT * FROM table1 LEFT JOIN table2 ON table1.id=table2.id LEFT JOIN table3 ON table2.id=table3.id

3. USING clause, you can use USING if the two columns of the join condition of two tables are joined with the same name

For example:

SELECT FROM LEFT JOIN USING ()

An example of joining more than two tables:

Mysql >

01.SELECT artists.Artist, cds.title, genres.genre 02. 03.FROM cds 04. 05.LEFT JOIN genres N cds.genreID = genres.genreID 06. 07.LEFT JOIN artists ON cds.artistID = artists.artistID; SELECT artists.Artist, cds.title, genres.genreFROM cdsLEFT JOIN genres N cds.genreID = genres.genreIDLEFT JOIN artists ON cds.artistID = artists.artistID

Or mysql >

01.SELECT artists.Artist, cds.title, genres.genre 02. 03.FROM cds 04. 05.LEFT JOIN genres ON cds.genreID = genres.genreID 06. 07. LEFT JOIN artists-> ON cds.artistID = artists.artistID 08. 09. WHERE (genres.genre = 'Pop'); SELECT artists.Artist, cds.title, genres.genre

FROM cds

LEFT JOIN genres ON cds.genreID = genres.genreID LEFT JOIN artists-> ON cds.artistID = artists.artistID WHERE (genres.genre = 'Pop')

In addition, it should be noted that when it comes to multi-table queries in MySQL, you need to figure out which join method is more efficient according to the situation of the query.

1. Cross-connect (Cartesian product) or internal connection [INNER | CROSS] JOIN

two。 Left outer connection LEFT [OUTER] JOIN or right outer connection RIGHT [OUTER] JOIN pay attention to specify the connection condition WHERE, ON,USING.

3. How MySQL optimizes LEFT JOIN and RIGHT JOIN

In MySQL, the A LEFT JOIN B join_condition execution process is as follows:

1) set up Table B based on all tables that Table An and A depend on.

2) set table A based on all tables used in the LEFT JOIN condition (except B).

3) the LEFT JOIN condition is used to determine how to search for rows from Table B. In other words, no conditions in the WHERE clause are used.

4) all standard joins can be optimized, with the exception of tables read from all tables on which it depends. If a circular dependency occurs, MySQL prompts for an error.

5) perform all standard WHERE optimizations.

6) if one row in A matches the WHERE clause, but no row in B matches the ON condition, another B row is generated, where all columns are set to NULL.

7) if you use LEFT JOIN to find rows that do not exist in some tables, and do the following test: col_name IS NULL in the WHERE section, where col_name is a column declared as NOT NULL, and MySQL finds a row that matches the LEFT JOIN condition and stops searching for other rows (for specific keyword combinations).

The execution of RIGHT JOIN is similar to LEFT JOIN, except that the role of the table is reversed.

The join optimizer calculates the order in which tables should be joined. The table read order enforced by LEFT JOIN and STRAIGHT_JOIN can help the join optimizer work faster because fewer table swaps are checked. Note that this indicates that if the following type of query is executed, MySQL performs a full scan b because LEFT JOIN forces it to read before d:

01.SELECT * 02.FROM a c.key=a.key b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) 03.WHERE b. Keyword d. Keyword; SELECT * FROM a r r b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key

In this case, the repair is in the reverse order of a, and b is listed in the FROM clause:

01.SELECT * 02.FROM BJI a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) 03.WHERE b.keyword d.key.SELECT * FROM BJI a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key

MySQL can do the following LEFT JOIN optimization: if the WHERE condition is always false for the resulting NULL row, the LEFT JOIN becomes a normal join.

For example, if t2.column1 is NULL,WHERE clause in the following query, it will be false:

01.SELECT * FROM T1 LEFT JOIN T2 ON (column1) WHERE t2.column2=5

SELECT * FROM T1 LEFT JOIN T2 ON (column1) WHERE T2 column 2 join 5; therefore, you can safely convert the query to a normal join:

01.SELECT * FROM T1, T2 WHERE t2.column2=5 AND t1.column1=t2.column1

SELECT * FROM T1, T2 WHERE t2.column2=5 AND T1, column1 column 1.column1 column 1; this is faster, because if you can make the query better, MySQL can use table T2 before table T1. To force the use of table order, use STRAIGHT_JOIN.

Third, use cache to realize

Now that community sharing sites are very popular, take Fangwei Shopping sharing website as an example. It is also a summary of the secondary development of Fang Wei Shopping sharing website, the master can fly by.

The key tables of shopping sharing are: sharing table, picture table, document table, comment table, label table, classification table and so on.

There are so many watches around sharing. Wow, that's a lot. When we view the details of a picture, we need to display the information in the table above. Show the category to which the picture belongs, the label to the picture, the comment on the picture, and the download information of the file if there is a file. Let us 6 tables to associate the query, of course, not so many associations to query data, we can query only one table, how to say? Here the sharing table is the main table, and we can create a cached field in the main table. For example, we call the cache_data field and give it the text type, so that we can store very long strings without exceeding the maximum storage of the field.

How do I use this cache field? After adding a sharing message, a sharing ID is generated. If the user publishes a picture or file, the picture information is entered into the picture table, the file information is entered into the file table, and then the newly generated picture or file information is written to the cache field. Similarly, if the user chooses to classify and tag, the corresponding information is also written to the cache field. For comments, it is not necessary to save all comments in the cache field, because you do not know how many records he has, and you can store the latest 10 in the cache field for display, so that the cache field becomes a two-dimensional or three-dimensional array and serialized and stored in the sharing table.

Array ('img' = array, url = >' http://tech.42xiu.com/123.jpg', width = > 800, width = > 600,), 'file' = array (name = >' abc.zip', download_url = > 'http: / / tech.42xiu.com/abc.zip', size = > 1.2Mb,),' category' = array (1 = > array (id = > 5) Name = > PHP Lezhi blog), 2 = > array (id = > 6, name = > PHP technology blog),), 'tag' = > array (tag1 tag2. ), 'message' = > array (1 = > array (id, uid, name, content, time), 2 = > array (id, uid, name, content, time), 3 = > array (id, uid, name, content, time), 4 = > array (id, uid, name, content, time),) / / for example, the above array structure, serialization is stored in the database.

UPDATE share SET cache_data=mysql_real_escape_string (serialize ($cache_data)) WHERE id=1; makes it easy to query just one item, take the cache field, deserialize it, extract the array information, and display it to the page. If it was the previous structure, it would have collapsed with hundreds of thousands of data. The method of data caching may not be the best, if you have a better way, you can learn from each other and discuss with each other.

The above are the details of analyzing and optimizing the efficiency of Mysql multi-table federation query, please pay more attention to other related articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.