What is the reason why SELECT * is not recommended in MySQL 07/11 Update SLTechnology News&Howtos

What is the reason why SELECT * is not recommended in MySQL

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article introduces the knowledge of "what is the reason why SELECT * is not recommended in MySQL". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

"Don't use SELECT *" has almost become a golden rule used by MySQL, and even the Ali Java Development Manual makes it clear that you cannot use * as a list of fields to be queried, giving this rule an authoritative blessing.

However, I often use SELECT * directly in the development process for two reasons:

Because of its simplicity, the development efficiency is very high, and if fields are added or modified frequently later, the SQL statement does not need to be changed.

I think it's a bad habit to optimize prematurely, unless you can determine what field you actually need in the first place and build an appropriate index for it; otherwise, I choose to optimize SQL when you are in trouble, as long as the trouble is not fatal.

But we always need to know why it is not recommended to use SELECT * directly. This article gives four reasons.

1. Unnecessary disk Ibank O

We know that MySQL essentially stores user records on disk, so a query operation is an act of IO on disk (provided that the record to be queried is not cached in memory).

The more fields you query, the more you have to read, thus increasing the disk IO overhead. The effect is especially obvious when some fields are of type TEXT, MEDIUMTEXT, or BLOB.

Will using SELECT * make MySQL take up more memory?

Not in theory, because for the Server layer, instead of passing the complete result set to the client all at once after storing the complete result set in memory, but each row fetched from the storage engine is written to a memory space called net_buffer, which is controlled by the system variable net_buffer_length. The default is 16KB. When the net_buffer is full, the data is sent to the client in socket send buffer, the memory space of the local network stack. After it is successfully sent (the client reads it), the net_buffer is emptied, and then the next line is read and written.

That is, by default, the result set takes up no more memory than net_buffer_length, and does not take up extra memory just because there are a few more fields.

two。 Aggravate network delay

To continue with the previous point, although the data in socket send buffer is sent to the client every time, it seems that the amount of data is small at a time, but someone can really use * to find out the fields of TEXT, MEDIUMTEXT or BLOB. The total amount of data is large, which directly leads to more network transmissions.

This overhead is obvious if the MySQL and the application are not on the same machine. Even if the MySQL server and client are on the same machine and the protocol is still TCP, communication takes extra time.

3. Cannot use override index

In order to illustrate this problem, we need to build a table

CREATE TABLE `PRIMARY KEY innodb` (`id` int NOT NULL AUTO_INCREMENT, `name` varchar (255) DEFAULT NULL, `gender` tinyint (1) DEFAULT NULL, `phone` varchar (11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `IDX_NAME_ PHONE` (`name`, `phone`) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

We created a table user _ innodb with the storage engine InnoDB, set id as the primary key, created a joint index for name and phone, and randomly initialized 500W + pieces of data to the table.

InnoDB automatically creates a B + tree for the primary key id called primary key index (also known as clustered index). The most important feature of this B+ tree is that the leaf node contains the complete user record, which looks like this.

If we execute this statement,

SELECT * FROM user_innodb WHERE name = 'cicada bathing in the wind'

Use EXPLAIN to view the execution plan of the statement:

It is found that this SQL statement uses the IDX_NAME_PHONE index, which is a secondary index. The leaf node of the secondary index looks like this:

According to the search conditions, the InnoDB storage engine will find the record that name is a cicada in the leaf node of the secondary index, but only name, phone and primary key id fields are recorded in the secondary index (who told us to use SELECT *), so InnoDB needs to take the primary key id to look for this complete record in the primary key index, a process called returning the table.

Think about it, if the leaf node of the secondary index has all the data we want, will there be no need to return to the table? Yes, this is the overlay index.

For example, we just want to search for name, phone, and primary key fields.

SELECT id, name, phone FROM user_innodb WHERE name = "cicada bathing in the wind"

Use EXPLAIN to view the execution plan of the statement:

You can see that the Extra column shows Using index, which means that our query list and search criteria only contain columns belonging to a certain index, that is, the overlay index is used, which can directly abandon the back table operation and greatly improve the query efficiency.

4. May slow down JOIN connection query

We created two tables T1 and T2 to join to illustrate the next problem, and inserted 100 pieces of data into the T1 table and 1000 pieces of data into T2.

CREATE TABLE `t1` (`id` int NOT NULL, `m` int DEFAULT NULL, `n` int DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT;CREATE TABLE `t2` (`id` int NOT NULL, `m` int DEFAULT NULL, `n` int DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT

If we execute the following statement

SELECT * FROM T1 STRAIGHT_JOIN T2 ON T1 = t2.m

Here I use the STRAIGHT_JOIN mandatory T1 table as the driven table and the T2 table as the driven table

For join queries, the driven table will only be accessed once, while the driven table will be accessed many times, depending on the number of records in the driven table that match the query records. Now that the driven and driven tables have been forcibly determined, let's talk about the nature of the join between the two tables:

T1, as the driver table, executes the query to the T1 table according to the filter conditions of the driver table. Because there is no filter condition, that is, to get all the data of the T1 table.

For each record in the result set obtained in the previous step, go to the driven table separately, and find the matching record according to the connection filter condition.

If expressed in pseudo code, the whole process goes like this:

/ / t1Res is for the filtered result set for (t1Row: t1Res) {/ / T2 is the complete driven table for (t2Row: T2) {if (satisfies the join condition & & meets the filtering condition of T2) {send to the client}

This method is the simplest, but also the worst performance, which is called nested loop join (Nested-LoopJoin,NLJ). How to speed up the connection?

One way is to create an index, preferably on the fields involved in the join condition of the driven table (T2). After all, the driven table needs to be queried many times. and access to the driven table is essentially a single-table query (because the T1 result set is set, the query condition for each connection T2 is also dead).

Now that indexes are used, in order to avoid repeating the mistake of not being able to use overridden indexes, we should try not to SELECT * directly, but instead use the fields actually used as query columns and build appropriate indexes for them.

But if we don't use indexes, does MySQL really join queries in the same way as nested loop queries? Of course not, after all, this kind of nested loop query is too slow!

Before MySQL8.0, MySQL provided block-based nested loop join (Block Nested-Loop Join,BLJ) method, and MySQL8.0 introduced hash join method, both of which were proposed to solve a problem, that is, to minimize the number of visits to driven tables.

Both methods use a fixed-size memory area called join buffer, in which several records in the result set of the drive table are stored (the difference between the two methods is that the form of storage is different). In this way, when the records of the driven table are loaded into memory, they are matched with the records in multiple drive tables in join buffer at one time, because the matching process is done in memory. So this can significantly reduce the cost of the driven table, and greatly reduce the cost of repeatedly loading the driven table from disk. The process of using join buffer is shown in the following figure:

Let's take a look at the execution plan of the join query above and find that hash join is indeed used (provided that no index is created on the join query field of the T2 table, otherwise the index will be used instead of join buffer).

In the best case, the join buffer is large enough to hold all the records in the result set of the driven table, so that you only need to access the driven table once to complete the join operation. We can configure it using the join_buffer_size system variable, and the default size is 256KB. If it doesn't fit, you have to put the result set of the driver table into join buffer in batches, and after comparing in memory, empty the join buffer and load the next batch of result sets until the join is complete.

Here comes the point! Not all the columns that drive the table records will be put into the join buffer, only the columns in the query list and the columns in the filter conditions will be put into the join buffer, so remind us again that it is best not to use * as the query list, just put the columns we care about into the query list, so that you can put more records in the join buffer, reduce the number of batches, and naturally reduce the number of visits to the driven table.

This is the end of the content of "Why SELECT * is not recommended in MySQL". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.