What are the practical optimization methods commonly used in MySQL 07/08 Update SLTechnology News&Howtos

What are the practical optimization methods commonly used in MySQL

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what are the practical optimization methods commonly used in MySQL". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "what are the practical optimization methods commonly used in MySQL"?

When the number of records in a single table of MySQL is too large, the performance of adding, deleting, modifying and querying will decline sharply. You can optimize it by referring to the following steps:

Single table optimization

Unless the data of a single table will continue to rise in the future, do not consider splitting at the beginning. Splitting will bring various complexities of logic, deployment, and operation and maintenance. Generally, tables dominated by integer values are at tens of millions of levels.

Below, there is no big problem for a string-based table below 5 million. In fact, in many cases, the performance of a single MySQL table still has a lot of room for optimization, and even can normally support more than 10 million levels of data:

Fields try to use TINYINT, SMALLINT, MEDIUM_INT as integer types instead of INT, and add UNSIGNED if non-negative

The length of VARCHAR allocates only the space that is really needed

Use enumerations or integers instead of string types

Try to use TIMESTAMP instead of DATETIME

Do not have too many fields in a single table. It is recommended that it be less than 20.

Avoid using NULL fields, which are difficult to query and optimize and take up extra index space

Using integers to save IP

The more indexes, the better. To create targeted indexes according to the query, consider building indexes on the columns involved in the WHERE and ORDER BY commands. You can check whether indexes or full table scans are used according to EXPLAIN.

Try to avoid judging the NULL value of a field in the WHERE clause, otherwise it will cause the engine to give up using the index and do a full table scan

Fields with sparse value distribution are not suitable for indexing, such as "gender", which has only two or three values.

Character fields are indexed with prefixes only

The character field had better not be the primary key.

No foreign keys are needed, and the program guarantees the constraint.

Try not to use UNIQUE, and the program guarantees the constraint.

When using multi-column indexes, the order of ideas and query conditions are consistent, while unnecessary single-column indexes are deleted.

Query SQL can find the slower SQL by opening the slow log

Do not do column operations: SELECT id WHERE age + 1 = 10, any column operation will result in a table scan, including database tutorial functions, evaluation expressions, etc. When querying, move the operation to the right of the equal sign as much as possible

Sql statements are as simple as possible: a sql can only operate on one cpu; large statements break down small statements to reduce lock time; a large sql can block the entire library

No SELECT *

Rewrite OR into IN: the efficiency of OR is n level, the efficiency of IN is log (n) level, and the number of in is recommended to be less than 200.

Without functions and triggers, implemented in the application

Avoid% xxx queries

Use less JOIN

Compare with the same type, such as the ratio of '123' to '123', 123 to 123

Try to avoid using the! = or operator in the WHERE clause, otherwise the engine will give up using the index and do a full table scan

For consecutive values, use BETWEEN instead of IN: SELECT id FROM t WHERE num BETWEEN 1 AND 5

List data do not take the whole table, use LIMIT to page, the number of each page is not too large

engine

At present, MyISAM and InnoDB engines are widely used:

MyISAM

The MyISAM engine is the default engine for MySQL 5.1 and earlier, with the following features:

Row locks are not supported. Locks are applied to all tables that need to be read when reading and exclusive locks are added to tables when writing

Transactions are not supported

Foreign keys are not supported

Security recovery after crash is not supported

While the table has a read query, new records can be inserted into the table.

Support for the first 500 characters of BLOB and TEXT, and full-text indexing

Support for delayed updating of indexes, greatly improving write performance

For tables that will not be modified, compressed tables are supported to greatly reduce disk space consumption

InnoDB

InnoDB becomes the default index after MySQL 5.5, which is characterized by:

Supports row locks and uses MVCC to support high concurrency

Support transaction

Foreign keys are supported

Support for security recovery after a crash

Full-text indexing is not supported

Ps: it is said that innodb already supports full-text indexing in mysql 5.6.4.

Generally speaking, MyISAM is suitable for SELECT-intensive tables, while InnoDB is suitable for INSERT and UPDATE-intensive tables.

System tuning parameters

You can use the following tools for benchmarking:

Sysbench: a modular, cross-platform and multithreaded performance testing tool

Iibench-mysql: a tool for insert performance testing of MySQL/Percona/MariaDB indexes based on Java

Tpcc-mysql: a TPC-C testing tool developed by Percona

There are many specific tuning parameters. For more information, please refer to the official documentation. Here are some important parameters:

The back_log:back_ log value indicates how many requests can be stored on the stack in a short period of time before MySQL temporarily stops answering new requests. In other words, if the connection data of MySql reaches max_connections, new requests will be stored in the stack waiting for a connection to release resources. The number of back_log in the stack will not be granted connection resources if the number of waiting connections exceeds back_log. Can rise from the default 50 to 500

Wait_timeout: database connection idle time, idle connection will consume memory resources. Can be reduced from the default 8 hours to half an hour

Max_user_connection: maximum number of connections. Default is 0. There is no upper limit. It is best to set a reasonable upper limit.

Thread_concurrency: number of concurrent threads, set to twice the number of CPU cores

Skip_name_resolve: disable DNS parsing of external connections, eliminating DNS parsing time, but require all remote hosts to access with IP

Key_buffer_size: the cache size of the index block. Increasing the cache size will increase the index processing speed and have the greatest impact on the performance of the MyISAM table. For about 4G memory, it can be set to 256m or 384m. By querying show status like 'key_read%', it is best to ensure that the key_reads / key_read_requests is less than 0.1%.

Innodb_buffer_pool_size: caches data blocks and index blocks, which have the greatest impact on InnoDB table performance. By querying show status like 'Innodb_buffer_pool_read%', make sure (Innodb_buffer_pool_read_requests-Innodb_buffer_pool_reads) / Innodb_buffer_pool_read_requests is as high as possible

The innodb_additional_mem_pool_size:InnoDB storage engine is used to store data dictionary information and the memory space of some internal data structures. When there are a large number of database objects, adjust the size of this parameter appropriately to ensure that all data can be stored in memory to improve access efficiency. When too small, MySQL will record Warning information to the database error log, so it is necessary to resize this parameter.

The buffer used by the transaction log of the innodb_log_buffer_size:InnoDB storage engine is generally not recommended to exceed 32MB

Query_cache_size: cache the ResultSet in MySQL, that is, the result set of the execution of a SQL statement, so only for select statements. Any change in the data of a table will invalidate the cached data in the Query Cache of all select statements that reference the table. So, when our data changes very frequently, the loss of using Query Cache may outweigh the gain. Adjust according to the hit rate (Qcache_hits/ (Qcache_hits+Qcache_inserts) * 100). Generally speaking, it is not recommended to be too large. 256MB may be almost enough, and large-scale configuration static data can be adjusted appropriately.

You can view the current system Query catch usage by using the command show status like 'Qcache_%'

Read_buffer_size:MySql read-in buffer size. A request for a sequential scan of the table allocates a read buffer and MySql allocates a memory buffer for it. If sequential scan requests for a table are frequent, its performance can be improved by increasing the value of the variable and the memory buffer size

The buffer size used by sort_buffer_size:MySql to perform sorting. If you want to increase the speed of ORDER BY, first see if you can let MySQL use indexes instead of additional sorting phases. If not, you can try to increase the size of the sort_buffer_size variable

The random read buffer size of the read_rnd_buffer_size:MySql. When rows are read in any order (for example, in sort order), a random read cache is allocated. When sorting a query, MySql scans the buffer first to avoid disk search and improve query speed. If you need to sort a large amount of data, you can increase the value appropriately. However, MySql allocates this buffer space for each customer connection, so you should try to set this value appropriately to avoid excessive memory overhead.

Record_buffer: each thread that performs a sequential scan allocates a buffer of this size to each table it scans. If you do a lot of sequential scans, you may want to increase this value

Thread_cache_size: saves threads that are not currently associated with a connection but are ready to serve a new connection later, and can quickly respond to thread requests for a connection without creating a new

Table_cache: similar to thread_cache_size, but used to cache table files, has little effect on InnoDB and is mainly used for MyISAM

Upgrade Hardwar

Scale up, needless to say, depending on whether MySQL is CPU-intensive or I-SSD-intensive, MySQL performance can be significantly improved by improving CPU and memory and using SSD.

Separation of reading and writing

It is also a commonly used optimization at present, reading and writing from the main library, generally do not use double-master or multi-master to introduce a lot of complexity, try to use other schemes in this paper to improve performance. At the same time, many split solutions also take into account the separation of read and write.

Caching

Caching can occur at these levels:

MySQL internal: the relevant settings are introduced in the system tuning parameters

Data access layer: for example, MyBatis caches SQL statements, while Hibernate can be accurate to a single record. Here, the cached object is mainly the persistent object Persistence Object.

Application service layer: here you can control the cache more precisely and implement more strategies by programming. Here, the cached object is the data transfer object Data Transfer Object.

Web layer: caching for web pages

Browser client: client-side cache

You can join the cache at one or more levels according to the actual situation. This article focuses on the caching implementation of the service layer. Currently, there are two main ways:

Write Through: after the data is written to the database, the cache is updated at the same time to maintain the consistency between the database and the cache. This is how most current application caching frameworks such as Spring Cache work. This implementation is very simple, good synchronization, but average efficiency.

Write Back: when there is data to write to the database, only the cache is updated and then synchronized to the database asynchronously in batches. This kind of implementation is more complex and requires more application logic. at the same time, it may cause the database to be out of sync with the cache, but it is very efficient.

Table partition

The partition introduced by MySQL in version 5.1 is a simple horizontal split. Users need to add partition parameters when creating the table, which is transparent to the application without modifying the code.

For users, the partition table is an independent logical table, but the underlying layer is composed of multiple physical child tables. The code for partitioning is actually encapsulated by the objects of a group of underlying tables, but for the SQL layer, it is a black box that completely encapsulates the underlying layer. The way MySQL implements partitioning also means that indexes are also defined according to partitioned child tables, and there is no global index.

Mysql optimization

The user's SQL statement needs to be optimized for the partition table. The columns with partition conditions should be included in the SQL condition, so that the query can be located on a small number of partitions, otherwise all partitions will be scanned. You can use EXPLAIN PARTITIONS to see that a SQL statement will fall on those partitions, so as to optimize SQL. For example, the following five records fall on two partitions:

Mysql optimization

The benefits of zoning are:

You can make a single table store more data.

The data of the partition table is easier to maintain, and you can support the newly inserted data by deleting a large amount of data in batches clearly throughout the partition, or by adding new partitions. In addition, a separate partition can be optimized, checked, repaired, etc.

Some queries can be determined from the query conditions to fall on only a few partitions, and the speed will be very fast.

The data of the partition table can also be distributed on different physical devices, making it funny to make use of multiple hardware devices.

You can use partition table dependencies to avoid some special bottlenecks, such as mutually exclusive access to a single index of InnoDB, inode lock contention for ext3 file systems

Individual partitions can be backed up and restored

Limitations and disadvantages of partitions:

A table can only have up to 1024 partitions

If there is a primary key or unique index column in the partition field, then all primary key columns and unique index columns must be included

Partitioned tables cannot use foreign key constraints

Null values invalidate partition filtering

All partitions must use the same storage engine

Type of partition:

RANGE partition: assigns multiple rows to a partition based on column values that belong to a given contiguous interval

LIST partition: similar to partition by RANGE, except that LIST partition is selected based on column values matching a value in a set of discrete values

HASH partition: a partition that is selected based on the return value of a user-defined expression that uses the column values of the rows to be inserted into the table. This function can contain any valid expression in MySQL that produces a non-negative integer value

KEY partitions: similar to partitioning by HASH, except that KEY partitions only support computing one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values

The suitable scenarios for zoning are:

If the most suitable scenario data has a strong time series, it can be partitioned by time, as shown below:

CREATE TABLE members (firstname VARCHAR (25) NOT NULL, lastname VARCHAR (25) NOT NULL, username VARCHAR (16) NOT NULL, email VARCHAR (35), joined DATE NOT NULL) PARTITION BY RANGE (YEAR (joined)) (PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990) PARTITION p4 VALUES LESS THAN MAXVALUE)

Adding time range conditions to the query will be very efficient, and it can be deleted in batches for unwanted historical data.

If the data has obvious hot spots, and except for this part of the data, the other data is rarely accessed, then you can put the hot spot data in a separate partition, so that the data in this partition can be cached in memory. When querying, only a small partition table can be accessed, which can effectively use index and cache.

In addition, MySQL has an early simple partition implementation-merge Table (merge table), which is limited and lack of optimization, so it is not recommended to use it. It should be replaced by a new partitioning mechanism.

Vertical split

Vertical split database is split according to the relevance of the data tables in the database, for example: there are both user data and order data in a database, so vertical split can put user data into user database and order data into order database. Vertical split table is a way to split a data table vertically. it is common to split a large multi-field table according to commonly used fields and unused fields. The number of data records in each table is generally the same, but the fields are different. Use primary key association.

For example, the original user table is:

Mysql optimization

After a vertical split, it is:

Mysql optimization

The advantages of vertical splitting are:

Can make the row data smaller, a data block (Block) can hold more data, in the query will reduce the number of Block O (each query will read less Block)

It can achieve the goal of maximizing the use of Cache. Specifically, when splitting vertically, you can put together the fields that do not change very often, and put the fields that change frequently together.

Data maintenance is simple

The disadvantages are:

The primary key is redundant and redundant columns need to be managed.

Can cause table join JOIN operations (increase CPU overhead) can reduce database pressure by performing join on the business server

There is still the problem of excessive amount of data in a single table (horizontal split is required)

Transaction processing is complex

Overview of horizontal split

Horizontal split is to divide the data into two parts: table and database through a certain strategy, each piece of data will be distributed to different MySQL tables or libraries to achieve the distributed effect and can support a very large amount of data. The previous table partition is essentially a special sub-table in the library.

The sub-table in the library only solves the problem that the data of a single table is too large. Because the data of the table is not distributed to different machines, it does not play a great role in reducing the pressure on the MySQL server. We still compete for IO, CPU and network on the same physical machine, which should be solved by dividing the library.

If the previous vertically split user table is split horizontally, the result is:

Mysql optimization

In practice, it is often a combination of vertical split and horizontal split, that is, Users_A_M and Users_N_Z are subdivided into Users and UserExtras, so that there are four tables

The advantages of horizontal splitting are:

There is no performance bottleneck of single library big data and high concurrency.

There is less modification on the application side.

Improve the stability and load capacity of the system.

The disadvantages are:

The consistency of fragment transaction is difficult to solve.

Cross-node Join has poor performance and complex logic.

It is difficult to expand the data many times and has a great amount of maintenance.

The principle of slicing can not be divided, and it can be optimized by referring to the single table.

The number of shards is as small as possible, and shards are evenly distributed over multiple data nodes as far as possible, because the more a query SQL spans shards, the worse the overall performance is. Although it is better than the result of all data in one shard, it is only necessary to expand the capacity and increase the number of shards.

Sharding rules need to be carefully selected, planning in advance, selection of sharding rules, data growth mode, data access mode, sharding relevance, and sharding expansion need to be considered. The latest sharding strategies are range sharding, enumerating shards, and consistent Hash shards, all of which are conducive to capacity expansion.

Try not to span multiple fragments in a SQL in a transaction. Distributed transactions have always been a difficult problem to deal with.

Query conditions should be optimized as much as possible to avoid Select *. Under a large number of data result sets, a lot of bandwidth and CPU resources will be consumed. Query should avoid returning a large number of result sets and index frequently used query statements as far as possible.

Reduce the possibility of cross-library Join through data redundancy and table partitioning

The selection of sharding rules is particularly emphasized here. If the data of a table has obvious time characteristics, such as orders, transaction records, etc., then they are usually suitable for time range fragmentation, because of the timeliness of the data. we often pay attention to its recent data, query conditions often have a time field to filter, a better solution is the current active data The short-span time period is used for slicing, while the historical data is stored with a longer span.

Generally speaking, the choice of sharding depends on the condition of querying SQL most frequently, because the query SQL without any Where statement will traverse all shards, and the performance is relatively the worst, so the more this kind of SQL, the greater the impact on the system, so we should try our best to avoid this kind of SQL.

Solution

Because of the complicated logic involved in horizontal split, there are many mature solutions at present. These solutions are divided into two main categories: client architecture and agent architecture.

Client architecture

By modifying the data access layer, such as JDBC, Data

Source, MyBatis, manage multiple data sources through configuration, connect databases directly, and complete data fragmentation and integration in the module, generally presented in the form of Jar packages.

This is an example of a client architecture:

Mysql optimization

You can see that the sharding is implemented with the application server, which is realized by modifying the Spring JDBC layer.

The advantages of the client architecture are:

Apply directly connected database to reduce the downtime risk caused by peripheral system dependence.

Components with low integration cost and no additional operation and maintenance

The disadvantages are:

Limited to working on the database access layer, the scalability is general, and it may be inadequate for more complex systems.

Put the pressure of sharding logic on the application server, causing additional risk

Agent architecture

All data sources and data fragmentation and integration are managed by independent middleware. The back-end database cluster is transparent to front-end applications and requires independent deployment and operation and maintenance agent components.

This is an example of an agent architecture:

Mysql optimization

In order to offload and prevent a single point, proxy components generally exist in the form of clusters, and may need service components such as Zookeeper to manage.

The advantages of the agent architecture are:

It can handle very complex requirements, is not limited by the original implementation of the database access layer, and has strong scalability.

Transparent to the application server without any additional load

The disadvantages are:

Need to deploy and operate independent agent middleware with high cost

The application needs to connect to the database through an agent, and there is an extra hop on the network, resulting in loss of performance and additional risk.

Comparison of different schemes

Mysql optimization

With so many options, how to choose? It can be considered according to the following ideas:

Determine whether to use the proxy architecture or the client architecture. Small and medium-sized or relatively simple scenarios tend to choose client architecture, while complex scenarios or large-scale systems tend to choose agent architecture.

Whether the specific function is satisfied, such as the need for cross-node ORDER BY, then the priority is to support this function.

Without considering the products that have not been updated within a year, it means that the development is stagnant, even without maintenance and technical support.

It is best to choose in the order of big company-> community-> small company-> individual.

Choose ones with good reputation, such as github star count, user quantity and quality, and user feedback.

Open source is preferred, often the project has special needs and may need to change the source code

In accordance with the above ideas, the following options are recommended:

Client architecture: ShardingJDBC

Agent architecture: MyCat or Atlas

MySQL-compatible and horizontally scalable database

There are also some open source databases that are compatible with MySQL protocols, such as:

TiDB

Cubrid

However, there is still a gap between its industrial quality and MySQL, and it requires a large input of operation and maintenance. If you want to migrate the original MySQL to a new database that can be scaled horizontally,

NoSQL

Doing Sharding on MySQL is a kind of dancing in chains. In fact, many big watches themselves do not have a great demand for RDBMS such as MySQL, and do not require ACID. You can consider migrating these tables to NoSQL to completely solve the problem of horizontal scaling, such as:

Log class, monitoring class, statistical data

Unstructured or weakly structured data

There is not a strong requirement for transactions, and there is not much data for associated operations.

At this point, I believe that everyone on the "practical MySQL commonly used optimization methods have a deeper understanding, might as well come to the actual operation of it!" Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.