In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
What is the practice of MySQL stand-alone database optimization? I believe many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
There are a lot of database optimization can be said, according to the amount of data supported can be divided into two stages: stand-alone database and sub-database sub-table, the former can generally support data less than 500W or 10G, more than this value, you need to consider sub-database sub-table. In addition, the general large enterprise interviews often start from the stand-alone database, step by step to ask the sub-database sub-table, interspersed with a lot of database optimization questions. The editor tries to describe some practices of stand-alone database optimization. The database is based on mysql. If there is anything unreasonable, please correct it.
1. Optimization of table structure
At the beginning of an application, the design of the table structure of the database will often affect the performance in the later stage of the application, especially the performance after the number of users. Therefore, table structure optimization is a very important step.
1.1, character set
Generally speaking, try to choose UTF-8. Although GBK uses less storage space than UTF-8 at noon, UTF-8 is compatible with various languages, so we don't have to sacrifice scalability for this storage space. In fact, if you want to change from GBK to UTF-8 in the later stage, the price is very high, data migration is needed, and the storage space can be solved by spending money to expand the hard disk.
1.2, primary key
When using mysql's innodb, the underlying storage model of innodb is B+ tree, which uses the primary key as the clustering index and the inserted data as the leaf node. Through the primary key, the leaf node can be found quickly, thus the record can be quickly obtained. Therefore, you need to add a primary key when designing the table, preferably by yourself. Because the self-increasing primary key allows the inserted data to be inserted into the leaf nodes of the underlying B+ tree in the primary key order, because it is in order, this insertion hardly needs to move other existing data, so the insertion efficiency is very high. If the primary key is not self-increasing, then each time the value of the primary key is approximately random, it is possible to move a large amount of data to ensure the characteristics of the B+ tree, adding unnecessary overhead.
1.3, Field
1.3.1. Indexed fields must be constrained by not null, and the default value must be set
1.3.2. It is not recommended to use float or double to save decimals to prevent precision loss. It is recommended to use decimal.
1.3.3. It is not recommended to use Text/blob to save a large amount of data, because reading and writing large text will cause a large amount of mysql overhead and occupy the cache of mysql, which will greatly reduce the database throughput under high concurrency. It is recommended that large text data be saved in a special file storage system. Only the access address of this file can be saved in mysql. For example, blog posts can be saved in a file. Only the relative address of the file is saved in mysql.
1.3.4. It is recommended that the length of varchar type should not exceed 8K.
1.3.5. It is recommended to use Datetime instead of timestamp for time type. Although Datetime occupies 8 bytes and timestamp only takes 4 bytes, the latter should be non-empty, and the latter is sensitive to time zone.
1.3.6. It is suggested that two fields, gmt_create and gmt_modified, be added to the table to record the modification time of data creation. These two fields are established because it is easy to find problems.
1.4. Index creation
1.4.1. At this stage, because you don't know anything about the business, try not to add indexes blindly, only for some fields that will definitely use indexes plus general indexes.
1.4.2. The length of creating an innodb single-column index should not exceed 767bytes. If it exceeds the length, the prefix 255bytes will be used as the index.
1.4.3. The length of each column index for creating an innodb composite index should not exceed 767bytes, and the total should not exceed 3072bytes.
2. SQL optimization
Generally speaking, there are only a few kinds of sql: basic addition, deletion, modification, paging query, range query, fuzzy search, multi-table join.
2.1. Basic query
In general, an index is required for a query. If there is no index, it is recommended to modify the query and add the field with an index. If this field cannot be used in a business scenario, then you need to see whether the query calls a large amount. If it is large, for example, if you call 10 watts every day, you need to add a new index. If not, for example, if you call 100 indexes every day, you can consider leaving it as it is. In addition, select * is used as little as possible, adding whatever fields are used in the sql statement, and don't check the unnecessary fields, which is a waste of IGO and memory space.
2.2. Efficient paging
The essence of limit mdirection n is to execute limit msqun first, and then fetch n lines from the m line, so that when the limit page turns backwards, the m is larger, and the performance is lower. such as
Select * from A limit 100000d10, the performance of this sql statement is very poor, so it is recommended to change it to the following version:
Selec id,name,age from A where id > = (select id from A limit 100000Jing 1) limit 10
2.3. Scope query
Range queries include between, greater than, less than, and in. The number of in queries in Mysql is limited. If the number is small, you can use the index query. If the number is large, it becomes a full table scan. While between, greater than, less than, and so on, these queries will not walk the index, so try to put it after the query condition of the index.
2.4.Fuzzy query like
Using statements such as like% name% will not go to the index, which is equivalent to a full table scan, there will not be too big a problem when the amount of data is small, and the performance will decline greatly after a large amount of data. It is suggested that a search engine should be used to replace this kind of fuzzy search after a large amount of data.
2.5. Multi-table join
Both subquery and join can fetch data between multiple tables, but the performance of subquery is poor, so it is recommended to change the subquery to join. For mysql's join, it uses the Nested Loop Join algorithm, that is, to query in the latter table through the result set of the previous table query, for example, the result set of the former table is 100 pieces of data, and the latter table has 10W data, then you need to filter the final result set in the data set of 100W to 10W. Therefore, try to use the table of the small result set and the large table to do join, and at the same time build an index on the field of join. If you can't build the index, you need to set up a large enough join buffer size. If none of the above techniques can solve the performance degradation caused by join, instead of using join, split one join query into two simple queries. In addition, try not to connect more than three tables. Generally speaking, the performance of more than three tables will be very poor. It is recommended to split the sql.
3. Database connection pool optimization
Database connection pool is essentially a kind of cache, which is a means to resist high concurrency. Database connection pool optimization is mainly to optimize the parameters. Generally, we use DBCP connection pool. Its specific parameters are as follows:
3.1 initialSize
The initial number of connections, which refers to the time of the first getConnection, not when the application is started. The initial value can be set to the historical average of the concurrency.
3.2 、 minIdle
The minimum number of idle connections reserved. DBCP starts a thread to reclaim idle connections in the background, and when the thread reclaims idle connections, it retains the number of minIdle connections. It is generally set to 5, and the amount of concurrency can be set to 1.
3.3 、 maxIdle
The maximum number of idle connections reserved is set according to the business concurrency peak. For example, if the concurrency peak is 20, these connections will not be recycled immediately after the peak has passed, and if there is another peak after a short period of time, then the connection pool can reuse these idle connections without frequently creating and closing connections.
3.4 、 maxActive
The maximum number of active connections is set according to the acceptable extreme value of concurrency. For example, if the acceptable extreme value of standalone concurrency is 100, then when the maxActive is set to 100, it can only serve 100 requests at the same time, and excess requests will be discarded after the maximum waiting time. This value must be set to prevent malicious concurrency attacks and protect the database.
3.5 、 maxWait
It is recommended to set the maximum waiting time for a connection to be shorter, such as 3s, so that the request can fail quickly, because when a request is waiting for a connection, the thread cannot be released, and the thread concurrency of the stand-alone machine is limited. If this time is set too long, such as the 60s recommended on the Internet, then the thread cannot be released within this 60s, as long as there are more than one such request. Fewer threads are available for the application, and the service becomes unavailable.
3.6 、 minEvictableIdleTimeMillis
The time for the connection to remain idle without being recycled is 30 minutes by default.
3.7 、 validationQuery
The sql statement used to check whether the connection is valid, which is usually a simple sql. It is recommended to set the
3.8 、 testOnBorrow
Check the connection when applying for a connection. It is not recommended to open it, which seriously affects the performance.
3.9 、 testOnReturn
Check the connection when returning the connection. Opening it is not recommended, which seriously affects the performance.
3.10 、 testWhileIdle
When enabled, the thread that cleans up the connection in the background will validateObject the idle connection every once in a while. If the connection fails, it will be cleared without affecting the performance. It is recommended to enable it.
3.11 、 numTestsPerEvictionRun
Represents the number of links checked each time, it is recommended that the setting is as large as maxActive, so that you can effectively check all links at a time.
3.12. Preheat connection pool
For connection pooling, it is recommended to warm up the connection pool when starting the application, and do a simple sql query before providing access to the public, so that the connection pool is filled with the necessary number of connections.
4. Index optimization
When the amount of data increases to a certain extent, sql optimization can no longer improve performance, so you need to make a big move: indexing. There are three levels of index, generally speaking, it is sufficient to master these three levels. in addition, for the fields that are indexed, you need to consider their selectivity.
4.1, first-level index
Indexes are built on the conditions after where. A single column can build a general index, and multiple columns can build a combined index. Composite indexes need to pay attention to the leftmost prefix principle.
4.2. Secondary index
If you have a field used by order by or group by, you can consider indexing this field, so that because the index is naturally orderly, you can avoid the sorting caused by order by and group by, thus improving performance.
4.3, three-level index
If the above two tricks do not work, then the fields queried are also indexed. At this time, the so-called index overlay is formed, which can reduce an Icano operation, because when querying data, mysql first looks up the primary key index, then looks up the general index according to the primary key index, and then looks up the corresponding records according to the ordinary index. If all the records we need are in the normal index, there is no need for step 3. Of course, this way of indexing is extreme and not suitable for general scenarios.
4.4. Selectivity of index
When building an index, try to build it on fields with high selectivity. What is high selectivity? The so-called high selectivity means that the amount of data found through this field is small. For example, if you look up the information of a person by name, the amount of data found is generally very small, while according to gender, half of the data in the database may be found. Therefore, the name is a highly selective field, while gender is a low selective field.
5. Historical data archiving
When the amount of data increases by 500W a year, the index is powerless. At this time, the general idea is to consider sub-database and sub-table. If the business does not have explosive growth, but the data is indeed slowly increasing, then we can not consider the complex technical means of database and table, but archive the historical data. We archive historical data that has ended its life cycle, such as data from six months ago. We can use quartz's scheduling task to check out the data from 6 months ago at a fixed time in the morning, and then store it in a remote hbase server. Of course, we also need to provide a query interface for historical data for a rainy day.
After reading the above, have you mastered the practice of MySQL stand-alone database optimization? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.