In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you how to analyze the key points of MySQL database optimization from the perspective of operation and maintenance. the content is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Overview
A mature database architecture is not designed with high availability, high scalability and other characteristics, it is with the increase in the number of users, the infrastructure is gradually improved.
1. Database table design
After the establishment of the project, the development department develops the project according to the needs of the product department, and part of the work of the development engineer is to design the table structure. This is important for the database, which, if improperly designed, will directly affect access speed and user experience. There are many factors, such as slow query, inefficient query statement, improper indexing, database blocking (deadlock) and so on. Of course, there is a team of test engineers who will do stress tests and look for bug. For teams without test engineers, most development engineers do not think too much about whether the database design is reasonable, but complete the functional implementation and delivery as soon as possible, and when the project has a certain number of visits, the hidden problems will be exposed. it's not so easy to modify it at this time.
2. Database deployment
The OPS engineer is on the stage, and the initial visit volume of the project will not be very large, so a single deployment is sufficient to handle the QPS (query rate per second) of about 1500. For high availability, you can use MySQL master / slave replication + Keepalived as double-click hot backup. Common cluster software include Keepalived and Heartbeat.
3. Database performance optimization
If MySQL is deployed to an ordinary X86 server, the theoretical value of MySQL can normally handle about 2000 QPS without any optimization. after optimization, it may be raised to about 2500 QPS, otherwise, when the number of visits reaches about 1500 concurrent connections, the database processing performance will slow down, and the hardware resources are still very rich, so it is time to consider the software problem. So how to make the database perform?
On the one hand, it is possible to run multiple MySQL instances in a single server to improve server performance.
On the other hand, the optimization of the database, often the operating system and database default configuration are relatively conservative, there will be certain restrictions on the database, these configurations can be appropriately adjusted to deal with as many connections as possible.
The specific optimization has the following three levels:
3.1 Database configuration optimization
There are two commonly used storage engines in MySQL, one is MyISAM, which does not support transaction processing, fast read performance processing and table-level locking. The other is InnoDB, which supports transaction processing (ACID). The design goal is to perform * performance and row-level locking for dealing with large amounts of data.
Table lock: low overhead, large locking granularity, high probability of deadlock and low concurrency.
Row lock: high overhead, small locking granularity, low deadlock probability and high relative concurrency.
Why do table locks and row locks appear? The main purpose is to ensure the integrity of the data. For example, if a user is operating a table and other users also want to operate the table, then they have to wait for * * users to operate before other users can operate. Table locks and row locks are for this purpose. Otherwise, if multiple users operate a table at the same time, there will be conflicts or exceptions in the data.
From the above point of view, using InnoDB storage engine is the choice of * *, and it is also the default storage engine in future versions of MySQL5.5. There are many parameters associated with each storage engine. The following is a list of the parameters that mainly affect the performance of the database (limited space, only innodb analysis here).
Default value of InnoDB parameter:
Innodb_buffer_pool_size = 128m # index and data buffer size. Generally, set 60% of physical memory innodb_buffer_pool_instances = 1 # number of buffer pool instances. It is recommended to set 4 or 8 innodb_flush_log_at_trx_commit = 1 # key parameters. 0 means to write to the log and synchronize to disk every second. If the database fails, transaction data will be lost for about 1 second. 1 for each SQL executed, it is written to the log and synchronized to disk, which costs a lot of money, and it is inefficient to wait for the log to read and write after executing the SQL. 2 means that the log is only written to the system cache and then synchronized to disk every second, which is very efficient. If the server fails, the transaction data will be lost. The recommended setting for data security is not very high 2, the performance is high, and the effect after modification is obvious. Innodb_file_per_table = OFF # is shared tablespaces by default, and the number of shared tablespace idbdata files is increasing, which affects the performance of Icano. It is recommended to enable independent tablespace mode. The indexes and data of each table exist in its own independent tablespace, so that a single table can be moved in different databases. Innodb_log_buffer_size = 8m # log buffer size. Since the log is refreshed at most once per second, it is generally not required to exceed 16m.
3.2 system kernel optimization
Most MySQL is deployed on linux systems, so some parameters of the operating system will also affect the performance of MySQL. Here are some appropriate optimizations for the linux kernel.
Net.ipv4.tcp_fin_timeout = 30 # TIME_WAIT timeout. Default: 60s net.ipv4.tcp_tw_reuse = 1 # 1: enable multiplexing, allow TIME_WAIT socket to be reused for new TCP connections. 0: disable net.ipv4.tcp_tw_recycle = 1 # 1: enable TIME_WAIT socket fast recycling, 0: disable net.ipv4.tcp_max_tw_buckets = 4096 # system to maintain the number of TIME_WAIT socket*** If this number is exceeded, the system will randomly clear some TIME_WAIT and print a warning message net.ipv4.tcp_max_syn_backlog = 4096 # to enter the SYN queue * * length, increasing the queue length to accommodate more waiting connections
On linux systems, if the number of file handles opened by a process exceeds the system default value of 1024, the "too many files open" message will be prompted, so adjust the open file handle limit.
* soft nofile 65535 * hard nofile 65535
# ulimit-SHn 65535 # effective immediately
3.3 hardware configuration
Increase physical memory to improve file system performance. The linux kernel allocates caches (system cache and data cache) from memory to store hot data. Through the file system write delay mechanism, the cache is synchronized to disk only when the conditions are met (such as reaching a certain percentage of cache size or executing the sync command). In other words, the larger the physical memory, the larger the allocated cache and the more cached data. Of course, a server failure will result in the loss of some cached data.
SSD hard disk replaces SAS hard disk and adjusts the RAID level to RAID1+0, which has better read and write performance (IOPS) than RAID1 and RAID5. After all, the pressure on the database mainly comes from the disk Icano.
4. Database schema extension
The main idea here is to decompose the load of a single database, break through the performance of disk Imax O, store hot data in the cache, and reduce the access frequency of disk Imax O.
4.1 Master-slave replication and read-write separation
Because in the production environment, most databases are read operations, so deploy one master and multi-slave architecture, the master database is responsible for write operations, and do double-click hot backup, and multiple slave databases do load balancers responsible for read operations. The mainstream load balancers include LVS, HAProxy and Nginx.
How to achieve the separation of reading and writing? Most enterprises achieve the separation of read and write at the code level, which is more efficient. Another way is to achieve the separation of read and write through the agent program, which is rarely used in the enterprise, and the common agents are MySQL Proxy and Amoeba. In such a database cluster architecture, the high concurrency ability of the database is greatly increased, and the bottleneck problem of single performance is solved. If one slave can handle 2000 QPS from the database, then five can handle 1w QPS (in theory), and the horizontal scalability of the database is easy.
Sometimes, in the face of a large number of write applications, the performance of a single write can not meet the business requirements. If you do double master, you will encounter database data inconsistency, this reason is that users in different applications may operate the two databases, and the simultaneous update operation results in conflicts or inconsistencies between the two databases. In a single database, MySQL uses storage engine mechanism table lock and row lock to ensure data integrity. How to solve this problem when there are multiple main libraries? There is a set of master-slave replication management tool based on perl language, called MySQL-MMM (Master-Master replication managerfor Mysql,Mysql Master replication Manager). The advantage of this tool is that it only provides one database write operation at a time to effectively ensure data consistency.
4.2 increase cach
Add a cache system to the database to cache the hot data in memory, and if there is data to be requested in the cache, the results will no longer be returned to the database to improve the read performance. Caching implementations include local caching and distributed caching, which caches data into local server memory or files. Distributed cache can cache large amounts of data with good expansibility. The mainstream distributed cache systems are memcached and redis,memcached with stable performance. The data is cached in memory and the speed is very fast. The QPS can reach about 8w. If you want to persist the data, choose to use redis, the performance is not lower than memcached.
Working process:
4.3 sub-library
Sub-database is to split the relevant tables into different databases according to different business, such as web, bbs, blog and so on. If there is a large amount of business, the split library can also be used as a master-slave architecture to further avoid excessive pressure on a single library.
4.4 sub-table
The daily increase in the amount of data, there are millions of pieces of data in a table in the database, resulting in query and insertion time is too long, how can we solve the pressure of a single table? You should consider whether to split the table into multiple small tables to reduce the pressure on a single table and improve processing efficiency, which is called sub-table.
Sub-table technology is more troublesome, to modify the SQL statements in the program code, but also manually to create other tables, you can also use merge storage engine to achieve sub-table, relatively simple. After dividing the table, the program operates on a master table, which does not store data, but only has some relations between sub-tables, as well as the way to update the data. The total table will divide the pressure into different small tables according to different queries, so as to improve the concurrency ability and disk IHAGO performance.
The subtable is divided into vertical split and horizontal split:
Vertical split: split the original table with many fields into multiple tables to solve the problem of table width. You can put less commonly used fields in a separate table, large fields in a separate table, or closely related fields in a table.
Horizontal split: split the original table into multiple tables, each table has the same structure, to solve the problem of large amount of data in a single table.
4.5 Subarea
Partition is to divide the data of a table into multiple blocks according to the fields in the table structure (such as range, list, hash, etc.). These blocks can be on a disk, or on different disks, after partitioning, it is still a table on the surface, but the data is hashed in multiple locations, so that multiple hard drives handle different requests at the same time, so as to improve the read and write performance of disk IRO, which is relatively simple to implement.
Note: adding cache, sub-library, sub-table and partition is mainly implemented by programmers.
5. Database monitoring
Database monitoring and maintenance is the main work of operation and maintenance engineers or DBA, including performance monitoring, performance analysis, performance tuning, database backup and recovery and so on.
5.1 performance status key indicators
QPS,Queries Per Second: the number of queries per second, the number of queries that a database can handle per second
TPS,Transactions Per Second: transactions per second
If you check the running status through show status, there will be more than 300 status information records, of which several values can help us calculate QPS and TPS.
Calculate the QPS based on Questions:
Mysql > show global status like 'Questions'; mysql > show global status like' Uptime'
Calculate the TPS based on Com_commit and Com_rollback:
Mysql > show global status like 'Com_commit'; mysql > show global status like' Com_rollback'; mysql > show global status like 'Uptime'
5.2 enable slow query log
MySQL enables the slow query log and analyzes which SQL statement is slow. If you use set to set variables and restart the service fails, you can add a parameter * in my.cnf to take effect.
Mysql > set global slow-query-log=on # enable slow query function mysql > set global slow_query_log_file='/var/log/mysql/mysql-slow.log'; # specify slow log file location mysql > set global log_queries_not_using_indexes=on; # record queries without indexes mysql > set global long_query_time=1; # only record slow queries with a processing time of more than 1 s
To analyze slow query logs, you can use the mysqldumpslow tool that comes with MySQL, and the log analysis is relatively simple.
# mysqldumpslow-t 3 / var/log/mysql/mysql-slow.log # View the first three slowest queries
5.3 Database backup
Backup database is the most basic work, but also the most important, otherwise the consequences are very serious, you know! However, because the database is relatively large, hundreds of gigabytes, often backup is very time-consuming, so we should choose an efficient backup strategy, for a large amount of data database, generally use incremental backup. The commonly used backup tools are mysqldump, mysqlhotcopy, xtrabackup and so on. Mysqldump is more suitable for small databases, because it is a logical backup, so backup and recovery take a long time. Mysqlhotcopy and xtrabackup are physical backups with fast backup and recovery speed, and hot copy without affecting database service. It is recommended to use xtrabackup and support incremental backup.
5.4 Database repair
Sometimes a sudden power outage or abnormal shutdown of the MySQL server will cause the table to be damaged and unable to read the table data. At this point, you can use two tools that come with MySQL to fix it, myisamchk and mysqlcheck.
Myisamchk: only the myisam table can be repaired, and the database needs to be stopped
Common parameters:
-f-- force mandatory repair, overwrite old temporary files, generally do not use
-r-- recover recovery model
-Q-- quik rapid recovery
-a-- analyze analysis table
-o-- safe-recover 's old recovery model. If-r cannot be repaired, you can try this parameter.
-F-- fast only checks tables that are not properly closed
Both mysqlcheck:myisam and innodb tables can be used. There is no need to stop the database. For example, to repair a single table, you can add the table name to the end of the database and divide it with spaces.
Common parameters:
-a-- all-databases checks all libraries
-r-- repair repair table
-c-- check checklist, default option
-a-- analyze analysis table
-o-- optimize optimization table
-Q-- quik checks or repairs the table as soon as possible
-F-- fast only checks tables that are not properly closed
Due to the original design limitations of relational databases, some companies put large amounts of data into relational databases, which can not achieve better performance in massive data query and analysis. Therefore, NoSQL is popular, non-relational database, large amount of data, high performance, but also make up for the shortcomings of some aspects of relational database, gradually most companies have stored part of the business database in NoSQL, such as MongoDB, HBase and so on. Distributed file system is used in data storage, such as HDFS, GFS and so on. Massive data are calculated and analyzed by Hadoop, Spark, Storm and so on.
The above content is how to analyze the key points of MySQL database optimization from the perspective of operation and maintenance. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.