In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
A mature database architecture is not designed with high availability, high scalability and other characteristics, it is with the increase in the number of users, the infrastructure is gradually improved. This article mainly talks about the problems and optimization schemes faced by MySQL database in the development cycle, leaving aside the front-end application, it is roughly divided into the following five stages:
Stage 1: database table design
After the project is established, the development department develops the project according to the needs of the product department.
The development engineer will design the table structure at the beginning of the development project. For the database, the table structure design is very important, if the design is improper, it will directly affect the speed of users to visit the website, the user experience is not good! There are many specific factors affecting this situation, such as slow query (inefficient query statement), improper indexing, database blocking (lock) and so on. Of course, there is a team in the testing department who will do product testing and find Bug.
Due to the different points that the development engineers attach importance to, they will not consider whether too much database design is reasonable at the initial stage, but complete the functional implementation and delivery as soon as possible. When the project is launched with a certain number of visitors, the hidden problems will be exposed, and it is not so easy to modify it at this time!
Phase 2: database deployment
It's time for the operation and maintenance engineer to come out and put the project online.
There are usually very few visits at the beginning of the project, and at this stage, a single deployment of the Web+ database is sufficient to cope with a QPS (query rate per second) of about 1000. Considering the single point of failure, high availability should be achieved, and MySQL master-slave replication + Keepalived can be used to realize dual-computer hot backup. The mainstream HA software are: Keepalived (recommended), Heartbeat.
Stage 3: database performance optimization
If MySQL is deployed to an ordinary X86 server, the theoretical value of MySQL can normally handle about 1500 QPS without any optimization, and after optimization, it may be raised to about 2000 QPS. Otherwise, when the number of visits reaches about 1500 concurrent connections, the response of database processing performance may be slow, and the hardware resources are relatively rich, so it is time to consider performance optimization. So how can you maximize the performance of the database? Mainly from the hardware configuration, database configuration, architecture, specifically divided into the following:
3.1 hardware configuration
If conditions permit, SSD solid state drives must be used instead of SAS mechanical drives, and the RAID level should be adjusted to RAID1+0, which has better read and write performance than RAID1 and RAID5. After all, the pressure on the database mainly comes from the disk Ibano.
The Linux kernel has a feature that divides cache areas (system cache and data cache) from physical memory to store hot data. Through the file system write delay mechanism, it will not be synchronized to disk until the conditions are met (such as reaching a certain percentage of cache size or executing the sync command). In other words, the larger the physical memory, the larger the allocated cache and the more cached data. Of course, a server failure will result in the loss of some cached data. It is recommended that physical memory be at least 50% richer.
3.2 Database configuration optimization
There are two most widely used storage engines in MySQL: one is MyISAM, which does not support transaction processing, fast read performance processing and table-level locking. The other is InnoDB, which supports transaction processing (ACID attribute). The design goal is to handle row-level locks for big data.
Table lock: low overhead, large locking granularity, high probability of deadlock and low concurrency.
Row lock: high overhead, small locking granularity, low deadlock probability and high relative concurrency.
Why do table locks and row locks appear? The main purpose is to ensure data integrity. For example, if a user is operating a table and other users also want to manipulate the table, they will have to wait for the first user to operate before other users can operate. Table locks and row locks are for this purpose. Otherwise, if multiple users operate a table at the same time, there will be conflicts or exceptions in the data.
According to these aspects, using the InnoDB storage engine is the best choice, and it is also the default storage engine for the MySQL5.5+ version. There are many related operating parameters for each storage engine. The following lists the parameters that may affect the performance of the database.
Default values of common parameters:
It is recommended that the maximum number of connections be handled at the same time when max_connections = 15 seconds. It is recommended that the maximum number of connections be set to about 80% of the upper number of connections. Sort_buffer_size = 2M# query sorting buffer size, which only works for order by and group by. It is recommended to increase the limit to 16Mopen_files_limit = 1024 # open files. If the value viewed by show global status like 'open_files' is equal to or greater than the open_files_ limit, the program will not be able to connect to the database or get stuck.
Default value of MyISAM parameter:
Key_buffer_size = 16M# index cache size. Generally set 30-40%read_buffer_size = 128K # read operation buffer size of physical memory. It is recommended to set 16m or 32Mquery_cache_type = ON# to enable query cache function query_cache_limit = 1m # query cache limit. Only query results below 1m will be cached. In case the result data is too large, overwrite the cache pool query_cache_size = 16m # to view the buffer size, which is used to cache the SELECT query results. The next time the same SELECT query will return the result directly from the cache pool, this value can be multiplied appropriately.
Default value of InnoDB parameter:
Innodb_buffer_pool_size = 128M# index and data buffer size. It is recommended to set about 70% of physical memory innodb_buffer_pool_instances = 1 # buffer pool instances. It is recommended to set 4 or 8 innodb_flush_log_at_trx_commit = 1 # key parameters. 0 means to write to the log and synchronize to disk every second. If a database failure occurs, transaction data will be lost for about 1 second. 1 for each SQL executed, it is written to the log and synchronized to disk, which costs a lot of money, and it is inefficient to wait for the log to read and write after executing the SQL. 2 means that the log is only written to the system cache and then synchronized to disk every second, which is very efficient. If the server fails, the transaction data will be lost. The recommended setting for data security is not very high 2, the performance is high, and the effect after modification is obvious. Innodb_file_per_table = OFF # whether to share tablespaces, version 5.7 + defaults to ON, and shared tablespace idbdata files are growing, which affects the performance of IMago. It is recommended to turn on independent tablespace mode. The indexes and data of each table exist in its own independent tablespace, so that a single table can be moved in different databases. Innodb_log_buffer_size = 8m # log buffer size. Since the log is refreshed at most once a second, it generally does not exceed the kernel parameter optimization of the 16M3.3 system.
Most MySQL are deployed on linux systems, so some parameters of the operating system will also affect the performance of MySQL. Here are some appropriate optimizations for Linux kernel parameters
Net.ipv4.tcp_fin_timeout = 3 minutes TIME_WAIT timeout. Default: 60snet.ipv4.tcp_tw_reuse = 1 # 1: enable multiplexing, allow TIME_WAIT socket to be reused for new TCP connections. 0: disable net.ipv4.tcp_tw_recycle = 1 # 1: enable TIME_WAIT socket fast recycling. 0: disable net.ipv4.tcp_max_tw_buckets = 4096 # system to maintain the maximum number of TIME_WAIT socket If this number is exceeded, the system will randomly clear some TIME_WAIT and print a warning message net.ipv4.tcp_max_syn_backlog = 409 minutes to enter the maximum length of the SYN queue. Increasing the queue length can accommodate more waiting connections in the Linux system. If the number of file handles opened by the process exceeds the system default value of 1024, it will prompt the "too many files open" message, so adjust the open file handle limit. Restart takes effect permanently: # vi / etc/security/limits.conf * soft nofile 65535 * hard nofile 65535 current users take effect immediately: # ulimit-SHn 65535 stage 4: database schema extension
With the increasing volume of business, the performance of a single database server can no longer meet the business needs, so it is time to consider increasing the server expansion architecture. The main idea is to decompose the load of a single database, break through the performance of disk Imax O, store hot data in the cache, and reduce the access frequency of disk Imax O.
4.1 increase cach
Add a cache system to the database, cache the hot data in memory, and stop requesting MySQL if there is requested data in the cache, thus reducing the load on the database. Caching implementations include local caching and distributed caching, which caches data into local server memory or files. Distributed cache can cache large amounts of data with good expansibility. The mainstream distributed cache systems: memcached and redis,memcached have stable performance, data are cached in memory, and the speed is very fast. QPS theory can reach about 8w. If you want to persist the data, choose to use redis, the performance is not lower than memcached.
Working process:
4.2 Master-slave replication and read-write separation
In the production environment, business systems usually read more and write less, can deploy one master and more slave architecture, master database is responsible for write operation, and do dual hot backup, multiple slave databases do load balancing, responsible for read operation. Mainstream load balancers: LVS, HAProxy, Nginx.
How to achieve the separation of reading and writing? Most enterprises achieve read-write separation at the code level with high efficiency. Another way is to achieve read-write separation through agents, which has fewer applications in the enterprise and will increase the consumption of middleware. The mainstream middleware agent systems include MyCat, Atlas and so on.
In this MySQL master-slave replication topology architecture, the load of a single server is dispersed, and the concurrency ability of the database is greatly improved. If one slave server can handle 1500 QPS, then three can handle 4500 QPS and is easy to scale out.
Sometimes, in the face of a large number of write applications, the performance of a single write can not meet the business requirements. You can do two-way replication (dual master), but there is a problem to note: if both master servers provide read and write operations, they may encounter data inconsistencies because the program has the chance to operate two databases at the same time. Simultaneous update operations will cause conflicts or inconsistencies in the data of the two databases.
You can set the self-increasing uniqueness of each table ID field: auto_increment_increment and auto_increment_offset, or you can write the algorithm to generate random uniqueness.
The MGR (multi-master replication) cluster launched by the government in the past two years can also be considered.
4.3 sub-library
Sub-database is to separate the relevant tables in the database into different databases according to the business, such as web, bbs, blog and so on. If the business volume is very large, the separated database can also be used as a master-slave replication architecture to further avoid excessive pressure on a single database.
4.4 sub-table
The daily increase in the amount of data, there are millions of pieces of data in a table in the database, resulting in query and insertion time is too long, how can we solve the pressure of a single table? You should consider splitting this table into multiple small tables to reduce the pressure on a single table and improve processing efficiency, which is called sub-table.
Sub-table technology is more troublesome, to modify the SQL statements in the program code, but also manually to create other tables, you can also use merge storage engine to achieve sub-table, relatively simple. After dividing the table, the program operates on a master table, which does not store data, but only has some relations between sub-tables, as well as the way to update the data. The total table will divide the pressure into different small tables according to different queries, so as to improve the concurrency ability and disk IHAGO performance.
The subtable is divided into vertical split and horizontal split:
Vertical split: split the original table with many fields into multiple tables to solve the problem of table width. You can put less commonly used fields in a separate table, large fields in a separate table, or closely related fields in a table.
Horizontal split: split the original table into multiple tables, each table has the same structure, to solve the problem of large amount of data in a single table.
4.5 Subarea
Partitioning is to divide the data of a table into multiple blocks according to the fields in the table structure (such as range, list, hash, etc.). These blocks can be on one disk or on different disks. After partitioning, the data is still a table on the surface, but the data is hashed in multiple locations. In this way, multiple hard disks process different requests at the same time, thus improving the read and write performance of disk IXO.
Note: adding cache, sub-library, sub-table and partition is mainly implemented by programmers or DBA.
Phase 5: database maintenance
Database maintenance is the work of database engineers or operation and maintenance engineers, including system monitoring, performance analysis, performance tuning, database backup and recovery and other main work.
5.1 performance status key indicators
Technical terms: QPS (Queries Per Second, query books per second) and TPS (Transactions Per Second)
When you check the running status through show status, there are more than 300 status information records, of which several values can help us calculate QPS and TPS, as follows:
Uptime: actual number of queries sent to the database (in second) Questions: number of queries sent to the database Com_select: number of queries, Com_insert of the actual database: number of inserts Com_delete: number of deletions Com_update: number of updates Com_commit: number of transactions Com_rollback: number of rollbacks
So, here comes the way to calculate the QPS based on Questions.
Mysql > show global status like 'Questions';mysql > show global status like' Uptime';QPS = Questions / Uptime
Calculate the TPS based on Com_commit and Com_rollback:
Mysql > show global status like 'Com_commit';mysql > show global status like' Com_rollback';mysql > show global status like 'Uptime';TPS = (Com_commit + Com_rollback) / Uptime
Another method of calculation:
Based on Com_select, Com_insert, Com_delete, and Com_update, QPS is calculated: mysql > show global status where Variable_name in ('com_select','com_insert','com_delete','com_update'); wait 1 second before execution to get the interval difference, and the second time each variable value minus the corresponding variable value of the first time is QPS.
TPS calculation method:
Mysql > show global status where Variable_name in ('com_insert','com_delete','com_update'); calculate the TPS, instead of the query operation, calculate the four values of insert, delete and update.
After netizens' tests on these two calculation methods, it is concluded that when there are more myisam tables in the database, the use of Questions calculation is more accurate. When there are many innodb tables in the database, the calculation of Com_* is more accurate.
5.2 enable slow query log
MySQL enables slow query log and analyzes which SQL statement is slow. It is supported to enable it dynamically:
Mysql > set global slow-query-log=on # enable slow log mysql > set global slow_query_log_file='/var/log/mysql/mysql-slow.log'; # specify slow log file location mysql > set global log_queries_not_using_indexes=on; # record queries without indexes mysql > set global long_query_time=1 # only slow query analysis logs with a processing time of more than 1 s can be recorded. You can use the mysqldumpslow tool included with MySQL to analyze the logs. Mysqldumpslow-t 3 / var/log/mysql/mysql-slow.log # the first three slowest queries can also be viewed using percona's pt-query-digest tool, which has a comprehensive log analysis function and can analyze slow log, binlog and general log. Analysis slow log: pt-query-digest / var/log/mysql/mysql-slow.log analysis binlog log: mysqlbinlog mysql-bin.000001 > mysql-bin.000001.sql pt-query-digest-- type=binlog mysql-bin.000001.sql analysis general log: pt-query-digest-- type=genlog localhost.log5.3 database backup
Backup database is the most basic work, but also the most important, otherwise the consequences are very serious, you know! For a high-frequency backup strategy, it is very important to choose a stable and fast tool. As the size of the database is less than 2G, it is recommended to use the official logical backup tool mysqldump. For more than 2 gigabytes, it is recommended to use percona's physical backup tool xtrabackup, otherwise it is as slow as a snail. Both tools support hot backup under InnoDB storage engine, which does not affect business read and write operations.
5.4 Database repair
Sometimes a sudden power outage or abnormal shutdown of the MySQL server will cause the table to be damaged and unable to read the table data. At this point, you can use two tools that come with MySQL to fix it, myisamchk and mysqlcheck. The former can only repair the MyISAM table and stop the database, while the latter can be repaired online by both MyISAM and InnoDB.
Note: it is best to back up the database before repairing.
Myisamchk common parameters:-f-- force forced repair, overwrite old temporary files, generally do not use-r-- recover recovery model-Q-- quik fast recovery-a-- analyze analysis table-o-- safe-recover old recovery model, if-r can not be repaired You can use this parameter to try-F-- fast only checks tables that are not normally closed, for example: myisamchk-r-Q * .MYImysqlcheck Common parameters:-a-- all-databases check all libraries-r-- repair repair table-c-- check check list Default option-a-- analyze analysis table-o-- optimize optimization table-Q-- quik fastest check or repair table-F-- fast only check tables that are not normally closed for example: mysqlcheck-r-Q-uroot-p123456 weibo 5.5 MySQL server performance analysis
Focus on:
Id:CPU utilization percentage, which is less than 60% on average, is normal, but it is already busy.
Wa:CPU waits for disk IO response time. Generally, a value greater than 5 means a large amount of disk reads and writes.
The amount of data read and written by KB_read/s and KB_wrtn/s per second is mainly evaluated based on the maximum read and write speed of the disk.
R _ IOPS s, w _ stroke s: the number of read and write requests per second, which can be understood as input and output per second, which is one of the main indicators to measure disk performance.
The aPCge response time per second of await:IO is generally greater than 5, which means that the response of the disk is slow and exceeds its own performance.
Util: disk utilization percentage, average less than 60% is normal, but it is already busy.
Summary
Due to the original design limitations of relational database, it will be inadequate when dealing with big data. Therefore, NoSQL (non-relational database) is popular, naturally inspirational, with the characteristics of distributed, high performance, high reliability and so on, which makes up for the congenital deficiency of relational database and is very suitable for storing unstructured data. The mainstream NoSQL databases are MongoDB, HBase, Cassandra and so on.
The improvement of optimization effect at the simple database level is not obvious, the main thing is to choose the appropriate database according to the business scenario!
QQ technology group:
Python Operation and maintenance Development Group (249171211)
Docker Technology Exchange Group (719105297)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.