Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to learn MYSQL Advanced

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces MYSQL advanced how to learn, the article is very detailed, has a certain reference value, interested friends must read it!

Article catalogue

1 several aspects that affect performance

1.1 hardware aspect

1.2 Server system

1.3 selection of database storage engine

1.4 Database parameter configuration

1.5 Database structure design and SQL statements (focus)

2 hardware aspect

2.2.1 using traditional machine hard drives

2.2.2 using RAID to enhance the performance of traditional machine hard drives

2.2.3 using solid-state storage SSD and PCI cards

2.2.4 using network storage NAS and SAN

2.2.2.2.1 RAID 0

2.2.2.2.2 RAID 1

2.2.2.2.3 RAID 5-the common RAID group

2.2.2.2.4 RAID 10-commonly used RAID group

2.2.2.1 what is RAID

2.2.2.2 RAID level

2.2.2.3 selection of RAID level

2.2.4.1 scenarios used by network storage

2.2.4.2 limitations on network performance

2.2.4.3 impact of Network on performanc

2.1.1 how to choose CPU

2.1.2 memory

2.1.2.1 Common MySQL storage engines

2.1.2.2 Tip

2.1.2.3 how to select memory

2.1 CPU resources and available memory size

2.2 configuration and selection of disks

2.3 Summary

3 influence of operating system on performance

3.1 Optimization of CentOS system parameters

4 impact of file system on performance

5 MySQL architecture

1 several aspects that affect performance

1.1 hardware aspect

Usually the personal computer is slow, we will say that it is because of the computer hardware problems, usually CPU, memory, disk IO and other factors, so this problem will also occur on the server.

1.2 Server system

In general, the operating system of personal computers is windows, different versions of windows systems have different performance, or some parameters are configured to lead to different performance. This is the same for the server system, the setting of parameters will also affect the performance of the server.

1.3 selection of database storage engine

MySQL has a plug-in storage engine, which can choose different storage engines according to different business needs.

Different storage engines have different characteristics:

MyISAM: transactions and table-level locks are not supported.

InnoDB: transaction-level storage engine, perfect support for row-level locks, transaction ACID features.

1.4 Database parameter configuration

For different storage engines, its parameter configurations are different, some parameters have little impact on the storage engine, but some parameters play a decisive role in performance. Therefore, it is also important to optimize the parameters according to the selected storage engine and different business requirements.

1.5 Database structure design and SQL statements (focus)

When we design the database structure, we should consider what kind of sql statements we should execute on the database in the future to query and update the table structure. Only in this way can we design a table structure that meets the requirements.

For slow query, it is the main culprit for poor performance, and it is due to our unreasonable design of database table structure. For this type of sql, it is also the most difficult to optimize, because once the project is online, it is difficult to modify the database table structure.

Therefore, our focus on optimizing database performance is:

Design of database table structure

The compilation and Optimization of SQL statement

Each aspect is described in detail below.

2 hardware aspect

2.1 CPU resources and available memory size

2.1.1 how to choose CPU

Usually when choosing CPU, we all want both the frequency of CPU and the number of cores to be as high as possible, but due to cost or various factors, we can only choose one of them. So how should we choose the best plan? Therefore, there are a few things we need to pay attention to when purchasing CPU:

Is our application CPU-intensive?

If our application is CPU-intensive, to speed up the processing of sql, we obviously need better CPU, not more CPU.

For the current MySQL, duoCPU is not supported for concurrent processing of the same SQL.

What is the concurrency of our system?

If our system needs more throughput, then the more CPU we have, the better. Suppose we have 40 CPU, can we handle 40 SQL at the same time?

A measure of database processing capacity: QPS, which refers to the number of SQL processed at the same time. But this metric is the number of SQL processed in 1s, but the simultaneous processing described in the previous point is in the nanosecond dimension.

MySQL is usually used in web applications, and the concurrency is often large, so the number of CPU is more important than the frequency of CPU.

The version of MySQL that we use

Before version 5.0, MySQL's support for multicore CPU was not good, and the restriction on the system was very serious. in the current version 5.6, version 5.7, the support for multicore CPU has been greatly improved. Therefore, the latest version of MySQL is recommended to achieve better performance.

Choose 32-bit or 64-bit CPU?

At present, the CPU of the server is 64-bit architecture by default, but it is important to check whether the system has installed a 32-bit server version on 64-bit, which will seriously affect server performance.

2.1.2 memory

The size of memory directly affects the performance of the database. At present, the efficiency of memory is much higher than that of disk. Therefore, caching data in memory can greatly improve server performance.

2.1.2.1 Common MySQL storage engines

There are two commonly used storage engines: MyISAM and InnoDB.

MyISAM:

The index is stored in memory and the data is stored in the hard disk.

InnoDB:

The index and data are stored in memory, thus improving the running efficiency of the database.

2.1.2.2 Tip

Although the more memory, the better, but the impact on system performance is limited.

If the data in our database has 100g, then the memory choice is about 128g to achieve the maximum performance. at this time, if all the data is hot data, then it will be cached in memory, and there is no need for 256GB of memory. However, if you choose larger memory, the performance of other services such as the operating system will be improved accordingly, and you do not have to consider upgrading memory in the short term.

For in-memory cache write operations, write can be deferred to reduce the pressure on the database.

Memory read operations have been well supported, and write operations can also be done in memory. We all need to write data to disk at the end, although we cannot avoid writing to disk. However, we can delay the write operation and merge multiple writes into one write, reducing the pressure on the database. The database provides a similar feature where multiple writes can be merged into one in the cache pool and eventually written to disk.

2.1.2.3 how to select memory

Try to use the memory of the motherboard to support the maximum frequency.

Component purchase upgrade, each channel memory as far as possible the same brand, particle, frequency, voltage, verification technology and model.

Select memory based on database size.

2.2 configuration and selection of disks

Although memory plays an important role in database performance, we can not ignore the impact of IO subsystem on performance. At present, there are four commonly used disk choices:

2.2.1 using traditional machine hard drives

Features: large storage space, low price, the most widely used, the most common, slow reading and writing

How to choose a traditional machine hard disk?

Storage capacity

Transmission speed

Access time

Spindle speed

Physical size

2.2.2 using RAID to enhance the performance of traditional machine hard drives

2.2.2.1 what is RAID

RAID is the abbreviation of disk redundancy queue (Redundant Arrays of Independent Disks). To put it simply, the function of RAID is to make multiple disks with smaller capacity into a group of disks with larger capacity, and to provide data redundancy to ensure data integrity.

2.2.2.2 RAID level

2.2.2.2.1 RAID 0

RAID 0 is the earliest RAID pattern, also known as data stripe. It is the simplest form of component disk array, it only needs more than 2 hard disks, the cost is low, and it can improve the performance and throughput of the whole disk. RAID 0 does not provide redundancy or error repair capabilities, but the implementation cost is the lowest. However, taking into account data recovery and reliability factors, RAID 0 becomes the most expensive configuration because there is no redundancy in RAID 0, and the probability of data corruption is higher than that on a disk. Because data damage in any disk will result in data loss. For example, RAID 0, which is made up of three disks, is three times more likely to be damaged than a single hard disk.

Therefore, RAID 0 is suitable for situations where there is no single loss of data, such as standby libraries that can be cloned from other databases at any time, or some databases that only need to be used at one time.

Simply put, RAID 0 is about concatenating hard drives together to form larger disks, such as:

And in the process of concurrency, it can achieve 3 times the performance of a single hard disk.

2.2.2.2.2 RAID 1

RAID 1, also known as disk mirroring, is based on mirroring the data from one disk to another, that is, when the data is written to another disk, an image file will be generated on another limited disk to maximize the reliability and repairability of the system without affecting performance.

It's different from RAID 0 with an equals sign in the middle. The data of the two disks are the same, with good redundancy, but the cost will increase accordingly, when there is a disk failure, it can also run normally, but even if the failed disk needs to be replaced, otherwise the system will crash.

After replacing the new disk, data synchronization takes a lot of time, although it will not affect data access, but the performance of the system will decline.

In many cases, RAID 1 provides good read performance and redundant data between different disks, so data redundancy is good. RAID 1 is better at reading than RAID 0, so it is more suitable for storing logs or similar work.

2.2.2.2.3 RAID 5-the common RAID group

RAID 5 is also called distributed parity disk array. Data is distributed over multiple disks by distributed parity blocks, so that if any of the disk data fails, it can be rebuilt from the parity block. However, if two disks fail, the data of the entire volume cannot be recovered.

It can be seen that there is a Dp,Cp,Bp,Ap in each disk, and if there is a problem with one of the disks, the data of the disk can be recalculated from the data and parity values of the other three disks.

For RAID 0 and RAID 1, this is the most economical redundant configuration because the entire array configuration requires only 1 disk capacity.

Writing on RAID 5 is slow because each write takes 2 reads and 2 writes between disks to calculate the value of stored parity bits, but random and sequential reads are fast because parity bits do not need to be calculated during reading, so RAID 5 is more suitable for read-based database business.

The biggest problem with RAID 5 is when the disk fails, because the data needs to be redistributed to other disks, which will seriously affect the performance of the disk, so using RAID 5 is best used in the case of rereading.

2.2.2.2.4 RAID 10-commonly used RAID group

RAID 10 is also known as the mirror image of a fragment. It is to do RAID 1 on the disk and then do RAID 0 on the two groups of RAID 1 disks, so it has good performance for reading and writing, and it is easier and faster to rebuild than RAID 5.

On RAID 10, if a hard disk is damaged, it has a serious impact on performance, because in the process of reading and writing, it can be read on two adjacent disks at the same time, but if one is damaged, it can only be read from a single disk, so in the worst case, our performance will be reduced by 50%.

2.2.2.3 whether the selection level of RAID level features redundant disk number read / write RAID 0 cheap, fast, dangerous N fast RAID 1 high speed read, simple, safe 2 fast and slow RAID 5 safe, cost compromise 1 fast depends on the slowest disk RAID 10 expensive, high speed, safe has 2N fast

2.2.3 using solid-state storage SSD and PCI cards

Solid-state storage is also called flash memory.

Features:

Compared with mechanical disk, solid state disk has better random read and write performance.

Solid state disks have better support for concurrency than mechanical disks.

Solid state disks are more likely to be damaged than mechanical disks

SSD features:

Using the SATA interface, you can replace traditional disks without any changes

SSD of SATA interface also supports RAID technology.

Solid-state storage PCIe card features:

Unable to use SATA interface, unique driver and configuration are required

The price is more expensive than SSD, but the performance is better than SSD.

Usage scenarios of solid-state storage

Suitable for scenarios with a large number of random Istroke O

Use Istroke O bottleneck to solve single-threaded load

2.2.4 using network storage NAS and SAN

SAN (Strorage Area Network) and NAS (Network-Attached Storage) are two methods of loading external file storage devices onto the server.

SAN:

The SAN device is connected to the server through optical fiber, the device is accessed through the block interface, and the server can use it as a hard disk.

Characteristics of SAN:

NAS:

NAS devices use a network connection and are accessed through a file-based protocol such as NFS or SMB.

2.2.4.1 scenarios used by network storage

Suitable for database backup.

2.2.4.2 limitations on network performance

The main limitations of network performance are delay and bandwidth.

2.2.4.3 impact of Network on performanc

The effect of Network bandwidth on performance

The influence of Network quality on performance

Recommendations:

Use high-performance and high-bandwidth network interface devices and switches

Bind multiple network cards to enhance availability and bandwidth

Isolate the network as much as possible

2.3 Summary

CPU:

64-bit CPU must work under 64-bit system

For scenarios with high concurrency, the number of CPU is more important than frequency.

For CPu-intensive scenarios and complex SQL, the higher the frequency, the better.

Memory:

Select the highest frequency of memory that can be used by the motherboard

The size of memory is important for performance, so be as large as possible

Ipaw O subsystem:

PCIe-> SSD-> RAID10-> disk-> SAN

3 influence of operating system on performance

Suitable operating system for MySQL: Windows,FreeBSD,Solaris,Linux

3.1 Optimization of CentOS system parameters

Kernel related parameters (/ etc/sysctl.conf)

Net.core.somaxconn = 65535

For ports in a listening state, each has its own listening queue, and this parameter determines the maximum length of the listening queue for each port. The default value for this parameter may be small, but not enough for large servers, and will generally be changed to a value of 2048 or greater.

Net.core.netdev_max_backlog=65535

Net.ipv4.tcp_max_syn_backlog=65535

The parameter backlog determines the maximum number of packets allowed to be sent to the queue when each network interface receives packets faster than the kernel processor, while the other parameter determines the maximum number of requests that can be kept in the queue for which these requests have not yet been connected. Connections that exceed this value may be discarded, so make them larger at the same time.

Net.ipv4.tcp_fin_timeout = 10

This parameter is used to control the timeout of the wait state for tcp connection processing. For systems with frequent connections, usually a large number of connections are waiting. The setting of this parameter is to reduce the connection timeout time and speed up the recovery speed of tcp. There are also two parameters that have an impact on tcp connections:

Net.ipv4.tcp_tw_reuse = 1, net.ipv4.tcp_tw_recycle = 1

These three parameters are mainly to speed up the recovery of tcp. In a high-load system, if the tcp connection is full, there will be an error of connecting to database 500. therefore, these three parameters play an important role.

Net.core.wmem_default = 87380, net.core.wmem_max = 16777216, net.core.r0mem_default = 87380, net.core.rmem_max = 16777216

The above four parameters determine the default and maximum values of the receive and send buffer sizes for tcp connections. For the database, you should adjust the values of these parameters to a little larger.

Net.ipv4.tcp_keepalive_time = 120, net.ipv4.tcp_keepalive_intvl = 30, net.ipv4.tcp_keepalive_probes = 3

The above three parameters are used to reduce the number of tcp system resources occupied by failed connections and speed up the efficiency of resource recovery. Net.ipv4.tcp_keepalive_time is the interval between tcp sending tcp_keepalive probe messages, in seconds, to confirm whether the tcp connection is valid. Net.ipv4.tcp_keepalive_intvl is used to resend the probe message (in seconds) when the tcp connection does not respond. Net.ipv4.tcp_keepalive_probes indicates how many tcp_keepalive probe messages need to be sent before the tcp connection is determined to be invalid. The default values of these three parameters are a little too large for a normal system, so they are changed to a little smaller here.

Kernel.shmmax = 4294967295

This parameter is one of the most important of the Linux kernel parameters and is used to define the maximum value of a single shared memory segment.

Note:

This parameter should be set large enough to accommodate the entire size of the Innodb buffer pool under one shared memory segment.

For a 64 Linux system, the maximum recommended value is physical memory-1 byte, and the recommended value is more than half of the physical memory segment. Generally, the value is larger than the size of the Innodb buffer pool. You can take the physical memory-1 byte.

Vm.swappiness = 0

This parameter will have a significant impact on performance when there is insufficient memory. This parameter tells the Linux system kernel not to use the swap area unless the virtual memory is completely full.

Linux system memory swap partition:

When a Linux system is installed, there is a special disk partition called the system swap partition. If we use free-m to look in the system, we can see something similar to the following, where swap is the swap partition. When the operating system does not have enough memory, it writes some virtual memory to the swap area of the disk so that memory swapping occurs.

Completely disabling swap partitions on the Linux system where the MySQL service resides poses the following two risks:

Degrade the performance of the operating system

Easy to cause memory overflow, crash, or be dropped by the operating system Kill

Increase resource limits (/ etc/security/limit.conf)

The limit.conf file is actually the configuration file of the plug-in authentication module Linx PAM.

One of the more important parameter configurations is the limit on the number of open files.

Conclusion: increase the number of files that can be opened to 65535 to ensure that enough file handles can be opened.

Note: changes to this file will take effect after the server is restarted.

Disk scheduling policy (/ sys/block/devname/queue/scheduler)

You can use the command cat / sys/block/sda/queue/scheduler to view the scheduling policy used by the current disk. The following noop anticipatory deadline [cfq] is the default cfq scheduling policy for the system.

Under MySQL database service, cfq is not appropriate because cfq inserts some unnecessary requests into the queue during MySQL work, resulting in poor response time.

In addition to the cfq scheduling policy, there are the following policies:

Noop (elevator scheduling strategy):

Deadline (deadline scheduling policy):

Anticipatory (expected Imax O scheduling policy):

We can change the disk scheduling policy by entering the following command:

Echo schedulerName > / sys/block/sda/queue/scheduler

For example: echo deadline > / sys/block/sda/queue/scheduler

4 impact of file system on performance

The XFS file system is recommended. The following parameters need to be configured under EXT3 and EXT4:

Mount parameters of the EXT3/4 system (/ etc/fstab):

Data=writeback | ordered | journal

There are three optional values for this parameter. Writeback means that only metadata is written to the log, and metadata writing and data writing are not synchronized. This is the fastest configuration, because InnoDB originally has its own transaction log, so it is usually the best choice for InnoDB. Ordered only records metadata, but provides some assurance of consistency. Data is written to keep it consistent before it is written. This option is slightly slower than writeback, but it is more secure in the event of a crash. Journal provides the behavior of the atomic log, which is recorded in the log before the data is written to the final log. This option is obviously not necessary for InnoDB and is the slowest of the three.

Noatime 、 nodiratime

These two options are used to record the access time of the file and the time the directory was read. Setting these two parameters can reduce some write operations. The system does not have to write to record the above two times when reading files and directories.

Here are some configurations in the file / dev/sda1/ext4:

Noatime,nodiratime,data=writeback 1 1

5 MySQL architecture

The top layer of the architecture is called the client. This layer represents the clients that can connect to the mysql through the mysql connection protocol, such as PHP,JAVA,C API,.Net and ODBC,JDBC. From this, we can see that this layer is not unique to the mysql architecture. Most of the services of the CS architecture adopt this architecture. This layer mainly completes some functions such as connection processing, authorization authentication and security. Each client connected to mysql has a thread in the server's process, and the query for this connection will only be executed in that thread, that is, as we mentioned earlier, the query for each connection uses only one core of CPU.

So the second layer of the system, most of the mysql core services are in this layer, as shown in the following figure.

Our commonly used DDL or DML statements are defined at this level. But we just have to remember that all the functions across the storage engine are implemented in this layer, because this layer is also called the service layer.

The third layer of our architecture is the storage engine layer. Mysql is a very excellent open source database, in which a series of storage engine interfaces are defined. As long as we meet the requirements of the storage engine, we can develop a storage engine for mysql that fully meets our needs, such as our commonly used InnoDB. At present, there are many storage engines supported by mysql, as shown below:

Note: the storage engine is for tables, not libraries (different tables in a library can use different storage engines)

The above is all the contents of the article "how to learn MYSQL Advanced". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report