MYSQL Database Learning Series 4 07/02 Update SLTechnology News&Howtos

MYSQL Database Learning Series 4

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

MYSQL Database Learning Series 4

IV. Application optimization of MySQL

Optimization and Design of 4.1-MySQL Index

What is an index?

The meaning of Index-- quickly locate the data to be found

Database index lookup

full table scan VS index lookup

How to find the line based on the initials

binary search

B+tree

InnoDB clustered index

There are only sort fields and ID in the index

Create an index

single column index

Create index idx_test1 on tb_student (name)

federated index

Create index idx_test2 on tb_student (name, age)

In the index, first sort according to name, in the case of the same name, sort by age

Index maintenance

index maintenance is done automatically by the database.

inserts / modifies / deletes each index row becomes an internally encapsulated transaction

The more indexes, the longer the transaction and the higher the cost

The more indexes, the slower the table inserts and index fields are modified

controls the number of indexes on the table. Do not add useless indexes indiscriminately.

How to use an index

builds indexes based on WHERE query conditions

Select a, b from tab_a where canals?

Idx_c (c) select a, b from tab_a where canals? And dudes?

Idx_cd (c, d)

sort order by, group by, distinct fields add index

Select from tb_a order by a select a, count () from tb_a group by a

Idx_a (a)

Select from tb_a order by a, b

Idx_a_b (a, b)

Select from tb_a order where Cruise? By a

Idx_c_a (c, a)

Index and field selectivity



The degree of repetition of the value of a field



Fields with poor selectivity are usually not suitable for creating single-column indexes



O gender in a list with similar gender ratio is not suitable for creating a single-column index

O if the male-to-female ratio is extremely uneven, the few parties to be inquired (female students in polytechnic colleges) can consider using the index.



The selective fields in the federated index should be at the top of the list.



Select from tab_a where gender=? And name=?

Idx_a1 (name, gender)

Joint index and prefix query

federated index can help prefix single-column and multiple-column queries.

Idx_smp (a, b, c) where axioms; where axioms? And baked?; where averse? And censor?; (part of ok)

reasonably creates federated indexes to avoid redundancy (a), (a, b), (a, b, c) X (a, b, c) ok

Index on long field

indexes on very long fields affect performance

InnoDB Index single Field (utf8) can only take the first 767 bytes

The method of dealing with long Fields in 

OEmail class, prefix indexing

Mail_addr varchar (2048)

Idx_mailadd (Mail_addr (30)) ok

O address class, split field

Home_address varchar (2048)

Idx_mailadd (Mail_addr (30))?-it is possible that the first half are all the same street names of provinces, municipalities and autonomous regions.

Province varchar (1024), City varchar (1024), District varchar (1024), Local_address varchar (1024)...-create a federated index or a single-column index

Index override scan

core SQL considers index coverage

Select Name from tb_user where UserID=?

Key idx_uid_name (UserID, Name)



does not need to go back to the table to obtain name fields. The IO is the smallest and the efficiency is the highest.

Unable to use the index

index columns perform mathematical or functional operations

Where id+1=10; Xwhere id = (10-1); ok

Year (col)

< 2007; X col < '2007-01-01'; ok  未含符合索引的前缀字段 Idxabc (a, b, c):where b=? and c=?; X (b, c) ok  前缀通配,''和'%'通配符 Like '%xxx%'; XLike 'xxx%'; ok  where 条件使用NOT, , != 字段类型匹配 o并不绝对，但是无法预测地会造成问题，不要使用 a int(11), idx_a (a)where a = '123'; Xwhere a = 123 ; ok 利用索引排序 idx_a_b (a, b) 能够使用索引帮助排序的查询： order by a a = 3 order by border by a, border by a desc, b desc a >

5 order by a

Queries that cannot use indexes to help sort:

Order by b

A > 5 order by b

An in (1,3) order by border by an asc, b desc

How to determine whether a query has gone or not, and which index has gone?

explain is the easiest and most effective way to determine how to index a query. Explain select from tb_test

Projects followed by 

Otype: how to query access

Okey: this query finally chooses which index to use. NULL is an unused index.

Okey_len: the prefix length or the entire length used by the selected index

Orows: can be understood as query logic read, the number of rows of records that need to be scanned

Oextra: additional information, mainly referring to the specific way of fetch data

4.2-MySQL database design

What is Schema Design?

designs the tables, indexes, and the relationship between tables and tables in the database

O transform the relational model into database tables based on the data model.

O optimize table structure according to database and application characteristics on the basis of meeting the needs of business model

Why does Schema need to design

Schema is related to application functionality and performance

O meet business functional needs

O is closely related to performance

O Database scalability

O meet surrounding needs (statistics, migration, etc.)

Modifying Schema in relational database is often a high-risk operation.

OSchema design should be forward-looking.

Schema design led entirely by developers

focuses on implementing current functions

There may be some hidden dangers in the completely function-based design of .

O unreasonable table structure or index design causes performance problems

O it is not reasonably assessed that the growth in the amount of data makes space tight and difficult to maintain.

O frequent changes in the table structure caused by frequent changes in requirements

O significant business adjustments result in frequent refactoring and revision of the data

Performance-based table design

designs the index according to the query needs.

adjusts the table structure appropriately according to the core query requirements

adjusts its implementation based on some special business requirements

Indexes

uses the index correctly

updates use primary keys or unique indexes whenever possible

The primary key uses self-incrementing ID fields whenever possible

Core query override scan

O user login needs to return a password based on the user name to authenticate create index idx_uname_passwd on tb_user (username, password)

O establish a federated index to avoid returning to the table to fetch data

Anti-normal form, redundant necessary fields

retains the redundant fields necessary for query results for the core SQL to avoid frequent join

O example: the nickname field that must be returned for each read message is redundant in the message table to prevent the message from becoming a join operation each time. The cost is that it becomes more expensive for users to modify nickname.

Split large field

splits large fields into a single table to avoid costly range scanning.

O example: the blog table is divided into two parts, the title table retains only the title and content abbreviations, which is used to quickly return the title list in batches, and the body table saves large chunks of blog content, which is used to click on articles and read them individually.

Avoid too many fields or too long lines

returns the design field according to SQL, and disassembles the table if necessary to avoid too many fields.

There is no need for to get so many columns of data at a time.

Excessively long rows result in fewer table data page records and lower range scanning performance

The cost of updating data by also increases.

16K also puts at least 2 lines, and row migration may occur.

Paging query

avoids excessive limit + offset

should use the self-increasing primary key ID to simulate paging

O on the first page, check directly

O get the first page max (id) = 123 (usually the last record)

O second page, with id > 123 query: where id > 123 limit 100

O so only 100 pieces of data need to be scanned at a time

requires that business forbids querying data after XX pages.

Special processing of hot reading data

does special processing to hot spot data according to the frequency or quantity of data acquisition.

O example 1: there are top posts and announcement posts in the forum system, which can be split and stored separately, because every visit must be read out and put together separately, so as to avoid finding them randomly in the ordinary table every time.

Special processing of hot write data

does special processing to hot spot data according to the frequency or quantity of data acquisition.

Example 2: the hot account messages that follow a lot of people in Weibo system are changed from "push" to "pull" to avoid excessive insert operations.

Quasi-real-time statistics

sets up a table of regularly updated results for statistical requirements such as counting without accurate results.

O example: the home page is required to show the total amount of dynamic transactions, maintain a counting table, and update the counting table every minute according to the registration time of the original table, so as to avoid scanning the full transaction record table every time the user refreshes.

Real-time statistics improved 1-trigger real-time statistics

maintains the count table using database triggers for counts that require accurate statistics.

O example: users rush 100 million activities require real-time statistics, add triggers to the user table, and add + 1 to the count table every time a new user is inserted.

Real-time statistics improvement 2-cache real-time statistics

maintains the count in real time by using the front-end cache for counts that require accurate statistics.

O example: 100 million user activity requires real-time statistics, the number of registrations is maintained in the cache in real time, and each registration is + 1, which completely avoids database read and write operations. In case of failure, the cache can be retrieved from the overall count of the database.

Real-time statistics improvement 2-maximum self-increasing total number of ID feeds

A lot of logic can use the maximum value of the self-increasing ID primary key directly as the total.

O example: 100 million user activity requires real-time statistics, and the user table with self-increasing ID as the primary key can get the total number of users as long as the current max (ID) is selected.

Curriculum expansive design

extensibility

O handle online business for as long as possible when there is a limit to the growth of hardware resources

data classification, cold data archiving and elimination

O you can constantly free up space for new data.

prepares for data distribution

O Sub-database sub-table

O horizontal split

O sacrifice some relational model support

Partition table and data elimination

range partition

is suitable for large tables where data needs to expire periodically.

single partition scan migrates data to history database to avoid IO overhead of full table scan

is very efficient in deleting a single partition

Partition table and vertical partition

list partition

is suitable for the way that data may be split vertically in the future based on region, category, etc.

is very efficient in cleaning up unwanted data on nodes.

Partition table and horizontal partition

hash partition

is suitable for tables that need to be split horizontally in the future.

is very efficient in cleaning up unwanted data on nodes.

Limitations of MySQL partitioned tables

The primary key or unique key must be included in the partition field

The partition field must be an integer type, or add a function that returns an integer

Meet the surrounding needs

adds additional table design for surrounding requirements.

O add special indexes for background statistics tasks

O add timestamps for data migration or statistical requirements

Statistical and background demand

The operation of statistics SQL is often very different from that online.

O using the characteristics of MySQL-- master and multi-slave, master and slave can build different indexes to divert statistics to a specific slave database.

O including batch queries for some special users, all queries with IO pressure on the line have to be read and written separately.

Automatically update timestamps

statistical requirements often require incremental data to be read off the line.

If no value is filled in when the first timestamp type field of the table is rewritten, it will be automatically written to the system timestamp

The first timestamp type field of the table is automatically updated each time the record is updated.

indexes the update_time field to export incremental data on a regular basis

Schema Design and forward-looking

based on historical experience and lessons to prevent and solve similar problems

records and analyzes and summarizes the reasons for all the Schema modifications that are difficult for DBA:

Taking business as an example, the encryption of user information has been greatly modified.

O A large number of changes have been made to the database results, including the addition of encrypted fields, validation policy tables, re-revision of data from all tables, etc.

O do all applications that use user information management go online and use ciphertext?

program bug mistakenly deletes data, which is risky online.

O transform business processes, no longer delete data, add is_deleted tag bits, and often add to various tables

O will future similar tables be marked with bits as soon as they are launched, with the modification reason field added?

The risk control was reformed in the later stage of payment application.

O transform the large table of online order, adding quota, terminal type and other fields

O when it comes to payment applications, do you remind the business whether it is necessary to consider risk control and keep relevant fields as soon as it is launched?

4.3-MySQL capacity evaluation

Performance capacity evaluation

analyzes online business scenarios

evaluates the performance metrics required by the database server

estimates server resources that may become bottlenecks

helps with database performance tuning

Database server hardware performance index

disk IO performance

memory capacity

CPU

network throughput

disk capacity

Database business characteristics keywords

OLTP/OLAP type

concurrent request

read-write ratio

data volume

hot and cold data ratio

Hierarchical storage of data

OLTP/OLAP

T = Transaction

is oriented to the majority of users, with high concurrency and short transaction operations.

Most of Internet applications belong to OLTP.

OLTP values server CPU, memory, write transactions or insufficient memory to rely on disk IO

A = Analytical

is usually oriented to insiders, and large-scale re-query

OLAP values the IO capabilities of disk scanning and relies in part on memory sorting

Concurrent requests-measure the busy level of online business

What is the number of concurrent visits per second to the database at the peak of business?

judges by the number of application servers and connection pool configuration

judges by estimating the size of initial online users and the growth rate of users.

judges by the type of actual business

concurrency related resources: CPU

Read-write ratio-describes how the application uses the database

Ratio of select read-only to update/delete/insert write operations for online services

delete/update usually reads before writing.

insert needs to distinguish between persistent insert and bulk imported data when data is written.

analysis based on the actual business scenario

multi-read scenario related resources: memory

multi-write scenario related resources: disk IO

Amount of data-Total

Upper limit of scalable capacity of storage devices for database servers

analyzes the data growth according to the estimated business volume, writing mode and analyzing the data growth.

estimates the total amount of data that can be stored in the database during a hardware upgrade cycle, leaving a good margin when going online.

total data related resources: disk capacity

Cold data and Hot data-Real-time Collection of useful data

hot data, the latest online data that will be accessed repeatedly in a certain period.

cold data, which is stored online and will not be used by online users recently.

estimates the number of active users and data growth to estimate the amount of thermal data.

memory size is as large as possible to store online real-time hot data.

hot data related resources: memory

Online data hierarchical storage-relieving online disk space pressure

The latest hot data of is kept in memory.

Earlier data that may also be accessed by are stored on the disk of the online library.

's earlier data that is not routinely accessed is periodically migrated to the history library.

distinguishes which data can be migrated with strong timeliness.

Server resource selection-list the options

Options for resource indicators

Disk IO performance single disk-> disk array; SATA-> SAS; HDD-> SSD

Smaller memory capacity-> larger memory

CPU normal-> multicore, hyperthreaded

Network throughput: gigabit-> 10-gigabit; single Nic-> multiplex

Disk capacity single disk-> disk array; single disk-> LVM

Case one, NetEase Yun music library database server evaluation

is used to store tens of millions of songs online.

determines that it belongs to OLTP online type database.

Number of concurrent requests for 

O50 application servers with a maximum of 100 connections each

O there may be peak 5000qps, with a large number of concurrent requests.

High demand for CPU

read-write ratio

O the access mode is mainly based on the user listing the playlist and querying the song information when playing the song, and the user only has a read-only query.

O write data occurs in the background operation when inputting new songs or modifying song information, the writing proportion is small, and it is imported in batches.

O 100 read / write ratio 1

Total amount of data

O estimate the information of each song is 8K, totaling 50 million, with a total of 400G.

O the growth of total data is relatively slow

hot and cold data

About 40% of the 50 million songs may be accessed, and 10% are hot songs.

O Thermal data is about 40G

General requirements for disk IO capabilities

network traffic requirements, 8k*2500/1024 ≈ 20MB/S, general

Options for resource indicators

Disk IO performance two SAS to do RAID1

Memory capacity 96 GB memory

CPU 2c8core hyperthreading is equivalent to 32 cores.

Network throughput gigabit dual network card bunding

Disk capacity 900G

Case 2, NetEase financial management and sales database server evaluation

is used to store financial users' online orders.

determines that it belongs to OLTP online type database.

The business scenario has obvious characteristics.

O there are a large number of concurrent orders written in the sales time window for specific high-interest products.

O usually only a small number of order inquiries and requests, and lower regular product purchase requests

assessment should be based on meeting the most critical business peak

Number of concurrent requests for 

O the duration of the second kill period is short, but the concurrency estimate of 30 application servers is about 2000tps.

read-write ratio

O Writing orders during peak hours is the main overhead operation.

CPU is demanding.

disk IO is very demanding

Total amount of data

O according to business analysis, the order is written in a large amount in an instant, with a small total amount and a high amount per transaction.

O the total volume is estimated to have a turnover of one million a year, with a relatively stable growth.

O determine that the data storage requirement is less than 200G

hot and cold data

O Peak write is dominant, and memory is required to store dirty data generated during the hot spot.

Hierarchical storage requirements for data

O the user order business agreement page shows the order for the last half a year. If you need to go to the history query page half a year ago, you will make a special query.

O so you can do hierarchical storage and migrate all orders from half a year ago to the history database.

General memory requirements for , > = 30g

General disk space requirements for , > = 200g

The demand for disk IO capacity is high.

network has higher requirements.

O High concurrent traffic

O High response speed is required

Options for resource indicators

Disk IO performance two SSD to do RAID1

Memory capacity 64GB memory

CPU 2c8core hyperthreading is equivalent to 32 cores.

Network throughput 10 Gigabit dual network card bunding

Disk capacity 600G

4.4-MySQL performance test

Why do you need performance testing

lacks psychological estimation of online products.

reproduces online anomalies

plans for future business growth

tests different hardware and software configurations

Classification of performance tests

Testing of device layer

Testing of Business layer

Testing of database tier

Testing of device layer

Indicators concerned by 

O Server, disk performance

O disk bad block rate

O Server life

Business layer testing

tests for business

Database layer testing

Under what circumstances should do the MySQL test?

O Test different MySQL branch versions

O Test different versions of MySQL

O Test different MySQL parameter collocation

MySQL test classification

CPU Bound

IO Bound

Write test update test read-only test mixed mode

Commonly used testing tools

's open source MySQL performance testing tool

Osysbench

Otpcc-mysql

Omysqlslap

writes performance testing tools for business

Oblogbench

Performance test measurement index

service throughput (TPS, QPS)

service response time

service concurrency

Sysbench

A well-known performance testing tool in the industry

can test disk, CPU, database

supports multiple databases: Oracle, DB2, MySQL

needs to download, compile and install it itself.

recommended version: sysbench0.5

Compile and install Sysbench

download sysbench

Ogit clone https://github.com/akopytov/sysbench.git

compilation & installation

O./autogen.sh

O./configure

Omake & & make install

Sysbench process

Common practices in 

Initialize data-> run Test-> Clean up data

Prepare syntax

Sysbench-test=parallel_prepare.lua-oltp_tables_count=1-rand-init=on-oltp-table-size=500000000-mysql-host=127.0.0.1-mysql-port=3306-mysql-user=sys-mysql-password=netease mysql-db=sbtest-max-requests=0 prepare

Parameter meaning

-- test=parallel_prepare.lua runs scripts that import data

How many tables are needed for the oltp_tables_count test

-- oltp-table-size the size of each table

-- mysql-host MySQL Host

-- mysql-port MySQL Port

-- mysql-db MySQL DB

-- mysql-user MySQL User

-- mysql-password MySQL Password

-- whether rand-init initializes data randomly

-- how many requests have been executed by max-requests and then stopped

Prepare executes the guided data

Sysbench table structure

Create table 'sbtest1' (

'id' int (10) unsigned not null AUTO_INCREMENT

'k 'int (10) unsigned not null DEFAULT' 0'

'c 'char (120) not null DEFAULT'

'pad' char (60) not null DEFAULT''

PRIMARY KEY ('id')

KEY 'kumb1' (' k')

) ENGINE=InnoDB AUTO_INCREMENT=3000000001 DEFAULT CHARSET=utf8 MAX_ROWS=1000000

Run syntax

Sysbench-test=oltp.lua-oltp_tables_count=1-num-threads=100-oltp-table-size=500000000-oltp-read-only=off-report-interval=10-rand-type=uniform-mysql-host=127.0.0.1-mysql-port=3306-mysql-user=sys-mysql-password=netease mysql-db=sbtest max-time=1000 max-requests=0 run

Parameter meaning

-- lua script that test=oltp.lua needs to run

How many tables are needed for the oltp_tables_count test

-- oltp-table-size the size of each table

-- the number of concurrent threads in num-threads testing

Whether oltp-read-only is a read-only test

-- report-interval result output interval

-- rand-type data distribution pattern, hot spot data or random data

-- max-time maximum elapsed time

-- how many requests have been executed by max-requests and then stopped

Prepare starts testing

Special circumstances

write test

Write data for testing-> Clean data

Cleanup

manual drop table drop and database

uses the cleanup command provided by sysbench

Tpcc-mysql



TPC-C is a specification specifically for online transaction processing systems (OLTP systems)



Tpcc-mysql is implemented by percona according to the specification



Download Tpcc-mysql



Obzr branch lp:~percona-dev/perconatools/tpcc-mysql



Compilation and installation



To use Tpcc-mysql

Create table structure and index-> Import data-> run Test-> data cleanup

Create a table structure

create_table.sql

add_fkey_idx.sql

Tpcc-load

Tpcc_load [server] [DB] [user] [pass] [warehouse]

Function meaning

Server database IP

DB DB name

User user name

Pass password

Number of warehouse warehouses

Tpcc-start

Tpcc_start-h server_host-P port- d database_name-u mysql_user-p mysql_password-w warehouse-c connections-r warmup_time-I running_time-I report-interval-f report-file

Function meaning

Number of warehouse warehouses

Number of connections concurrent threads

Warmup_time warm-up time

Running_time run time

Report_interval output interval

Report_file output file

Summary

The amount of IO Bound test data is much larger than memory, and the amount of CPU Bound test data is less than memory.

The test time is recommended to be more than 60 minutes to reduce the error.

Sysbench is more inclined to test MySQL performance and TPCC is closer to business.

When runs the test program, it needs to monitor the machine load and MySQL monitoring indicators at the same time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.