In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
MYSQL Database Learning Series 4
IV. Application optimization of MySQL
Optimization and Design of 4.1-MySQL Index
What is an index?
The meaning of Index-- quickly locate the data to be found
Database index lookup
full table scan VS index lookup
How to find the line based on the initials
binary search
B+tree
InnoDB clustered index
There are only sort fields and ID in the index
Create an index
single column index
Create index idx_test1 on tb_student (name)
federated index
Create index idx_test2 on tb_student (name, age)
In the index, first sort according to name, in the case of the same name, sort by age
Index maintenance
index maintenance is done automatically by the database.
inserts / modifies / deletes each index row becomes an internally encapsulated transaction
The more indexes, the longer the transaction and the higher the cost
The more indexes, the slower the table inserts and index fields are modified
controls the number of indexes on the table. Do not add useless indexes indiscriminately.
How to use an index
builds indexes based on WHERE query conditions
Select a, b from tab_a where canals?
Idx_c (c) select a, b from tab_a where canals? And dudes?
Idx_cd (c, d)
sort order by, group by, distinct fields add index
Select from tb_a order by a select a, count () from tb_a group by a
Idx_a (a)
Select from tb_a order by a, b
Idx_a_b (a, b)
Select from tb_a order where Cruise? By a
Idx_c_a (c, a)
Index and field selectivity
The degree of repetition of the value of a field
Fields with poor selectivity are usually not suitable for creating single-column indexes
O gender in a list with similar gender ratio is not suitable for creating a single-column index
O if the male-to-female ratio is extremely uneven, the few parties to be inquired (female students in polytechnic colleges) can consider using the index.
The selective fields in the federated index should be at the top of the list.
Select from tab_a where gender=? And name=?
Idx_a1 (name, gender)
Joint index and prefix query
federated index can help prefix single-column and multiple-column queries.
Idx_smp (a, b, c) where axioms; where axioms? And baked?; where averse? And censor?; (part of ok)
reasonably creates federated indexes to avoid redundancy (a), (a, b), (a, b, c) X (a, b, c) ok
Index on long field
indexes on very long fields affect performance
InnoDB Index single Field (utf8) can only take the first 767 bytes
The method of dealing with long Fields in
OEmail class, prefix indexing
Mail_addr varchar (2048)
Idx_mailadd (Mail_addr (30)) ok
O address class, split field
Home_address varchar (2048)
Idx_mailadd (Mail_addr (30))?-it is possible that the first half are all the same street names of provinces, municipalities and autonomous regions.
Province varchar (1024), City varchar (1024), District varchar (1024), Local_address varchar (1024)...-create a federated index or a single-column index
Index override scan
core SQL considers index coverage
Select Name from tb_user where UserID=?
Key idx_uid_name (UserID, Name)
does not need to go back to the table to obtain name fields. The IO is the smallest and the efficiency is the highest.
Unable to use the index
index columns perform mathematical or functional operations
Where id+1=10; Xwhere id = (10-1); ok
Year (col)
< 2007; X col < '2007-01-01'; ok 未含符合索引的前缀字段 Idxabc (a, b, c):where b=? and c=?; X (b, c) ok 前缀通配,''和'%'通配符 Like '%xxx%'; XLike 'xxx%'; ok where 条件使用NOT, , != 字段类型匹配 o并不绝对,但是无法预测地会造成问题,不要使用 a int(11), idx_a (a)where a = '123'; Xwhere a = 123 ; ok 利用索引排序 idx_a_b (a, b) 能够使用索引帮助排序的查询: order by a a = 3 order by border by a, border by a desc, b desc a >5 order by a
Queries that cannot use indexes to help sort:
Order by b
A > 5 order by b
An in (1,3) order by border by an asc, b desc
How to determine whether a query has gone or not, and which index has gone?
explain is the easiest and most effective way to determine how to index a query. Explain select from tb_test
Projects followed by
Otype: how to query access
Okey: this query finally chooses which index to use. NULL is an unused index.
Okey_len: the prefix length or the entire length used by the selected index
Orows: can be understood as query logic read, the number of rows of records that need to be scanned
Oextra: additional information, mainly referring to the specific way of fetch data
4.2-MySQL database design
What is Schema Design?
designs the tables, indexes, and the relationship between tables and tables in the database
O transform the relational model into database tables based on the data model.
O optimize table structure according to database and application characteristics on the basis of meeting the needs of business model
Why does Schema need to design
Schema is related to application functionality and performance
O meet business functional needs
O is closely related to performance
O Database scalability
O meet surrounding needs (statistics, migration, etc.)
Modifying Schema in relational database is often a high-risk operation.
OSchema design should be forward-looking.
Schema design led entirely by developers
focuses on implementing current functions
There may be some hidden dangers in the completely function-based design of .
O unreasonable table structure or index design causes performance problems
O it is not reasonably assessed that the growth in the amount of data makes space tight and difficult to maintain.
O frequent changes in the table structure caused by frequent changes in requirements
O significant business adjustments result in frequent refactoring and revision of the data
Performance-based table design
designs the index according to the query needs.
adjusts the table structure appropriately according to the core query requirements
adjusts its implementation based on some special business requirements
Indexes
uses the index correctly
updates use primary keys or unique indexes whenever possible
The primary key uses self-incrementing ID fields whenever possible
Core query override scan
O user login needs to return a password based on the user name to authenticate create index idx_uname_passwd on tb_user (username, password)
O establish a federated index to avoid returning to the table to fetch data
Anti-normal form, redundant necessary fields
retains the redundant fields necessary for query results for the core SQL to avoid frequent join
O example: the nickname field that must be returned for each read message is redundant in the message table to prevent the message from becoming a join operation each time. The cost is that it becomes more expensive for users to modify nickname.
Split large field
splits large fields into a single table to avoid costly range scanning.
O example: the blog table is divided into two parts, the title table retains only the title and content abbreviations, which is used to quickly return the title list in batches, and the body table saves large chunks of blog content, which is used to click on articles and read them individually.
Avoid too many fields or too long lines
returns the design field according to SQL, and disassembles the table if necessary to avoid too many fields.
There is no need for to get so many columns of data at a time.
Excessively long rows result in fewer table data page records and lower range scanning performance
The cost of updating data by also increases.
16K also puts at least 2 lines, and row migration may occur.
Paging query
avoids excessive limit + offset
should use the self-increasing primary key ID to simulate paging
O on the first page, check directly
O get the first page max (id) = 123 (usually the last record)
O second page, with id > 123 query: where id > 123 limit 100
O so only 100 pieces of data need to be scanned at a time
requires that business forbids querying data after XX pages.
Special processing of hot reading data
does special processing to hot spot data according to the frequency or quantity of data acquisition.
O example 1: there are top posts and announcement posts in the forum system, which can be split and stored separately, because every visit must be read out and put together separately, so as to avoid finding them randomly in the ordinary table every time.
Special processing of hot write data
does special processing to hot spot data according to the frequency or quantity of data acquisition.
Example 2: the hot account messages that follow a lot of people in Weibo system are changed from "push" to "pull" to avoid excessive insert operations.
Quasi-real-time statistics
sets up a table of regularly updated results for statistical requirements such as counting without accurate results.
O example: the home page is required to show the total amount of dynamic transactions, maintain a counting table, and update the counting table every minute according to the registration time of the original table, so as to avoid scanning the full transaction record table every time the user refreshes.
Real-time statistics improved 1-trigger real-time statistics
maintains the count table using database triggers for counts that require accurate statistics.
O example: users rush 100 million activities require real-time statistics, add triggers to the user table, and add + 1 to the count table every time a new user is inserted.
Real-time statistics improvement 2-cache real-time statistics
maintains the count in real time by using the front-end cache for counts that require accurate statistics.
O example: 100 million user activity requires real-time statistics, the number of registrations is maintained in the cache in real time, and each registration is + 1, which completely avoids database read and write operations. In case of failure, the cache can be retrieved from the overall count of the database.
Real-time statistics improvement 2-maximum self-increasing total number of ID feeds
A lot of logic can use the maximum value of the self-increasing ID primary key directly as the total.
O example: 100 million user activity requires real-time statistics, and the user table with self-increasing ID as the primary key can get the total number of users as long as the current max (ID) is selected.
Curriculum expansive design
extensibility
O handle online business for as long as possible when there is a limit to the growth of hardware resources
data classification, cold data archiving and elimination
O you can constantly free up space for new data.
prepares for data distribution
O Sub-database sub-table
O horizontal split
O sacrifice some relational model support
Partition table and data elimination
range partition
is suitable for large tables where data needs to expire periodically.
single partition scan migrates data to history database to avoid IO overhead of full table scan
is very efficient in deleting a single partition
Partition table and vertical partition
list partition
is suitable for the way that data may be split vertically in the future based on region, category, etc.
is very efficient in cleaning up unwanted data on nodes.
Partition table and horizontal partition
hash partition
is suitable for tables that need to be split horizontally in the future.
is very efficient in cleaning up unwanted data on nodes.
Limitations of MySQL partitioned tables
The primary key or unique key must be included in the partition field
The partition field must be an integer type, or add a function that returns an integer
Meet the surrounding needs
adds additional table design for surrounding requirements.
O add special indexes for background statistics tasks
O add timestamps for data migration or statistical requirements
Statistical and background demand
The operation of statistics SQL is often very different from that online.
O using the characteristics of MySQL-- master and multi-slave, master and slave can build different indexes to divert statistics to a specific slave database.
O including batch queries for some special users, all queries with IO pressure on the line have to be read and written separately.
Automatically update timestamps
statistical requirements often require incremental data to be read off the line.
If no value is filled in when the first timestamp type field of the table is rewritten, it will be automatically written to the system timestamp
The first timestamp type field of the table is automatically updated each time the record is updated.
indexes the update_time field to export incremental data on a regular basis
Schema Design and forward-looking
based on historical experience and lessons to prevent and solve similar problems
records and analyzes and summarizes the reasons for all the Schema modifications that are difficult for DBA:
Taking business as an example, the encryption of user information has been greatly modified.
O A large number of changes have been made to the database results, including the addition of encrypted fields, validation policy tables, re-revision of data from all tables, etc.
O do all applications that use user information management go online and use ciphertext?
program bug mistakenly deletes data, which is risky online.
O transform business processes, no longer delete data, add is_deleted tag bits, and often add to various tables
O will future similar tables be marked with bits as soon as they are launched, with the modification reason field added?
The risk control was reformed in the later stage of payment application.
O transform the large table of online order, adding quota, terminal type and other fields
O when it comes to payment applications, do you remind the business whether it is necessary to consider risk control and keep relevant fields as soon as it is launched?
4.3-MySQL capacity evaluation
Performance capacity evaluation
analyzes online business scenarios
evaluates the performance metrics required by the database server
estimates server resources that may become bottlenecks
helps with database performance tuning
Database server hardware performance index
disk IO performance
memory capacity
CPU
network throughput
disk capacity
Database business characteristics keywords
OLTP/OLAP type
concurrent request
read-write ratio
data volume
hot and cold data ratio
Hierarchical storage of data
OLTP/OLAP
T = Transaction
is oriented to the majority of users, with high concurrency and short transaction operations.
Most of Internet applications belong to OLTP.
OLTP values server CPU, memory, write transactions or insufficient memory to rely on disk IO
A = Analytical
is usually oriented to insiders, and large-scale re-query
OLAP values the IO capabilities of disk scanning and relies in part on memory sorting
Concurrent requests-measure the busy level of online business
What is the number of concurrent visits per second to the database at the peak of business?
judges by the number of application servers and connection pool configuration
judges by estimating the size of initial online users and the growth rate of users.
judges by the type of actual business
concurrency related resources: CPU
Read-write ratio-describes how the application uses the database
Ratio of select read-only to update/delete/insert write operations for online services
delete/update usually reads before writing.
insert needs to distinguish between persistent insert and bulk imported data when data is written.
analysis based on the actual business scenario
multi-read scenario related resources: memory
multi-write scenario related resources: disk IO
Amount of data-Total
Upper limit of scalable capacity of storage devices for database servers
analyzes the data growth according to the estimated business volume, writing mode and analyzing the data growth.
estimates the total amount of data that can be stored in the database during a hardware upgrade cycle, leaving a good margin when going online.
total data related resources: disk capacity
Cold data and Hot data-Real-time Collection of useful data
hot data, the latest online data that will be accessed repeatedly in a certain period.
cold data, which is stored online and will not be used by online users recently.
estimates the number of active users and data growth to estimate the amount of thermal data.
memory size is as large as possible to store online real-time hot data.
hot data related resources: memory
Online data hierarchical storage-relieving online disk space pressure
The latest hot data of is kept in memory.
Earlier data that may also be accessed by are stored on the disk of the online library.
's earlier data that is not routinely accessed is periodically migrated to the history library.
distinguishes which data can be migrated with strong timeliness.
Server resource selection-list the options
Options for resource indicators
Disk IO performance single disk-> disk array; SATA-> SAS; HDD-> SSD
Smaller memory capacity-> larger memory
CPU normal-> multicore, hyperthreaded
Network throughput: gigabit-> 10-gigabit; single Nic-> multiplex
Disk capacity single disk-> disk array; single disk-> LVM
Case one, NetEase Yun music library database server evaluation
is used to store tens of millions of songs online.
determines that it belongs to OLTP online type database.
Number of concurrent requests for
O50 application servers with a maximum of 100 connections each
O there may be peak 5000qps, with a large number of concurrent requests.
High demand for CPU
read-write ratio
O the access mode is mainly based on the user listing the playlist and querying the song information when playing the song, and the user only has a read-only query.
O write data occurs in the background operation when inputting new songs or modifying song information, the writing proportion is small, and it is imported in batches.
O 100 read / write ratio 1
Total amount of data
O estimate the information of each song is 8K, totaling 50 million, with a total of 400G.
O the growth of total data is relatively slow
hot and cold data
About 40% of the 50 million songs may be accessed, and 10% are hot songs.
O Thermal data is about 40G
General requirements for disk IO capabilities
network traffic requirements, 8k*2500/1024 ≈ 20MB/S, general
Options for resource indicators
Disk IO performance two SAS to do RAID1
Memory capacity 96 GB memory
CPU 2c8core hyperthreading is equivalent to 32 cores.
Network throughput gigabit dual network card bunding
Disk capacity 900G
Case 2, NetEase financial management and sales database server evaluation
is used to store financial users' online orders.
determines that it belongs to OLTP online type database.
The business scenario has obvious characteristics.
O there are a large number of concurrent orders written in the sales time window for specific high-interest products.
O usually only a small number of order inquiries and requests, and lower regular product purchase requests
assessment should be based on meeting the most critical business peak
Number of concurrent requests for
O the duration of the second kill period is short, but the concurrency estimate of 30 application servers is about 2000tps.
read-write ratio
O Writing orders during peak hours is the main overhead operation.
CPU is demanding.
disk IO is very demanding
Total amount of data
O according to business analysis, the order is written in a large amount in an instant, with a small total amount and a high amount per transaction.
O the total volume is estimated to have a turnover of one million a year, with a relatively stable growth.
O determine that the data storage requirement is less than 200G
hot and cold data
O Peak write is dominant, and memory is required to store dirty data generated during the hot spot.
Hierarchical storage requirements for data
O the user order business agreement page shows the order for the last half a year. If you need to go to the history query page half a year ago, you will make a special query.
O so you can do hierarchical storage and migrate all orders from half a year ago to the history database.
General memory requirements for , > = 30g
General disk space requirements for , > = 200g
The demand for disk IO capacity is high.
network has higher requirements.
O High concurrent traffic
O High response speed is required
Options for resource indicators
Disk IO performance two SSD to do RAID1
Memory capacity 64GB memory
CPU 2c8core hyperthreading is equivalent to 32 cores.
Network throughput 10 Gigabit dual network card bunding
Disk capacity 600G
4.4-MySQL performance test
Why do you need performance testing
lacks psychological estimation of online products.
reproduces online anomalies
plans for future business growth
tests different hardware and software configurations
Classification of performance tests
Testing of device layer
Testing of Business layer
Testing of database tier
Testing of device layer
Indicators concerned by
O Server, disk performance
O disk bad block rate
O Server life
Business layer testing
tests for business
Database layer testing
Under what circumstances should do the MySQL test?
O Test different MySQL branch versions
O Test different versions of MySQL
O Test different MySQL parameter collocation
MySQL test classification
CPU Bound
IO Bound
Write test update test read-only test mixed mode
Commonly used testing tools
's open source MySQL performance testing tool
Osysbench
Otpcc-mysql
Omysqlslap
writes performance testing tools for business
Oblogbench
Performance test measurement index
service throughput (TPS, QPS)
service response time
service concurrency
Sysbench
A well-known performance testing tool in the industry
can test disk, CPU, database
supports multiple databases: Oracle, DB2, MySQL
needs to download, compile and install it itself.
recommended version: sysbench0.5
Compile and install Sysbench
download sysbench
Ogit clone https://github.com/akopytov/sysbench.git
compilation & installation
O./autogen.sh
O./configure
Omake & & make install
Sysbench process
Common practices in
Initialize data-> run Test-> Clean up data
Prepare syntax
Sysbench-test=parallel_prepare.lua-oltp_tables_count=1-rand-init=on-oltp-table-size=500000000-mysql-host=127.0.0.1-mysql-port=3306-mysql-user=sys-mysql-password=netease mysql-db=sbtest-max-requests=0 prepare
Parameter meaning
-- test=parallel_prepare.lua runs scripts that import data
How many tables are needed for the oltp_tables_count test
-- oltp-table-size the size of each table
-- mysql-host MySQL Host
-- mysql-port MySQL Port
-- mysql-db MySQL DB
-- mysql-user MySQL User
-- mysql-password MySQL Password
-- whether rand-init initializes data randomly
-- how many requests have been executed by max-requests and then stopped
Prepare executes the guided data
Sysbench table structure
Create table 'sbtest1' (
'id' int (10) unsigned not null AUTO_INCREMENT
'k 'int (10) unsigned not null DEFAULT' 0'
'c 'char (120) not null DEFAULT'
'pad' char (60) not null DEFAULT''
PRIMARY KEY ('id')
KEY 'kumb1' (' k')
) ENGINE=InnoDB AUTO_INCREMENT=3000000001 DEFAULT CHARSET=utf8 MAX_ROWS=1000000
Run syntax
Sysbench-test=oltp.lua-oltp_tables_count=1-num-threads=100-oltp-table-size=500000000-oltp-read-only=off-report-interval=10-rand-type=uniform-mysql-host=127.0.0.1-mysql-port=3306-mysql-user=sys-mysql-password=netease mysql-db=sbtest max-time=1000 max-requests=0 run
Parameter meaning
-- lua script that test=oltp.lua needs to run
How many tables are needed for the oltp_tables_count test
-- oltp-table-size the size of each table
-- the number of concurrent threads in num-threads testing
Whether oltp-read-only is a read-only test
-- report-interval result output interval
-- rand-type data distribution pattern, hot spot data or random data
-- max-time maximum elapsed time
-- how many requests have been executed by max-requests and then stopped
Prepare starts testing
Special circumstances
write test
Write data for testing-> Clean data
Cleanup
manual drop table drop and database
uses the cleanup command provided by sysbench
Sysbench-test=parallel_prepare.lua-oltp_tables_count=1-rand-init=on-oltp-table-size=500000000-mysql-host=127.0.0.1-mysql-port=3306-mysql-user=sys-mysql-password=netease mysql-db=sbtest-max-requests=0 cleanup
Tpcc-mysql
TPC-C is a specification specifically for online transaction processing systems (OLTP systems)
Tpcc-mysql is implemented by percona according to the specification
Download Tpcc-mysql
Obzr branch lp:~percona-dev/perconatools/tpcc-mysql
Compilation and installation
To use Tpcc-mysql
Create table structure and index-> Import data-> run Test-> data cleanup
Create a table structure
create_table.sql
add_fkey_idx.sql
Tpcc-load
Tpcc_load [server] [DB] [user] [pass] [warehouse]
Function meaning
Server database IP
DB DB name
User user name
Pass password
Number of warehouse warehouses
Tpcc-start
Tpcc_start-h server_host-P port- d database_name-u mysql_user-p mysql_password-w warehouse-c connections-r warmup_time-I running_time-I report-interval-f report-file
Function meaning
Number of warehouse warehouses
Number of connections concurrent threads
Warmup_time warm-up time
Running_time run time
Report_interval output interval
Report_file output file
Summary
The amount of IO Bound test data is much larger than memory, and the amount of CPU Bound test data is less than memory.
The test time is recommended to be more than 60 minutes to reduce the error.
Sysbench is more inclined to test MySQL performance and TPCC is closer to business.
When runs the test program, it needs to monitor the machine load and MySQL monitoring indicators at the same time.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.