Those things about the database (all practical information) 07/13 Update SLTechnology News&Howtos

Those things about the database (all practical information)

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

When it comes to databases, our first idea is how to optimize and how to make query operations faster. I think the best way is to be as thoughtful as possible from the beginning of database design. If unfortunately it is an old project, we have to start with optimization. Next, I will talk about some of my understanding and experience from design and optimization.

1: database design

A good database design scheme will often get twice the result with half the effort for the performance of the database. The design of database includes the design of database architecture and business tables.

1) Database architecture

Different architectures are designed according to different amounts of data and visits. What suits you is the best.

Single instance: both data reads and writes are a database instance. (the backup instance is not counted). This applies to small enterprise internal systems. The disadvantage is that it is only suitable for scenarios with a small amount of data, and the advantage is that it can achieve strong data consistency.

Vertical split, multiple instances. Different businesses take different examples. It is also suitable for a single business, the amount of data is small, and each business is relatively independent and does not produce correlation.

Read-write separation, master-slave architecture. Through the master-slave structure, the master library is resistant to write pressure, and the slave library is used to share the read pressure. It is suitable for applications that write less and read more, and the real-time requirement of data consistency is not high.

Master-slave, cluster structure. It is suitable for applications with more writing and more reading. The implementation is the most complex one, which needs to consider how to route the data, and the later expansion is also relatively troublesome. It is necessary to choose the appropriate routing strategy on the basis of understanding the service at the initial stage of the design. Such as range,hash, etc.

Cloud database: Aliyun rds, etc. Suitable for companies without professional database operation and maintenance team, it is very convenient and easy to use.

2) the design of business table

Common business table categories:

(1) configuration table

This kind of table usually stores some basic configuration or dictionary information of the business. The amount of data in the table is generally small, and the operation of modification and change is not very frequent, usually the Select query operation.

(2) status table

This kind of table usually stores the status information of the entity reading image in the business system, such as user information table, order information table and so on. The amount of data in this table is directly related to the size of the physical image, such as how many registered users an APP has and how many records there are usually in the APP user table. The status table usually changes frequently, and there are Insert, Update and Select operations. Whether there are Delete operations or not is usually decided according to the business situation.

(3) Log table

This kind of table is usually used to record the status information of some entity in the business system, such as user login table, recharge information record table and so on. The data scale of this kind of table is usually large, and if the business state changes frequently and records more change information, this kind of table requires high data volume and insertion performance. Log table operations are usually based on Insert operations, and individual businesses will query log tables. The high-performance writing architecture among the five special requirements architectures of MySQL is mainly the need to apply such tables.

(4) Archive table

This kind of table is a table that archives the data of the above three OLTP business tables or separates them from hot and cold. Data archiving and hot and cold separation of online business tables, on the one hand, can control the data scale of online business tables and ensure the performance of business tables; on the other hand, after archiving, it can be used for better query reflection and support of archived historical data. The amount of data in the archived table is related to the corresponding online table size and archiving cycle. The operation of archiving table, in addition to the data loading of the archiving process, is mainly Select query operation, even if the archived table is read-only.

(5) Statistical data sheet

Statistical data table refers to the original data table which needs to transfer the data of various online tables and archived tables to the online OLAP statistical analysis system through the ETL process when the business has offline statistical analysis needs. This kind of table usually has a very large amount of data, and an OLAP statistical analysis platform will collect the data of multiple online business systems for statistical analysis. In addition to the data transfer action, the operation of the statistical data table is mainly the access calculation of all kinds of statistical analysis programs.

(6) Table of statistical results

The statistical result table is the result data after all kinds of statistical analysis process accesses the data in the statistical data table and carries on the statistical analysis according to certain logic when the business has offline statistical analysis needs. This kind of statistical data, usually the amount of data will be relatively small. The operation of the statistical result table, in addition to dealing with the result transfer action, is mainly for the access interface for Select query.

Sort out the types of business tables, you can make a general division of all business systems, so that you have a clear idea.

Here are some experiences in designing tables:

Decompose a table with a large number of fields into multiple tables: for a table with more fields, if some fields are not used frequently, you can separate these fields, because when the amount of data in a table is large, it will slow down due to the existence of fields with low frequency of use.

Add intermediate tables: for tables that require frequent federated queries, intermediate tables can be established to improve query efficiency.

Add redundant fields: reasonable addition of redundant fields can improve the query speed. Redundant fields can cause some problems, for example, if the values of redundant fields are modified in a table, the associated table should be synchronized, otherwise it will lead to data inconsistency. According to the actual situation, we should balance the performance of the database and design the redundant fields. )

All fields are defined as NOT NULL unless you really want to store null.

Estimate the amount of data in advance and design the sub-table. Do not wait until the need to split, generally control the amount of data in the table to the level of 10 million. When the amount of data in a single table reaches a certain extent (the performance inflection point in the MySQL5.x era is the 1KW-2KW row level, which needs to be tested according to the actual situation), in order to improve the performance, the most commonly used method is to split the table. The strategy of splitting tables can be vertical split (for example, split orders with different order states into different tables) or horizontal split (for example, split orders into different tables on a monthly basis). If you divide the table at the business layer, the logic will become complex and decentralized. The middleware of sub-table can be introduced to shield the details of sub-table, so that the business layer can query the data after sub-table like a single table. Like Mycat. (for tables with a small number of visits but a lot of table data, we can adopt partitioned tables, which is relatively simple to implement.)

Select a unified character set. MySQL uses a "ladder" approach to set character set defaults, each database, each table has its own default values, they are inherited layer by layer, and eventually the lowest default settings will affect the objects you create. The conversion between different character sets and proofreading rules may bring additional system overhead and affect the performance of the database.

Set the primary key and index reasonably.

The primary key is divided into self-increasing primary key and business primary key.

Self-increasing primary key: write, query efficiency and disk utilization are all high, but two-level indexes are required for each query, because online businesses do not have queries that directly use primary key columns.

Business primary key: write, query efficiency and disk utilization are all low, but the first-level index can be used, depending on the overlay index, and in some cases, the query can be completed once on the non-primary key index.

In general, business primary keys are used.

Indexes are roughly divided into three categories.

Primary key index: InnoDB automatically creates an index on the primary key of the table, and the data structure uses B+Tree.

Non-primary key index: the index on a non-primary key column is a secondary index (because a query needs to find two index trees)

Federated index: a federated index is also called a multi-column index. The key of the index structure contains multiple fields. The first column is compared when sorting, and then the second column is compared if the same, and so on.

2: database optimization (mysql)

When it comes to mysql optimization, be sure to understand the principles of mysql, so that you can have an in-depth understanding of those sql rules. The following figure shows the logical architecture diagram of MySQL.

The MySQL client / server communication protocol is "half-duplex": either the server sends data to the client or the client sends data to the server at any time, both of which cannot occur at the same time. Once one end starts to send a message, the other end has to receive the whole message before it can respond to it, so we cannot and do not need to cut a message into small pieces to send independently, and there is no way to control the flow. The client sends the query request to the server in a separate packet, so when the query statement is very long, you need to set the max_allowed_packet parameter. When the server responds to the client request, the client must receive the entire return result completely, instead of simply taking the first few results and asking the server to stop sending. Therefore, in the actual development, it is a very good habit to keep the query simple and return only the necessary data, and to reduce the size and number of data packets between communications, which is one of the reasons to avoid using SELECT * and add LIMIT restrictions in the query.

1) Optimization of sql. Analyze the implementation plan through explain and show profiles to find out the problems and optimize them. Among them, creating efficient index is one of the most effective means.

Multi-column index and index order:

When multiple indexes intersect (multiple AND conditions), generally speaking, one index containing all related columns is better than multiple independent indexes.

Indexing on highly selective fields allows MySQL to filter out more rows when querying. For multi-column indexes, which index field comes first depends on the selectivity of the index. The index with high selectivity is in the front, which is helpful to improve the query efficiency. For example, the group of user_group_id,trade_amount users must be more selective than the transaction amount of the order.

Override index: if an index contains or overrides the values of all the fields that need to be queried, then there is no need to return to the table query, which is called an override index. Overriding an index is a very useful tool that can greatly improve performance, because queries only need to scan the index for many benefits:

Optimize associated queries: use small tables to drive large tables.

Try to change the subquery to join. This is because join,MySQL does not need to create temporary tables in memory to accomplish this logical requirement.

Make sure that any expressions in GROUP BY and ORDER BY refer to only the columns in a table, so that MySQL can use indexes to optimize.

Optimize LIMIT pagination: a common problem is when the offset is very large, such as LIMIT 10000 20 query, MySQL needs to query 10020 records and return only 20 records, the first 10000 records will be discarded, this cost is very high. One of the easiest ways to optimize this query is to use overlay index scans as much as possible, rather than querying all columns. Then do an association query as needed and return all the columns. When the offset is large, the efficiency of doing so will be greatly improved. Consider the following query:

Before revision: SELECT film_id,description FROM film ORDER BY title LIMIT 50pr 5

After modification:

SELECT film.film_id,film.description

FROM film INNER JOIN (

SELECT film_id FROM film ORDER BY title LIMIT 50,5

) AS tmp USING (film_id)

Optimize UNION: unless you really need the server to deduplicate, be sure to use UNION ALL. If there is no ALL keyword, MySQL will add the DISTINCT option to the temporary table, which will cause the data of the entire temporary table to be checked for uniqueness, which is very expensive.

Avoid writing that leads to index failure

(1) negative conditional queries cannot use indexes (not in/not exists is not a good habit)

(2) leading fuzzy queries cannot use indexes (like'XX%')

(3) indexing is not suitable for fields with low data differentiation.

(4) calculation on attributes cannot hit the index.

(5) the leftmost prefix of the composite index is not satisfied.

Cast will scan the whole table.

Limit 1 can improve efficiency if you clearly know that only one result is returned.

2) set some parameters of mysql reasonably to achieve the most efficient.

Thread_pool_size: if the main engine (primary storage engine) is InnoDB,thread_pool_size, the best setting may be between 16 and 36, and the most common optimization value tends to be 24 to 36.

Thread_pool_stall_limit: make sure that the server is not completely blocked by handling blocked and long-running statements. Setting too long can cause threads to block and cause performance problems.

Tmp_table_size: increase the size of a temporary table by setting the tmp_table_size option, such as the temporary table generated by order by and GROUP BY operations. If you increase this value, MySQL will also increase the size of the heap table, which can improve the speed of the join query. It is recommended to optimize the query as far as possible to ensure that the temporary table generated during the query process is in memory, so as to avoid the generation of MyISAM table based on hard disk due to the temporary table being too large.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.