MySQL database design specification 04/16 Update SLTechnology News&Howtos

MySQL database design specification

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Catalogue

Background and purpose of standardization

Design specification

2.1 Database Design

2.1.1 Library name

2.1.2 Table structure

2.1.3 column data type optimization

2.1.4 Index design

2.1.5 Sub-database sub-table and partition table

2.1.6 character set

2.1.7 suggestions for designing the DAO layer of the program

2.1.8 an example of a canonical table-building statement

2.2 written by SQL

2.2.1 DML statement

2.2.2 Multi-table join

2.2.3 transaction

2.2.4 sorting and grouping

2.2.5 SQL statements that are prohibited online

1. Background and purpose of standardization

Compared with Oracle, SQL Server and other databases, MySQL database has its advantages and disadvantages in kernel. When using MySQL database, we need to follow certain norms to enhance our strengths and circumvent our weaknesses. The purpose of this specification is to help or guide RD, QA, OP and other technicians to make database design suitable for online business. Standardize the database change and processing process, database table design, SQL compilation and other aspects, so as to provide a guarantee for the stable and healthy operation of the company's business system.

two。 Design specification

2.1 Database Design

All of the following specifications will be marked according to the three levels of [high risk], [mandatory] and [recommended], with compliance priority from high to low.

For designs that do not meet the [high risk] and [mandatory] levels, DBA will force callback requirements to be modified.

2.1.1 Library name

[mandatory] the name of the library must be controlled within 32 characters, and the join relationship between the table name and the table name of the relevant module should be extracted as much as possible, such as user table and user_ log table. [mandatory] the name format of the library: business system name _ subsystem name. The table name used by the same module uses a uniform prefix as far as possible. [mandatory] the general naming format of sub-library name is library common name _ number, and the number increases from 0. For example, the name format of wenda_001 dividing library by time is "library common name _ time" [mandatory] when you create a database, you must explicitly specify the character set, and the character set can only be utf8 or utf8mb4. Create a database SQL for example: create database db1 default character set utf8;.

2.1.2 Table structure

[mandatory] the names of tables and columns must be limited to 32 characters, and table names can only use letters, numbers and underscores, all in lowercase. [mandatory] the list name requires the module name to be strongly related, such as the teacher system uses "sz" as the prefix, and the channel system uses "qd" as the prefix. [force] when creating a table, you must explicitly specify the character set as utf8 or utf8mb4. [mandatory] when creating a table, you must explicitly specify the type of table storage engine. If there is no special requirement, it will always be InnoDB. When a storage engine other than InnoDB/MyISAM/Memory is required, it must be audited by DBA before it can be used in a production environment. Because Innodb table supports important features of relational database, such as transaction, row lock, downtime recovery, MVCC and so on, it is the most widely used MySQL storage engine in the industry. This is not available in most other storage engines, so InnoDB is the first choice. [mandatory] A table must have a comment [recommendation] about the primary key when creating the table: (1) the primary key is required to be id, the type is int or bigint, and the field that identifies the body of each row in the table for auto_increment (2) should not be set as the primary key. It is recommended to set it to other fields such as user_id,order_id, etc., and establish a unique key index (see cdb.teacher table design). Because if the primary key is set as the primary key and the primary key value is randomly inserted, it will result in internal page splitting and a large number of random page O, resulting in performance degradation. [recommended] Core tables (such as user tables, money-related tables) must have row data creation time field create_time and last update time field update_time to facilitate problem checking. [recommended] all fields in the table must be NOT NULL attributes, and business can define default values as needed. Because of the use of null values, there are some problems, such as extra storage space for each row, error prone to data migration, deviation in the calculation results of aggregate functions, and so on. [recommended] it is recommended to split the blob, text and other large fields in the table vertically into other tables, and go to select only when you need to read these objects. [suggestion] Anti-paradigm design: make a redundant copy of the fields that often need join queries in other tables. For example, the user_name attribute is redundant in the user_account,user_login_log and other tables to reduce join queries. [force] the intermediate table is used to retain the intermediate result set and the name must start with tmp_. Backup tables are used to back up or take snapshots of source tables, and the name must start with bak_. Intermediate tables and backup tables are cleaned regularly. [mandatory] alter table for large tables with more than 100W rows must be audited by DBA and executed during the business trough. Because alter table generates table locks, blocking all writes to the table during the period can have a significant impact on the business.

2.1.3 column data type optimization

[recommended] self-increment column (autoincrement attribute) in the table, it is recommended to use the bigint type. Because the unsigned int storage range is-2147483648mm 2147483647 (about 2.1 billion), the overflow will cause an error. [recommended] for fields with few choices in business, such as status status and type type, it is recommended to use tinytint or smallint type to save storage space. [recommended] int type is recommended for IP address field in business, but char (15) is not recommended. Because int occupies only 4 bytes, it can be converted to each other with the following functions, while char (15) occupies at least 15 bytes. Once the number of rows in the table reaches 100 million, use an extra 1.1 gigabytes of storage. SQL:select inetaton ('192.168.2.12'); select inetntoa (3232236044); PHP: ip2long ('192.168.2.12'); long2ip (3530427185); [recommended] enum,set is not recommended. Because they waste space, and the enumeration values are written dead, it is not convenient to change. Tinyint or smallint is recommended. [recommended] blob,text and other types are not recommended. They are all a waste of hard disk and memory space. When loading table data, large fields will be read into memory, thus wasting memory space and affecting system performance. It is recommended to communicate with PM and RD to see if such a large field is really needed. In Innodb, when a row of records exceeds 8098 bytes, the longest field of the record is placed in the original page, and the rest of the field is placed in the overflow-page. Unfortunately, in the compact line format, both the original page and overflow-page are loaded. [suggestion] for fields that store money, it is recommended to use int, program side multiplied by 100 and divided by 100 for access. Because int takes up 4 bytes, while double takes 8 bytes, space is wasted. [recommended] text data should be stored in varchar as far as possible. Because varchar is variable-length storage, it saves more space than char. The MySQL server layer specifies a maximum of 65535 bytes for all text on a line, so a maximum of 21844 characters are stored under the utf8 character set, which is automatically converted to a mediumtext field. Text stores up to 21844 characters under the utf8 character set, mediumtext stores up to 2 ^ 24 / 3 characters, and longtext stores up to 2 ^ 32 characters. It is generally recommended to use the varchar type, with no more than 2700 characters. [recommended] choose timestamp as far as possible for time type. Because datetime occupies 8 bytes, timestamp consumes only 4 bytes, but the range is from 1970-01-01 00:00:01 to 2038-01-01 00:00:00. In a more advanced approach, int is used to store time, and SQL functions unixtimestamp () and fromunixtime () are used for conversion.

Detailed storage size is shown in the following figure: *

2.1.4 Index design

[mandatory] the primary key of the InnoDB table must be id int/bigint auto_increment, and the primary key value must not be updated. [recommended] the name of the primary key begins with "contention", the unique key begins with "uk" or "uq", and the ordinary index starts with "idx", all in lowercase format with the name or abbreviation of the table name / field as the suffix. [force] InnoDB and MyISAM store engine tables, and the index type must be BTREE;MEMORY table. You can select HASH or BTREE type indexes as needed. [force] the length of each index record in a single index cannot exceed 64KB. [recommended] the number of indexes on a single table cannot exceed 7. [suggestion] when building an index, more consideration should be given to establishing a joint index, and the fields with the highest degree of differentiation should be put first. For example, the discrimination of the column userid can be calculated by select count (distinct userid). [suggestion] in the SQL of multi-table join, ensure that there is an index on the join column of the driven table, so that join is the most efficient. [suggestion] when building a table or indexing, make sure that there are no redundant indexes in the table. For MySQL, if key (a) already exists in the table, then key (a) is a redundant index and needs to be deleted.

2.1.5 Sub-database sub-table and partition table

[force] the partition field (partition-key) of a partitioned table must have an index, or the first column of a combined index. [mandatory] the number of partitions (including subpartitions) in a single partition table cannot exceed 1024. [mandatory] before launching, RD or DBA must specify the creation and cleaning policy of the partition table. [mandatory] the SQL that accesses the partition table must contain a partition key. [recommended] A single partition file should not exceed 2G, and the total size should not exceed 50G. It is recommended that the total number of zones not exceed 20. [force] alter table operations on partitioned tables must be performed during the business trough. [mandatory] if a split-database policy is adopted, the number of databases cannot exceed 1024. [mandatory] if a split-table policy is adopted, the number of tables cannot exceed 4096 [it is recommended that a single sub-table should not exceed 500W rows, and the ibd file size should not exceed 2G. Only in this way can the performance of data distribution be better. [suggestion] horizontal sub-table should be modeled as much as possible, and date should be used for log and report data.

2.1.6 character set

[mandatory] all character sets of database native library, table and column must be consistent, which is utf8 or utf8mb4. [force] the character set of the front-end program or environment variables must be consistent with the character set of the database and table, and be unified as utf8.

2.1.7 suggestions for the design of program layer DAO

[recommended] do not use model for new code. It is recommended to manually spell SQL+ binding variables to pass parameters. Because although model can operate db in an object-oriented way, its improper use can easily cause the generated SQL to be very complex, and the cast performance of the model layer is poor, which eventually leads to the decline of database performance. [recommended] when the front-end program connects to MySQL or redis, there must be a connection timeout and failure reconnection mechanism, and there must be an interval between failed retries. [suggestion] the error report of the front-end program can prompt the original error message of MySQL or redis as far as possible, so as to facilitate the troubleshooting of errors. [recommended] for front-end programs with connection pooling, initial, minimum and maximum connections, timeout and connection recovery mechanism must be configured according to business needs, otherwise database connection resources will be exhausted and online accidents will occur. [recommended] for tables of log or history type, it is easy to become larger and larger over time, so RD or DBA must establish a table data cleaning or archiving plan before launch. [suggestion] in the application design phase, RD must consider and avoid the impact of master-slave delay in the database on the business. Try to avoid the impact on the business caused by the short-term delay of the slave database (less than 20 seconds). It is recommended to force the consistent read to open the transaction to go to the master database, or to read the slave database after a period of update. [recommended] when multiple concurrent business logic accesses the same piece of data (innodb table), row locks or even table locks will occur on the database side, resulting in a decrease in concurrency. Therefore, it is recommended that the update class SQL should be updated based on the primary key. [recommended] the locking order between business logic should be kept consistent as far as possible, otherwise it will lead to deadlock. [recommended] for rows or columns with a single table read-to-write ratio greater than 10:1, hot spot data can be cached (such as mecache or redis) to speed up access and reduce MySQL pressure.

2.1.8 an example of a canonical table-building statement

A more standardized statement for building a table is:

CREATE TABLE user (`id` bigint (11) NOT NULL AUTO_INCREMENT, `user_ id` bigint (11) NOT NULL COMMENT 'user id' `username` varchar (45) NOT NULL COMMENT' real name', `email` varchar (30) NOT NULL COMMENT 'user email', `nickname` varchar (45) NOT NULL COMMENT 'nickname', `avatar` int (11) NOT NULL COMMENT 'avatar`, `birthday day`date NOT NULL COMMENT', `sex` tinyint (4) DEFAULT'0' COMMENT 'gender' Introduce yourself in the sentence `short_ introduce`varchar (150) DEFAULT NULL COMMENT' Up to 50 Chinese characters', 'user_ resume` varchar (300) NOT NULL COMMENT' address of resume submitted by user', 'user_register_ ip` timestamp NOT NULL COMMENT' source ip', 'create_ time` timestamp NOT NULL COMMENT' when the user registered', 'update_ time`time` timestamp NOT NULL COMMENT', `user_review_ status` tinyint NOT NULL COMMENT', 1: passed, 2: under review, 3: failed 4: audit has not been submitted yet', PRIMARY KEY (`id`), UNIQUE KEY `idx_user_ id` (`user_ id`), KEY `idx_ username` (`username`), KEY `idx_create_ time` (`create_ time`, `user_review_ status`) basic information of ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' website users' 2.2 written by SQL

2.2.1 DML statement

[mandatory] the SELECT statement must specify a specific field name, which is prohibited. Because select will also read the data that should not be read from the MySQL, causing pressure on the network card. And once the table field is updated, but the model layer does not have time to update, the system will report an error. [force] insert statement to specify specific field names, do not write as insert into T1 values (…) The reason is the same as above. [suggestion] insert into... Values (XX), (XX), (XX)... The value of XX here should not exceed 5000. Too much value can be launched quickly, but it will cause master-slave synchronization delay. [recommended] do not use UNION for SELECT statements, but UNION ALL is recommended, and the number of UNION clauses is limited to 5. Because union all does not need to be duplicated, it saves database resources and improves performance. [recommended] in value list is limited to 500. For example, select... Where userid in (… Within 500.) In order to reduce the underlying scanning and reduce the pressure on the database to speed up the query [recommended] to update data in batches in a transaction, you need to control the quantity and perform the necessary sleep to achieve a small number of times. [mandatory] the tables involved in the transaction must all be innodb tables. Otherwise, if it fails, it will not all be rolled back, and it is easy to cause the master-slave library to synchronize the terminal. [force] writes and transactions are sent to the master database, and read-only SQL is sent to the slave library. [force] except for static tables or small tables (less than 100 rows), DML statements must have where conditions and use index lookups. [mandatory] the use of hint, such as sql_no_cache,force index,ignore key,straight join, is prohibited in the production environment. Because hint is used to force SQL to execute according to a certain execution plan, but as the amount of data changes, we cannot guarantee that our predictions are correct, so we have to trust the MySQL optimizer! [mandatory] the field type around the equal sign in the where condition must be the same, otherwise the index cannot be utilized. [recommended] SELECT | UPDATE | DELETE | REPLACE must have a WHERE clause, and the condition of the WHERE clause must be looked up by index. [mandatory] full table scans on large tables are strongly not recommended in production databases, but full table scans are allowed for static tables with less than 100 rows. The amount of query data should not exceed 25% of the number of table rows, otherwise the index will not be used. [mandatory] it is forbidden to use only fully fuzzy LIKE conditions for lookups in the WHERE clause. There must be other equivalent or range query conditions, otherwise the index cannot be utilized. [recommended] do not use functions or expressions for index columns, otherwise you cannot take advantage of indexes. For example, where length (name) = 'Admin' or where user_id+2=10023. [recommended] reduce the use of the or statement, optimize the or statement to union, and then index on each where condition. For example, where aversion 1 or baggage 2 is optimized to where aura 1. Union... Where bread2, key (a), key (b). [recommended] pagination query. When the starting point of limit is high, filter conditions can be used to filter first. For example, select a from c from T1 limit 10000 select 20; optimized as follows: select a recital b journal c from t 1 where id > 10000 limit 20;

2.2.2 Multi-table join

[force] prohibit join statements across db. Because this can reduce the coupling between modules and lay a solid foundation for database splitting. [force] prohibit the use of join in business update-like SQL statements, such as update T1 join T2. [recommendation] it is not recommended to use subquery. It is recommended to disassemble subquery SQL and combine program to query multiple times, or to use join instead of subquery. [recommended] in online environment, the join of multiple tables should not exceed 3 tables. It is recommended to use aliases for multi-table join queries, and aliases should be used to refer to fields and databases in the SELECT list. Table format, such as select a from db1.table1 alias1 where... [suggestion] in multi-table join, try to select the table with small result set as the driver table to join other tables.

2.2.3 transaction

[recommended] the number of rows operated by the INSERT | UPDATE | DELETE | REPLACE statement in the transaction is limited to 2000, and the number of passed parameters of the IN list in the WHERE clause is limited to 500. [recommended] when operating data in batches, you need to control the transaction interval and perform the necessary sleep. The recommended value is generally 5-10 seconds. [recommended] for inserting tables with auto_increment attribute fields, the concurrency should be controlled within 200. [mandatory] programming must consider the impact of the "database transaction isolation level", including dirty reading, non-repeatable reading, and phantom reading. It is recommended online that the transaction isolation level is repeatable-read. [recommended] there are no more than 5 SQL in the transaction (except payment business). Because too long transactions will lead to avalanche problems such as locking data for a long time, MySQL internal cache, excessive connection consumption and so on. [suggestion] update statements in transactions should be based on primary keys or unique key, such as update. Otherwise, where id=XX; will produce a gap lock, which expands the locking range internally, resulting in a decline in system performance and a deadlock. [suggestion] try to remove some typical external calls from the transaction, such as calling webservice, accessing file storage, etc., so as to avoid too long transactions. [recommended] for select statements that are strictly sensitive to the master / slave delay of MySQL, enable transactions to force access to the master library.

2.2.4 sorting and grouping

* [recommendation] reduce the use of order by, communicate with business without sorting, or put sorting to the program side to do. The statements such as order by, group by and distinct consume more CPU, and the CPU resources of the database are extremely valuable. [recommended] SQL such as order by, group by and distinct try their best to use the index to directly retrieve the sorted data. For example, where axiom 1 order by can make use of key (afield b). [recommended] contains statements for queries such as order by, group by and distinct. The result set filtered by where condition should be kept within 1000 rows, otherwise SQL will be slow.

2.2.5 SQL statements that are prohibited online

[high risk] disable update | delete T1 … Where a=XX limit XX; is an update statement with limit. Because it will lead to the inconsistency of the principal, leading to data confusion. It is recommended to add order by competition. [high risk] prohibit the use of related subqueries, such as update T1 set. Where name in (select name from user where …) ; the efficiency is extremely low. [force] disable procedure, function, trigger, views, event, foreign key constraints. Because they consume database resources and reduce the scalability of database instances. Recommendations are implemented on the program side. [force] disable insert into... On duplicate key update... In a high concurrency environment, it will cause the principal to be inconsistent. [force] forbids linked table update statements, such as update T1 and T2 where t1.id=t2.id.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.