Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Summarize the knowledge points of the database

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "summing up the knowledge points of the database". Friends who are interested may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "summarize the knowledge points of the database"!

Storage engine

InnoDB

InnoDB is the default transactional storage engine for MySQL, and consider using other storage engines only when you need features that it does not support.

InnoDB uses MVCC to support high concurrency and implements four standard isolation levels (uncommitted read, committed read, repeatable read, serializable). Its default level is REPEATABLE READ, and at the repeatable level, phantom reading is prevented through MVCC + Next-Key Locking.

When the primary index is clustered, the data is saved in the index, so as to avoid reading the disk directly, so it has high performance for primary key query.

Many optimizations have been made within InnoDB, including predictable reads from disk, adaptive hash indexes that automatically create hash indexes in memory to speed up read operations, and insert buffers that speed up insert operations.

InnoDB supports true online hot backup, while other MySQL storage engines do not support online hot backup. To get a consistent view, you need to stop writing to all tables, and in a read-write mixed scenario, stopping writing may also mean stopping reading.

MyISAM

The design is simple and the data is stored in a compact format. It can still be used for read-only data, or for tables that are small enough to tolerate repair operations.

Provides a large number of features, including compressed tables, spatial data indexes, and so on.

Transactions are not supported.

Row-level locks are not supported and can only be locked on the entire table, with shared locks on all tables that need to be read and exclusive locks on tables when writing. However, when the table has read operations, it is also possible to insert new records into the table, which is called CONCURRENT INSERT.

Check and repair operations can be performed manually or automatically, but unlike transaction recovery and crash recovery, some data may be lost, and the repair operation is very slow.

If the DELAY_KEY_WRITE option is specified, the modified index data is not immediately written to disk each time the modification is completed, but to the key buffer in memory, and the corresponding index block is written to disk only when the key buffer is cleaned or the table is closed. This approach can greatly improve write performance, but when the database or host crashes, it will cause index corruption and need to be repaired.

Comparison between InnoDB and MyISAM

Transactions: InnoDB is transactional and can use Commit and Rollback statements.

Concurrency: MyISAM only supports table-level locks, while InnoDB also supports row-level locks.

Foreign keys: InnoDB supports foreign keys.

Backup: InnoDB supports online hot backup.

Crash recovery: MyISAM is much more likely to be damaged after a crash than InnoDB, and it recovers more slowly.

Other features: MyISAM supports compressed tables and spatial data indexes.

Indexes

B + Tree principle

Data structure

B Tree refers to Balance Tree, that is, a balanced tree, which is a search tree and all leaf nodes are on the same layer.

B + Tree is a variant of B-tree, which is implemented based on B Tree and leaf node sequential access pointers. It is usually used in database and operating system file systems.

B+ tree has two types of nodes: internal node (also known as index node) and leaf node, the internal node is non-leaf node, the internal node does not store data, only stores the index, the data has leaf node.

The key in the internal node is arranged in the order from small to large. For a key in the internal node, all the key in the left subtree is less than it, the key in the right subtree is greater than or equal to it, and the records of leaf nodes are also arranged from small to large.

Each leaf node has a pointer to the adjacent leaf node.

Operation

Find

The search is done in a typical way, similar to a binary search tree. Starting from the root node, traverse the tree from the top down, selecting the subpointer whose split value is on either side of the value you want to find. A typical use within a node is a binary lookup to determine this location.

insert

Perform a search to determine what bucket the new record should go into.

If the bucket is not full (a most b-1 entries after the insertion,b is the number of elements in a node, usually an integral multiple of the page), add tht record.

Otherwise,before inserting the new record

Original node has "(Lang 1) / 2" items

New node has "(Lang 1) / 2" items

Split the bucket.

Move "(Lang 1) / 2"-th key to the parent,and insert the new node to the parent.

Repeat until a parent is found that need not split.

If the root splits,treat it as if it has an empty parent ans split as outline above.

B-trees grow as the root and not at the leaves.

Delete

Similar to insertion, it is only a bottom-up merge operation.

Common characteristics of trees

AVL tree

Balance binary tree, generally determined by the balance factor difference and achieved through rotation, the height difference between the left and right sub-trees is not more than 1, then compared with the red-black tree it is a strict balance binary tree, the balance condition is very strict (tree height difference is only 1), as long as insertion or deletion does not meet the above conditions to maintain balance through rotation. Because rotation is very time-consuming. Therefore, the AVL tree is suitable for scenarios with fewer inserts / deletions but more lookups.

Red and black tree

By constraining the color of each node on the path from the root node to the leaf node, ensure that no path is twice as long as the others, so it is approximately balanced. Therefore, compared with the AVL tree which requires strict balance, its rotation is balanced for fewer times. Suitable for scenarios with less search and more insertions / deletions. (now some scenarios use jump tables instead of red-black trees, and search for "Why does redis use skiplist instead of red-black?" )

Bax B + tree

Multi-way search tree, high output, low disk IO, generally used in database systems.

Comparison between B + tree and red-black tree

Balanced trees such as red and black trees can also be used to implement indexes, but B + Tree is commonly used as the index structure in file systems and database systems for the following two reasons:

(1) number of disk IO

The B+ tree can store multiple elements per node, which has a lower tree height and fewer disk IO times than the red-black tree.

(2) disk pre-reading characteristics

In order to reduce disk Icano operations, disks are often not strictly read on demand, but are pre-read every time. In the process of pre-reading, the disk is read sequentially, and sequential reading does not require disk seek. An integer multiple of the page is read each time.

The operating system generally divides memory and disk into fixed-size blocks, each called a page, and the memory and disk exchange data on a page-by-page basis. The database system sets the size of one node of the index to the size of the page, so that one node can be fully loaded at a time.

Comparison between B + Tree and B Tree

The disk IO of the B+ tree is lower

The internal node of the B+ tree does not have a pointer to keyword details. Therefore, its internal node is smaller than the B-tree. If all the keywords of the same internal node are stored in the same disk, the more keywords the disk can hold. The more keywords you need to find when you read them into memory at once. Relatively speaking, the number of IO reads and writes has been reduced.

The query efficiency of B+ tree is more stable.

Because the non-leaf node is not the node that ultimately points to the content of the file, but only the index of the keyword in the leaf node. Therefore, any keyword search must take a road from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

High efficiency of element traversal in B + tree

B-tree does not solve the problem of inefficient element traversal while improving the performance of disk IO. It is to solve this problem that B+ tree arises at the historic moment. The B+ tree can traverse the whole tree as long as it traverses the leaf nodes. And scope-based queries are very frequent in the database, and B-trees do not support such operations (or are too inefficient).

MySQL index

Indexing is implemented at the storage engine layer, not at the server layer, so different storage engines have different index types and implementations.

B+ Tree index

Is the default index type for most MySQL storage engines.

Because you no longer need to do a full table scan, you only need to search the tree, so the search speed is much faster.

Because of the ordered nature of B + Tree, it can be used for sorting and grouping in addition to lookup.

You can specify multiple columns as index columns, and multiple index columns make up the key.

It is suitable for full key value, key value range and key prefix lookup, where key prefix lookup is only applicable to leftmost prefix lookup. The index cannot be used if the lookup is not in the order of the index columns.

The B+Tree index of InnoDB is divided into primary index and secondary index. The leaf node data domain of the main index records the complete data record, which is called clustered index. Because data rows cannot be stored in two different places, a table can have only one clustered index.

The data domain of the leaf node of the secondary index records the value of the primary key, so when using the secondary index for lookup, you need to find the primary key value first, and then look for it in the primary index, a process also known as back to the table.

Hash indexing

The hash index can be searched in O (1) time, but the order is lost:

Cannot be used for sorting and grouping

Only precise lookups are supported, and cannot be used for partial lookups and range lookups.

The InnoDB storage engine has a special feature called "adaptive hash index". When an index value is used very frequently, another hash index is created on top of the B+Tree index, so that the B+Tree index has some of the advantages of a hash index, such as fast hash lookup.

Full-text index

The MyISAM storage engine supports full-text indexing, which is used to find keywords in text, rather than directly comparing equality.

The lookup criteria use MATCH AGAINST instead of normal WHERE.

Full-text indexing is implemented using an inverted index, which records the mapping of keywords to the document in which they are located.

The InnoDB storage engine also began to support full-text indexing in MySQL version 5.6.4.

Spatial data index

The MyISAM storage engine supports spatial data indexing (R-Tree), which can be used for geographic data storage. The spatial data index indexes the data from all dimensions and can effectively use any dimension for combined queries.

You must use GIS-related functions to maintain the data.

Index optimization

Independent column

When making a query, the index column cannot be part of an expression or an argument to a function, otherwise the index cannot be used.

For example, the following query cannot use the index of the actor_id column:

SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5

Multi-column index

When you need to use multiple columns as criteria for a query, using multiple column indexes performs better than using multiple single column indexes. For example, in the following statement, it is best to set actor_id and film_id to multi-column indexes.

SELECT film_id, actor_id FROM sakila.film_actor WHERE actor_id = 1 AND film_id = 1

Order of index columns

Put the most selective index column first.

The selectivity of the index refers to the ratio of the index value that is not repeated to the total number of records. The maximum value is 1, where each record has a unique index corresponding to it. The higher the selectivity, the higher the differentiation of each record and the higher the query efficiency.

For example, in the results shown below, customer_id is more selective than staff_id, so it is best to put the customer_id column in front of the multi-column index

SELECT COUNT (DISTINCT staff_id) / COUNT (*) AS staff_id_selectivity

COUNT (DISTINCT customer_id) / COUNT (*) AS customer_id_selectivity

COUNT (*)

FROM payment

Staff_id_selectivity: 0.0001 tomer_id_selectivity: 0.0373 COUNT (*): 16049

Prefix index

For columns of type BLOB, TEXT, and VARCHAR, you must use a prefix index, indexing only the first portion of the characters.

The selection of prefix length needs to be determined according to index selectivity.

Overlay index

The index contains the values of all fields that need to be queried.

It has the following advantages:

Indexes are usually much smaller than the size of data rows, and read-only indexes can greatly reduce the amount of data access.

Some storage engines, such as MyISAM, only cache indexes in memory, while data is cached by the operating system. Therefore, accessing only the index can be done without using system calls (which are usually time-consuming).

For the InnoDB engine, if the secondary index can override the query, there is no need to access the primary index.

Advantages of indexing

Greatly reduces the number of rows of data that the server needs to scan.

Help the server avoid sorting and grouping, and avoid creating temporary tables (B+Tree indexes are ordered and can be used for ORDER BY and GROUP BY operations. Temporary tables are mainly created in the process of sorting and grouping, and do not need sorting and grouping, so there is no need to create temporary tables.

Change the random B+Tree O to the sequential I hand O (the index is ordered and the adjacent data is stored together).

Conditions for the use of indexes

For very small tables, simple full table scans are more efficient than indexing in most cases

For medium to large tables, the index is very efficient

But for very large tables, the cost of establishing and maintaining indexes will increase. In this case, you need to use a technique that can directly distinguish the set of data that needs to be queried, rather than matching record by record, for example, you can use partitioning techniques.

Why is a simple full table scan more efficient than indexing for very small tables in most cases?

If a table is small, it is obviously faster to traverse the table directly than to walk the index (because you need to go back to the table).

Note: first of all, note that the implicit condition of this answer is that the data queried is not part of the index, and there is no need to return to the table. Second, the query condition is not the primary key, otherwise you can get the data directly from the clustered index.

Query performance optimization

Parsing select query statements using explain

Explain is used to analyze SELECT query statements, and developers can optimize query statements by analyzing Explain results.

Select_type

The commonly used ones are SIMPLE simple query, UNION federated query, SUBQUERY subquery and so on.

Table

The table to be queried

Possible_keys

The possible indexes to choose

Optional index

Key

The index actually chosen

The index actually used

Rows

Estimate of rows to be examined

Number of rows scanned

Type

The type of index query, the type of index query commonly used:

Const: when querying with primary key or unique index, there is only one row matching ref: using non-unique index range: using primary key, secondary index of single field, and last field of secondary index of multiple fields for range query index: the difference between index: and all is that the index tree is scanned all: scan the whole table:

System

Trigger condition: the table has only one row, which is a special case of const type

Const

Trigger condition: only one row matches when querying using a primary key or a unique index.

SELECT * FROM tbl_name WHERE primary_key=1

SELECT * FROM tbl_name

WHERE primary_key_part1=1 AND primary_key_part2=2

Eq_ref

Trigger condition: when a join query is made, when a primary key or unique index is used and only a row of records are matched

SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1

Ref

Trigger condition: using a non-unique index

SELECT * FROM ref_table WHERE key_column=expr

SELECT * FROM ref_table,other_table

WHERE ref_table.key_column=other_table.column

SELECT * FROM ref_table,other_table

WHERE ref_table.key_column_part1=other_table.column

AND ref_table.key_column_part2=1

Range

Trigger condition: only a range query is performed in the last field using the primary key, the secondary index of a single field, or the secondary index of multiple fields. Range

SELECT * FROM tbl_name

WHERE key_column = 10

SELECT * FROM tbl_name

WHERE key_column BETWEEN 10 and 20

SELECT * FROM tbl_name

WHERE key_column IN (10, 10, 20, 30)

SELECT * FROM tbl_name

WHERE key_part1 = 10 AND key_part2 IN (10, 10, 20, 30)

Index

The index join type is the same as ALL, except that the index tree is scanned. This occurs two ways:

Trigger condition:

Scan only index trees

1) the field of the query is part of the index and overwrites the index. 2) sort using primary key

All

Trigger condition: full table scan, no index

Optimize data access

Reduce the amount of data requested

Return only the necessary columns: it is best not to use the SELECT * statement.

Return only the necessary rows: use the LIMIT statement to restrict the data returned.

Caching repeatedly queried data: using caching can avoid querying in the database, especially when the data to be queried is often repeatedly queried, the query performance improvement caused by caching will be very obvious.

Reduce the number of rows scanned on the server side

The most efficient way is to use indexes to override queries.

Reconstruct query mode

Syncopated big query

If a large query is executed at once, it may lock a lot of data at once, occupy the entire transaction log, deplete system resources, and block many small but important queries.

DELETE FROM messages WHERE create

< DATE_SUB(NOW(), INTERVAL 3 MONTH); rows_affected = 0 do { rows_affected = do_query( "DELETE FROM messages WHERE create < DATE_SUB(NOW(), INTERVAL 3 MONTH) LIMIT 10000") } while rows_affected >

0

Decompose big join query

Decomposing a large join query into a single table query for each table, and then associating it in the application, has the following benefits:

Make caching more efficient. For join queries, if one of the tables changes, the entire query cache cannot be used. For multiple queries after decomposition, even if one of the tables changes, the query cache for other tables can still be used.

Decomposed into multiple single-table queries, the cached results of these single-table queries are more likely to be used by other queries, thus reducing the number of queries with redundant records.

Reduce lock competition

Connecting at the application layer makes it easier to split the database, making it easier to achieve high performance and scalability.

The query itself may also be more efficient. For example, in the following example, using IN () instead of join queries allows MySQL to query in ID order, which may be more efficient than random joins.

SELECT * FROM tag

JOIN tag_post ON tag_post.tag_id=tag.id

JOIN post ON tag_post.post_id=post.id

WHERE tag.tag='mysql'

SELECT * FROM tag WHERE tag='mysql'; SELECT * FROM tag_post WHERE tag_id=1234; SELECT * FROM post WHERE post.id IN (123Person456, 567, 9098, 8904)

Business

A transaction is a set of operations that satisfy the ACID feature. A transaction can be committed through Commit or rolled back using Rollback.

ACID

The most basic features of a transaction are the four features of ACID, which are:

Atomicity: atomicity

Consistency: consistency

Isolation: isolation

Durability: persistence

Atomicity

The transaction is regarded as an indivisible minimum unit, and all operations of the transaction either succeed or fail to roll back.

Consistency

The database maintains the consistency state before and after the transaction execution, and in the consistency state, the reading result of all transactions to a data is the same.

Isolation

Changes made by one firm are not visible to other transactions until they are finally committed.

Persistence

Once the transaction commits, its changes are saved to the database forever. Even if the system crashes, the result of transaction execution cannot be lost.

The relationship between ACID

The concept of the ACID feature of a transaction is simple but difficult to understand, mainly because these features are not a level relationship:

The result of the transaction is correct only if consistency is satisfied.

In the case of no concurrency, the transaction is executed serially, and the isolation must be satisfied. At this time, as long as atomicity can be satisfied, consistency must be satisfied. In the case of concurrency, multiple transactions are executed in parallel, and transactions need to satisfy not only atomicity but also isolation in order to meet consistency.

Transaction satisfaction persistence is designed to cope with database crashes.

Isolation level

Uncommitted read (READ UNCOMMITTED)

Changes in a transaction are visible to other transactions, even if they are not committed.

Submit read (READ COMMITTED)

A transaction can only read changes made by committed transactions. In other words, changes made by one firm are not visible to other transactions until they are committed.

Repeatable read (REPEATABLE READ)

Ensure that the result of reading the same data multiple times in the same transaction is the same.

Serializable (SERIALIZABLE)

Forces the transaction to execute serially.

Locking implementations are required, while other isolation levels are usually not required.

Lock

Lock is a key feature that distinguishes database system from file system. The locking mechanism is used to manage concurrent access to shared resources.

Lock type

Shared lock (S Lock)

Allow a transaction to read a row of data

Exclusive lock (X Lock)

Allow transactions to delete or update a row of data

Intention shared lock (IS Lock)

The transaction wants to acquire a shared lock for several rows in a table

Intention exclusive lock

The transaction wants to acquire exclusive locks for certain rows in a table

MVCC

Multi-version concurrency control (Multi-Version Concurrency Control, MVCC) is a specific way for MySQL's InnoDB storage engine to achieve isolation levels, which are committed reads and repeatable reads. The uncommitted read isolation level always reads the latest rows of data without using MVCC. Serializable isolation levels require locking on all read rows, which cannot be achieved using MVCC alone.

Basic concept

Version number

System version number: is an increasing number, each time you start a new transaction, the system version number will be automatically incremented.

Transaction version number: the system version number at the beginning of the transaction.

Hidden column

MVCC stores two hidden columns after each row of records to store two version numbers:

Create version number: indicates the system version number when a snapshot of a data row is created

Delete version number: if the deleted version number of the snapshot is greater than the current transaction version number, the snapshot is valid, otherwise it has been deleted.

Undo log

The snapshots used by MVCC are stored in the Undo log, which connects all snapshots of a row of data (Record) by rolling back pointers.

Realization process

The following implementation procedures are for repeatable readable isolation levels.

It is critical to understand that when you start a transaction, the version number of the transaction must be greater than the creation version number of all current data row snapshots. The creation version number of the data row snapshot is the system version number when the data row snapshot is created, and the system version number increases with the creation of the transaction, so when a new transaction is created, the system version number of the transaction is larger than the previous system version number. that is, larger than the creation version number of all data row snapshots.

SELECT

Multiple transactions must read a snapshot of the same data row, and this snapshot is the closest valid snapshot to now. However, there are exceptions. If a transaction is modifying the data row, it can read the changes made by the transaction itself without having to be consistent with the read results of other transactions.

A transaction that has not made changes to a data row is called the creation version number of the data row snapshot to be read by Trector T must be less than or equal to the version number of T, because if it is greater than the version number of T, then it indicates that the data row snapshot is the latest modification of other transactions, so it cannot be read. In addition, the delete version number of the data row snapshot to be read by T must be undefined or greater than T, because if it is less than or equal to T, then the data row snapshot has been deleted and should not be read.

INSERT

Use the current system version number as the creation version number of the data row snapshot.

DELETE

Use the current system version number as the deleted version number of the data row snapshot.

UPDATE

Use the current system version number as the deleted version number of the data row snapshot before the update, and the current system version number as the creation version number of the updated data row snapshot. It can be understood as executing DELETE first and then INSERT.

Snapshot read and current read

At the repeatable readable level, through the MVCC mechanism, although the data becomes readable, the data we read may be historical data, not timely data, not the current data of the database! This is likely to go wrong in some businesses that are particularly sensitive to the timeliness of data.

This way of reading historical data is called snapshot read, while the way of reading data from the current version of the database is called current read (current read). Obviously, in MVCC:

Snapshot read

The SELECT operation of MVCC is the data in the snapshot and does not need to be locked.

Select * from table... .

Current read

MVCC other operations that modify the database (INSERT, UPDATE, DELETE) require locking operations to read the latest data. You can see that MVCC does not need locking at all, but just avoids the locking operation of SELECT.

INSERT; UPDATE; DELETE

When you perform a SELECT operation, you can force the lock operation to be specified. The following first statement requires an S lock and the second statement requires an X lock.

-select * from table where? Lock in share mode;-select * from table where? For update

In fact, the isolation level of the transaction is the current read level defined. In order to reduce the time of lock processing (including waiting for other locks) and improve the concurrency ability, MySQL introduces the concept of snapshot read so that select does not need to be locked. The isolation of "current reads" such as update and insert needs to be achieved by locking.

Locking algorithm

Record Lock

Lock the index on a record, not the record itself.

If the table does not have an index set, InnoDB automatically creates a hidden clustered index on the primary key, so Record Locks can still be used.

Gap Lock

Locks the gap between indexes, but does not include the index itself. For example, when one transaction executes the following statement, other transactions cannot insert 15 in t.c.

SELECT c FROM t WHERE c BETWEEN 10 and 20 FOR UPDATE

Next-Key Lock

It is a combination of Record Locks and Gap Locks that locks not only the indexes on a record, but also the gaps between indexes. For example, if an index contains the following values: 10, 11, 13, and 20, then you need to lock the following range:

(- ∞, 10] (10, 11] (11, 13] (13, 20] (20, + ∞)

In the InnoDB storage engine, the unrepeatable reading problem of SELECT operation is solved by MVCC, while the unrepeatable reading problem of UPDATE and DELETE is solved by Record Lock, and the non-repeatable reading problem of INSERT is solved by Next-Key Lock (Record Lock + Gap Lock).

Lock problem

Dirty reading

Dirty reading means that under different transactions, the current transaction can read the uncommitted data of another transaction.

For example:

T1 modifies a data, and T2 then reads the data. If T1 undoes this modification, the data read by T2 is dirty.

Non-repeatable

Non-repeatable reading refers to the situation in which the same data set is read many times in the same transaction and the data read is different.

For example:

T2 reads a piece of data, and T1 modifies it. If T2 reads this data again, the result read at this time is different from that read for the first time.

In InnoDB storage engine, the unrepeatable reading problem of SELECT operation is solved by MVCC, while the unrepeatable reading problem of UPDATE and DELETE is solved by Record Lock, and the non-repeatable reading problem of INSERT is solved by Next-Key Lock (Record Lock + Gap Lock).

Phantom Proble (Phantom Reading)

The so-called phantom problem occurs within a transaction when the same query produces different sets of rows at different times. For example, if a SELECT is executed twice, but returns a row the second time that was not returned the first time, the row is a "phantom" row.

Phantom Proble means that two consecutive execution of the same sql statement under the same transaction may return different results, and the second sql statement may return rows that did not exist before.

Phantom reading is a special unrepeatable reading problem.

Missing updates

The update operation of one transaction is overwritten by the update operation of another transaction.

For example:

Both T1 and T2 transactions modify a data. T1 modifies first, T2 then modifies, and T2 changes cover T1 modifications.

This type of problem can be solved by adding an exclusive lock to the SELECT operation, but this may introduce performance problems, depending on the business scenario.

Sub-database and sub-table data segmentation

Horizontal syncopation

Horizontal sharding, also known as Sharding, splits records from the same table into multiple tables with the same structure.

When the data of a table is increasing, Sharding is the inevitable choice, it can distribute the data to different nodes of the cluster, thus caching the pressure of a single database.

Vertical slicing

Vertical segmentation is to divide a table into multiple tables according to columns, usually according to the relationship intensity of the columns, or you can use the vertical atmosphere to divide the columns that are often used and those that are not often used into different tables.

At the database level, vertical segmentation is used to deploy to impassable databases according to the density of tables in the database, for example, the original e-commerce data deployment database is vertically divided into commodity database, user database and so on.

Sharding strategy

Hash Modulus: hash (key)% N

Range: can be an ID range or a time range

Mapping tables: use a separate database to store mapping relationships

Problems in Sharding

Transaction problem

Use distributed transactions to solve, such as XA interface

Connect

The original join can be decomposed into multiple single-table queries, and then joined in the user program.

Uniqueness

Use globally unique ID (GUID)

Specify an ID range for each shard

Distributed ID generators (such as Twitter's Snowflake algorithm)

Copy

Master-slave replication

It mainly involves three threads: the binlog thread, the Imax O thread and the SQL thread.

Binlog thread: responsible for writing data changes on the primary server to the binary log (Binary log).

Icano thread: responsible for reading the-binary log from the master server and writing to the relay log (Relay log) of the slave server.

SQL thread: responsible for reading the relay log, parsing the data changes that have been made by the master server, and replaying them in the slave server (Replay).

Separation of reading and writing

The master server handles the write operation and the real-time read operation, while the slave server handles the read operation.

The reasons why read-write separation can improve performance are:

The master and slave servers are responsible for reading and writing respectively, which greatly alleviates the lock contention.

MyISAM can be used from the server to improve query performance and save system overhead.

Increase redundancy and improve availability.

Read-write separation is often implemented by proxy, in which the proxy server receives the read-write request from the application layer and then decides which server to forward to.

JSON

The JSON data type is often used in real business, and there are two main requirements in the query process:

In the where condition, there is a need to filter the returned results through a field in json

Query some of the fields in the json field as returned results (reduce memory footprint)

JSON_CONTAINS

JSON_CONTAINS (target, candidate [, path])

If the target value condidate is found at the location specified in the json field target, 1 is returned, otherwise 0 is returned

If you are just checking to see if data exists in the specified path, use JSON_CONTAINS_PATH ()

Mysql > SET @ j ='{"a": 1, "b": 2, "c": {"d": 4}}'; mysql > SET @ j2 = '1seam; mysql > SELECT JSON_CONTAINS (@ j, @ j2,' $.a') +-+ | JSON_CONTAINS (@ j, @ j2) '$.a') | +-- + | 1 | +-+ mysql > SELECT JSON_CONTAINS (@ j, @ j2,'$.b') +-+ | JSON_CONTAINS (@ j, @ j2) '$.b') | +-- + | 0 | +-- + mysql > SET @ j2 ='{"d": 4}' Mysql > SELECT JSON_CONTAINS (@ j, @ j2,'$.a') +-+ | JSON_CONTAINS (@ j, @ j2) '$.a') | +-- + | 0 | +-- + mysql > SELECT JSON_CONTAINS (@ j, @ j2,'$.c') +-+ | JSON_CONTAINS (@ j, @ j2) '$.c') | +-+ | 1 | +-+

JSON_CONTAINS_PATH

JSON_CONTAINS_PATH (json_doc, one_or_all, path [, path]...)

Returns 1 if there is data in the specified path, otherwise 0

Mysql > SET @ j ='{"a": 1, "b": 2, "c": {"d": 4}}'; mysql > SELECT JSON_CONTAINS_PATH (@ j, 'one',' $.a','$.e') +-+ | JSON_CONTAINS_PATH (@ j, 'one',' $.a' '$.e') | +-+ | 1 | +-- -+ mysql > SELECT JSON_CONTAINS_PATH (@ j 'all',' $.a','$.e') +-+ | JSON_CONTAINS_PATH (@ j, 'all',' $.a' '$.e') | +-+ | 0 | +-- -+ mysql > SELECT JSON_CONTAINS_PATH (@ j 'one',' $.c.d') +-- + | JSON_CONTAINS_PATH (@ j, 'one' '$.c.d') | +-- + | 1 | +-- + mysql > SELECT JSON_CONTAINS_PATH (@ j, 'one' '$.a.d') +-- + | JSON_CONTAINS_PATH (@ j, 'one' '$.a.d') | +-+ | 0 | +-+

Actual use:

$conds = new Criteria (); $conds- > andWhere ('dept_code',' in', $deptCodes); if (! empty ($aoiAreaId)) {$aoiAreaIdCond = new Criteria (); $aoiAreaIdCond- > orWhere ("JSON_CONTAINS_PATH (new_aoi_area_ids,'one','$.\" $aoiAreaId\ ")",'=', 1) $aoiAreaIdCond- > orWhere ("JSON_CONTAINS_PATH (old_aoi_area_ids,'one','$.\" $aoiAreaId\ ")",'=', 1); $conds- > andWhere ($aoiAreaIdCond);}

Column- > path, column- > > path

Gets the value of the specified path

-> vs-> >

Whereas the-> operator simply extracts a value, the-> > operator in addition unquotes the extracted result.

Mysql > SELECT * FROM jemp WHERE g > 2 +-- +-+ | c | g | +-- +-+ | {"id": "3" "name": "Barney"} | 3 | | {"id": "4", "name": "Betty"} | 4 | +-+-+ 2 rows in set (0.01sec) mysql > SELECT c->'$.name'AS name-> FROM jemp WHERE g > 2 +-+ | name | +-+ | "Barney" | | "Betty" | +-+ 2 rows in set (0.00 sec) mysql > SELECT JSON_UNQUOTE (c->'$.name') AS name-> FROM jemp WHERE g > 2 +-+ | name | +-+ | Barney | | Betty | +-+ 2 rows in set (0.00 sec) mysql > SELECT c-> >'$.name'AS name-> FROM jemp WHERE g > 2; +-+ | name | +-+ | Barney | Betty | +-+ 2 rows in set (0.00 sec)

Actual use:

$retTask = AoiAreaTaskOrm::findRows (['status',' extra_info- > > "$.new_aoi_area_infos" as new_aoi_area_infos', 'extra_info- > > "$.old_aoi_area_infos" as old_aoi_area_infos'], $cond)

Relational database design theory

Functional dependency

Remember A-> B means that A function determines B. it can also be said that B function depends on A.

If {A1 and A2 An} is a set of one or more attributes of a relationship, the set function determines all other attributes of the relationship and is minimal, then the set is called a key code.

For A-> B, if we can find a proper subset of An and make A-> B, then A-> B is a partial functional dependency, otherwise it is a complete functional dependency.

For A-> B < B > C, then A-> C is a transfer function dependency.

Abnormal

The functional dependency of the following student curriculum relationship is {Sno, Cname}-> {Sname, Sdept, Mname, Grade}, and the key code is {Sno, Cname}. In other words, after determining the students and courses, you can determine other information.

The relationship that does not conform to the paradigm will produce a lot of exceptions, mainly the following four kinds of exceptions:

Redundant data: for example, student-2 appeared twice.

Modify exception: the information in one record is modified, but the same information in another record is not modified.

Delete exception: delete one message and you will lose other information as well. For example, if you delete course-1, you need to delete the first and third lines, then the information for student-1 will be lost.

Insert exception: for example, if you want to insert a student's information, if the student has not chosen a course, it cannot be inserted.

Paradigm

The purpose of paradigm theory is to solve the above four anomalies.

The high-level paradigm depends on the low-level paradigm, and 1NF is the lowest-level paradigm.

The first normal form (1NF)

Attributes are inseparable.

Second normal form (2NF)

Each non-primary attribute is completely functional dependent on the key code.

It can be satisfied by decomposition.

Before decomposition

In the above student course relationship, {Sno, Cname} is the key code and has the following functional dependencies:

Sno-> Sname, Sdept

Sdept-> Mname

Sno, Cname- > Grade

The complete function of Grade depends on the key code, it does not have any redundant data, and each student has a specific score in each course.

Sname, Sdept and Mname all rely partly on key codes. When a student takes multiple courses, the data will appear multiple times, resulting in a large amount of redundant data.

After decomposition

Relationship-1

There are the following functional dependencies:

Sno-> Sname, Sdept

Sdept-> Mname

Relationship-2

There are the following functional dependencies:

Sno, Cname-> Grade

The third normal form (3NF)

Non-primary attributes do not pass functions that depend on the key code.

The following transfer function dependencies exist in relation-1 above:

Sno-> Sdept-> Mname

The following decomposition can be performed:

Relationship-11

Relationship-12

ER diagram

Entity-Relationship, which has three components: entity, attribute, and connection.

It is used to design the concept of relational database system.

Three kinds of relations of entities

It includes one-to-one, one-to-many and many-to-many.

If A to B is an one-to-many relationship, draw a segment with an arrow pointing to B.

If it is one-to-one, draw two segments with arrowheads

If it is many-to-many, draw two segments without arrows.

The following figure shows an one-to-many relationship between Course and Student.

Indicates a relationship that occurs multiple times

When an entity appears in contact several times, it needs to be connected by several lines.

The following figure shows the prerequisite relationship of a course. There are two Course entities in the prerequisite relationship, the first is the prerequisite course and the second is the post-course, so two lines are needed to represent the relationship.

The multiplicity of connection

Although teachers can offer multiple courses and teach multiple students, for specific students and courses, there is only one teacher to teach, which constitutes a ternary relationship.

Represents a subclass

A triangle and two lines are used to connect the class and the subclass. the attributes and connections related to the subclass are connected to the subclass, while those related to the parent class and subclass are connected to the parent class.

At this point, I believe that everyone on the "summary of database knowledge points" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report