If you don't read the 20,000-word database interview questions, you absolutely regret it. 04/16 Update SLTechnology News&Howtos

If you don't read the 20,000-word database interview questions, you absolutely regret it.

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I. basic concepts

1. Primary key, foreign key, superkey, candidate key

Superkey: a set of attributes that uniquely identifies a tuple in a relationship is called a hyperkey in a relational schema. An attribute can be used as a superkey, or multiple attributes can be combined as a superkey. The superkey contains a candidate key and a primary key.

Candidate key: is the minimum superkey, that is, a superkey with no redundant elements.

Primary key: a combination of data columns or attributes in a database table that uniquely and fully identify the stored data object. A data column can have only one primary key, and the value of the primary key cannot be missing, that is, it cannot be null (Null).

Foreign key: the primary key of another table that exists in one table is called the foreign key of this table.

two。 Why use self-incrementing column as primary key

If we define the primary key (PRIMARY KEY), then InnoDB selects the primary key as the clustered index,

If the primary key is not explicitly defined, InnoDB selects the first unique index that does not contain a null value as the primary key index,

If there is no such unique index, InnoDB chooses the built-in 6-byte ROWID as the implicit clustered index (ROWID increments the primary key as row records are written, and this ROWID is not as referable as ORACLE's ROWID and is implied).

The data record itself is stored on the leaf node of the main index (a B+Tree). This requires that the data records within the same leaf node (the size is a memory page or a disk page) are stored in primary key order, so every time a new record is inserted, MySQL will insert it into the appropriate node and location according to its primary key, and if the page reaches the load factor (InnoDB default is 15top 16), a new page (node) will be opened.

If the table uses a self-incrementing primary key, each time a new record is inserted, the record will be sequentially added to the subsequent position of the current index node, and when a page is full, a new page will be automatically opened

If you use a non-self-increasing primary key (such as ID number or student number, etc.), because the value of each inserted primary key is approximately random, each new record will be inserted somewhere in the middle of the existing index page, and MySQL will have to move the data in order to insert the new record to the appropriate location, and even the target page may have been written back to disk and cleared from the cache, and then read back from the disk. This increases a lot of overhead, while frequent movement and paging operations cause a large number of fragments, resulting in an index structure that is not compact enough. Later, we have to use OPTIMIZE TABLE to rebuild the table and optimize the fill page.

3. What is the function of the trigger?

Trigger is a special stored procedure, which is mainly triggered by events and executed. It can strengthen constraints to maintain data integrity and consistency, and can track operations within the database so as not to allow unauthorized updates and changes. It can be cascaded. For example, a trigger on a table contains a data operation on another table, which in turn causes the table trigger to be triggered.

4. What is a stored procedure? What does it use to call?

The stored procedure is a precompiled SQL statement, which has the advantage of allowing modular design, that is, it only needs to be created once, and it can be called many times later in the program. If an operation requires multiple SQL execution, using a stored procedure is faster than a simple SQL statement execution.

Call:

1) you can use a command object to call a stored procedure.

2) it can be called by external programs, such as java programs.

5. What are the advantages and disadvantages of stored procedures?

Advantages:

1) the stored procedure is precompiled and executed efficiently.

2) the code of the stored procedure is directly stored in the database and called directly through the name of the stored procedure to reduce network communication.

3) High security, and users with certain permissions are required to execute stored procedures.

4) stored procedures can be reused to reduce the workload of database developers.

Disadvantages: poor portability

6. The difference between stored procedures and functions

7. What is a view? What is the cursor?

View:

Is a virtual table with the same function as a physical table. The view can be added, modified, checked, manipulated, and attempted to be a subset of rows or columns that usually have one or more tables. Changes to the view affect the underlying table. It makes it easier for us to get data than multi-table queries.

Cursor:

Is to query the result set as a unit to effectively deal with. The cursor can be positioned on a specific row in the cell to retrieve one or more rows from the current row of the result set. You can modify the current row of the result set. Cursors are generally not used, but cursors are very important when you need to process data one by one.

8. Advantages and disadvantages of views

Advantages:

1 access to the database, because the view can selectively select a part of the database.

2) users can get results from complex queries through simple queries.

3) maintain the independence of the data and try to retrieve the data from multiple tables.

4) different views can be generated for the same data.

Disadvantages:

Performance: when querying a view, the query for the view must be converted into a query for the basic table. If the view is defined by a complex multi-table query, then the data cannot be changed.

Differences among 9.drop, truncate and delete

The most basic:

Drop deletes the table directly.

Truncate deletes the data from the table, and the self-growing id starts at 1 when you insert it.

Delete deletes the data in the table by adding the where sentence.

(1) the process of DELETE statement performing deletion is to delete a row from the table each time, and at the same time save the deletion of that row in the log as a transaction record for rollback operation. On the other hand, TRUNCATE TABLE deletes all the data from the table at once and does not log the individual delete operation records, and the delete row cannot be recovered. And table-related delete triggers are not activated during deletion. The execution speed is fast.

(2) the space occupied by tables and indexes. When a table is TRUNCATE, the space occupied by the table and index is restored to its original size, while the DELETE operation does not reduce the space occupied by the table or index. The drop statement frees up all the space occupied by the table.

Generally speaking, drop > truncate > delete

(4) the scope of application. TRUNCATE can only be for TABLE;DELETE. It can be table and view.

(5) TRUNCATE and DELETE delete only data, while DROP deletes the entire table (structure and data).

(6) truncate and delete without where: only delete data, not delete table structure (definition) drop statement will delete table structure dependent constraints (constrain), trigger (trigger) index (index); stored procedures / functions that depend on the table will be retained, but its state will be changed to: invalid.

(7) the delete statement is DML (data maintain Language), and this operation will be put into the rollback segment and will not take effect until the transaction is committed. If there is a corresponding tigger, it will be triggered during execution.

(8) truncate and drop are DLL (data define language), and the operation takes effect immediately. The original data is not put into the rollback segment and cannot be rolled back.

(9) use drop and truncate carefully in the absence of backup. To delete some data rows, use delete and note that the scope of influence is constrained in combination with where. The rollback segment should be large enough. To delete a table use drop; if you want to keep the table and delete the data in the table, if it has nothing to do with the transaction, you can do it with truncate. If it is related to business, or if the teacher wants to trigger trigger, use delete.

(10) Truncate table table names are fast and efficient because:? truncate table is functionally the same as a DELETE statement without a WHERE clause: both delete all rows in the table. However, TRUNCATE TABLE is faster than DELETE and uses fewer system and transaction log resources. The DELETE statement deletes one row at a time and records one entry for each row deleted in the transaction log. TRUNCATE TABLE deletes the data by releasing the data pages used to store the table data, and only records the release of the pages in the transaction log.

(11) TRUNCATE TABLE deletes all rows in the table, but the table structure and its columns, constraints, indexes, and so on remain unchanged. The count value used for the new row identity is reset to the seed of the column. If you want to keep the identity count value, use DELETE instead. Use the DROP TABLE statement if you want to delete the table definition and its data.

(12) instead of using TRUNCATE TABLE for tables referenced by FOREIGN KEY constraints, you should use a DELETE statement without a WHERE clause. Because TRUNCATE TABLE is not logged, it cannot activate the trigger.

10. What is a temporary table and when will it be deleted?

Temporary tables can be deleted manually:

DROP TEMPORARY TABLE IF EXISTS temp_tb

Temporary tables are visible only in the current connection, and when you close the connection, MySQL automatically deletes the table and frees up all space. Therefore, a temporary table with the same name can be created in different connections, and the operation belongs to the temporary table of this connection.

The syntax for creating a temporary table is similar to that for creating a table, except that the keyword TEMPORARY is added.

Such as:

CREATE TEMPORARY TABLE tmp_table (

NAME VARCHAR (10) NOT NULL

Time date NOT NULL

);

Select * from tmp_table

11. What is the difference and advantage comparison between non-relational database and relational database?

Advantages of non-relational databases:

Performance: NOSQL is based on key-value pairs, which can be thought of as the correspondence between primary keys and values in the table, and does not need to be parsed by the SQL layer, so the performance is very high.

Scalability: it is also very easy to scale horizontally because there is no coupling between data based on key-value pairs.

Advantages of relational databases:

Complex query: you can easily use SQL statements to do very complex data queries between one table and multiple tables.

Transaction support: data access requirements with high security performance can be realized.

Other:

1. For these two types of databases, each other's advantage is its own weakness, and vice versa.

2.NOSQL databases are slowly beginning to have some of the complex query capabilities of SQL databases, such as MongoDB.

3. Support for transactions can also be implemented with some system-level atomic operations such as optimistic locking to save the nation, such as Redis set nx.

twelve。 Database paradigm, design data tables according to a scenario?

The first paradigm: (ensure that each column remains atomic) all field values are indecomposable atomic values.

The first paradigm is the most basic paradigm. If all the field values in the database table are indecomposable atomic values, the database table satisfies the first normal form.

The reasonable follow of the first paradigm needs to be determined according to the actual needs of the system. For example, some database systems need to use the "address" attribute, the "address" attribute should be directly designed as a field of a database table. However, if the system often accesses the "city" part of the "address" attribute, it is necessary to re-split the "address" attribute into provinces, cities, detailed addresses, and other parts for storage. this will be very convenient when operating on a certain part of the address. Only in this way can the design satisfy the first normal form of the database, as shown in the following table.

The user information shown in the above table follows the requirements of the first paradigm, which makes it very convenient for users to classify cities and improve the performance of the database.

The second paradigm: (make sure that each column in the table is related to the primary key) in a database table, only one kind of data can be stored in a table, and multiple data cannot be saved in the same database table.

The second paradigm goes a step further on the basis of the first paradigm. The second paradigm needs to ensure that each column in the database table is related to the primary key, not only to some part of the primary key (mainly for the federated primary key). That is to say, in a database table, only one kind of data can be saved in a table, and multiple data can not be saved in the same database table.

For example, to design an order information table, because there may be multiple items in the order, the order number and item number should be used as the joint primary key of the database table.

The third paradigm: (ensure that each column is directly related to the primary key column, not indirectly) each column of data in the data table is directly related to the primary key, not indirectly.

The third paradigm needs to ensure that each column of data in the data table is directly related to the primary key, not indirectly.

For example, when designing an order data table, the customer number can be used as a foreign key to establish a corresponding relationship with the order table. You cannot add fields about other customer information (such as name, company, etc.) to the order table.

BCNF: conforms to 3NF, and the primary attribute does not depend on the primary attribute.

If the relational schema belongs to the second normal form and each attribute does not pass dependent on the key code, then R belongs to the BC normal form.

In general, the conditions of the BC normal form have many equivalent expressions: the left side of each non-trivial dependency must contain a key code; each determinant must contain a key code.

The BC paradigm checks both non-primary and primary attributes. When only non-primary attributes are checked, it becomes the third normal form. The relationship that satisfies the BC paradigm must satisfy the third paradigm.

It can also be said that if a relationship reaches the third normal form and it has only one candidate code, or if each candidate code is a single attribute, then the relationship naturally reaches the BC normal form.

In general, a database design conforms to 3NF or BCNF.

The fourth paradigm requires that many-to-many relationships in the same table be deleted.

The fifth paradigm: re-establish the original structure from the final structure.

13. What are inner connections, outer connections, cross connections, Cartesian products, etc.?

Internal connection:

Data query language DQL

Data manipulation language DML

Data definition language DDL

Data control language DCL.

1. Data query language DQL

The basic structure of data query language DQL is a query block composed of SELECT clause, FROM clause and WHERE clause.

SELECT

FROM

WHERE

2. Data manipulation language DML

There are three main forms of data manipulation language DML:

1) insert: INSERT

2) Update: UPDATE

3) Delete: DELETE

3. Data definition language DDL

The data definition language DDL is used to create various objects in the database-tables, views, indexes, synonyms, clustering, etc.

CREATE TABLE/VIEW/INDEX/SYN/CLUSTER

Table view index synonym cluster

The DDL operation is implicitly submitted! Cannot rollback

4. Data control language DCL

Data control language DCL is used to grant or retrieve certain privileges to access the database, to control the time and effect of database manipulation transactions, to monitor the database, and so on. Such as:

1) GRANT: authorization.

2) ROLLBACK [WORK] TO [SAVEPOINT]: go back to a certain point. Rollback-the ROLLBACK; rollback command returns the database state to the last committed state. Its format is:

SQL > ROLLBACK

3) COMMIT [WORK]: submit.

During database insert, delete, and modify operations, only when the transaction is committed to the data

The library is not complete until it is completed. Before the transaction is committed, only the person who operates the database has the right to see it.

What is done can only be seen by others after the final submission is completed.

There are three types of submission data: explicit submission, implicit submission, and automatic submission. The following points

Don't describe these three types.

(1) explicit submission

A commit that is done directly with the COMMIT command is an explicit commit. Its format is:

SQL > COMMIT

(2) implicit submission

A commit that is done indirectly with the SQL command is an implicit commit. These commands are:

ALTER,AUDIT,COMMENT,CONNECT,CREATE,DISCONNECT,DROP

EXIT,GRANT,NOAUDIT,QUIT,REVOKE,RENAME .

(3) automatic submission

If AUTOCOMMIT is set to ON, after the insert, modify and delete statements are executed

The system will submit automatically, which is called automatic submission. Its format is:

SQL > SET AUTOCOMMIT ON

Reference article:

Https://www.cnblogs.com/study-s/p/5287529.html

The difference between 16.like% and-

Classification of wildcards:

% percent wildcard: indicates that any character appears any number of times (can be 0 times).

_ underscore wildcard: indicates that you can only match a single character, neither more nor less, just one character.

Like operator:

% wildcards use: match records that start with "yves": (including record "yves") SELECT

Wildcard use: SELECT

Pay attention to case. When using fuzzy matching, that is, when matching text, mysql may or may not be case-sensitive, depending on how the user configures MySQL. If it is case-sensitive, records like YvesHe cannot be matched by matching conditions such as "yves__."

Note the trailing space. "% yves" does not match records such as "heyves".

Note that NULL,% wildcards can match any character, but not NULL, that is, SELECT * FROM products WHERE products.prod_name like'%; can't match records with products.prod_name of NULL.

Tips and suggestions:

As you can see, MySQL wildcards are useful. But this feature comes at a price: wildcard searches generally take longer to process than other searches discussed earlier. Here are some tips to remember when using wildcards.

Don't overuse wildcards. If other operators can achieve the same purpose, other operators should be used.

When you really need to use wildcards, don't use them at the beginning of the search pattern unless absolutely necessary. Put wildcards at the beginning of the search pattern, and the search is the slowest.

Pay close attention to the position of wildcards. If misplaced, the desired number may not be returned.

Reference blog post: https://blog.csdn.net/u011479200/article/details/78513632

The difference between 17.count (*), count (1) and count (column)

Count (*) calculates the number of rows, including NULL

Count (column) calculates the number of rows the value of a particular column has and does not include a null value.

There is another way to use count (). The use of count (1) is the same as that of count (*).

Performance issues:

1. SELECT COUNT (*) FROM tablename is the best choice in any case.

two。 Minimize queries such as SELECT COUNT (*) FROM tablename WHERE COL = 'value'

3. Stop the appearance of SELECT COUNT (COL) FROM tablename WHERE COL2 = 'value'.

If the table does not have a primary key, count (1) is faster than count (*).

If there is a primary key, then count (primary key, federated primary key) is faster than count (*).

If the table has only one field, count (*) is the fastest.

Count (1), like count (primary key), scans only primary keys. Count (*), like count (non-primary key), scans the entire table. The former is obviously faster.

18. Leftmost prefix principle

Multi-column index:

ALTER TABLE people ADD INDEX lname_fname_age (lame,fname,age)

In order to improve the search efficiency, we need to consider the use of multi-column indexes, because the index files are saved in B-Tree format, so we do not have to scan any records to get the final results.

Note: when executing a query in mysql, only one index can be used. If we build an index on lname,fname,age separately and execute the query with only one index, mysql will choose the index that is the strictest (with the least number of records in the result set).

Leftmost prefix principle: as the name implies, it is the leftmost first. In the above example, we created a lname_fname_age multi-column index, which is equivalent to creating (lname) single-column index, (lname,fname) composite index and (lname,fname,age) composite index.

II. Index

1. What is an index?

What is an index:

Database index is a sorted data structure in database management system. B-tree and its variant B + tree are usually used to implement the index.

In addition to the data, the database system also maintains data structures that meet specific search algorithms, which refer to (point to) the data in some way, so that advanced search algorithms can be implemented on these data structures. This kind of data structure is the index.

two。 What is the function of the index? What are its advantages and disadvantages?

The function of index:

Assist to quickly query and update the data in the database table.

There is a price to pay for setting an index on a table:

First, the storage space of the database is increased.

Second, it takes more time to insert and modify data (because the index changes accordingly).

3. What are the advantages and disadvantages of the index?

Creating an index can greatly improve the performance (benefits) of the system:

1. By creating a uniqueness index, you can ensure the uniqueness of each row of data in the database table.

two。 It can greatly speed up the retrieval of data, which is the main reason for creating an index.

3. The connection between the meter and the table can be accelerated, especially in achieving the referential integrity of the data.

4. When using grouping and sorting clauses for data retrieval, you can also significantly reduce the time of grouping and sorting in a query.

5. By using the index, the optimization hider can be used in the process of query to improve the performance of the system.

Adding an index also has many disadvantages (disadvantages):

1. It takes time to create and maintain an index, which increases as the amount of data increases.

two。 The index needs to take up the physical space, in addition to the data table occupies the data space, each index also takes up a certain amount of physical space, if you want to establish a clustered index, then the space required will be more.

3. When the data in the table is added, deleted and modified, the index should also be maintained dynamically, which reduces the speed of data maintenance.

4. Which columns are suitable for indexing and which are not?

The index is built on top of some columns in the database table. When creating an index, you should consider on which columns you can create an index and on which columns you cannot.

In general, indexes should be created on these columns:

(1) you can speed up the search on the columns that often need to be searched.

(2) on the column as the primary key, force the uniqueness of the column and organize the arrangement structure of the data in the table.

(3) on the columns that are often used in joins, these columns are mainly foreign keys, which can speed up the join.

(4) create an index on a column that often needs to search by range, because the index is sorted and its specified range is continuous.

(5) create an index on columns that often need to be sorted, because the index is already sorted, so that the query can make use of the sorting of the index to speed up the sorting query time.

(6) create an index on the columns that are often used in the WHERE clause to speed up the judgment of conditions.

Indexes should not be created for some columns:

(1) Indexes should not be created for columns that are rarely used or referenced in queries.

This is because, since these columns are rarely used, indexing or no indexing does not improve query speed. On the contrary, due to the increase of the index, the maintenance speed of the system is reduced and the space requirement is increased.

(2) the index should not be added to those columns that have only a few data values.

This is because, because these columns have very few values, such as the gender column of the personnel table, the data rows of the result set account for a large proportion of the data rows in the table in the results of the query, that is, a large proportion of the data rows that need to be searched in the table. Increasing the index does not significantly speed up the retrieval speed.

(3) columns defined as text, image, and bit data types should not be indexed.

This is because these columns either have a large amount of data or have very few values.

(4) when the modification performance is much greater than the retrieval performance, the index should not be created.

This is because modification performance and retrieval performance contradict each other. When the index is added, the retrieval performance is improved, but the modification performance is reduced. When the index is reduced, the modification performance is improved and the retrieval performance is reduced. Therefore, indexes should not be created when modification performance is much greater than retrieval performance.

5. What kind of fields are suitable for indexing?

Unique, non-empty, frequently queried fields

What is the difference between 6.MySQL B+Tree index and Hash index?

The characteristics of Hash index and B+ tree index:

Because of the particularity of Hash index structure, its retrieval efficiency is very high, and the index retrieval can be located at once.

The B+ tree index needs to go from the root node to the branch node before it can access as many IO accesses as the page node.

Why not all use Hash indexes but use B+ tree indexes?

The Hash index can only satisfy the "=", "IN" and "" queries, and the range query cannot be used, because the size of the hash value processed by the corresponding Hash algorithm cannot be guaranteed to be exactly the same as that before the Hash operation.

Hash indexes cannot be used to avoid sorting data, because the size of the hash value is not necessarily the same as the key value before the Hash operation.

Hash indexes cannot use partial index keys to query. For combined indexes, when calculating hash values, Hash indexes calculate hash values together after combining index keys, rather than calculating hash values separately, so when querying through one or more index keys of the combined index, the Hash index cannot be utilized.

Hash index can not avoid table scan at any time, because different index keys have the same hash value, so even if you take the number of records that meet a certain Hash key value, you can not directly complete the query from the Hash index, or you have to go back to the table to query the data.

Hash indexes do not necessarily perform better than B+ tree indexes when they encounter a large number of equal hash values.

Add:

In 1.MySQL, only the HEAP/MEMORY engine shows support for Hash indexes.

two。 The Btree index is used by default in the commonly used InnoDB engine, which monitors the use of indexes on the table in real time. If you think that establishing a hash index can improve query efficiency, then automatically establish a hash index in the "adaptive hash indexing buffer" in memory (adaptive hash indexing is enabled by default in InnoDB). By observing the search pattern, MySQL will build a hash index using the prefix of index key. If almost most of a table is in the buffer pool, establishing a hash index can speed up the equivalent query.

The obvious differences between B+ tree index and hash index are:

3. If it is an equivalent query, then the hash index has an absolute advantage, because it only takes one algorithm to find the corresponding key value; of course, this premise is that the key value is unique. If the key value is not unique, you need to find the location of the key first, and then scan back according to the linked list until the corresponding data is found.

4. If it is a range query and retrieval, the hash index is useless at this time, because the previously ordered key values may become discontinuous after the hash algorithm, and there is no way to use the index to complete the range query and retrieval.

Similarly, a hash index cannot be sorted using an index, and partial fuzzy queries such as like 'xxx%' (this partial fuzzy query is essentially a range query)

5. Hash indexing also does not support leftmost matching rules for multi-column federated indexes.

6. The keyword retrieval efficiency of the B + tree index is relatively average, which is not as volatile as the B tree. In the case of a large number of repeated keys, the efficiency of hash indexing is also very low, because of the so-called hash collision problem.

7. In most scenarios, there will be query features such as range query, sorting, grouping and so on, which can be indexed by B + tree.

7. The difference between B-tree and B + tree

B-tree, each node stores key and data, all nodes make up the tree, and the leaf node pointer is nul, and the leaf node does not contain any keyword information

In the B+ tree, all the leaf nodes contain the information of all keywords and the pointer to the records containing these keywords, and the leaf nodes themselves are linked sequentially according to the size of the keywords. All non-terminal nodes can be regarded as index parts, and the nodes contain only the largest (or minimum) keywords in the root node of the subtree. (the non-terminal node of the B-tree also contains valid information to find)

8. Why is B+ more suitable for file index and database index of operating system in practical application than B tree?

1.The disk read and write cost of B+ is lower.

The internal node of B+ does not have a pointer to keyword specific information. Therefore, its internal node is smaller than that of B-tree. If all the keywords of the same internal node are stored in the same disk, the more keywords the disk can hold. The more keywords you need to find when you read them into memory at once. Relatively speaking, the number of IO reads and writes has been reduced.

The query efficiency of 2.B+tree is more stable

Because the non-endpoint is not the node that ultimately points to the content of the file, but only the index of the keyword in the leaf node. Therefore, any keyword search must take a road from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

9. The difference between a clustered index and a nonclustered index?

Aggregate Index (clustered index):

The order of the records of the clustered index table is the same as that of the index, so the query efficiency is fast. As long as the first index value record is found, the rest of the continuous records are stored continuously in physics. The corresponding disadvantage of the clustered index is that it is slow to modify, because in order to ensure that the physical and index order of the records in the table are consistent, the data pages are reordered when the records are inserted.

The clustered index is similar to using pinyin to find Chinese characters in Xinhua dictionaries. Pinyin search tables are arranged in axiz in the order of secretaries, just as the same logical order is the same as physical order. When you need to find the two pronounced words aLaior multiple sha homonyms at a time, you may turn back a few pages, or follow the next line to get the results.

Non-aggregated index (nonclustered index):

The nonclustered index specifies the logical order of the records in the table, but the physical and index of the records are not necessarily the same, both indexes adopt the B+ tree structure, and the leaf layer of the nonclustered index does not overlap with the actual data page. the leaf layer contains a pointer to the records in the table in the data page. There are so many levels of nonclustered indexes that they do not cause data rearrangement.

The nonclustered index is similar to querying Chinese characters through the side radical in the Xinhua dictionary, and the key table may be arranged horizontally, vertically and prime, but because the text is the pinyin order of axiz, it is similar to the non-correspondence between logical address and physical address. At the same time, it is suitable for grouping, large numbers with different values, and frequently updated columns, which are not suitable for clustered indexes.

Fundamental difference:

The fundamental difference between a clustered index and a nonclustered index is whether the table records are arranged in the same order as the index.

III. Affairs

1. What is a transaction?

Transaction is a unified rollback or commit operation for a series of operations in the database, which is mainly used to ensure the integrity and consistency of the data.

two。 Four characteristics of transaction (ACID) atomicity, consistency, isolation, persistence?

Atomicity (Atomicity):

Atomicity means that all operations contained in a transaction either succeed or fail to roll back, so if the transaction operation is successful, it must be fully applied to the database, and if the operation fails, it can not have any impact on the database.

Consistency (Consistency):

Before and after the start of the transaction, the integrity constraints of the database are not broken. For example, if A transfers money to B, it is impossible for A to deduct the money, but B does not receive it.

Isolation (Isolation):

Isolation means that when multiple users access the database concurrently, such as operating the same table, the transactions opened by the database for each user cannot be disturbed by the operations of other transactions, and multiple concurrent transactions should be isolated from each other. At the same time, only one transaction is allowed to request the same data, and there is no interference between different transactions. For example, An is withdrawing money from a bank card, and B cannot transfer money to this card until the withdrawal process of An is over.

Persistence (Durability):

Persistence means that once a transaction is committed, the change to the data in the database is permanent, and the operation of committing the transaction will not be lost even if the database system encounters a failure.

3. Concurrency of transactions? Transaction isolation level, what are the problems caused by each level, and what is the default level of MySQL?

In theory, transactions should be completely isolated from each other to avoid problems caused by concurrent transactions, however, that can have a significant impact on performance because transactions must run sequentially, in order to improve performance in actual development, the transaction runs at a lower isolation level, and the isolation level of the transaction can be specified by isolating the transaction property.

Concurrency of transactions

1. Dirty reading: transaction A reads the data updated by transaction B, and then B rolls back the operation, then the data read by An is dirty data.

2. Non-repeatable reading: transaction A reads the same data many times, and transaction B updates and commits the data in the process of multiple reading by transaction A. as a result, when transaction A reads the same data many times, the results of the data read by the transaction twice will be inconsistent.

3. Phantom reading: phantom reading solves the problem of non-repetitive reading, ensuring that in the same transaction, the results of the query are all in the same state at the beginning of the transaction (consistency).

For example, transaction T1 modifies a data item of all rows in a table from "1" to "2" and transaction T2 inserts a row of data items into the table. The value of this data item is still "1" and submitted to the database. On the other hand, if the user who operates transaction T1 looks at the data that has just been modified, he will find that there is still no modification. In fact, this line is added from transaction T2, as if it were an illusion, which is an illusion.

Summary: non-repeatable reading and phantom reading are easy to be confused, non-repeatable reading focuses on modification, and phantom reading focuses on adding or deleting. To solve the problem of unrepeatable reading, you only need to lock the rows that meet the conditions, and to solve the phantom reading, you need to lock the table.

Isolation level of the transaction

Read uncommitted: another transaction modifies the data but has not yet committed, and the SELECT in this transaction will read the uncommitted data dirty read

Can not be read repeatedly: transaction A reads the same data many times, and transaction B updates and commits the data in the process of transaction A reading many times, resulting in transaction A reading the same data many times, as a result, the data read by this transaction twice will be inconsistent.

Repeatable: in the same transaction, the result of SELECT is the state of the point in time at the beginning of the transaction, so the results read by the same SELECT operation will be consistent. However, there will be hallucinations.

Serialization: the highest isolation level at which no exception is generated. Concurrent transactions, as if transactions are executed sequentially one by one

Pay special attention to:

The default transaction isolation level for MySQL is repeatable-read

MySQL supports 4 transaction isolation levels.

The isolation level of transactions is supported by the underlying database engine, not by the application or framework.

Two transaction isolation levels supported by Oracle: READ_COMMITED, SERIALIZABLE

There may be some differences in the specific implementation of different databases according to the standards stipulated by the SQL specification.

Read rows are not locked when the default transaction isolation level in MySQL is repeatable

Transaction isolation level: when a read is not committed, the write data only locks the corresponding row.

The transaction isolation level is: the write data locks the entire table when it can be read repeatedly.

Transaction isolation level is: when serialization, read and write data will lock the entire table.

The higher the isolation level, the more you can ensure data integrity and consistency, but the greater the impact on concurrent performance, you can't have both. For most applications, priority can be given to setting the isolation level of the database system to Read Committed, which can avoid dirty reads and has good concurrency performance. Although it can lead to concurrency problems such as unrepeatable reading and phantom reading, it can be controlled by the application using pessimistic or optimistic locks in individual situations where such problems may occur.

4. Transaction propagation behavior

1.PROPAGATION_REQUIRED: if there is no transaction currently, create a new transaction, and if there is a transaction, join the transaction. This setting is the most commonly used setting.

2.PROPAGATION_SUPPORTS: support the current transaction, join the transaction if there is a transaction, and execute as a non-transaction if no transaction exists.

3.PROPAGATION_MANDATORY: support the current transaction, join the transaction if there is a transaction, and throw an exception if no transaction exists.

4.PROPAGATION_REQUIRES_NEW: create a new transaction, regardless of whether it currently exists or not.

5.PROPAGATION_NOT_SUPPORTED: performs the operation in a non-transactional manner, suspending the current transaction if there is a current transaction.

6.PROPAGATION_NEVER: executes in a non-transactional manner, throwing an exception if a transaction currently exists.

7.PROPAGATION_NESTED: if a transaction currently exists, it is executed within a nested transaction. If there is no transaction currently, perform an operation similar to PROPAGATION_REQUIRED.

5. Nested transaction

What is a nested transaction?

The nested child transaction is executed in the parent transaction, and the child transaction is part of the parent transaction. Before entering the child transaction, the parent transaction establishes a rollback point called save point, and then executes the child transaction. The execution of this child transaction is also part of the parent transaction. Then the child transaction execution ends, and the parent transaction continues to execute. The point is that save point. Just look at a few questions and make it clear:

What happens if the subtransaction rolls back?

The parent service rolls back to the save point established before entering the child transaction, and then tries other transactions or other business logic. The operation before the parent transaction will not be affected, let alone automatically rolled back.

What happens if the parent transaction rolls back?

If the parent transaction rolls back, so will the child transaction! Why? because the child transaction is not committed until the parent transaction ends, which is why we say that the child transaction is part of the parent transaction. So:

What about the commit of the transaction?

Is the parent transaction committed first, and then the child transaction commit, or the child transaction commit first, and the parent transaction commit again? The answer is the second case, again, where the child transaction is part of the parent transaction and is uniformly committed by the parent transaction.

Reference article: https://blog.csdn.net/liangxw1/article/details/51197560

4. What are the differences among the three common storage engines (InnoDB, MyISAM, MEMORY) in 1.MySQL?

The general differences between the two storage engines are as follows:

1.InnoDB supports transactions, but MyISAM does not

1.INNODB supports some advanced features of relational databases, such as transaction capabilities and row-level locks, which MyISAM does not support.

2.MyISAM has better performance and takes up less storage space, so the choice of storage engine depends on the specific application.

If your application must use transactions, there is no doubt that you should choose the INNODB engine. Note, however, that INNODB's row-level locks are conditional. When the primary key is not used in the where condition, the whole table is still locked. Delete statements such as DELETE FROM mytable.

If your application requires high query performance, you should use MyISAM. MyISAM indexes are separate from data, and their indexes are compressed, making better use of memory. Therefore, its query performance is obviously better than that of INNODB. The compressed index can also save some disk space. MyISAM has the function of full-text indexing, which can greatly optimize the efficiency of LIKE queries.

Some people say that MyISAM can only be used for small applications, but this is just a prejudice. If the amount of data is large, this needs to be solved by upgrading the architecture, such as sub-table and sub-database, rather than relying solely on the storage engine.

Now generally choose innodb, mainly MyISAM full table lock, read-write serial problems, parallel efficiency lock table, low efficiency, MyISAM for read-write-intensive applications generally will not choose.

MEMORY storage engine

MEMORY is a special storage engine in MySQL. It uses the contents stored in memory to create the table, and the data is all in memory. These features are very different from the previous two.

Each table based on the MEMORY storage engine actually corresponds to a disk file. The file name of the file is the same as the table name, and the type is frm. Only the structure of the table is stored in this file. The data files are stored in memory, which is conducive to the rapid processing of data and improve the efficiency of the whole table. It is worth noting that the server needs to have enough memory to maintain the use of the tables of the MEMORY storage engine. If you don't need it, you can free up memory or even delete unwanted tables.

MEMORY uses a hash index by default. It is faster than using a B-tree index. Of course, if you want to use a B-tree index, you can specify it when you create the index.

Note that MEMORY is rarely used because it stores data in memory, and if an exception occurs in memory, it will affect the data. If you restart or shut down, all data will disappear. Therefore, MEMORY-based tables have a short life cycle and are generally one-time.

What are the applicable scenarios of 3.MySQL 's MyISAM and InnoDB storage engines at the transaction and lock level?

Transaction processing aspect

MyISAM: the emphasis is on performance. Each query is atomic and executes several times faster than the InnoDB type, but does not provide transaction support.

InnoDB: provides advanced database functions such as transaction support transactions and foreign keys. Transaction security (transaction-safe (ACID compliant)) table with transaction (commit), rollback (rollback), and crash repair capability (crash recovery capabilities).

Lock level

MyISAM: only table-level locks are supported. When users manipulate MyISAM tables, the select,update,delete,insert statement automatically locks the table. If the locked table meets insert concurrency, you can insert new data at the end of the table.

InnoDB: support for transaction and row-level locks is the most important feature of innodb. Row lock greatly improves the new energy of multi-user concurrent operation. But the row lock of InnoDB, only the primary key of WHERE is valid, the WHERE of non-primary key will lock the whole table.

Other references for the storage engines MyISAM and InnoDB are as follows:

Detailed explanation of the difference between MyISAM and InnoDB in MySQL Storage engine

Summary carding of MyISAM and Innodb of MySQL Storage engine

V. Optimization

1. The order in which different elements of the query statement (where, jion, limit, group by, having, etc.) are executed?

1. The keywords used in the query mainly contain six keywords, and their order is as follows

The table associations after 2.from are parsed from right to left, while the parsing order of where conditions is bottom-up.

In other words, when writing SQL statements, try to associate tables with small amounts of data on the far right (using small tables to match large tables), and put the conditions that can filter out small amounts of data on the far left of the where statement (use small tables to match large tables).

Other reference resources:

Http://www.cnblogs.com/huminxxl/p/3149097.html

two。 Use explain to optimize sql and indexes?

For complex and inefficient sql statements, we usually use explain sql to analyze sql statements, which can print out the execution of the statement. This is convenient for us to analyze and optimize.

Table: shows which table the data in this row is about

Type: this is an important column that shows what type of connection is used. The best to worst connection types are const, eq_reg, ref, range, index, and ALL

All:full table scan; MySQL will traverse the entire table to find matching rows

Index:

Slow_query_log slow query enabled status.

The location where the slow_query_log_file slow query log is stored (this directory requires the writable permission of the running account of MySQL, which is generally set to the data storage directory of MySQL).

How many seconds before the long_query_time query is recorded.

VI. Database lock

What are the locks in 1.mysql, deadlock determination principles and specific scenarios, and how to solve deadlocks?

MySQL has three levels of locks: page level, table level, and row level.

Table-level lock: low overhead, fast locking; no deadlock; large lock granularity, the highest probability of lock conflict and the lowest concurrency.

Row-level locks: expensive and slow to add locks; deadlocks occur; locking granularity is the smallest, the probability of lock conflicts is the lowest, and the degree of concurrency is the highest.

Page lock: the overhead and locking time are between table lock and row lock; deadlocks occur; lock granularity is between table lock and row lock, and the concurrency is average.

Under what circumstances will it cause a deadlock?

What is a deadlock?

Deadlock:

Oracle supports select for update no wait, which means that if you can't get the lock, you will immediately report an error instead of waiting, MySQL will not have the option of no wait.

Another problem with MySQL is that all scanned rows are locked during the execution of the select for update statement, which can easily cause problems. So if you use pessimistic locks in MySQL, it's important to make sure that the index is gone, not a full table scan.

Optimistic lock (Optimistic Lock):

1. Optimistic locking, also known as optimistic concurrency control, assumes that multi-user concurrent transactions do not affect each other, and that transactions can deal with that part of the data that they affect without producing locks. Before committing the data update, each transaction checks to see if any other transaction has modified the data after the transaction has read the data. If other transactions are updated, the transaction currently being committed will be rolled back.

2. Optimistic lock is characterized by carrying out business operations first and not getting the lock unless it is a last resort. * * that is, "optimistic" thinks that most of the locks will be successful, so you just need to take the locks at the last step when you actually update the data after the business operation.

The implementation of optimistic lock on the database is completely logical and does not require special support from the database.

3. The general practice is to add a version number, or timestamp, to the data that needs to be locked.

Examples of implementation are as follows:

Optimistic lock (add a version number field to the table)

SELECT data AS old_data, version AS old_version FROM...

Perform business operations according to the obtained data, and get new_data and new_version

UPDATE SET data = new_data, version = new_version WHERE version = old_version

If (updated row > 0) {

/ / the optimistic lock was acquired successfully and the operation was completed

} else {

/ / failed to acquire optimistic lock, roll back and try again

}

Note:

Optimistic lock is less expensive than pessimistic lock when there is no lock failure, but the rollback cost is higher in the event of failure, so it is suitable for scenarios with low probability of lock failure and can improve system concurrency performance.

Optimistic locks are also suitable for some special scenarios, such as where pessimistic locks are not applicable, such as not being able to keep a connection with the database during business operations.

Summary:

Pessimistic lock and optimistic lock are two methods used by database to ensure the security of data concurrency and prevent the loss of updates, as an example in select. Adding a transaction before the for update can prevent updates from being lost. Pessimistic lock and optimistic lock are not different in most scenarios, but there are some differences in some unique scenarios. Generally, we can judge from the following aspects.

Response speed:

Collision frequency:

Retry cost: if the retry cost is high, a pessimistic lock is recommended.

VII. Other

1. Master-slave replication of database

There are several ways of master-slave replication:

Synchronous replication:

The so-called synchronous replication means that changes in the master must wait for the slave-1,slave-2,...,slave-n to complete before returning. This is obviously not desirable, nor is it the default setting for MySQL replication. For example, on the WEB front-end page, the user added a record and had to wait a long time.

Asynchronous replication:

Just like an AJAX request. Master only needs to complete its own database operations. As for whether slaves receives binary logs and completes the operation, don't worry about the default settings of MySQL.

Semi-synchronous replication:

Master guarantees the success of only one operation in the slaves and returns, regardless of the other slave. This feature was introduced by google for MySQL.

two。 7 problems of database master-slave replication analysis?

Question 1:master write operation, slaves passively do the same operation, maintain data consistency, then can slave actively write operation?

Suppose that slave can write actively, and slave cannot notify master, which leads to data inconsistency between master and slave. Therefore, slave should not write, at least databases on slave that involve replication should not be written. In fact, the concept of separation of reading and writing has been revealed here.

Question2: there can be N slave in master-slave replication, but these slave cannot write. What do you want them to do?

Implement data backup:

Similar to a high-availability feature, once master is down, it can be topped by slave, while slave is upgraded to master.

Remote disaster recovery: for example, if master is in Beijing and the earthquake is dead, then slave in Shanghai can continue.

It is mainly used to implement scale out, to share the load, and to distribute the reading tasks to the slaves.

[it is likely that a system has far more read operations than write operations, so write operations are sent to master and read operations are sent to slaves]

Question 3: there is master,slave1,slave2,... in master-slave replication Wait, there are so many MySQL databases, for example, which database should a JAVA WEB application connect to?

We can do this in the application, insert/delete/update these operations to update the database, using connection (for master) to operate.

Select operates with connection (for slaves). Then our application also needs to complete how to select one from slaves to execute select, such as using a simple round robin algorithm.

In this way, the application completes the routing of SQL statements and is very related to the master-slave replication architecture of MySQL. Once the master is down and some slave is dead, then the application will be modified. Is it possible that the application doesn't have much to do with MySQL's master-slave replication architecture?

Find a component, application program only needs to deal with it, use it to complete the MySQL proxy, to achieve the routing of SQL statements.

MySQL proxy is not responsible, how to choose one from many slaves? It can be done by handing it over to another component, such as haproxy.

This is the so-called read-write separation of MySQL READ WRITE SPLITE,MySQL.

Question4: what if some of MySQL proxy, direct, master fail?

The president usually gets a vice president in case something happens. Similarly, you can back up these key nodes.

Question 5: every time an event is generated in the binary log of master, it needs to be sent to slave. If we have N slave, will it be sent N times or only once? If you send it only once and send it to slave-1, then slave-2,slave-3,... What about them?

Obviously, it should be sent N times. In fact, within MySQL master, N threads are maintained, each of which is responsible for sending binary log files to the corresponding slave. Master is responsible for both writing and maintaining N threads, which can be a heavy burden. We can do this. Slave-1 is the slave of master, and slave-1 is slave-2,slave-3,.... While slave-1 is no longer responsible for select. Slave-1 shifts the burden of master's replication thread onto itself. This is the so-called concept of multilevel replication.

Question 6: when a select is sent to MySQL proxy, it may be responded by slave-2 this time and slave-3 next time, so that the query cache cannot be utilized.

You should find a shared cache, such as memcache, to solve it. Slave-2,slave-3,... The results of these queries are cached in mamcache.

Question 7: with the growth of applications and a lot of read operations, we can extend slave, but what if master can no longer satisfy write operations?

Scale on? A better server? There is no best, only better. It's too expensive.

Scale out? The master-slave replication architecture can no longer be satisfied.

Can be divided into libraries [vertical split], sub-table [horizontal split].

3.mysql high concurrency environment solution?

MySQL high concurrency environment solution: sub-library and sub-table distributed to increase the secondary cache.

Demand analysis: Internet units read and write a large number of data every day, with high concurrency.

The existing solution is horizontal sub-database and sub-table, which is distributed from single point to multi-point database, so as to reduce the pressure of single-point database.

Cluster solution: solves the problem that a single point of DB cannot be accessed due to DB downtime.

Read-write separation strategy: greatly improves the speed and concurrency of Read data in applications. Unable to resolve high write pressure.

4. The recovery mechanism of transactions in the event of a database crash (REDO log and UNDO log)?

Reprint: MySQL REDO log and UNDO log

Undo Log:

Undo Log is to realize the atomicity of transactions. In the MySQL database InnoDB storage engine, Undo Log is also used to implement multi-version concurrency control (MVCC for short).

Atomicity of transactions (Atomicity) all operations in a transaction are either completed or nothing is done, and only part of the operation cannot be done. If an error occurs during execution, Rollback to the state before the start of the transaction as if the transaction had never been executed.

The principle of UndoLog is simple: in order to satisfy the atomicity of the transaction, before manipulating any data, first backup the data to a place (the place where the backup of the data is called UndoLog). Then modify the data. If an error occurs or the user executes the ROLLBACK statement, the system can use the backup in Undo Log to restore the data to the state it was before the transaction began.

Atomicity and persistence can be guaranteed at the same time because of the following characteristics:

Record the Undo log before updating the data.

To ensure persistence, the data must be written to disk before the transaction commits. As long as the transaction is successfully committed, the data must have been persisted.

Undo log must be persisted to disk before data is persisted. If the system crashes between GMAH, the undo log is complete and can be used to roll back transactions.

If the system crashes between Amurf because the data is not persisted to disk. So the data on disk remains the same as it was before the transaction started.

Flaw: data and Undo Log are written to disk before each transaction is committed, which results in a large amount of disk IO, so performance is very low.

If you can cache data for a period of time, you can reduce IO and improve performance. But this loses the durability of the transaction. Therefore, another mechanism is introduced to achieve persistence, namely Redo Log.

Redo Log:

The principle is contrary to Undo Log, Redo Log records the backup of new data. Just persist the Redo Log before the transaction is committed, and there is no need to persist the data. When the system crashes, the data is not persisted, but the Redo Log is persisted. The system can restore all data to the latest state according to the content of Redo Log.

VIII. Reference materials for sorting out the database of the head of java.

20 databases often meet with questions to explain-Penglei-Open Source China "

34 databases often meet to explain the examination questions.

Talking about the blog Park of Database Index knowledge Base

Mysql | Fuzzy query using wildcards (like,%,_)

Detailed explanation of the difference between MyISAM and InnoDB in MySQL Storage engine

Summary carding of MyISAM and Innodb of MySQL Storage engine

Https://blog.csdn.net/liangxw1/article/details/51197560

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.