In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the index optimization problems that 90% programmers will encounter in the interview in the database. The article is very detailed and has a certain reference value. Interested friends must finish reading it!
With regard to the index, it is divided into the following points (technical article):
An overview of the index (what is an index, its advantages and disadvantages)
Basic use of indexes (creating indexes)
The basic principles of the index (interview focus)
Data structure of the index (B-tree, hash)
The principle of creating an index (top priority, interview must be asked! Please collect it! )
How to delete data at the level of one million or more
I. Overview of the index
1) what is an index?
Indexes are special files (indexes on InnoDB data tables are part of the table space) that contain reference pointers to all records in the data table. More generally, an index is the equivalent of a catalog. When you are using Xinhua Dictionary, tear up the catalogue for you, and you can only turn from the first page to the thousandth page of the idiom at the beginning of a word. Tired! If I return the catalog to you, I can locate it quickly!
2) advantages and disadvantages of the index:
It can greatly speed up the retrieval of data, which is the main reason for creating an index. And by using the index, you can use the optimization hider in the process of query to improve the performance of the system. However, the index also has its drawbacks: the index requires additional maintenance costs; because the index file is a separate file, the addition, modification and deletion of the data will result in additional operations on the index file, which will consume additional IO and reduce the efficiency of adding / modifying / deleting.
Second, the basic use of the index (true technical text)
1) create an index: (three ways)
The first way:
The second way: use the ALTER TABLE command to increase the index:
ALTER TABLE is used to create normal, UNIQUE, or PRIMARY KEY indexes.
Where table_name is the name of the table to be indexed, column_list indicates which columns are indexed, and multiple columns are separated by commas.
The index name index_name can be named by itself, and by default, MySQL assigns a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time.
The third way: use the CREATE INDEX command to create
CREATE INDEX can add a normal index or an UNIQUE index to a table. (however, you cannot create a PRIMARY KEY index)
Third, the basic principles of the index (do not want to like other articles as a lot of nonsense)
The index is used to quickly find records with specific values. If there is no index, it is common to traverse the entire table when executing the query.
The principle of indexing is very simple, which is to turn disordered data into ordered queries.
1. Sort the contents of the indexed columns
2. Generate an inverted table for the sorting results
3. Spell the data address chain on the contents of the inverted table
4. when querying, first get the contents of the inverted table, and then take out the data address chain, so as to get the specific data.
4. Data structure of the index (b-tree, hash)
1) B-tree index
Mysql fetches data through the storage engine, and basically 90% of people use InnoDB. According to the way of implementation, there are only two index types of InnoDB: BTREE (B-tree) index and HASH index. B-tree index is the most frequently used index type in Mysql database, and almost all storage engines support BTree index. Generally speaking, index refers to (B-tree) index (actually it is implemented with B + tree, because when viewing table indexes, mysql always prints BTREE, so it is called B-tree index for short).
Query method:
Primary key index area: PI (address of associated saved data) query by primary key
Normal index area: si (the address of the associated id, and then to the address above). So press the main key to query, the fastest
Nature of B+tree:
1.) the node of the n child tree contains n keywords, which are not used to save the data but to save the index of the data.
2.) all the leaf nodes contain information about all keywords and pointers to records containing these keywords, and the leaf nodes themselves are linked sequentially according to the size of the keywords.
3.) all non-terminal nodes can be regarded as index parts, and the nodes contain only the largest (or smallest) keywords in their subtrees.
4.) in the B+ tree, the insertion and deletion of data objects take place only on the leaf node.
5.) the B+ tree has two head pointers, one is the root node of the tree, and the other is the leaf node of the minimum key.
2) Hashi indexing (good technical text)
In a nutshell, similar to the HASH table (hash table) which is simply implemented in the data structure, when we use the hash index in mysql, we mainly use the Hash algorithm (common Hash algorithms include direct addressing method, square middle method, folding method, divisor remainder method, random number method) to convert the database field data into a fixed-length Hash value, which is stored in the corresponding position of the Hash table together with the row pointer of this data. If a Hash collision occurs (two different keywords have the same hash value), it is stored as a linked list under the corresponding Hash key. Of course, this is only a brief simulation.
Ps: with regard to data structure, friends who are interested in going deep can follow me and check out the [data structure] topic. I won't explain it in detail here.
Fifth, the principle of creating an index (top priority)
Although the index is good, it is not unlimited use, it is best to comply with the following principles
1) the leftmost prefix matching principle, the principle of combinatorial index is very important, mysql will match to the right until it encounters a range query (>, 3 and d = 4 if you build an index in the order of (a) and d = 4), d does not need an index, and if you build an index (a), you can use it, and the order of a book can be adjusted at will.
2) create indexes only for fields that are used as query criteria more frequently.
3) frequently updated fields are not suitable for creating indexes.
4) if the columns that can not effectively distinguish the data are not suitable for index columns (for example, gender, men and women are unknown, there are at most three, the degree of distinction is too low)
5) expand the index as much as possible, do not create a new index. For example, if you already have an index of an in the table, and now you want to add the index of (a), you only need to modify the original index.
6) data columns with foreign keys must be indexed.
7) for columns that are rarely involved in a query, do not index columns with more repetitive values.
8) do not index columns of data types defined as text, image, and bit.
How to delete data at the level of one million or more (good technical article)
On the index: because the index requires additional maintenance costs, because the index file is a separate file, so when we add, modify, delete the data, there will be additional operations on the index file, which will consume additional IO, which will reduce the efficiency of adding / modifying / deleting. So, when we delete millions of data in the database, consult the MySQL official manual to know that the speed of deleting data is proportional to the number of indexes created.
So when we want to delete millions of data, we can delete the index first (this time takes more than three minutes).
Then delete the useless data (this process takes less than two minutes)
Re-create the index after the deletion (there is less data at this time) and create the index very quickly, about ten minutes.
It is definitely much faster than the previous direct deletion, not to mention that if the deletion is interrupted, all deletions will be rolled back. That's even more cheating.
Commonly used database index optimization statements
Use the following table TB _ test as an example to illustrate:
Create table tb_test (id int not null,age int not null, name varchar (30) not null,addr varchar (50) not null); create unique index idx1_tb_test on tb_test (id); create index idx2_tb_test on tb_test (name); create index idx3_tb_test on tb_test (addr)
Index optimization recommendations
1. Calculate the index column
For example, we want to find age and name in data records with id greater than 100 in table tb_test.
The correct SQL statement is:
Select age,name from tb_test where id > 1100
The deprecated SQL statements are:
Select age,name from tb_test where id/100 > 1
two。 Splicing index columns
For example, we want to find the id and age in the records in table tb_test where name is "zhou" and addr is "CQ".
The correct SQL statement is:
Select id,age from tb_test where name='zhou' and addr='CQ'
The deprecated SQL statements are:
Select id,age from tb_test where concat (name,'', addr) = 'zhou CQ'
3. Use of is null or is not null on index columns
For example, we want to find the age in the records in the table tb_test where id is greater than or equal to "0".
The correct SQL statement is:
Select age from tb_test where id > = 0
The deprecated SQL statements are:
Select age from tb_test where id is not null
4. The use of or on index columns
For example, we want to find the age and name in the records in the table tb_test where id equals 101or102.
The correct SQL statement (using union) is:
Select age,name from tb_test where id = 101union select age,name from tb_test where id = 102,
The deprecated SQL statements (using or) are:
Select age,name from tb_test where id = 101or id = 102,
5. Avoid using wildcards for the first character of the index column in like whenever possible
For example, we want to find the id and age in the records where the name matches "zho" in the table tb_test.
The correct SQL statement is:
Select id,age from tb_test where name like 'zho%'
The deprecated SQL statements are:
Select id,age from tb_test where name like'% ho%'
6. The use of composite indexes
If the index we build is a composite index, the first field in the index must be used as a condition to ensure that the system uses the index.
For example, we created the following index on the table tb_test:
Create index idx4_tb_test on tb_test (id,name,addr)
The above index idx4_tb_test is equivalent to the establishment of three indexes: index (id), index (id,name) and index (id,name,addr). The index is not used when name or addr is used alone in the where condition of the SQL statement, but only when id is used.
In the SQL statements we wrote, incorrect use of index columns may result in indexes not being used, and full table scans greatly degrade database performance. Therefore, it is necessary to learn how to use the correct index.
The above is all the contents of the article "what are the index optimization problems encountered by 90% programmers in the database interview?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.