In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article is about how MySQL indexes string fields. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Suppose you now maintain a system that supports mailbox login, and the user table is defined like this:
Create table SUser (ID bigint unsigned primary key, email varchar (64),...) engine=innodb
Because you are going to log in using a mailbox, a statement similar to this must appear in the business code:
Select f1, f2 from SUser where email='xxx'
If there is no index on the email field, then this statement can only do a full table scan.
1) can I build an index on the email address field?
MySQL supports prefix indexing. You can define part of a string as an index.
2) what happens if the statement that creates the index does not specify a prefix length?
The index will contain the entire string
3) can you give me an example?
Alter table SUser add index index1 (email); or alter table SUser add index index2 (email (6))
The index1 index contains the entire string of each record
In the index2 index, only the first 6 bytes are fetched for each record
4) what is the difference between these two different definitions in terms of data structure and storage?
It is obvious that the index structure of email (6) will take up less space.
5) what are the shortcomings of email (6) this index structure?
May increase the number of additional record scans
6) how is the following statement executed under the two index definitions?
Select id,name,email from SUser where email='zhangssxyz@xxx.com'
Index1 (that is, the index structure of the entire email string), execution order
Find the record that satisfies the index value of 'zhangssxyz@xxx.com' from the index1 index tree, and get the value of ID2
Go back to the table and find the row whose primary key value is ID2, determine that the value of email is correct, and add this row record to the result set
Continue to the next record in the index index tree and find that the email='zhangssxyz@xxx.com' condition is no longer met and the loop ends.
In this process, you only need to go back to the primary key to retrieve the data once, so the system thinks that only one row has been scanned.
Index2 (i.e. email (6) index structure), execution order
Find a record from the index2 index tree that satisfies that the index value is' zhangs', and the first one found is ID1
Check the row on the primary key that the primary key value is ID1, and determine that the value of email is not 'zhangssxyz@xxx.com','.
Take the next record of the location just found on the index2, and find that it is still 'zhangs',' to take out the ID2, then fetch the whole row from the ID index, and then judge, this time the value is right, and add this row of records to the result set
Repeat the previous step until the loop ends when the value taken on idxe2 is not 'zhangs'.
In this process, the primary key is returned to retrieve the data four times, that is, four lines are scanned.
7) what conclusion can be drawn from the above comparison?
The use of prefix indexes may cause query statements to read data more times.
8) is the prefix index really useless?
If the index2 we define is not email (6) but email (7), then there is only one record that satisfies the prefix 'zhangss'. ID2 is found directly, and only one line is scanned.
9) what are the considerations for using prefix indexing?
Reasonable choice of length
10) when I want to create a prefix index on a string, how do I know how long the prefix index should be?
Count how many different values are on the index to determine how long the prefix is to be used.
11) how to count how many different values are on the index?
Select count (distinct email) as L from SUser
12) what should I do next after getting how many different values are corresponding to the index?
Choose prefixes of different lengths in turn to see this value.
Select count (distinct left (email,4)) as L4, count (distinct left (email,5)) as L5, count (distinct left (email,6)) as L6, count (distinct left (email,7)) as L7, from SUser
Then, in L4~L7, find the first value that is not less than 95% L * 95%, indicating that more than 95% of the data can be found through this index.
13) what is the impact of the prefix index on the overlay index?
The following SQL statement:
Select id,email from SUser where email='zhangssxyz@xxx.com'
The same as the SQL statement in the previous example
Select id,name,email from SUser where email='zhangssxyz@xxx.com'
By contrast, the first statement requires only the id and email fields to be returned.
If you use index1 (that is, the index structure of the entire email string), you can get ID if you look up email, then you don't have to go back to the table, this is the override index.
With index2 (that is, the email (6) index structure), you have to go back to the ID index to determine the value of the email field.
14) Why don't I just change the definition of index2 to the prefix index of email (18)?
This 18 is defined by you, the system does not know whether the length of 18 is greater than my email length, so it will go back to the table to check and verify.
All in all: using a prefix index eliminates the need to override the query performance optimization of the index
15) for fields such as mailboxes, using prefix indexes may work well. However, when we encounter the situation that the differentiation of the prefix such as ID card is not good enough, what should we do?
The index is selected longer.
But the longer it is, the more disk space it takes, and the less index values can be put down on the same page, which will affect the query efficiency.
16) if we can determine that among the business requirements, there is only the need for an equivalent query according to the ID card, is there any other way to deal with it?
Since it is much the same, I'll save it upside down. Check like this when querying.
Select field_list from t where id_card = reverse ('input_id_card_string')
Use the count (distinct) method to do a verification when using it
Use the hash field. Create an integer field on the table to hold the check code of the ID card, and create an index on this field.
Alter table t add id_card_crc int unsigned, add index (id_card_crc)
Each time you insert a new record, you use the function crc32 () to get the check code and fill in the new field. Because the check code may conflict, that is to say, the result obtained by two different ID numbers through the crc32 () function may be the same, so the where part of your query statement should determine whether the value of id_card is exactly the same.
Select field_list from t where id_card_crc=crc32 ('input_id_card_string') and id_card='input_id_card_string'
In this way, the length of the index becomes 4 bytes (int type), which is much smaller than the original.
17) what are the similarities and differences between using reverse order storage and using hash fields?
Same point: range query is not supported.
The indexes created on the fields stored in reverse order are sorted in reverse order, so there is no way to find out all citizens whose ID numbers are in [ID_X, ID_Y] by indexing. Similarly, the way the hash field works can only support equivalent queries.
Difference
In terms of the extra space consumed, the reverse storage method does not consume additional storage space on the primary key index, while the hash field method requires an additional field. Of course, using a prefix length of 4 bytes in reverse storage should not be enough, and if it were a little longer, the consumption would be more or less offset by the extra hash field.
In terms of CPU consumption, the reverse order requires an additional call to the reverse function each time it is written and read, while the hash field requires an additional call to the crc32 () function. If you only look at the computational complexity of these two functions, the additional CPU resources consumed by the reverse function will be less.
In terms of query efficiency, the query performance using hash fields is relatively stable. Because the value calculated by crc32 has the probability of conflict, but the probability is very small, it can be considered that the average number of rows scanned per query is close to 1. After all, reverse storage is still the way of prefix indexing, that is to say, it will still increase the number of rows scanned.
Example: if you are maintaining a school's student information database, the unified format of the student login name is "student number @ gmail.com", and the rule of the student number is: fifteen digits, in which the first three digits are the city numbers, the fourth to the sixth digits are the school numbers, the seventh to the tenth digits are the year of admission, and the last five digits are sequential numbers.
18) when logging in, students are required to enter their login name and password and verify that they are correct before they can continue to use the system. If you only consider the behavior of login authentication, how would you design the index of the login name?
A school estimates that 20, 000 new students per year, 50 years only 1 million records, how much space can be saved, direct full-field index. Save the risk of development conversion and limitations, encounter a large number of forced to re-use the latter two methods
The actual operation of the direct full-field index on the line, the amount of data and query pressure of a school database will not be so great. If only from the point of view of optimizing the data table:\ 1. The suffix @ gmail can be stored in a single field or guaranteed by the business code,\ 2. It is estimated that the city number and school number will not change, and you can also configure\ 3 with the business code. Then just save the year and sequence number directly, and this field can be indexed in full field.
Thank you for reading! This is the end of the article on "how to index string fields by MySQL". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.