In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "how to improve the performance of the back-end interface". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor learn how to improve the performance of the back-end interface.
1. What is the experience of slow MySQL query?
Thank you for the invitation. Interests are related.
Most Internet application scenarios read more and write less, and business logic is more distributed on writing. The requirement for reading is probably to be fast. So what causes us to complete an excellent slow query?
1.1 Index
When the amount of data is not very large, most slow queries can be solved by indexes, and most slow queries are also caused by unreasonable indexes.
The MySQL index is based on the B+ tree, which is believed to have been memorized in the interview, and then you can ask the leftmost prefix index, B+ tree and various trees.
When it comes to the leftmost prefix, it is actually the rules for the use of composite indexes. The use of reasonable composite indexes can effectively improve the query speed. Why?
Because the index is pushed down. If the query condition is included in the composite index, for example, there is a composite index (aforme b), after querying the record that satisfies a, it will directly judge whether b is satisfied inside the index and reduce the number of table returns. At the same time, if the column of the query happens to be included in the composite index, the index is overwritten and there is no need to return to the table. Index rules are probably known and will be created and used in actual development. The question may be more: why is it so slow to build an index?
1.1.1 what causes the index to fail
Indexing is still slow, mostly due to index failure (unused), available for explain analysis. The common reasons for index failure are:
Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community
Use! = or expression or function in where (left)
Start with like statement%
The string is not added''
Index field differentiation is too low, such as gender
The leftmost prefix is not matched
Mature MySQL also has its own ideas about why these practices can lead to failure.
1.1.2 Why do these reasons cause the index to fail
If you want MySQL to give a reason, it's the same B+ tree.
Function operation
When you use an expression or function on the left side of the query where =, such as field An is string and indexed, and there is a where length (a) = 6 query, then passing an index tree from 6 to A, it is not hard to imagine getting lost on the first level of the tree.
Implicit conversion
Implicit type conversions and implicit character encoding conversions can also cause this problem.
Implicit type conversions generally do not occur for frameworks such as JOOQ.
Implicit character encoding conversion may occur in a connected table query, that is, the connected table fields have the same type but different character encodings.
Destroy the order.
However, for fields that are too poorly differentiated, such as gender, the index is not invalidated for this reason.
1.1.3 Why not indexing gender fields
Why fields with low index differentiation are not indexed. Blind guessing is inefficient, indeed inefficient, sometimes even equal to no addition.
For non-clustered indexes, the table is returned. If you have 100 pieces of data, index in the sex field, scan 51 male, and then go back to the table to scan 51 rows. You might as well just have a full table scan.
Therefore, the InnoDB engine will abandon the use of indexes for this scenario. As for how low the differentiation is, roughly, the indexes of this field will be abandoned when a certain type of data accounts for about 30% of the total. If you are interested, you can give it a try.
1.1.4 is there any useful and simple indexing method
As mentioned earlier, most slow queries are derived from indexes, how to build and use indexes well. Here are some simple rules.
Index push-down: the gender field is not suitable for indexing, but what if there is a query scenario? If it is a multi-conditional query, you can establish a federated index to take advantage of this feature optimization.
Override index: also a federated index, the information needed for the query is already contained in the index and will not return to the table.
Prefix indexing: for strings, you can only add indexes in the first N bits to avoid unnecessary overhead. If you do need a keyword query, it may be better to give it to a more appropriate one such as ES.
Do not do function operations on index fields
The maintenance cost of the index should be considered for determined tables that write more than read, or fields that are updated frequently.
1.1.5 how to evaluate that MySQL chose the wrong index
Sometimes, an index that looks right at first glance is built, but things don't go according to plan. It's like "Why XXX has an index, query or slow query according to it."
Maybe you should be confident at the moment: there can't be BUG in my code. There must be something wrong with MySQL. There may be something wrong with MySQL.
This is common when you build a lot of indexes and query conditions. Instead of using the one you want it to use, you choose a low resolution one, which leads to too many scans. There are basically two reasons:
Information statistics are not accurate: analyze table x can be used to reanalyze.
Optimizer misjudgment: can be mandatory specified by force index. Or modify the statement to guide the optimizer, add or remove index bypass.
But according to my shallow experience, it is more likely that you have built some unnecessary indexes. No one really thinks MySQL is not as smart as he is, do you?
In addition to the above indexing reasons, there are some unusual or difficult reasons to judge.
1.2and other MDL locks
MDL is introduced in MySQL version 5.5. when a table is CRUD, the MDL read lock is added automatically, and the MDL write lock is added when the table structure is changed. Read-write locks and write locks are mutually exclusive.
When a statement takes a MDL write lock, it blocks the MDL read lock. You can use the show processlist command to view the statements in the Waiting for table metadata lock state.
1.3Isoflush
Flush is fast, mostly because the flush command is blocked by other statements, and it blocks select. When you look at it through the show processlist command, you will find that it is in the Waiting for table flush state.
1.4 Equal row lock
Something holds a write lock that has not been submitted.
1.5 current read
The default level of InnoDB is repeatable. Imagine a scenario where transaction A starts a transaction and transaction B begins to perform a large number of updates. B commits first, and An is the current read, and undo log is executed in turn until the value before transaction B is found.
1.6 large table scene
In the unredeveloped MYSQL, hundreds of millions of tables must be large tables. Even if this situation is well implemented at the index and query level, there may be IO or CPU bottlenecks in the face of frequent aggregation operations, and the efficiency will decline even if it is a simple query.
And the storage capacity of each B+ tree node in Innodb is 16 KB, which can theoretically store about 2kw rows, and the tree height is 3 layers. We know that innodb_buffer_pool is used to cache tables and indexes, and if the index data is large, the cache hit rate is worrying. At the same time, innodb_buffer_pool uses LRU algorithm to eliminate pages. If the amount of data is too large, queries for old or non-hot data may crowd out hot data.
Therefore, the common optimization for large tables is the separation of database and table and the separation of read and write.
1.6.1 Sub-library and sub-table
Scheme
Is it a sub-library or a sub-table? This requires a specific analysis.
If there is an IO bottleneck on the disk or network, split the library and table vertically.
If it is the bottleneck of CPU, that is, the query efficiency is low and the table is divided horizontally.
Horizontal is to split the data and disperse the original data into more database tables.
Vertically, that is, the database is divided by business, and the table is segmented by field.
Tools include sharding-sphere, TDDL, and Mycat. To get started, you need to evaluate the sub-database and the number of tables, formulate sharding rules and select key, then develop and migrate data, and consider the problem of capacity expansion.
problem
In the actual operation, the writing problem is not big, but the main problems are unique ID generation, non-partition key query and capacity expansion.
There are many unique ID methods, such as DB self-increment, Snowflake, number segment, a large wave of GUID algorithm and so on.
Non-partition key queries are often solved by mapping, and it is fast to use overlay indexes for mapping tables. Or it can be combined with other DB.
The expansion should be determined according to the strategy of sharding. It is very simple for the range slicing, and the data will be migrated when the module is randomly selected. The mode of range + module can also be used to slice, and the mode is taken first and then the range is taken, which can avoid a certain degree of data migration.
Of course, if the sub-library will also face problems such as transaction consistency and cross-library join.
1.6.2 read-write separation
Why should read and write be separated?
The sub-table solves the CPU bottleneck for the large table, and the sub-library solves the IO bottleneck, both of which solve the storage pressure. But the inquiry is not certain.
If the QPS that falls to the DB is still very high, and the read is much larger than the write, you can consider the separation of read and write, and share the read pressure based on the master-slave mode to avoid high stand-alone load. At the same time, it ensures high availability and achieves load balancing.
problem
The main problems are overdue reading and allocation mechanism.
Expired reading, that is, the problem of master-slave delay, this is for.
The allocation mechanism is whether it is a master or a slave. You can switch or use middleware directly according to the statement type in the code.
1.7 Summary
The above enumerates the common reasons and processing methods of slow query in MySQL, and introduces the common methods to deal with big data scenarios.
Sub-library sub-table and read-write separation are aimed at big data or concurrent scenarios, but also to improve the stability and expansibility of the system. But not all problems are best solved in this way.
two。 How to evaluate ElasticSearch
It was mentioned earlier that ES can be used for keyword queries. Then let's talk about ES.
2.1 what can I do?
ES is a near real-time distributed search engine based on Lucene. The usage scenarios include full-text search, NoSQL Json document database, monitoring log, data collection and analysis, etc.
For non-data development, full-text search and logging should be commonly used. In the use of ES, often combined with Logstash, Kibana, also become ELK. Let's take a look at how the journal works.
The following is a retrieval operation of our log system: open Kibana and enter a format such as "xxx" query on the Discover page.
This operation can be replaced in the console of Dev Tools with:
GET yourIndex/_search {"from": 0, "size": 10, "query": {"match_phrase": {"log": "xxx"}
What does it mean?
2.2 structure of ES
Type has been abandoned since 7. 0, so think of index as table for the time being.
You can view some basic information in the Console of Dev Tools with the following command. You can also replace it with the crul command.
Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community
GET / _ cat/health?v&pretty: check the cluster health status
GET / _ cat/shards?v: view the shard status
GET yourindex/_mapping: index mapping structure
GET yourindex/_settings: index setting structure
GET / _ cat/indices?v: view all index information of the current node
The key point is mapping and setting. Mapping can be understood as the structural definition of the table in MySQL, and setting is responsible for controlling the number of shards and replicas.
The difference between the two is that the text type will do word segmentation, while the keyword type will not.
"*": {"mappings": {"doc": {"properties": {"appname": {"type": "text", "fields": {"keyword": {"type": "keyword" "ignore_above": 256}
2.3 Why are ES queries fast?
What does a participle mean? After reading the indexing principle of ES, you will get.
ES is based on an inverted index. What do you mean? The traditional index is generally based on the document ID as the index and the content as the record.
The following figure is a schematic diagram of the ES inverted index, which consists of Term index,Team Dictionary and Posting List.
Picture
The Ada and Sara in the picture are called term, which are actually the words after participle. If you remove the Term Index from the picture, isn't it a bit like MySQL? Term Dictionary is like a secondary index, but MySQL is stored on disk, and retrieving a term requires several random access disk operations.
On the other hand, ES has a layer of Term Index on the basis of Term Dictionary, which is stored in memory in the form of FST and preserves the prefix of term, so that it can quickly locate the offset of this term of Term dictionary. And the FST form and Term dictionary block storage mode are very memory-saving and disk space-saving.
When you get to this point, you can see why it's fast, because with Term Index in memory, it does another layer of indexing for term's index Term Dictionary.
However, this is not to say that any query in ES is faster than MySQL. Retrieval is roughly divided into two categories.
2.3.1 Retrieval after word segmentation
The index of ES stores the result of participle sorting. For example, the Ada in the figure, in MySQL,% da% scans the whole table, but for ES, it can be quickly located.
2.3.2 accurate retrieval
In fact, the difference is not big, because the advantage of Term Index is gone, but we have to take advantage of it to find a place in term dictionary. Perhaps it will be faster because the MySQL override index does not need to return to the table.
2.4 when to use ES
As mentioned earlier, when is it appropriate to use ES for query scenarios in the business? I think there are two.
2.4.1 full-text search
Fuzzy querying of string types based on keywords in MySQL is a disaster, but it is a piece of cake for ES. Specific scenarios, such as fuzzy query of message content by message table, that is, chat record query.
However, it should be noted that if you need a keyword query similar to the majority of search engines rather than a log phrase matching query, you need to deal with Chinese word segmentation, and ik is the most widely used. The installation of the Ik splitter will not be discussed in detail here.
What do you mean?
Word segmentation
At the beginning of the query for the log, when you type "I am such a smart guy", you will only get an exact match.
And if you get rid of "", you will get according to "I", "can", "true" … . All the information matched by the word segmentation, which obviously returns a lot of information, is also inconsistent with the Chinese semantics. The actual expected segmentation effect is probably "I", "can", "really", "smart", and then match the query according to the result of this participle.
This is caused by the unfriendly support of ES's default word segmentation strategy for Chinese, according to the letters of English words, but there are spaces between English words. This is also one of the reasons why the Chinese search effect of many foreign software is not nice.
For this problem, you can use the following command in console to test the word segmentation effect of the current index.
POST yourindex/_analyze {"field": "yourfield", "text": "I am such a smart guy"}
2.4.2 combined query
If the amount of data is large enough, there are enough table fields. It is unreasonable to create an index by throwing all the field information into ES. If you use MySQL, you can only use the above-mentioned sub-database table, read-write separation. Why don't you combine it.
1. ES + MySQL
Add id to the field information to participate in the query, put it into ES, and do a good job of participle. Put all the information into MySQL and retrieve it quickly through id.
2. ES + HBASE
If you want to save the need for sub-database and sub-table, maybe you can abandon MySQL and choose a distributed database, such as HBASE. For this kind of NOSQL, the storage capacity is massive, the capacity of easy is expanded, and the query according to rowkey is also very fast.
The above ideas are the classic scheme of isolating indexing from data storage.
Of course, the bigger the stall, the more likely it is to have an accident, and there will be more problems. Using ES as the index layer, data synchronization, timing, mapping design, high availability and so on all need to be considered.
After all, compared with a simple log system, the log can wait, but the user cannot.
2.5 Summary
This section gives a brief introduction to why ES is fast and where this fast can be used. Now you can open the Kibana console and try it.
If you want to connect to the Java project, there is a SpringBoot blessing. Under the premise of OK in the ES environment, it is completely out of the box, so you only need to rely on it. Basic CRUD support is fully OK.
3. HBASE
HBASE is mentioned earlier, what is HBASE, in view of the length of this article, we will briefly talk about it.
3.1 Storage structure
Relational databases such as MySQL are line-by-row.
Name Primary School Middle School University Li XX Primary School YY Middle School NULL
The HBASE is by column (it is actually a column family). The table on the column storage becomes:
Name of school Li XX Primary School Li YY Middle School
The following figure shows an actual HBASE table model structure. Picture
Row key is the primary key, sorted in dictionary order. TimeStamp is the version number. Both info and area are column Family, which cuts the table horizontally. Name and age are called columns, which belong to a certain column cluster and can be added dynamically. Cell is a specific Value.
3.2 OLTP and OLAP
Data processing can be divided into two categories: online transaction processing OLTP (on-line transaction processing) and online analytical processing OLAP (On-Line Analytical Processing).
OLTP is the main application of traditional relational database, which is mainly basic and daily transaction processing.
OLAP is the main application of data warehouse system, supporting complex analysis, focusing on decision support, and providing intuitive and easy-to-understand query results.
Column-oriented is suitable for OLAP and row-oriented is suitable for online transaction processing (OLTP). But HBASE is not OLAP, it doesn't have transaction, and it's actually for CF. Generally speaking, not many people use HBASE to do OLAP.
3.3 RowKey
Whether the HBASE watch design is good or not depends on the RowKey design. This is because HBASE only supports three query methods.
1. Rowkey-based single-row query 2, Rowkey-based range scan 3, full table scan
You can see that HBASE does not support complex queries.
3.4 use scenarios
HBASE is not suitable for real-time fast queries. It is more suitable for write-intensive scenarios, it uses fast write capability, and the query for a single or small area query is OK, of course, only according to rowkey. But its performance and reliability are very high, and there is no single point of failure.
At this point, I believe you have a deeper understanding of "how to improve the performance of back-end interfaces". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 223
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.