Optimization method of NoSql Application in ElasticSearch 07/04 Update SLTechnology News&Howtos

Optimization method of NoSql Application in ElasticSearch

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge points of this article "NoSql Application Optimization method in ElasticSearch", so the editor summarizes the following content, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "NoSql application optimization method in ElasticSearch" article.

Turning to NoSql applications, the process of fetching data from search engines is usually as follows:

First, a result set with only id is obtained by searching the word matching inverted table, then the corresponding document field is obtained by id matching forward index, and finally the result is returned. This advantage is:

You can make the inverted index as small as possible to ensure IO performance.

Id is assigned and maintained by the search engine, and does not rely on external mapping, so that the document id and document content can be separated, so that the document content can scale out fields like NoSql.

You can bring the original content of the document along with the search results, and return the information (sorting and content) necessary for the front-end display in a single query, thus eliminating the IO overhead of fetching data from db.

In this way, search engines can indeed take over part of the work of db to a certain extent, and have the ability to do the second db (NoSql).

This time, let's talk briefly about the typical application scenarios of search engines on NoSql:

1. Business wide table

Business wide table should be the most common kind of NoSql application, which is used to associate several business tables stored independently in db into a large intermediate table, so that the complex fetch logic can be simplified to a query, which looks very attractive. So why not just store these business fields as a table in db, roughly for the following reasons:

In the process of the development of a product from small to large, with the split of the business line, the corresponding business db database table is also bound to be split to facilitate development and maintenance (decoupling)

If the table stores a large amount of data, the cost of performing a ddl operation will be high (locking the table), and the new business requirements (new fields) will have to avoid business interruption caused by locking the table by creating a new schedule.

Everything has two sides, disassembling tables solves the above problems, but it also brings new troubles. If a business relies on multiple business tables at the same time, then a data exchange must be accompanied by multiple db operations (complex fetching logic). If you also need to sort a field, you must use join operations (increase db pressure).

In order to simplify the logic or reduce the pressure on db, you can create a new business wide table outside the business table and store it in ES. Even if the amount of data is large (billions), you can still quickly add fields without causing locking table operations, and the characteristics of NoSql are naturally suitable for the scenario of rapid business development.

Tips: the response time of search engines is generally around 0-100ms. ES occasionally has a second-level rt because of gc, so applications need to evaluate their sensitivity to engine response time (rt).

two。 Big data Exchange / Storage

Sometimes the result of offline calculation is very large (for example, a batch of potential customers are calculated according to various consumption rules), and the results are needed for various online query calculations. If it is tens of millions of levels of data, if it is directly imported into db, it may seriously affect the online business, while the traditional big data storage, such as HBase, is not so powerful in the second-level index, and ES can support the rapid import of 10 million levels of offline data. It can also provide online query service after the import is completed, which is relatively suitable for this scenario.

Another typical storage scenario for big data is the Log Storage system (ELK). In general, the log output of online business is amazing, and it is a typical application that writes more and reads less, and requires strong write performance and strong search and matching capabilities. ES is also a more appropriate carrier.

Tips: in this scenario, the application needs to control the write rate to prevent the engine from not responding for a long time due to merge or garbage collection, and to ensure that the cluster is physically isolated from the online business cluster.

3. Enhanced keyword matching

Although db (mysql) also has the ability of full-text indexing, it is not suitable for expensive db resources to be used in full-text search scenarios. If you need to provide full-text search capabilities of millions of data, a few vm will be enough for search engines to run with sufficient performance. In such scenarios, search engines can be used as cheap storage resources with full-text retrieval capabilities.

Tips: when used as a storage resource, it should be noted that search engines provide "near real-time" query services, which are often visible only a few seconds or minutes after data is written. Applications need to evaluate their sensitivity to real-time data. Over-sensitive businesses are not recommended for this scenario.

4. External index

Take HBase as an example, it has cheap and powerful storage capacity of big data, which can automatically split data files and ensure stable read and write performance. However, in order to provide stable online query capability, the rowkey design of HBase is very delicate, and rebuilding rowkey in the case of a large amount of data is a high-cost operation, and the native does not support secondary indexes. at this time, to ensure the flexibility and stability of HBase query, the best way is to establish a secondary index externally, which not only has the powerful retrieval capability of search engine, but also has its own stable storage performance, and even if the external index needs to be rebuilt. You only need to switch query traffic after the new index is created.

Tips: ensuring the consistency of data on both sides is a common problem in this scenario. If there is no perfect double-write mechanism, it is more appropriate to use a reasonable compensation mechanism to ensure it.

5. Online statistics

ES has indeed made a lot of efforts on aggregate query. From version 1.x to version 5.x, the function of aggregate query has been continuously improved. Aggregate query provides statistical query capability to a certain extent, and focuses on log analysis such as ELK, mainly:

Can only provide top n function, not page flipping

If the amount of data is large (100 million), and the data changes frequently, the time consuming of the query is often seconds.

Therefore, if it is a business that is not very sensitive to rt and cannot be solved through db online query, on the premise of clarifying the above defects, you can also use ES to do "online" statistical query. Of course, it is recommended:

Try to reduce the frequency of data updates, frequent updates will lead to frequent reopen index searcher of ES, resulting in increased pressure on io, resulting in query timeout

Try to ensure that the amount of index data is not too large (index split)

The above is about the content of this article on "methods for optimizing NoSql applications in ElasticSearch". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.