What is the experience of using index table and ES 04/27 Update SLTechnology News&Howtos

What is the experience of using index table and ES

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail what you have learned about the use of index tables and ES. The content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

In e-commerce projects, the physical inventory system is an extremely important system, after the order is paid, it will begin to occupy the physical inventory. In general, the inventory system needs to divide the database, because the main operation is write operations, such as occupation / release / cancellation and so on. The use of sub-libraries can reduce the pressure on database writing. Although write operations are dominant, there are also read operations. For example, when the inventory is occupied, you have to query whether there is any inventory, and this query operation does not always have a sub-database factor (used for routing to a specific database), but some relatively relaxed query conditions. The data corresponding to these query conditions may be distributed in different databases. At this time, an index table is built for the convenience of query. This index table is stored in a separate database and will no longer be divided into databases.

For example, when you query a product, you can only search by price, as shown below:

At this time, you have to get data from several DB, and you need to traverse the DB, which is quite troublesome. Of course, you can also index the price to speed up the query, but you should know that when querying, you may also query according to other conditions, such as the status of getting off the shelves, commodity categories, and so on. It is impossible to index these query conditions, which is too expensive and unreasonable. After the introduction of the index table, it is different.

Find the primary key ID from the index table according to the query conditions, and then find the data from multiple databases according to the primary key ID. In this way, no matter what the query conditions are, you only need to find the data according to the primary key ID. Of course, the design of the index table is not just like the above design, there are about three kinds.

1. Query field + database primary key

To put the query field into the index table, you also need to place the corresponding database primary key ID. When the query request comes, the corresponding data primary key is found according to the query conditions, and then routed to the corresponding data sub-database with complete business data according to the data primary key. What about this kind of plan. Index tables take up a small amount of space and can be supported for a long time. However, to find out the business data, it still needs to be routed to the sub-database. In addition, this approach will not work if you want to store the data of the index table on the ES search engine. Because there is no business data required by the external system in the index table. Therefore, the inventory system at that time did not use this kind of index table design.

2. Query field + database primary key + business field to be displayed

What about this kind of plan. When the request comes, query the index table directly. There is no need to route to the branch library according to the primary key. At the same time, if you want to combine ES. You can directly get the data of the index table to the ES. Then you can directly let ES expose the query interface. At present, I use this approach in the physical inventory project that VIPSHOP is involved in. But this plan also has a drawback. Even if the volume of the index table is relatively large, if the subsequent amount of data is large, it is also a problem. Can you optimize it?

3. Index table split

In the second scenario mentioned above, the expansion of the index table may be very fast, and you can consider splitting the index table. For example, an index table only holds the query conditions and primary keys, and the data that needs to be displayed to the external system is stored on a separate table. For example, it is called index_ detail table. This table has the primary key of the index table. In this way, when the query request comes, first query from the index table to the primary key, and then query the data from the index_ detail table according to the primary key. Of course, if you do. The data source for ES becomes multiple tables, but this is acceptable.

How to write business data to an index table using MQ

In general, indexed data is built using a single application. For example, it is called the data-index domain. This domain reads messages from the message queue and is used to build indexed data. When the business data changes, the producer sends a MQ message to the queue.

The message design here is also divided into two situations. One is that the message only has a data primary key and an operation type (ADD/Update/DELETE). Consumers get the primary key and then go to DB to get the complete data and insert it into the index table. Another solution is that the message contains most of the required fields, and the consumer gets the message and inserts the data directly into the index table. I have used both message designs in practical projects.

Direct operation of DB

This scheme is relatively rough, directly configure the data source of an index table database, and use Mybatis or JDBC to update the data to the index table when the business data changes. This is generally not recommended because the logic for building the indexed data is fused with the CRUD operation of the data. Second, manipulate the data of other databases either through the interface or by a separate domain. It is recommended that you use MQ to build indexed data.

How to get index table data to ES to listen for data changes in database tables

For example, on VIPSHOP's side, he developed a component called VDP, which uses storm job to listen for changes in index table data. Once there are changes, the data is synchronized to the queue. ES obtains the data directly from the queue and stores it on ES.

The advantage of this scheme is that we don't need to write any code, and the data can be automatically synchronized to the ES.

Use MQ

If a component such as VDP is not developed within the company, you can synchronize the data of the index table to the ES by sending MQ messages.

Let ES expose the CUD interface

Another option is to have ES expose the CUD interface for synchronizing indexed table data. But this is coupled with ES. This is not recommended.

When the index table is combined with ES, the interaction process takes the following form.

About paging

In the case of database sub-database, if you want to display data in pages, and the amount of database data is particularly large, you can obtain paging data from ES with the help of ES. If there is no ES, only the index table, then directly in the index table to get the paging data corresponding to the ID, and then from the database.

Further thinking

1. ES does not support paging after Group By, so when building an index table, you can calculate some statistics of Group By in advance and store them in the index table.

2. In some background applications, if the number of database tables is already very large, hundreds of millions, and the SQL of the query is so abnormal that it can no longer be supported by the database, then you can use ES, which is fast and supports some statistical operations.

3. There is also a hole when using ES to output data. You always get dirty data. For example, when the data is sent and changed, it takes time to build the index data and synchronize the index data to the ES, but we usually have the operation of removing the data from the shelves. After the operation is finished, click the query button again, and you may still see dirty data, because data synchronization to the ES is not so fast. Now I haven't thought of a good way to solve this problem.

About the use of index tables and ES what is shared here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.