The method of large amount of data Retrieval by ElasticSearch 04/27 Update SLTechnology News&Howtos

The method of large amount of data Retrieval by ElasticSearch

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "the method of ElasticSearch to solve the large amount of data retrieval". The editor shows you the operation process through the actual case, the operation method is simple, fast and practical. I hope that this article "ElasticSearch method to solve the large amount of data retrieval" can help you solve the problem.

Introduction

What should you do if you have tens of millions of levels of data in your project, and the daily increment of the data is large and requires high-performance retrieval, such as order data?

As an interviewer, you need to find someone who can solve this problem! As a candidate, how do you answer the interviewer's question?

You can learn about using the search engine framework, Elasticsearch (ES) is a good open source search engine framework. We can use ES as a "database". Many well-known communities around the world use ES for full-text retrieval, such as Wikipedia, Stack Overflow, Github and so on.

ElasticSearch is open source, based on lucene, is a scalable, distributed, high-availability full-search engine, and can simply index data in JSON format through RESTful.

Try to imagine, if you are doing a knowledge base system, there are a large number of articles in the system, and if you want to retrieve the content of the article through a certain keyword or keyword, if you use MySQL to do this, it cannot be satisfied by like query alone. Full-text search is an index of an article. ES can segment the content according to the meaning of the word, and then create an index separately. For example, "I want to be an aspiring programmer", after the ES participle is: "I", "I want", "motivational", "one", "aspiring", "programmer", no matter which keyword you search, you will find this sentence.

So that you do not need to understand the complex logic behind to complete the search, Elasticsearch is committed to hiding the complexity of distributed systems. The following operations are done automatically at the bottom:

Partition your documents into different containers or shards, which can exist in one or more nodes.

The slicing is evenly distributed to each node, and the index and search are load balanced.

According to the feature that ES supports full-text indexing, we can also do a lot of fuzzy search functions through it. Multi-dimensional aggregation of large amounts of data is also one of the strengths of ES, such as Tmall Mall, searching for goods through keywords, entering iph will automatically load all products related to iphone, which is a typical search engine scenario.

1. Interviewer: I see that you have used ES in your resume. Which scenarios use ES?

Problem analysis: everything with a large amount of data and needs to be retrieved, at this time you can think of ES, the traditional relational database query speed becomes slow, database sub-table federation query speed is slow.

A: there is such a demand scenario in which the operation system needs an order analysis tool. At that time, the total number of order libraries has far exceeded 100 million-level data, with an increment of one million per day.

The initial order query of the system mainly uses MySQL query, but does not use other databases. With the development of business, the system is mainly faced with two challenges:

With the increase of data, the data of single table still increases to the order of tens of millions of data after MySQL sub-database and sub-table, and the query is getting slower and slower.

There are a large number of aggregate operations in the query, such as filtering the total number of abnormal orders, completing the total number of orders, calculating the amount of orders, and so on. MySQL is not good at using sql to do large-scale operations.

In view of the above two problems, I use Elasticsearch to deal with the problem of slow query perfectly. I use ES as the main query data source and MySQL as the downgrade record. If the ES cluster is not available for various reasons, the system will automatically switch the order query data source to the MySQL data source. For the operating system, although the query will slow down, it will not delay the normal use, and the probability of this degradation rarely occurs.

The system architecture diagram looks like this: (try to show the interviewer a clear picture)

Focusing on the red box, I used ES as the preferred order query source and MySQL as the backup data source, with an automatic downgrade switch in the middle.

Interviewer: what is the result of using ES?

A: after using ES, the query speed has of course been greatly improved.

When using MySQL, 99% of the query time is 10 seconds. After the architecture is introduced into ES, the query time is rapidly reduced to the millisecond level.

I also kept the performance monitoring chart in my debriefing report to lay a solid foundation for a promotion and a raise.

The interviewer kept nodding and approved of my operation.

Interviewer: how much do you know about some conceptual names of ES? How do you understand things such as indexes, documents, inverted indexes?

Problem analysis: when some people first came into contact with Elasticsearch, they only used it and only knew that ES was fast and could hold a lot of data, but the interviewer was confused when he asked an inverted index, and he was embarrassed to say that you could use a search engine?

A: let's talk about Index,Document,Type in ES and the corresponding MySQL database.

Index (Index):

The concept of index is equivalent to the concept of database in MySQL. To create an index with ES is to create a database. For example, in an e-commerce system, an index of an order is created, then the customer service system can quickly query all the information of the order through the order index and quickly process customer complaints.

Document (Document):

ES is a document database, the concept of documents is equivalent to the concept of a piece of data in MySQL, many documents (many pieces of data) constitute an index.

Type (Type)

The concept of document is equivalent to the concept of a piece of data in MySQL. A piece of data in MySQL has many fields, such as order number, user's mobile phone number, order amount, etc. The concept of Type is equivalent to aggregating a table according to each field, such as grouping according to order number and mobile phone number. This grouping is called Type, which is a virtual logical grouping used to filter documents. No matter which field is searched, there is a corresponding Type (table).

If you still don't understand, I'll put it in a table for you directly: ES VS Mysql

ElasticSearch relational database: MySQL correspondence: index database correspondence: type type data table correspondence: document row correspondence: field Field column the most important inverted index

Tip: if you've used Elasticsearch and don't know the concept of inverted index, it probably doesn't make sense. Inverted index is also called Inverted Index.

(starting to give the interviewer an example to analyze the inverted index, I've really done my homework.)

There are three paragraphs:

Hello everyone

This article is based on inverted index

Which is hashmap like data structure

After saving with ES, the structure is as follows:

Hello (1,1)

Everyone (1,2)

This (2,1)

Article (2,2)

Is (2,3) (3,2)

Based (2,4)

On (2,5)

Inverted (2,6)

Index (2,7)

Which (3,1)

Hashmap (3,3)

Like (3,4)

Data (3,5)

Structure (3,6)

Hello appears in the first sentence and the first word, so it is (1, 1), is (2, 3); (3, 2) means that is appears in the third word in the second sentence and the second word in the third sentence, so that after splitting, each keyword appears in which sentence and where it is very easy to search, this is the concept of inverted index. Just imagine, when we use Baidu or Google search, is this data structure easier for us to find all the content you want? this is the convenience of inverted indexing, which allows fast full-text search, but increases processing costs when adding documents to the database.

Interviewer: all right, I know you understand. Time is limited. Let's not talk about this.

This is the ideal result of the interview, leaving the interviewer speechless.

This is the end of the introduction of "ElasticSearch's method to solve the problem of massive data retrieval". Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.