How to realize elasticsearch distributed search 07/02 Update SLTechnology News&Howtos

How to realize elasticsearch distributed search

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "elasticsearch distributed search how to achieve", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "how to achieve elasticsearch distributed search"!

Elasticsearch generalization

(based on the Internet)

Elasticsearch is a lucene-based, open source, distributed, RESTful search engine. Elasticsearch has the following characteristics:

1. Perform searches faster

two。 Easy to install

3. A completely free search mode

4. You can simply use JSON to index data through HTTP

5. Distributed, available for search clusters

6. Ability to search in real time

7. Implement simple multi-tenancy

8. Wait.

Many times we use Elasticsearch instead of lucene to implement search, of course, because it can implement search clusters.

Free and document-oriented model

The data model of search engine is schema-free and the database is document-oriented. According to the current development trend of # nosql, using this data model to build applications has proved to be very efficient.

Elasticsearch's model is based on JSON. In fact, in recent years, it has become a standard for data presentation. In addition, semi-structured data can be easily represented through JSON. Similarly, most programming languages will give priority to JSON data parsing.

Schema mapping (Schema Mapping)

Elasticsearch is schemaless, you can throw a document in JSON format into it, and ES can index it automatically. If the input is a number or a time type, ES will automatically detect it and deal with it accordingly.

However, as we all know, the search engine is very complex, the fields in the index document can be set BOOST values to affect the scoring, and you can also use different Analyzer to control how to segment words, for example, some fields need to be segmented, but some are not necessarily, and so on. Elasticsearch allows you to fully control these rules and eventually map an JSON document to a search engine. And can be set by index (Index) and by type (Type).

Get data (GETting Some Data)

Each indexed document must have a unique identity (at the type level), which is useful in many cases, such as if you want to update or delete an indexed document, or just want to take a look at an indexed data. Getting the data couldn't be easier, just tell es to specify the index, type, and id of the document and get the actual indexed document back (that is, the JSON document when you created the index).

Search (Search)

Processing the query requires a simple request that hides many of the complex distributed operations provided by es. You can simply use the common syntax of Lucene, or use the JSON format QueryDSL (DSL: domain-specific language) to construct search requests (more flexible and convenient to construct complex queries).

Search is not just the end of the query, aspect / plane (facets), highlighting, custom scripts, and so on are supported.

Multi-tenancy (Multi Tenancy)

Since there is already a single index, why do you need more than one index? in fact, there are many reasons to support multiple indexes. for example, log indexes can be stored separately on a weekly basis. or different indexes are set up differently (for example, one uses memory for storage and the other uses the file system for storage).

When we have multiple indexes, we want to be able to search across indexes (or other operations).

Settings (Settings)

Being able to configure is a double-edged sword in itself. what we want is to be able to open it and run as soon as possible, without any configuration, and to be able to control almost all aspects of the application when necessary.

Elasticsearch has been this idea since the beginning of its construction, so almost everything is configurable and pluggable, and each index has its own independent configuration to override the master configuration (master settings). For example, one index can be configured to use memory storage, 10 shards and 1 copy, while another index can be configured to use file system storage, 1 shard and 10 copies. All index level (index) settings can be specified in YAML or JSON format when the index is created.

Distributed (Distributed)

One of the most important functions of elasticsearch is the support for distribution. The index can be split into multiple shards, and each shard can have 0 or more replicas. Each data node in the cluster can carry one or more shards, and acts as the coordination and processing of various operations to distribute to the appropriate shards. Rebalancing (Rebalancing) and routing (routing) all happen automatically.

Gate of time (Gateway)

Maybe one day, the whole cluster will collapse (no one can guarantee for any reason), or shut down for special needs. In most cases, we need to restore the cluster to its last state and re-run the service. Elasticsearch provides a module called gateway that allows you to do this, and you can think about the combination of time machine and search.

Cluster status information (including transaction logs) can be rebuilt through each local storage (default mode), or shared storage (such as NFS or Amazon S3). When shared storage is used, cluster state information is replicated asynchronously.

In addition, when using shared storage for persistence, the index information can be completely stored in memory, even if the entire cluster is shut down and then recovered.

Cluster (cluster)

There are multiple nodes in the cluster, one of which is a master node, which can be elected, and the master and slave nodes are internal to the cluster. One of the concepts of es is decentralization, which literally means no central node, which is external to the cluster, because from the external point of view, the es cluster is logically a whole, and your communication with any node and the communication with the entire es cluster is equivalent.

Index fragmentation (shards)

Es can divide a complete index into multiple shards, which has the advantage of splitting a large index into multiple nodes. Constitute a distributed search. The number of fragments can only be specified before the index is created, and cannot be changed after the index is created.

Index copy (replicas)

Es can set copies of multiple indexes. The first function of the copy is to improve the fault tolerance of the system. When a fragment of a node is damaged or lost, it can be recovered from the copy. The second is to improve the query efficiency of es, es will automatically load balance the search requests.

Data redistribution (recovery)

When a node joins or exits, es redistributes the index shards according to the load of the machine, and data recovery occurs when the dead node is restarted.

Data source (river)

It is also a way for other storage methods (such as databases) to synchronize data to es. It is an es service that exists as a plug-in, by reading the data in river and indexing it into es, the official river has couchDB, RabbitMQ, Twitter, Wikipedia.

Transport

Represents the way es internal nodes or clusters interact with clients. By default, tcp protocol is used to interact with clients, and it supports the transport protocols of http protocol (json format), thrift, servlet, memcached, zeroMQ, etc. (integrated through plug-ins).

Thank you for your reading, the above is the content of "how to achieve elasticsearch distributed search". After the study of this article, I believe you have a deeper understanding of how to achieve elasticsearch distributed search, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.