What is the introduction to Elasticsearch? 07/12 Update SLTechnology News&Howtos

What is the introduction to Elasticsearch?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

What is a brief introduction to Elasticsearch? I believe many inexperienced people don't know what to do about it. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Elasticsearch7.2 series of articles: https://www.xugj520.cn/category/ES/

For search and analysis, you know!

Brief introduction

Elasticsearch is the core distributed search and analysis engine of Elastic Stack. Logstash and Beats help collect, aggregate, enrich, and store your data in Elasticsearch. Kibana enables you to interactively explore, visualize, and share data insights, and manage and monitor stacks. Elasticsearch is the place to index, search and analyze where magic occurs.

Elasticsearch provides real-time search and analysis for all types of data. Whether you are a structured document or unstructured text, digital data, or geospatial data, Elasticsearch can effectively store and index it in a way that supports fast search. You can go far beyond simple data retrieval and aggregating information to discover trends and patterns in the data. As your data and query volume grows, the distributed nature of Elasticsearch enables your deployment to grow seamlessly with it.

While not every problem is a search problem, Elasticsearch provides the speed and flexibility to deal with a variety of use case data:

Add a search box to an application or website

Store and analyze log, metrics and security event data

Real-time automatic modeling of data behavior using machine learning

Using Elasticsearch as a storage engine to automate business workflows

Use Elasticsearch as a geographic information system (GIS) to manage, integrate and analyze spatial information

Using Elasticsearch as a bioinformatics research tool to store and process genetic data we are surprised by the novel way people use search. However, whether your use case is similar to one of these, or whether you use Elasticsearch to solve new problems, you handle data, documents, and indexes in Elasticsearch in the same way.

Data entry: files and indexes

Elasticsearch is a distributed document storage. Instead of storing information as column data rows, Elasticsearch stores complex data structures that have been serialized into JSON documents. When there are multiple Elasticsearch nodes in a cluster, the stored documents are distributed across the cluster and can be accessed immediately from any node.

When a document is stored, it is indexed in real time within 1 second and is fully searchable. Elasticsearch uses a data structure called inverted indexing to support very fast full-text search. The inverted index lists each unique word that appears in any document and identifies all documents in which each word appears.

An index can be thought of as an optimized collection of documents, and each document is a collection of fields that are key-value pairs that contain data. By default, Elasticsearch indexes all data in each field, and each index field has a dedicated optimized data structure. For example, text fields are stored in reverse indexes, and numeric and geographic fields are stored in the BKD tree. The ability to assemble and return search results using the data structure of each field makes Elasticsearch so fast.

Elasticsearch also has the ability to be schemaless, which means that documents can be indexed without having to explicitly specify how to handle each different field that may appear in the document. When dynamic mapping is enabled, Elasticsearch automatically detects and adds new fields to the index. This default behavior makes it easy to index and browse data-just start indexing the document, and Elasticsearch will detect and map Boolean, floating-point and integer values, dates, and strings to the appropriate Elasticsearch data types.

In the end, however, you know more about your data and how you want to use it than Elasticsearch. You can define rules to control dynamic mapping and explicitly define mapping to fully control how fields are stored and indexed.

Defining your own mapping enables you to:

Distinguish between full-text string fields and exact value string fields

Perform language-specific text analysis

Optimize fields for partial matching

Use a custom date format

It is often useful to index the same field in different ways for different purposes using data types such as geo_point and geo_shape that cannot be automatically detected. For example, you might want to index string fields into text fields for full-text search and keyword fields for sorting or aggregating data. Alternatively, you can choose to use multiple language parsers to process the contents of string fields that contain user input.

The analysis chain that is applied to full-text fields during indexing is also used in search. When you query a full-text field, the query text does the same analysis before looking for terms in the index.

Information output: search and analysis

While you can use Elasticsearch as a document store and retrieve documents and their metadata, the real power comes from easy access to a full set of search capabilities built on the Apache Lucene search engine library.

Elasticsearch provides a simple, consistent REST API for managing clusters, indexing, and searching data. For testing purposes, you can easily submit a request directly from the command line or through Developer Console in Kibana. From your application, you can use the Elasticsearch client as your language of choice: Java,JavaScript,Go,.NET,PHP,Perl,Python or Ruby.

Elasticsearch REST API support for searching your data combines both structured queries, full-text queries and complex queries. Structured queries are similar to the types of queries that can be constructed in SQL. For example, you can search the gender and age fields in the index and sort the matches by field employee hire_date. The full-text query finds all documents that match the query string and returns them by relevance-the degree of match with the search term.

In addition to searching for individual terms, you can also perform phrase search, similarity search, and prefix search, and get auto-fill suggestions.

Do you want to search for geospatial or other digital data? Elasticsearch indexes non-text data in optimized data structures to support high-performance geographic and digital queries.

You can access all of these search capabilities using Elasticsearch's comprehensive JSON-style query language (query DSL). You can also build SQL-style queries to locally search and aggregate data within Elasticsearch, and the JDBC and ODBC drivers enable a variety of third-party applications to interact with Elasticsearch through SQL.

By analyzing your data through Elasticsearch aggregation, you can build complex data summaries and gain insight into key metrics, patterns, and trends. Aggregation prevents you from finding the well-known "looking for a needle in a haystack", but answers the following questions:

How many needles are there to look for a needle in a haystack?

What is the average length of the needle?

What is the median length of the needle subdivided by the manufacturer?

How many needles have been added to the haystack every day in the past six months? You can also use aggregation to answer more subtle questions, such as:

What is your most popular needle manufacturer?

Are there any abnormal or abnormal needles? Because aggregations take advantage of the same data structures used for search, they are also very fast. This allows you to analyze and visualize the data in real time. Your reports and dashboards are updated as the data changes so that you can take action based on the latest information.

More importantly, aggregation runs with search requests. You can search for documents, filter results, and perform analysis in a single request in the same data. And because the aggregation is calculated in the context of a specific search, you display not only the number of all 70 stitches, but also the number of 70 stitches that match the user's search criteria-for example, all sizes 70 non-stick embroidery needles.

But wait, there are more people who want to automatically analyze your time series data? You can use machine learning capabilities to create an accurate baseline of normal behavior in your data and to identify abnormal patterns. Through machine learning, you can detect:

Abnormal statistics related to time deviations in values, counts, or frequencies are rare what is the best part of a population's unusual behavior? You do not need to specify algorithms, models, or other configurations related to data science to do this.

Scalability and resiliency: clusters, nodes and shards

Elasticsearch is always available and can be extended according to your needs. It does this through natural distribution. You can add servers (nodes) to the cluster to increase capacity, and Elasticsearch automatically distributes data and query load on all available nodes. Without thoroughly examining your application, Elasticsearch knows how to balance multi-node clusters to provide scale and high availability. The more nodes, the better.

How does this work? Under the cover, the Elasticsearch index is really just a logical grouping of one or more physical shards, where each shard is actually a self-contained index. By distributing documents in the index across multiple shards and distributing these shards across multiple nodes, Elasticsearch ensures redundancy, which prevents hardware failures and increases query capacity when nodes are added to the cluster. As the cluster grows (or shrinks), Elasticsearch automatically migrates shards to rebalance the cluster.

There are two types of slices: primary color and copy. Each document in the index belongs to a master shard. A replica shard is a copy of the main shard. Copies provide redundant copies of data to prevent hardware failures and increase the capacity of service read requests, such as searching or retrieving documents.

The number of primary shards in the index is fixed when the index is created, but the number of replica shards can be changed at any time without interrupting the index or query operation.

It depends on.

There are many performance considerations and tradeoffs with regard to the shard size and the number of primary shards configured for the index. The more shards, the more expensive it is to maintain these indexes. The larger the shard size, the longer it takes to move the shard when Elasticsearch needs to rebalance the cluster.

Querying a large number of small shards makes the processing of each shard faster, but querying means more overhead, so querying a smaller number of larger shards may be faster. In short. It depends on.

As a starting point:

It is designed to keep the average fragment size between a few GB and dozens of GB. For use cases with time-based data, you will usually see shards in the range of 20GB to 40GB.

Avoid a large number of debris problems. The number of shards that a node can hold is proportional to the available heap space. As a general rule, the number of fragments per GB heap space should be less than 20. The best way to determine the best configuration of a use case is to test it with your own data and queries.

In the case of disaster recovery, for performance reasons, the nodes in the cluster need to be on the same network. It only takes too long to balance shards in the cluster between nodes in different data centers. But the high availability architecture requires you to avoid putting all your eggs in one basket. If a major outage occurs in one location, the server in another location needs to be able to take over. Seamless connection. Answer? Cross-cluster replication (CCR).

CCR provides a way to automatically synchronize indexes from the home cluster to a secondary remote cluster that can be used as a hot backup. If the primary cluster fails, the secondary cluster can take over. You can also use CCR to create a secondary cluster to provide read requests when geographically close to the user.

Cross-cluster replication is active-passive. The index on the home cluster is the active leader index and handles all write requests. Indexes replicated to the secondary cluster are read-only followers.

Maintain and manage like any enterprise system, you need tools to protect, manage, and monitor your Elasticsearch cluster. The security, monitoring, and management capabilities integrated into Elasticsearch allow you to use Kibana as a control center for managing clusters. Similar feature data summarization and metrics lifecycle management help you manage your data wisely over time.

After reading the above, have you mastered what the introduction to Elasticsearch is? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.