Basic concepts and characteristics of Elasticsearch 07/04 Update SLTechnology News&Howtos

Basic concepts and characteristics of Elasticsearch

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "the basic concepts and characteristics of Elasticsearch". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

I. brief introduction

Lucene: to put it simply, it is a jar package that contains encapsulated code for inverted indexing and search, including various algorithms. When we develop with java, we can introduce lucene.jar to develop.

ElasticSearch is a Lucene-based search server. It provides a distributed full-text search engine and data analysis engine; can achieve full-text search; structured retrieval; data analysis; near real-time processing of massive data; easy to install and use; it has good scalability and can be extended to hundreds of servers to deal with PB-level data.

II. Excellent cases of use of ES at home and abroad

1) in early 2013, GitHub abandoned Solr and adopted ElasticSearch to do PB-level search. GitHub uses ElasticSearch to search for 20TB data, including 1.3 billion files and 130 billion lines of code.

2) Wikipedia: launch a core search architecture based on elasticsearch.

3) SoundCloud: "SoundCloud uses ElasticSearch to provide real-time and accurate music search services for 180 million users".

4) Baidu: at present, Baidu widely uses ElasticSearch as text data analysis to collect all kinds of indicator data and user-defined data from all Baidu servers. Through multi-dimensional analysis and display of all kinds of data, Baidu helps to locate and analyze instance exceptions or business-level exceptions. Currently, it covers more than 20 business lines within Baidu (including casio, Cloud Analysis, Network Alliance, Forecast, Library, Direct number, Wallet, risk Control, etc.), with a maximum of 100 machines and 200 ES nodes in a single cluster, importing 30TB + data every day.

5) Taobao and other e-commerce websites, news websites, OA office systems, etc.

Third, basic concepts 1. Node (Node) and cluster (Cluster)

A cluster is a collection of one or more nodes (servers) that together hold the entire data and provide federated indexing and search functions on all nodes. A cluster is determined by a unique cluster ID and specifies a cluster name (default is "elasticsearch"). The cluster name is very important because nodes can join the cluster through this cluster name, and a node can only be part of the cluster.

2. Index (index)

An index is similar to a "database" in a relational database-- it is where we store and index associated data. The index name must be all lowercase, cannot begin with an underscore, and cannot contain commas.

3. Type (type)

In the index, we can define one or more types. A type is the logical category / partition of an index, and its semantics are entirely determined by the developer. Typically, you define a type for a document that has a set of common fields. For example, suppose a developer runs a blog platform and stores all the data in an index. In this index, we can define a type for user data, another type for blog data, and another type for comment data. We can think of an index as a table in a database document.

4. Document (documentation)

A document is the basic unit of indexable information, expressed in JSON. You can use it to define individual product information or employee information. We can think of a document as row and row data in a database document. In the index / type, you can store any number of documents. Documents have several common and indispensable properties, namely _ index, _ type, and _ id, which must be specified when operating on a particular document or class of documents.

5. Mapping (Mapping)

Schema mapping (schema mapping, or abbreviated mapping) is used to define the index structure. Elasticsearch stores information about fields in the mapping. The mapping is transferred as a JSON object in the file.

6. Field (field)

The smallest unit in ElasticSearch is equivalent to a column of data, similar to a key in json.

7. Shards (sharding)

When there are a large number of documents, one node may not be enough due to memory limitations, hard disk capacity, insufficient processing power, inability to respond to client requests fast enough, and so on. In this case, the data can be divided into smaller parts called shard (where each shard is a separate Apache Lucene index). Each shard can be placed on a different server, so the data can be propagated among the nodes of the cluster.

When the index of your query is distributed across multiple shards, Elasticsearch will send the query to each relevant shard and merge the results together. In addition, multiple shards can speed up indexing.

8. Replica (copy)

To improve query throughput or achieve high availability, fragmented copies can be used. A replica is just an exact copy of a slice, and each slice can have zero or more copies. In other words, Elasticsearch can have many identical shards, one of which is automatically selected to change the index operation. This special slicing is called primary shard and the rest is called replica shard (replica shard). When the primary shard is lost, for example, the server where the shard data resides is not available, the cluster promotes the replica to the new master shard.

4. Relational database and ElasticSearch corresponding relational database Elasticsearch database Database index Index, support full-text search table Table type Type data row Row document Document, but do not need a fixed structure, different documents can have different fields collection data column Column field Field schema Schema mapping Mapping "basic concepts and characteristics of Elasticsearch" content is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.