ElasticSearch note-taking (2): CURL operations, ES plug-ins, cluster installation and core concepts 04/18 Update SLTechnology News&Howtos

ElasticSearch note-taking (2): CURL operations, ES plug-ins, cluster installation and core concepts

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

[TOC]

Introduction to CURL Operation CURL

Curl is an open source file transfer tool that uses URL syntax to work on the command line. Using curl, you can easily implement common get/post requests. Simply think of it as a tool that can access url under the command line. There is a curl tool in the default library of centos, if you don't have it installed by yum.

Curl-X specifies the http request method: HEAD GET POST PUT DELETE-d specifies the data to be transferred-H specifies the http request header information curl creates an index library curl-XPUT http://:9200/index_name/ PUT or POST can be created for example: curl-XPUT 'http://localhost:9200/bigdata'CURL operation (1): index database creation and query

Create an index library:

Curl-XPUT http://uplooking01:9200/bigdata returns the value {"acknowledged": "true"}

View an index library information:

Curl-XGET http://uplooking01:9200/bigdata returns a json string if you want a well-formed result curl-XGET http://uplooking01:9200/bigdata?pretty

Add several index information to the index library:

Curl-XPOST http://uplooking01:9200/bigdata/product/1-d' {"name": "hadoop", "author": "Dong Couting", "version": "2.9.4"} 'curl-XPOST http://uplooking01:9200/bigdata/product/2-d' {"name": "hive", "author": "facebook", "version": "2.1.0" "url": "http://hive.apache.org"}'curl-XPUT http://uplooking01:9200/bigdata/product/3-d' {" name ":" hbase "," author ":" apache "," version ":" 1.1.5 "," url ":" http://hbase.apache.org"}' "

Query all the data under an index library:

Curl-XGET http://uplooking01:9200/bigdata/product/_search is well written: curl-XGET http://uplooking01:9200/bigdata/product/_search?pretty{ "took": 11, "timed_out": false,-> whether timeout "_ shards": {- > number of fragments (that is, partition in kafka) An index library has several partitions) "total": 5,-> default shard has 5 "successful": 5,-> five "failed": 0-> (total-successful) = failed}, "hits": {- > query result set "total": 2 -> query several records "max_score":-> the highest score in the record "hits": [{- > specific result set array "_ index": "bigdata",-> result location index library "_ type": "product" -> a certain type of type "_ id" in the index database of the result: "2",-> result index id "_ score": 1.0,-> index score "_ source": {- > index specific content _ source "name": "hive" "author": "facebook", "version": "2.1.0", "url": "http://hive.apache.org"}}, {" _ index ":" bigdata "," _ type ":" product "," _ id ":" 1 "," _ score ": 1.0 "_ source": {"name": "hadoop", "author": "Dong Couting", "version": "2.9.4"} the difference between PUT and POST

PUT is idempotent, but POST is not. Therefore, it is more appropriate for PUT to update and POST to add.

PUT and DELETE operations are idempotent. The so-called idempotent means that no matter how many operations are performed, the result is the same. For example, modify an article with PUT, and then do the same, and the result is no different after each operation, and so is DELETE.

POST operations are not idempotent, such as the common POST repeat loading problem: when we make the same POST request multiple times, the result is that several resources are created.

It is also important to note that the creation operation can use POST or PUT. The difference is that POST acts on a collection resource (/ articles), while PUT operation acts on a specific resource (/ articles/123). For example, many resources use the database self-increasing primary key as identification information, so you need to use PUT. When the identity information of the created resource can only be provided by the server, POST must be used.

What to pay attention to when creating Index Library and Index by ES

1) Index library names must be all lowercase, cannot begin with an underscore, or contain commas

2) if the ID of the index data is not explicitly specified, then es will automatically generate a random ID, requiring the use of the POST parameter

Curl-XPOST http://localhost:9200/bigdata/product/-d'{"author": "Doug Cutting"}'

Give an example of idempotent operations: StringBuilder sb = new StringBuilder (); sb.append ("aaaa"); what is the result? Sb String str = new StringBuilder (); str.substring ();-> New object CURL operations (2): advanced query, update, delete and batch operations

Query all:

Query only individual fields in source curl-XGET 'http://uplooking01:9200/bigdata/product/_search?_source=name, Author&pretty' returns results only query source curl-XGET 'http://uplooking01:9200/bigdata/product/1?_source&pretty' condition query: curl-XGET' http://uplooking01:9200/bigdata/product/_search?q=name:hbase&pretty' query name results of hbase curl-XGET 'http://uplooking01:9200/bigdata/product/_search?q=name:h*&pretty' query results of name with h first

Paging query:

Curl-XGET 'http://uplooking01:9200/bank/acount/_search?pretty&from=0&size=5'

Update:

Using post put, you can curl-XPOST http://uplooking01:9200/bigdata/product/AWA184kojrSrzszxL-Zs-d' {"name": "sqoop", "author": "apache"} 'curl-XPOST http://uplooking01:9200/bigdata/product/AWA184kojrSrzszxL-Zs-d' {"version": "1.4.6"}' but these operations are global updates, understood as before deletion, and re-create a new document with the same id. Partial update (POST must be used) if update is used and the doc content in source curl-XPOST http://uplooking01:9200/bigdata/product/AWA184kojrSrzszxL-Zs/_update-d' {"doc": {"name": "sqoop", "author": "apache"}} 'description: ES can use PUT or POST to update the document, if the specified ID document already exists Then perform the update operation Note: when performing the update operation, ES first marks the old document as the deletion status, and then adds a new document, the old document will not disappear immediately, but you can not access it, ES will continue to add more data in the background to clean up documents that have been marked as deleted status.

Delete:

Delete normally, delete the curl-XDELETE http://localhost:9200/bigdata/product/3/ description according to the primary key: if the document exists, the value of the es property found:true,successful:1,_version property is + 1. If the document does not exist, the es attribute found is false, but the version value version will still be + 1, which is part of the internal management, a bit like the svn version number, which ensures that the order of our different operations across multiple nodes is correctly marked. Note: after a document is deleted, it will not take effect immediately, it is just marked as deleted. ES will delete it in the background when you add more indexes later.

Batch operation-bulk:

Bulk api can help us execute multiple request formats at the same time: action: [index | create | update | delete] metadata:_index,_type,_id request body:_source (delete operation is not required) {action: {metadata}}\ n {request body}\ n {action: {metadata}}\ n {request body}\ n for example: {"index": {"_ id": "1"} {"account_number": 1, "balance": 39225, "firstname": "Amber" "lastname": "Duke", "age": 32, "gender": "M", "address": "880Holmes Lane", "employer": "Pyrami", "email": "amberduke@pyrami.com", "city": "Brogan", "state": "IL"} {"index": {"_ id": "6"} {"account_number": 6, "balance": 5686, "firstname": "Hattie", "lastname": "Bond", "age": 36, "gender": "M" "address": "671 Bristol Street", "employer": "Netagy", "email": "hattiebond@netagy.com", "city": "Dante", "state": "TN"} the difference between create and index if data exists If the operation fails with create, it will indicate that the document already exists, and if you use index, it can be executed successfully. How to use files curl-XPOST http://uplooking01:9200/bank/acount/_bulk-- data-binary @ / home/uplooking/data/accounts.json

Supplementary notes on bulk operations:

You can look at each index library information curl 'http://localhost:9200/_cat/indices?v'Bulk request can declare in URL / _ index or / _ index/_typeBulk the maximum amount of data to be processed at a time Bulk will load the data to be processed in memory, so the amount of data is limited, the best amount of data is not a definite number, it depends on your hardware, your document size and complexity Your index and search load is generally recommended to be 1000-5000 documents. If your document is very large, you can reduce the queue appropriately. The recommended size is 5~15MB, which cannot exceed 100m by default. You can modify this value in the configuration file of es. Http.max_content_length:100mbCURL operation (3): ES version control

Ordinary relational databases use (pessimistic concurrency control (PCC))

When we lock a row of data before reading it, then make sure that only the thread reading the data can modify the row of data.

ES uses (optimistic concurrency Control (OCC))

ES does not block access to a particular data, however, if the underlying data changes between our reads and writes, the update will fail, and it is up to the program to decide how to handle the conflict. It can re-read the new data to update, or feedback the situation directly to the user.

How ES implements version control (using es build numbers)

1: first get the document that needs to be modified and get the version (_ version) number

curl-XGET http://localhost:9200/bigdata/product/1

2: pass the version number when you perform the update operation

curl-XPUT http://localhost:9200/bigdata/product/1?version=1-d'{"name": "hadoop", "version": 3}'(override)

curl-XPOST http://localhost:9200/bigdata/product/1/_update?version=3-d'{"doc": {"name": "apachehadoop", "latest_version": 2.6}}'(partial update)

3: if the version number passed does not match the version number of the document to be updated, the update will fail

How does ES implement version control (using external version numbers)

if your database already has a version number, or it can represent the timestamp of the version. At this point, you can add version_type=external after the query url of es to use these numbers.

Note: the version number must be an integer greater than 0 and less than 9223372036854775807 (the maximum positive value of long in Java).

When es processes the external version number, it no longer checks whether the _ version is equal to the value specified in the request, but checks whether the current _ version is smaller than the specified value, and if so, the request succeeds. Example: curl-XPUT 'http://localhost:9200/bigdata/product/20?version=10&version_type=external'-d' {"name": "flink"} 'Note: the quotation marks before and after url cannot be omitted here, otherwise an error will be reported to the ES plug-in during execution.

ES itself has relatively few services, and the power of its function is reflected in the richness of plug-ins. There are many ES plug-ins for ES management, performance improvement, the following is a few commonly used plug-ins.

BigDesk Plugin offline installation: bin/plugin install file:/home/uplooking/soft/bigdesk-master.zip uninstall: bin/plugin remove bigdesk online installation: bin/plugin install hlstudio/bigdesk access (web): http://uplooking01:9200/_plugin/bigdeskElasticsearch-Head Plugin offline installation bin/plugin install file:/home/uplooking/soft/ online installation bin/plugin install mobz/elasticsearch-head access http://uplooking01:9200/_plugin/head/Elasticsearch Kibana

This is one of the components in ELK (ElasticSearch LogStach Kibana).

Configuration: server.port: 5601 server.host: "uplooking01" elasticsearch.url: "http://uplooking01:9200" elasticsearch.username:" jack "elasticsearch.password:" uplooking "access: http://uplooking01:5601 Common reporting tools: line chart, pie chart, bar chart, ES cluster installation if you are in a local area network, you only need to modify one place: cluster.name remains uniform. In this way, the newly started machine can be found through the discovery mechanism. If you want to network in different Lans, directly disable the discovery mechanism and specify the corresponding node hostname. (however, it is not automatically discovered when using version 2.3.0, so in order to configure the cluster, you need to disable this mechanism and configure the node hostname manually) cluster.name: bigdata-08-28node.name: hadooppath.data: / home/uplooking/data/elasticsearchpath.logs: / home/uplooking/logs/elasticsearchnetwork.host: uplooking01discovery.zen.ping.multicast.enabled: falsediscovery.zen.ping.unicast.hosts: ["uplooking01", "uplooking02", "uplooking03"]

Cluster status of ElasticSearch:

Green: Yellow is available for all primary and secondary shards: not all primary and secondary shards are available Red: not all primary and secondary shards are available with the ElasticSearch core concept Cluster

Represents a cluster, there are multiple nodes in the cluster, including a master node, this master node can be elected, the master-slave node is for the internal cluster. One of the concepts of ES is decentralization, which literally means no central node, which is external to the cluster, because from the external point of view, the ES cluster is logically a whole, and your communication with any node and the communication with the entire ES cluster is equivalent.

The primary node is responsible for managing the state of the cluster, including the status of shards and replicas, as well as the discovery and deletion of nodes.

You only need to start multiple ES nodes within the same network segment to form a cluster automatically.

By default, ES automatically discovers nodes in the same network segment and automatically forms clusters.

View status of the cluster:

Http://:9200/_cluster/health?prettyshards

On behalf of index shards, ES can divide a complete index into multiple shards. This advantage is that a large index can be divided into multiple, distributed to different nodes to form a distributed search. The number of fragments can only be specified before the index is created, and cannot be changed after the index is created.

You can specify when you create an index library:

Curl-XPUT 'localhost:9200/test1/'-d' {"settings": {"number_of_shards": 3}}' # defaults to an index library with 5 sharded index.number_of_shards:5replicas

On behalf of the index copy, ES can set a copy to the index. One of the functions of the copy is to improve the fault tolerance of the system. When a fragment of a node is damaged or lost, it can be recovered from the copy. The second is to improve the query efficiency of ES, ES will automatically load balance the search requests.

You can specify when you create an index library:

Curl-XPUT 'localhost:9200/test2/'-d' {"settings": {"number_of_replicas": 2}}' # defaults to one part with one copy of index.number_of_replicas:1recovery

Represents data recovery or data redistribution. When a node joins or exits, ES redistributes the index shards according to the load of the machine, and data recovery occurs when the dead node is restarted.

Gateway

Represents the persistent storage method of ES index. By default, ES stores the index in memory first, and then persists to the hard disk when the memory is full. The index data is read from the gateway when the ES cluster is shut down and restarted. ES supports many types of gateway, including local file system (default), distributed file system, Hadoop's HDFS and amazon's S3 cloud storage service.

Discovery.zen

Representing the automatic node discovery mechanism of ES, ES is a P2P-based system, which first finds the existing nodes through broadcast, then communicates between nodes through multicast protocol, and also supports point-to-point interaction.

If the nodes of different network segments form an ES cluster, disable the automatic discovery mechanism: discovery.zen.ping.multicast.enabled: false sets the list of comments that can be found when a new node is started: discovery.zen.ping.unicast.hosts: ["master:9200", "slave01:9200"] transport

Represents the way ES internal nodes or clusters interact with clients. By default, tcp protocol is used to interact with clients. At the same time, it supports http protocol (json format), thrift, servlet, memcached, zeroMQ and other transport protocols (integrated through plug-ins).

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.