Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize Elasticsearch Cross-Cluster data Migration

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article focuses on "how to achieve Elasticsearch cross-cluster data migration". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "how to achieve Elasticsearch cross-cluster data migration"!

Scheme comparison Scheme elasticsearch-dumpreindexsnapshotlogstash basic principle logical backup, similar to mysqldump to export data one by one and then import reindex is an API interface provided by Elasticsearch, which can migrate data from one cluster to another from the source cluster to create a data snapshot through Snapshot API, and then restore in the target cluster to read data from one cluster and then write to another cluster network requires interconnection between clusters. If the file is exported first and then imported into the cluster through the file, there is no need for network interconnection and no network interconnection requires that the network needs interworking and migration speed is slow and fast. Generally, it is suitable for scenarios with small amount of data, for scenarios with large amount of data, for scenarios with large amount of data, and for scenarios that accept offline data migration. Near real-time data transmission configuration complexity medium simple complex medium preparation source cluster data

Create a mapping:

PUT dumpindex {"mappings": {"properties": {"name": {"type": "text"}, "age": {"type": "integer"}

Insert data:

POST _ bulk {"index": {"_ index": "dumpindex"} {"name": "tom", "age": 18} {"index": {"_ index": "dumpindex"}} {"name": "jack", "age": 19} {"index": {"_ index": "dumpindex"}} {"name": "bob", "age": 20} elasticsearch-dump

Elasticsearch-dump is an open source ES data migration tool, github address: https://github.com/taskrabbit/elasticsearch-dump

Install elasticsearch-dump mode 1

Elasticsearch-dump is developed using node.js and can be installed directly using npm package management tools:

Npm install elasticdump-g mode II

You can also run it by starting the finished elasticsearch-dump docker container. You need to mount the host's directory to the container with the-v parameter.

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\ plus command

For example:

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\-- input=/tmp/dumpindex.json\ # where the input file is the file path in the container, and the corresponding file on the host is / root/elasticsearch-dump/dumpindex.json file-- output= http://192.168.1.67:9200/dumpindex\-- type=dataJSON file import and export Elasticsearch data to JSON file

Export the data from Elasticsearch to a dumpindex_data.json file using the following command.

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\-- input= http://192.168.1.171:9200/dumpindex\-- output=/tmp/dumpindex_data.json\-type=data

View the contents of the file, which contains the data information of the index:

[root@elastic1] # cat / root/elasticsearch-dump/dumpindex_data.json {"_ index": "_ type": "_ doc", "_ id": "q28kPngB8Nd5nYNvOgHd", "_ score": 1, "_ source": {"name": "tom", "age": 18}} {"_ index": "dumpindex", "_ type": "_ doc", "_ id": "rG8kPngB8Nd5nYNvOgHd", "_ score": 1, "_ source": {"name": "jack" "age": 19}} {"_ index": "dumpindex", "_ type": "_ doc", "_ id": "rW8kPngB8Nd5nYNvOgHd", "_ score": 1, "_ source": {"name": "bob", "age": 20}}

In addition, you also need to export the mapping of the index. If you directly transfer the previous data to the new Elasticsearch cluster, the new cluster will automatically generate mapping based on the data, which may be inconsistent with the mapping of the source cluster:

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\-- input= http://192.168.1.171:9200/dumpindex\-- output=/tmp/dumpindex_mapping.json\-type=mapping

View the contents of the exported mapping file:

[root@elastic1 ~] # cat / root/elasticsearch-dump/dumpindex_mapping.json {"dumpindex": {"mappings": {"properties": {"age": {"type": "integer"}, "name": {"type": "text"} import JSON file data into Elasticsearch

First import the mapping information:

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\-- input=/tmp/dumpindex_mapping.json\-- output= http://192.168.1.67:9200/dumpindex\-type=mapping

Then import the data:

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\-- input=/tmp/dumpindex_data.json\-- output= http://192.168.1.67:9200/dumpindex\-type=data

View the mapping information of the index on the new cluster, which is the same as that of the source cluster:

GET dumpindex/_mapping# output result {"dumpindex": {"mappings": {"properties": {"age": {"type": "integer"}, "name": {"type": "text"}

View the data of the new cluster, which is also consistent with that of the source cluster:

GET dumpindex/_search# output {"took": 3, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 3, "relation": "eq"}, "max_score": 1.0 "hits": [{"_ index": "dumpindex", "_ type": "_ doc", "_ id": "rW8kPngB8Nd5nYNvOgHd", "_ score": 1.0, "_ source": {"name": "bob", "age": 20} {"_ index": "dumpindex", "_ type": "_ doc", "_ id": "rG8kPngB8Nd5nYNvOgHd", "_ score": 1.0, "_ source": {"name": "jack", "age": 19}}, {"_ index": "dumpindex" "_ type": "_ doc", "_ id": "q28kPngB8Nd5nYNvOgHd", "_ score": 1.0," _ source ": {" name ":" tom "," age ": 18} CSV file import and export Elasticsearch data to CSV file

Open the Kibana interface, create the Index Pattern, and you can see the index in Discover.

Then create a Save Search task:

After creating the task, choose to generate the CSV file:

You can download the generated CSV file in Reports:

View the exported CSV file:

❯ cat dumpindex.csv "_ id", "_ index", "_ score", "_ type", age,nameq28kPngB8Nd5nYNvOgHd,dumpindex,0, "_ doc", 18Magneto tomrG8kPngB8Nd5nYNvOgHd doc 0, "_ doc", 19rejackrW8kPngB8Nd5nYNvOgHD book dumpindex0, "_ doc", two ways

Export to a CSV file through the elasticsearch-dump command:

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\-- input= http://192.168.1.171:9200/dumpindex\-output= "csv:///tmp/dumpindex.csv"

View the exported CSV file:

[root@elastic1 ~] # cat / root/elasticsearch-dump/dumpindex.csv name,age,@id,@index,@typetom,18,q28kPngB8Nd5nYNvOgHd,dumpindex,_docjack,19,rG8kPngB8Nd5nYNvOgHd,dumpindex,_docbob,20,rW8kPngB8Nd5nYNvOgHd,dumpindex,_doc imports CSV file data into Elasticsearch

It is important to note here that CSV files exported through the elasticsearch-dump command can be imported directly into Elasticsearch with this command. However, CSV files exported through Kibana need to modify the "_ id", "_ index", "_ score" and "_ type" in the first row (header) to other custom fields (elasticsearch-dump is changed to start with @) before they can be imported (because these fields are built-in in Elasticsearch).

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\-- input "csv:///tmp/dumpindex.csv"\-- output= http://192.168.1.67:9200/dumpindex\-- csvSkipRows 1 # the first row (header) is not imported as data

Looking at the imported data, you can see that the previous fields such as _ id have actually become @ id, and the real _ id of the index has changed. Therefore, it is not recommended to import and export data through CSV.

GET dumpindex/_search# output {"took": 0, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 2, "relation": "eq"}, "max_score": 1.0 "hits": [{"_ index": "dumpindex", "_ type": "_ doc", "_ id": "W9OmPngBcQzbxUdL_4fB", "_ score": 1.0, "_ source": {"name": "bob", "age": "20" "@ id": "rW8kPngB8Nd5nYNvOgHd", "@ index": "dumpindex", "@ type": "_ doc"}}, {"_ index": "dumpindex", "_ type": "_ doc", "_ id": "XNOmPngBcQzbxUdL_4fB", "_ score": 1.0 "_ source": {"name": "jack", "age": "19", "@ id": "rG8kPngB8Nd5nYNvOgHd", "@ index": "dumpindex", "@ type": "_ doc"} data between Elasticsearch clusters

The previous practice of exporting the data from the Elasticsearch cluster to a file, and then importing the data into the new Elasticsearch cluster through the file, is suitable for the situation where there is no network connection between the two clusters. If the networks of the two clusters are connected, you can directly conduct data between the two clusters in an easier way as follows:

Export mapping to the new cluster first

Docker run-rm-ti elasticdump/elasticsearch-dump\-input= http://192.168.1.171:9200/dumpindex\-output= http://192.168.1.67:9200/dumpindex\-type=mapping

Then export the data to the new cluster:

Docker run-- rm-ti elasticdump/elasticsearch-dump\-- input= http://192.168.1.171:9200/dumpindex\-- output= http://192.168.1.67:9200/dumpindex\-- type=data query filtering import and export data

You can filter the data to be migrated through query statements:

Docker run-- rm-ti elasticdump/elasticsearch-dump\-- input= http://192.168.1.171:9200/dumpindex\-- output= http://192.168.1.67:9200/dumpindex\-searchBody= "{\" query\ ": {\" match\ ": {\" name\ ":\" tom\ "}}"

View the data for the new cluster:

GET dumpindex/_search# output {"took": 2, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 1, "relation": "eq"}, "max_score": 1.0 "hits": [{"_ index": "dumpindex", "_ type": "_ doc", "_ id": "q28kPngB8Nd5nYNvOgHd", "_ score": 1.0, "_ source": {"name": "tom", "age": 18}}]}} MultiElasticDump

Multielasticdump makes a layer of encapsulation on the basis of elasticdump, which can simultaneously fork multiple child threads (by default, the number of CPU of the host) to operate on multiple indexes in parallel.

-- input must be URL,--output and filename must be, that is, data can only be exported from Elasticsearch to file. The exported file contains the data,mapping,setting,template of the index by default.

The regular expression matches and exports the index of dumpindex.* in Elasticsearch:

Docker run-- rm-ti-v / root/elasticsearch-dump:/tmp elasticdump/elasticsearch-dump\ multielasticdump\-- direction=dump\-- match='dumpindex.*'\ # supports matching indexes through regular expressions-- input= http://192.168.1.171:9200\-- output=/tmp

View the exported file:

[root@elastic1] # ll elasticsearch-dump/total 32 RW Murray. 1 root root 343 Mar 17 13:35 dumpindex2.json-rw-r--r--. 1 root root 149 Mar 17 13:35 dumpindex2.mapping.json-rw-r--r--. 1 root root 187 Mar 17 13:35 dumpindex2.settings.json-rw-r--r--. 1 root root 1581 Mar 17 13:35 dumpindex2.template.json-rw-r--r--. 1 root root 337 Mar 17 13:35 dumpindex.json-rw-r--r--. 1 root root 92 Mar 17 13:35 dumpindex.mapping.json-rw-r--r--. 1 root root 186 Mar 17 13:35 dumpindex.settings.json-rw-r--r--. 1 root root 1581 Mar 17 13:35 dumpindex.template.json

File data can be imported into Elasticsearch through elasticdump.

User authentication

If Elasticsearch requires an authenticated username and password, you can specify it as follows:

-- input= http://username:password@192.168.1.171:9200/my_indexReindex

First, configure the whitelist in the target Elasticsearch cluster, edit the elasticsearch.yml file, and then restart the cluster:

Reindex.remote.whitelist: 192.168.1.171:9200

Execute the reindex command on the target cluster:

POST _ reindex {"source": {"remote": {"host": "http://192.168.1.171:9200" # if user authentication is required #" username ":" user ", #" password ":" pass ",}," index ":" kibana_sample_data_flights "# also supports filtering} through query statements "dest": {"index": "kibana_sample_data_flights"}}

Filter out reindex tasks:

GET _ tasks?actions=*reindex

Query the execution of specific reindex tasks:

GET _ tasks/pMrJwVGSQcSgeTZdh71QRw:1413Snapshot

Snapshot API is a set of API interfaces used by Elasticsearch to back up and restore data. Data can be migrated across clusters through Snapshot API. The principle is to create a data snapshot from the source Elasticsearch cluster and then restore it in the target Elasticsearch cluster.

Step 1: register Repository in the source cluster

Register Repository before creating a snapshot. A Repository can contain multiple snapshot files. There are mainly the following types of Repository:

Fs: share file system, store snapshot files in file system url: specify URL path of file system, support protocol: http,https,ftp,file,jars3: AWS S3 object storage, snapshot is stored in S3, support hdfs as plug-in: snapshot is stored in hdfs, support azure as plug-in: snapshot is stored in azure object storage, support gcs in plug-in form: snapshot is stored in google cloud object storage Support to build NFS servers in the form of plug-ins

We choose to share the file system here as the Repository, first deploy a NFS server for file sharing.

Install NFS:

Yum install-y nfs-utilssystemctlenable nfs.service-- now

Create related directories:

Mkdir / home/elasticsearch/snapshotchmod 777 / home/elasticsearch/snapshot

Edit the NFS configuration file / etc/exports:

# rw read and write permissions, sync synchronous write to hard disk / home/elasticsearch/snapshot 192.168.1.0 Universe 24 (rw,sync)

Re-NFS the service after modification:

Systemctl restart nfsElasticsearch cluster hosts mount NFS file system

Edit the / etc/fstab file and add the following:

192.168.1.65:/home/elasticsearch/snapshot / home/elasticsearch/snapshot/ nfs defaults 0 0

After editing, execute the mount command to mount:

[root@elastic1 ~] # mount-a # View mount point [root@elastic1 ~] # df-hTFilesystem Type Size Used Avail Use% Mounted on...192.168.1.65:/home/elasticsearch/snapshot nfs 142G 39M 142G 1% / home/elasticsearch/snapshot modify Elasticsearch configuration file

Edit the elasticsearch.yml file and add the following:

Path.repo: ["/ home/elasticsearch/snapshot"]

Restart Elasticsearch after the addition is completed, and then verify whether the repo directory has been added successfully with the following command:

GET _ cluster/settings?include_defaults&filter_path=*.path.repo# output result {"defaults": {"path": {"repo": ["/ home/elasticsearch/snapshot"]} register Repository

Register a Repository named dumpindex:

PUT / _ snapshot/my_fs_backup {"type": "fs", "settings": {"location": "/ home/elasticsearch/snapshot/dumpindex", # you can write dumpindex, relative path "compress": true}}

Check the registration of RRepository on each node:

POST _ snapshot/my_fs_backup/_verify# output result {"nodes": {"MjS0guiLSMq3Oouh008uSg": {"name": "elastic3"}, "V-UXoQMkQYWi5RvkjcO_yw": {"name": "elastic2"}, "9NPH3gJoQAWfgEovS8ww4w": {"name": "elastic4"} "gdUSuXuhQ7GvPogi0RqvDw": {"name": "elastic1"}} step 2: register with the source cluster to create a snapshot

Indices: index the snapshot.

Wait_for_completion=true: whether to wait for the snapshot to be completed before responding; if true, it will wait for the snapshot to complete before responding. (default is false, and respond immediately without waiting for the snapshot to complete)

Ignore_unavailable: when set to true, indexes that do not exist are ignored when snapshots are created.

Include_global_state: when set to false, snapshots can be completed when not all primary shards of an index are available.

Specify to take a snapshot of the dumpindex index with the following command:

PUT _ snapshot/my_fs_backup/snapshot_1?wait_for_completion=true {"indices": "dumpindex", "ignore_unavailable": true, "include_global_state": false} # output result {"snapshot": {"snapshot": "snapshot_1", "uuid": "cTvmz15pQzedDE-fHbzsCQ", "version_id": 7110199, "version": "7.11.1" "indices": ["dumpindex"], "data_streams": [], "include_global_state": false, "state": "SUCCESS", "start_time": "2021-03-17T14:33:20.866Z", "start_time_in_millis": 1615991600866, "end_time": "2021-03-17T14:33:21.067Z" "end_time_in_millis": 1615991601067, "duration_in_millis": 201, "failures": [], "shards": {"total": 1, "failed": 0, "successful": 1}

In the directory where you registered Repository earlier, you can see that the relevant snapshot files are generated:

[elasticsearch@elastic1] $ll / home/elasticsearch/snapshot/dumpindex/total 16 RW Murray RW Murray. 1 elasticsearch elasticsearch 439 Mar 17 22:18 index-0-rw-rw-r--. 1 elasticsearch elasticsearch 8 Mar 17 22:18 index.latestdrwxrwxr-x. 3 elasticsearch elasticsearch 36 Mar 17 22:18 indices-rw-rw-r--. 1 elasticsearch elasticsearch 193 Mar 17 22:18 meta-cTvmz15pQzedDE-fHbzsCQ.dat-rw-rw-r--. 1 elasticsearch elasticsearch 252 Mar 17 22:18 snap-cTvmz15pQzedDE-fHbzsCQ.dat step 3: register Repository with the target cluster

In the same way that you register with the source cluster, modify the elasicsearch.yml configuration file and register the Repository with the following command:

PUT _ snapshot/my_fs_backup {"type": "fs", "settings": {"location": "/ home/elasticsearch/snapshot/dumpindex", "compress": true}}

Copy the snapshot files generated by the source cluster to the Repository directory of the target cluster:

[elasticsearch@elastic1 ~] $scp-r / home/elasticsearch/snapshot/dumpindex/* elasticsearch@192.168.1.67:/home/elasticsearch/snapshot/dumpindex/ step 4: import the snapshot into the index on the target cluster

View snapshot information on the target cluster:

GET _ snapshot/my_fs_backup/snapshot_1# output result {"snapshots": [{"snapshot": "snapshot_1", "uuid": "cTvmz15pQzedDE-fHbzsCQ", "version_id": 7110199, "version": "7.11.1", "indices": ["dumpindex"], "data_streams": [] "include_global_state": false, "state": "SUCCESS", "start_time": "2021-03-17T14:33:20.866Z", "start_time_in_millis": 1615991600866, "end_time": "2021-03-17T14:33:21.067Z", "end_time_in_millis": 1615991601067, "duration_in_millis": 201 "failures": [], "shards": {"total": 1, "failed": 0, "successful": 1}]}

Import the snapshot into the dumpindex index of the target cluster:

POST _ snapshot/my_fs_backup/snapshot_1/_restore {"indices": "dumpindex"}

If you look at the index data, you can see that it is consistent with the source cluster:

{"took": 3, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 3, "relation": "eq"}, "max_score": 1.0 "hits": [{"_ index": "dumpindex", "_ type": "_ doc", "_ id": "q28kPngB8Nd5nYNvOgHd", "_ score": 1.0, "_ source": {"name": "tom", "age": 18} {"_ index": "dumpindex", "_ type": "_ doc", "_ id": "rG8kPngB8Nd5nYNvOgHd", "_ score": 1.0, "_ source": {"name": "jack", "age": 19}}, {"_ index": "dumpindex" "_ type": "_ doc", "_ id": "rW8kPngB8Nd5nYNvOgHd", "_ score": 1.0, "_ source": {"name": "bob", "age": 20}}]} Logstash

Logstash supports reading data from one Elasticsearch cluster and writing to another Elasticsearch cluster:

Edit the conf/logstash.conf file:

Input {elasticsearch {hosts = > ["http://192.168.1.171:9200"] index = >" dumpindex "# if user authentication is required # user = >" username "# password = >" password "}} output {elasticsearch {hosts = > [" http://192.168.1.67:9200"] index = > "dumpindex"}

Start Logstash:

[elasticsearch@es1 logstash-7.11.1] $bin/logstash-f config/logstash.conf

Looking at the dumpindex index data on the target cluster, you can see that it is consistent with the source cluster:

GET dumpindex/_search# output {"took": 3, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 3, "relation": "eq"} "max_score": 1.0,1.0, "hits": [{"_ index": "dumpindex", "_ type": "_ doc", "_ id": "jrfpQHgBERNPF_kwE-Jk", "_ score": 1.0, "_ source": {"@ version": "1" "name": "tom", "@ timestamp": "2021-03-17T15:58:39.423Z", "age": 18}}, {"_ index": "dumpindex", "_ type": "_ doc", "_ id": "j7fpQHgBERNPF_kwE-Jk", "_ score": 1.0 "_ source": {"@ version": "1", "name": "jack", "@ timestamp": "2021-03-17T15:58:39.440Z", "age": 19}}, {"_ index": "dumpindex", "_ type": "_ doc" "_ id": "kLfpQHgBERNPF_kwE-Jk", "_ score": 1.0, "_ source": {"@ version": "1", "name": "bob", "@ timestamp": "2021-03-17T15:58:39.440Z", "age": 20} so far I believe you have a deeper understanding of "how to achieve Elasticsearch cross-cluster data migration". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report