How to use enrich processor 04/17 Update SLTechnology News&Howtos

How to use enrich processor

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to use enrich processor". In daily operation, I believe many people have doubts about how to use enrich processor. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to use enrich processor"! Next, please follow the editor to study!

Introduction to enrich processor

Ingest pipeline can preprocess the incoming document before it is indexed, and modify the content of the document (such as case conversion, etc.) through a series of rules defined in processor.

Enrich processor was introduced in Elasticsearch version 7.5 to add data from an existing index (source index) to an incoming document (incoming document).

For example, you can use it in the following scenarios:

Identify the Web service or vendor based on a known IP address.

Add product information to the retail order according to the product ID.

Supplement the contact information according to the email address.

Add the zip code according to the user's coordinates.

Use enrich processor

There are several steps to using enrich processor:

1. Add enrich data: add document (enrich data) to one or more source index that should contain data to be added to the incoming documents later.

two。 The creation of an enrich policy:enrich policy should contain at least the following parameters:

That specifies the source index.

Specifies the properties that incoming documents and source index use to match.

Specify the attributes to add to the incoming documents.

3. Execute enrich policy: after execution, the corresponding enrich index is automatically created. Unlike ordinary indexes, enrich index is optimized.

4. Use enrich processor:enrich processor in ingest pipeline to query using enrich index.

Background description

The source index is as follows:

Locnumcompany Guangdong A1001 Tencent Shanghai B1001Bilibili Zhejiang C1001 Alibaba

The document passed in by incoming document is as follows. Find the value of loc in the corresponding source index through the num field, and add it to the new enrich_loc attribute in incoming document.

NumcompanyA1001 Tencent B1001BilibiliC1001 Alibaba first step: add enrich data

Add documents to the location index in batches through _ bulk API, just like normal documents.

POST _ bulk {"index": {"_ index": "location"} {"loc": "Guangdong", "company": "Tencent", "num": "A1001"} {"index": {"_ index": "location"} {"loc": "Shanghai", "company": "Bilibili", "num": "B1001"} {"index": {"_ index": "location"} {"loc": "Zhejiang Province", "company": "Alibaba" "num": "C1001"} step 2: create an enrich policy

Once an enrich policy is created, it cannot be updated or modified.

PUT / _ enrich/policy/my-policy {"match": {"indices": "location", # source index index name, which is the index "match_field": "num", the attribute name in # source index, used for the matching of incoming documents and source index. The attribute name is also num "enrich_fields": ["loc"], # optional attribute added to incoming documents. Filter source index documents. Only if loc.keyword is an enrich data of Shanghai can you add attributes to incoming documents "query": {"match": {"loc.keyword": "Shanghai"} step 3: execute enrich policy

When you have created enrich policy, you can execute enrich policy through execute enrich policy API. When the enrich policy is executed, the enrich index is automatically created.

Directly matching incoming document to documents in source index can be slow and resource-intensive. To speed up processing, enrich processor uses enrich index. Enrich index includes enrich data,enrich index from source index with some special properties to help simplify them:

They are system indexes, which means that they are managed internally by Elasticsearch and apply only to enrich processor.

They always start with .enrich-*.

They are read-only, which means you can't change them directly.

They are forced to be merged for quick retrieval.

When the data is added or modified in the source index, you only need to re-execute the enrich policy to change the enrich index to update the enrich processor.

Execute enrich policy with the following command:

PUT / _ enrich/policy/my-policy/_execute

View the automatically created enrich index:

GET _ cat/indices/.enrich*# returns the result green open .enrich-my-policy-1616136526661 Vxal9lLBSlKS5lmzMpFfwQ 13 1 0 13.4kb 3.3kb

I feel that there is a small bug here in enrich policy. When deleting enrich policy, for example, the deleted enrich policy is my-policy-1, both my-policy-1 's enrich index and enrich policy will be deleted, but if there was a my-policy-2 (the two enrich policy were the same before), the my-policy-2 's enrich index will also be mistakenly deleted (enrich policy is not deleted).

Step 4: use enrich processorPUT _ ingest/pipeline/loc-pipeline {"processors" in ingest pipeline: [{"enrich": {"policy_name": "my-policy", # reference the enrich policy "field": "num", # attribute name in # incoming document, which is used to match the attribute value in source index # the attribute added in incoming document # contains the values of match_field and enrich_fields defined in enrich policy: "target_field": "enrich_loc"} verification

Use simulate to debug the effect of ingest pipeline. Since the matching loc.keyword in source index is not Shanghai, this document will not be processed:

POST _ ingest/pipeline/loc-pipeline/_simulate {"docs": [{"_ source": {"num": "A1001", "company": "Tencent"}}]} # returns the result {"docs": [{"doc": {"_ index": "_ index", "_ type": "_ doc" "_ id": "_ id", "_ source": {"company": "Tencent", "num": "A1001"}, "_ ingest": {"timestamp": "2021-03-19T06:56:45.754486259Z"}}]}

The loc.keyword of this document is Shanghai, so the attributes specified in enrich data will be added:

POST _ ingest/pipeline/loc-pipeline/_simulate {"docs": [{"_ source": {"num": "B1001", "company": "Bilibili"}}]} # returns the result {"docs": [{"doc": {"_ index": "_ index", "_ type": "_ doc" "_ id": "_ id", "_ source": {"company": "Bilibili", "enrich_loc": {"loc": "Shanghai", "num": "B1001"}, "num": "B1001"} "_ ingest": {"timestamp": "2021-03-19T06:56:29.393585306Z"}}]}

After simulate debugging is successful, we specify ingest pipeline when inserting the document:

# method 1: insert POST origin-location/_doc?pipeline=loc-pipeline {"num": "A1001", "company": "Tencent"} POST origin-location/_doc?pipeline=loc-pipeline {"num": "B1001", "company": "Bilibili"} # method 2: insert POST _ bulk?pipeline=loc-pipeline {"index": {"_ index": "origin-location"} {"num": "A1001" in batch "company": "Tencent"} {"index": {"_ index": "origin-location"} {"num": "B1001", "company": "Bilibili"}

View the results of the insertion:

GET origin-location/_search# returns the result {"took": 12, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 2, "relation": "eq"}. "max_score": 1.0,1.0, "hits": [{"_ index": "origin-location", "_ type": "_ doc", "_ id": "zXxLSXgBUc4opBV-QiOv", "_ score": 1.0, "_ source": {"num": "A1001" "company": "Tencent"}, {"_ index": "origin-location", "_ type": "_ doc", "_ id": "znxLSXgBUc4opBV-SCPk", "_ score": 1.0, "_ source": {"num": "B1001" "company": "Bilibili", "enrich_loc": {"loc": "num": "B1001"}

You can also specify the default ingest pipeline used by the index so that you do not have to specify ingest pipeline every time the document is inserted:

# specify ingest pipelinePUT origin-location2 {"settings": {"default_pipeline": "loc-pipeline"}} # insert data POST _ bulk {"index": {"_ index": "origin-location2"}} {"num": "A1001", "company": "Tencent"} {"index": {"_ index": "origin-location2"} {"num": "B1001" "company": "Bilibili"} # View the result GET origin-location2/_search# output result {"took": 8, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 2 "relation": "eq"}, "max_score": 1. 0, "hits": [{"_ index": "origin-location2", "_ type": "_ doc", "_ id": "CXxPSXgBUc4opBV-oyTJ", "_ score": 1. 0 "_ source": {"num": "A1001", "company": "Tencent"}, {"_ index": "origin-location2", "_ type": "_ doc", "_ id": "CnxPSXgBUc4opBV-oyTJ", "_ score": 1.0 "_ source": {"num": "B1001", "company": "Bilibili", "enrich_loc": {"loc": "Shanghai", "num": "B1001"}

Alternatively, you can use index template to match multiple indexes in the form of regular expressions to specify the ingest pipeline used by the index:

# use index templatePUT _ template/my-template {"index_patterns": ["origin-*"], "settings": {"default_pipeline": "loc-pipeline"} # insert data POST _ bulk {"index": {"_ index": "origin-location3"}} {"num": "A1001", "company": "Tencent"} {"index": {"_ index": "origin-location3"} {"num": "B1001" "company": "Bilibili"} # View the result GET origin-location3/_search# output result {"took": 2, "timed_out": false, "_ shards": {"total": 1, "successful": 1, "skipped": 0, "failed": 0}, "hits": {"total": {"value": 2 "relation": "eq"}, "max_score": 1. 0, "hits": [{"_ index": "origin-location3", "_ type": "_ doc", "_ id": "XnxVSXgBUc4opBV-1yRp", "_ score": 1. 0 "_ source": {"num": "A1001", "company": "Tencent"}, {"_ index": "origin-location3", "_ type": "_ doc", "_ id": "X3xVSXgBUc4opBV-1yRp", "_ score": 1.0 "_ source": {"num": "B1001", "company": "Bilibili", "enrich_loc": {"loc": "Shanghai", "num": "B1001"} so far The study on "how to use enrich processor" is over. I hope I can solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.