In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains the "elasticsearch-hadoop hive import data how to achieve non-automatic word segmentation", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "elasticsearch-hadoop hive import data how to achieve non-automatic word segmentation" bar!
Background
Based on our company's use of es scenarios, no word segmentation function is required. When the es string type is used, the word will be segmented automatically, resulting in the segmentation of province, region and other fields.
Specific mode of use
Create an elasticsearc template (_ template), using the command:
Curl-XPUT localhost:9200/_template/dmp_down_result-d'
{
"template": "dmp_down_*", # defines the name of the template that will be used by indexes starting with dmp_down_
"settings": {
"number_of_shards": 14, # set the number of shards
"number_of_replicas": 1, # set the number of copies
"index.refresh_interval": "30s" # refresh interval (but not set)
}
"aliases": {
"dmp_down_result": {} # alias
}
"mappings": {
"dmp_es_result1": the type name in the {# index, which needs to be the same as the index being built.
"properties": # specific field mapping settings
{"user_id": a field in {# hive data that must correspond to it
"type": "multi_field", # type multimedia
"fields": {
"user_id": {"type": "string", "index": "not_analyzed"}, # type string,not_analyzed: does not use a participle, using a participle: analyzed
}
}
"phone": {
"type": "multi_field"
"fields": {
"imei": {"type": "string", "index": "not_analyzed"}
}
}
"address": {
"type": "multi_field"
"fields": {
"idfa": {"type": "string", "index": "not_analyzed"}
}
}
}
}}
}
}
}'
After you have created the template, you can check whether it takes effect through http://localhost:9200/_template.
This completes the creation operation. The following drawing is attached
Import data as reported: maybe it contains illegal characters? This error cannot be checked during import. You can view the specific error information by manually creating an index / type. For example, {"error": {"root_cause": [{"type": "remote_transport_exception", "reason": "[dmp_es-16] [10.8.1.16 remote_transport_exception 9300] [indices:admin/create]"}], "type": "illegal_state_exception", "reason": "index and alias names need to be unique, but alias [dmp_keyword_result] and index [dmp_keyword_result] have the same name"} "status": 500} this can be more intuitive to see the problem. The index name is duplicated, just restart an index name.
Attached: es-hadoop hive data synchronization method:
Download the eslaticsearch-hadoop jar package, which needs to correspond to the current elasticsearch version
Upload eslaticsearch-hadoop jar to the cluster
Execute from the hive command line: add jar / home/hdroot/ elasticsearch-hadoop-2.2.0.jar
Table creation command:
CREATE EXTERNAL TABLE dmp_es_result2 (
User_id string
Imei string
Idfa string
Email string
Type_id array
Province string
Region string
Dt string
Terminal_brand string
System string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES ('es.resource' =' Index name / Type'
'es.index.auto.create' = 'true'
'es.nodes'='localhost'
'es.port' =' 9200'
'es.field.read.empty.as.null' =' true')
Es.resource: specifies the index name / type name that is synchronized to es
Es.index.auto.create: whether to use active primary key, if not, you can specify es.mapping.id= primary key
Es.nodes:es cluster node address. Any node can be separated by a comma (,). For example, 192.168.1.1purl 9200192.168.1.2purl 9200
Es.port:es cluster port number, such as multiple es.nodes specified, this may not be used
Es.field.read.empty.as.null: how to handle empty and null fields. Add this parameter to make the data import more ready (a little uncertain here)
Import data hive statement:
INSERT OVERWRITE TABLE dmp_es_result2 select user_id,imei,idfa,email,type_id,province,region,dt,terminal_brand,system from temp_zy_game_result01
If the execution is successful, you can see that the data synchronization in es has passed.
Thank you for your reading, the above is the "elasticsearch-hadoop hive import data how to achieve non-automatic word segmentation" content, after the study of this article, I believe you on the elasticsearch-hadoop hive import data how to achieve non-automatic word segmentation this problem has a deeper understanding, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.