In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the example analysis of the open source big data index project hive-solr, which is introduced in great detail and has a certain reference value. Interested friends must finish reading it!
Latest update:
(1) added support for solrcloud cluster
(2) fixed bug for handling null columns and null values in hive when inverting sequences
(3) optimized the ignorance of null and null values when building the index.
Some tests:
Data volume: about 12 million, 8 fields, one of which is large text, two are word segmentation fields, and the data volume before indexing is about 20g.
Total indexing time: about 15 minutes
Volume after indexing: about 6G per shard, a total of about 18G
Hive: limit the maximum number of concurrent map to 30, for fear of affecting the Hbase service. Note that after indexing using Hive, you need to manually commit once to make the memory index flush to disk.
Batch processing: 100000 data in each map is submitted once for batch processing. This value is not commit. This value is set according to the situation. If the value is too large, it is easy for solrcloud to lose data, and too small will affect the speed.
Solrcloud cluster version 5.1 uses 3 machines, each with one shard, no copy, and 10 GB of jetty memory
CPU:24 core, note that the large text segmentation field consumes a lot of cpu
Jvm parameter adjustment of solr:
(1) increase the proportion of SurvivorRatio area and reduce the memory space of survivor area.
(2) reduce the proportion of NewRatio area and increase the memory space of the new generation.
(3) increase the permanent MaxPermSize memory to 256m
(4) adjust MaxTenuringThreshold=0 to accelerate large objects into the old age, avoid copying back and forth in survivor and eden areas, and use YGC more times.
Other parameters are still configured by default.
Solr server configuration:
(1) disable automatic commit
(2) set ramBufferSizeMB to 1000, equal to 1G
(3) set maxBufferedDocs and so on-1, disable maxBufferDocs
(4) set mergeFactor to 100
The above is all the contents of this article "sample Analysis of the Open Source big data Index Project hive-solr". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.