In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
(1) introduction to Hive+Solr
As an offline data warehouse in Hadoop ecosystem, Hive can easily use SQL to analyze massive historical data offline, and do some other things according to the analysis results, such as report statistics query and so on.
As a high-performance search server, Solr can provide fast and powerful full-text retrieval functions.
(2) Why is it necessary for hive to integrate solr?
Sometimes, we need to store the results of hive analysis into solr for full-text search services. For example, we used to have a business, using hive analysis to store the search logs of our e-commerce website into solr for report query, because it involves search keywords, and this field needs to be able to query with and without word segmentation. Through the word segmentation query, you can see a trend chart of the related products in a certain period of time. Sometimes, we need to load the data in solr into hive and use sql to complete some join analysis functions. The advantages and disadvantages of the two complement each other to better adapt to our business needs. Of course, there are some hive integration solr open source projects on the Internet, but because the version is relatively old, it can not run in the new version, and the modified and patched version can run in the latest version.
(3) how can hive be integrated with solr?
The so-called integration is actually rewriting some components of hadoop's MR programming interface. We all know that the programming interface of MR is very flexible and highly abstract. MR can load data sources not only from HDFS, but also from any non-HDFS system, provided that we need to customize:
InputFormat
OutputFormat
RecordReader
RecordWriter
InputSplit
Components, although slightly cumbersome, it is possible to load data from anywhere, including mysql,sqlserver,oracle,mongodb, solr,es,redis, and so on.
What is mentioned above is to customize the MR programming interface of Hadoop. In Hive, in addition to some of the above components, you also need to define SerDe components and assemble StorageHandler. In hive, SerDe refers to Serializer and Deserializer, that is, serialization and deserialization. Hive needs to use serde and fileinput to read and write rows of data in the hive table.
The process of reading:
HDFS files / every source-> InputFileFormat-- >-- > Deserializer-- > Row object
The process of writing:
Row object-- > Serializer-- >-- > OutputFileFormat-- > HDFS files / every source
(4) what can hive do after integrating solr?
(1) read solr data, use SQL syntax supported by hive, and perform all kinds of aggregation, statistics, analysis, join, etc.
(2) generate solr index, with a sentence of SQL, you can build an index for large-scale data by means of MR.
(e) how to install, deploy and use it?
The source code is here, no longer pasted, github has been uploaded, friends in need can use git clone https://github.com/qindongliang/hive-solr, modify a few pom files, and execute
Mvn clean package
Command to build and generate the jar package, and copy the jar package to the lib directory of hive
Examples are as follows:
(1) hive reads solr data
Build a table:
Sql code
-- delete if there is a table
Drop table if exists solr
-- create an external table
Create external table solr (
-- define fields, which need to be the same as those in solr.
Rowkey string
Sname string
)
-- define the stored storehandler
Stored by "com.easy.hive.store.SolrStorageHandler"
-- configure the solr property
Tblproperties ('solr.url' =' http://192.168.1.28:8983/solr/a',
'solr.query' ='*: *'
'solr.cursor.batch.size'='10000'
'solr.primary_key'='rowkey'
);
Execute the bin/hive command to execute the command line terminal of hive:
-- query all data
Select * from solr limit 5
-- query specified fields
Select rowkey from solr
-- aggregate statistical solr data in the way of mr
Select sname, count (*) as c from solr group by sname order by c desc
(2) an example of using hive to build an index for solr
First build the data source table:
Sql code
-- delete if it exists
Drop table if exists index_source
-- build a data table
CREATE TABLE index_source (id string, yname string,sname string) ROW FORMAT DELIMITED FIELDS TERMINATED BY', 'STORED AS TEXTFILE
-- Import local data into the data source
Load data local inpath'/ ROOT/server/hive/test_solr' into table index_source
Second, build the associated table for solr:
-- Delete existing tables
Drop table if exists index_solr
-- create an associated solr table
Create external table index_solr (
Id string
Yname string
Sname string
)
-- define the storage engine
Stored by "com.easy.hive.store.SolrStorageHandler"
-- set solr service properties
Tblproperties ('solr.url' =' http://192.168.1.28:8983/solr/b',
'solr.query' ='*: *'
'solr.cursor.batch.size'='10000'
'solr.primary_key'='id'
);
Finally, execute the following sql command to build a solr index for the data in the data source:
Sql code
-- register the jar package of hive-solr, otherwise when running in MR mode, it will not start normally
Add jar / ROOT/server/hive/lib/hive-solr.jar
-execute the insert command
INSERT OVERWRITE TABLE index_solr SELECT * FROM index_source
After the execution is successful, you can view it in the terminal interface of solr, or execute the following solr query in hive
Select * from index_solr limit 10
(6) can they integrate other frameworks?
Of course, as an open source independent framework, we can carry out various combinations, hive can also be integrated with elasticsearch, can also be integrated with mongodb, solr can also be integrated with spark, can also be integrated with pig, but we need to customize related components, the idea is roughly the same as the idea of this project.
(7) the basic environment in which the test passed
Apache Hadoop2.7.1
Apache Hive1.2.1
Apache Solr5.1.0
(8) thanks and reference materials:
Https://github.com/mongodb/mongo-hadoop/tree/master/hive/src/main/java/com/mongodb/hadoop/hive
Https://github.com/lucidworks/hive-solr
Https://github.com/chimpler/hive-solr
Https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HowtoWriteYourOwnSerDe
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.