In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces how to use Solr7 to build a full-text index of structured csv files, the content is very detailed, interested friends can refer to, hope to be helpful to you.
The editor will show you how to use Solr to build a full-text index of csv files.
1. This test is going to generate a csv file about the size of 1GB. The data file consists of ten fields, including int, double, string, date, Chinese text and English text. Some more data types are convenient for testing. Here is the Java code to generate the data.
Https://github.com/fayson/cdhproject/blob/master/generatedata/src/main/java/com/cloudera/solr/GenerateSolrTestData.java
A total of 60W pieces of data are generated, the size of 1.1GB, and the ten fields are number,firstDouble,firstNo,secondDouble,secondNo,jarName,enText,cnText,firstTime,secondTime respectively.
Build an index
On the Solr Web page, select [Collections] on the left, and then click [Add collection]. Create a Collection
Collection created successfully
Import the prepared csv file into Solr, which is provided by post.jar that comes with Solr. Here is the use of post.jar
Referring to the help command, import the csv file into Solr and establish a full-text index using the following command
Java-Durl= http://localhost:8983/solr/test0723/update-Dtype=text/csv-Dc=test0723-jar post.jar / tmp/solr/file/data.csv
The csv file is imported successfully. The next step is to verify the query on Solr.
Perform query verification
1. Enter the query interface
two。 Query based on a single field
Number
JarName
Time field range query
3. Find according to the content in the English text
4. Search according to the content in the Chinese text
5. Use a combination of fields to find
Number in a certain time range contains records of Cloudera in English text between 1 and 10000
Among the records from number30000 to 40000, firstDouble is greater than 200and seconddouble is less than 500.
JarName begins with spark, and the Chinese text contains the record of "query"
1. Different from the dataimport method used in the previous document to import data to establish an index, this document uses the post.jar that comes with Solr to import the csv file and create an index. After query testing, this method can be used normally.
2.Solr can only use UTC format when querying with time format, and Solr can only recognize time in this format, such as 2018-03-06T02:37:02Z.
3. When using multiple conditional queries, fq can be used, and multiple search conditions can be added to fq. Range retrieval can be implemented using {}, [], and TO collocation, such as firstTime: [2018-01-01T00:00:00Z TO 2018-01-31T23:59:59Z], which represents the data of firstTime between January 1 and January 31.
4.Solr 's query page also has many parameters that can be used, such as sort can sort fields, start, rows can define the number of pages, wt can specify the format of search results, and so on.
On how to use Solr7 to build a full-text index of structured csv documents to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.