How to use Solr to index MySQL data 07/04 Update SLTechnology News&Howtos

How to use Solr to index MySQL data

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article focuses on "how to use Solr to index MySQL data", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn how to use Solr to index MySQL data.

The mysql test is used here.

1. First create a table in mysql: solr_test

2. Insert several test data:

3. Type the solrconfig.xml file with notepad, in the solrhome folder. E:\ solrhome\ mycore\ conf\ solrconfig.xml

(what is the solrhome folder, see: http://www.cnblogs.com/HD/p/3977799.html)

Join this node:

Data-config.xml

4. Create a new data-config.xml file in the same directory as solrconfig.xml. The content is

Note: a ${dataimporter.request.id} is used here, which is a parameter. Later, when you import data, you will use this condition as a benchmark to read data.

5. Copy the extracted solr jar packages solr-dataimporthandler-4.10.0.jar and solr-dataimporthandler-extras-4.10.0.jar to the WEB-INF\ lib directory of tomcat solr webapp.

Of course, it also includes mysql's jdbc jar package: mysql-connector-java-5.1.7-bin.jar

(another way is to add the lib node to the solrconfig.xml, and then put the jar package under solrhome, so that you don't have to add the jar package to WEB-INF\ lib)

6. Open schema.xml with notepad in the solrhome folder (same as point 3). The content is:

Id subject

7. Update zookeeper cluster configuration

Solrctl instancedir-update collection1 / opt/cloudera/parcels/CDH/lib/solr/solr_configs

8. Collection1 loads new configuration information

Http://localhost:8983/solr/admin/collections?action=RELOAD&name=collection1

9. Open solr web:

Description:

Custom Parameters fill in id=1, which is the parameter set in point 4.

The Clean option refers to whether to delete unmatched data. That is, if it is not in the database select result, but it exists in the solr index library, it will be deleted.

You can also use this address to access directly:

Http://localhost:8899/solr/mycore/dataimport?command=full-import&clean=true&commit=true&wt=json&indent=true&entity=solr_test&verbose=false&optimize=false&debug=false&id=1

The result will be returned:

After the configuration, we only need to use this url address to import data without segment for indexing. (it's as simple as that)

10. Test query:

Of course, dataimport can add a parameter command to reload data-config.xml

Http://localhost:8899/solr/#/mycore/dataimport/command=reload-config

If you add a piece of data to the database, but there is no index in the Solr index, you can't find it, so generally, when you use Solr to retrieve the contents of the database, you always insert the database first, then index the data in Solr, and use the fuzzy query or word segmentation function of Solr to retrieve the contents of the database.

DIH increment imports data from MYSQL database

You have learned how to import all MySQL data. Full import is very expensive when there is a large amount of data. Generally speaking, it is applicable to import data incrementally. The following describes how to import data in MYSQL database incrementally and how to set the timing to do so.

1) changes to database tables

Now that you have created a table for User, you need to add a new field, updateTime, of type timestamp and default value of CURRENT_TIMESTAMP, in order to perform incremental imports.

With such a field, Solr can determine which data is new when it is imported incrementally.

Because Solr itself has a default value of last_index_time, which records the last time full import or delta import (incremental import) was done, this value is stored in the dataimport.properties file in the file conf directory.

2) setting of necessary attributes in data-config.xml

Transformer format conversion: HTML tags are ignored in HTMLStripTransformer indexes

Query: query database tables in accordance with recorded data

DeltaQuery: incremental index query primary key ID Note this can only return the ID field

DeltaImportQuery: incremental index query imported data

DeletedPkQuery: incremental index delete primary key ID query Note this can only return the ID field

The explanations of "query", "deltaImportQuery" and "deltaQuery" are quoted from the official website, as follows:

The query gives the data needed to populate fields of the Solr document in full-import

The deltaImportQuery gives the data needed to populate fields when running a delta-import

The deltaQuery gives the primary keys of the current entity which have changes since the last index time

If you need to associate a child table query, you may need to use parentDeltaQuery

The parentDeltaQuery uses the changed rows of the current table (fetched with deltaQuery) to give the changed rows in theparent table. This is necessary because whenever a row in the child table changes, we need to re-generate the document which has that field.

For more instructions, see the documentation for DeltaImportHandler.

For the User table, the configuration of the data-config.xml file is as follows:

The principle of incremental indexing is to query the ID number of all the data that needs to be imported incrementally from the database according to the SQL statement specified by deltaQuery.

Then the data of all these ID is returned according to the SQL statement specified by deltaImportQuery, that is, the data to be processed for this incremental import.

The core idea is to record the id to be indexed and the time of the last index through the built-in variables "${dih.delta.id}" and "${dataimporter.last_index_time}".

Note: the newly added updateTime field should also be configured in the field property as well as in the schema.xml file:

If there is a deletion operation in the business, you can add an isDeleted field in the database to indicate whether the data has been deleted. When Solr updates the index, it can update the indexes of the deleted records according to this field.

At this point, you need to add the following to dataConfig.xml:

Query= "select * from user where isDeleted=0" deltaImportQuery= "select * from user where id='$ {dih.delta.id}'" deltaQuery= "select id from user where updateTime >'${dataimporter.last_index_time} 'and isDeleted=0" deletedPkQuery= "select id from user where isDeleted=1"

At this point, when Solr does an incremental index, it deletes the index of the isDeleted=1 data in the database.

Test incremental import

If there is data in the User table, you can first empty the previous test data (because the added updateTime has no value), add a User with my Mybatis test program, and the database will assign a value to the field at the current time. Use Query in Solr to query all values that have not been queried, use dataimport?command=delta-import incremental import, and query all again to find the values that have just been inserted into MySQL.

Set up tasks that are imported incrementally for scheduled execution

You can use Windows to schedule tasks, or Linux's Cron to access incremental import connections periodically to complete the function of scheduled incremental import, which is actually possible, and there should be no problem.

But it is more convenient and highly integrated with Solr itself to make use of its own timing incremental import function.

1. Download apache-solr-dataimportscheduler-1.0.jar and put it in the\ solr-webapp\ webapp\ WEB-INF\ lib directory:

Download address: http://code.google.com/p/solr-dataimport-scheduler/downloads/list

You can also download from Baidu Cloud disk: http://pan.baidu.com/s/1dDw0MRn

Note: apache-solr-dataimportscheduler-1.0.jar has bug. Reference: http://www.denghuafeng.com/post-242.html

2. Modify the web.xml file under the WEB-INF directory of solr:

Add a child element to the element

Org.apache.solr.handler.dataimport.scheduler.ApplicationListener

3. Create a new configuration file dataimport.properties:

Create a new directory conf under the SOLR_HOME\ solr directory (note that it is not the conf under SOLR_HOME\ solr\ collection1), then open the apache-solr-dataimportscheduler-1.0.jar file with the unzipped file, copy the dataimport.properties file inside, and modify it. The following is my final automatic and regular update configuration file:

# dataimport scheduler properties # # to sync or not to sync# 1-active Anything else-inactivesyncEnabled=1# which cores to schedule# ina multi-core environment you can decide which cores you want syncronized# leave empty or comment it out if using single-core deployment# syncCores=game ResourcesyncCores=collection1# solr server name or IP address# [defaults to localhost if empty] server=localhost# solr server port# [defaults to 80 if empty] port=8983# application name/context# [defaults to current ServletContextListener's context (app) name] webapp=solr# URLparams [mandatory] # remainder of URL# http://localhost:8983/solr/collection1/dataimport?command=delta-import&clean=false&commit=trueparams=/dataimport?command=delta-import&clean=false&commit=true# schedule interval# number of minutes between two runs# [defaults to 30 if empty] Time interval for interval=1# to redo indexes Unit minutes. Default is 7200, that is, 1 day. # null, 0, or commented out: indicates that the index will never be redone # the parameter of the reBuildIndexInterval=2# redo index, the start time of the reBuildIndexParams=/dataimport?command=full-import&clean=true&commit=true# redo index interval, the time of the first real execution = reBuildIndexBeginTime+reBuildIndexInterval*60*1000 # two formats: 2012-04-11 03:10:00 or 03:10:00, the latter will automatically complete the date part as the date when the service starts reBuildIndexBeginTime=03:10:00

Here, in order to do the test, the incremental index is performed every 1 minute, and the full index of full-import is disable.

4. Test

Insert a piece of data into the database and query it in Solr Query, which can not be found at first, but can be queried after an incremental index by Solr.

Generally speaking, to introduce Solr into your project, you need to consider the following points:

1. Data update frequency: how big is the daily data increment, timely update or regular update

2. Total data: how long should the data be kept?

3. Consistency requirements: how long do you expect to see the updated data, and how long is the maximum delay allowed?

4. Data characteristics: what does the data source include, and the average single record size

5. Business characteristics: what are the sorting requirements and retrieval conditions

6. Resource reuse: what is the existing hardware configuration and whether there is an upgrade plan?

At this point, I believe you have a deeper understanding of "how to use Solr to index MySQL data". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.