Installation and testing of full-text search tool sphinx 02/10 Update SLTechnology News&Howtos

Installation and testing of full-text search tool sphinx

2026-02-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the installation and testing of the full-text retrieval tool sphinx, which involves things, learned from the theoretical knowledge, there are many books and documents for your reference, from the perspective of practical significance, accumulated years of practical experience can be shared with you.

Introduction: recently, online due to some indescribable reasons need to filter some words in the database, so more statements on the line are select C1 fron tb1 where C1 like'% name%' similar to this kind of fuzzy query. At first, I thought of full-text indexing for word segmentation, but I also thought that these are all online databases, and the data in the table are tens of millions of rows of data, online online ddl or using third-party tools will block online DML operations during modification, and the maintenance of full-text indexing also needs to consume certain resources, so I temporarily give up this idea. In the discussion with the developer and combined with the actual business, we finally intend to choose a third-party full-text retrieval tool, here we choose the full-text retrieval tool sphinx.

Installation and testing of sphinx:

Installation of sphinx

Yum install sphinx-y

two。 Configuration file

# Minimal Sphinx configuration sample (clean, simple, functional)

Source sbtest_sbtest1

{

Type = mysql

Sql_host = localhost

Sql_user = root

Sql_pass = redhat

Sql_db = sbtest

Sql_port = 3306 # optional, default is 3306

Sql_query =\

SELECT userid as id,\

K,\

C,\

Pad\

FROM sbtest1\

Where c like '679578678%'\

Sql_attr_uint = k # data type int or timestamp

Sql_field_string = c # data type text or string

Sql_attr_string = pad # data type string

}

Index sbtest1

{

Source = sbtest_sbtest1

Path = / var/lib/sphinx/sbtest1 # path to the index file

}

Index testrt

{

Type = rt

Rt_mem_limit = 128m

Path = / var/lib/sphinx/testrt

Rt_field = title

Rt_field = content

Rt_attr_uint = gid

}

Indexer

{

Mem_limit = 128m

}

Searchd

{

Listen = 9312

Listen = 9306:mysql41 # the port on which services are provided, similar to port 3306 in mysql

Log = / var/log/sphinx/searchd.log

Query_log = / var/log/sphinx/query.log

Read_timeout = 5

Max_children = 30

Pid_file = / var/run/sphinx/searchd.pid

Seamless_rotate = 1

Preopen_indexes = 1

Unlink_old = 1

Workers = threads # for RT to work

Binlog_path = / var/lib/sphinx/

}

Note: it is important to note that in the configuration file:

a. There is a sql statement after the sql_query parameter, but the first field of the sql statement must be the id field, which does not need to be defined when the parameter type is defined, otherwise sphinx will not recognize this attribute.

b. You must include a field of type sql_filed_string when defining the property, otherwise sphinx doesn't think you have to use full-text search.

3. Create an index

Indexer sbtest1

4. Start the searchd service

Service searchd start

5. View searchd service status

[root@TiDB-node2 data] # searchd-- status

Sphinx 2.3.2-id64-beta (4409612)

Using config file'/ etc/sphinx/sphinx.conf'...

Searchd status

Uptime: 80403

Connections: 11

Maxed_out: 0

Command_search: 7

Command_excerpt: 0

Command_update: 0

Command_delete: 0

Command_keywords: 0

Command_persist: 0

Command_status: 1

Command_flushattrs: 0

Agent_connect: 0

Agent_retry: 0

Queries: 7

Dist_queries: 0

Query_wall: 0.010

Query_cpu: OFF

Dist_wall: 0.000

Dist_local: 0.000

Dist_wait: 0.000

Query_reads: OFF

Query_readkb: OFF

Query_readtime: OFF

Avg_query_wall: 0.001

Avg_query_cpu: OFF

Avg_dist_wall: 0.000

Avg_dist_local: 0.000

Avg_dist_wait: 0.000

Avg_query_reads: OFF

Avg_query_readkb: OFF

Avg_query_readtime: OFF

Qcache_max_bytes: 16777216

Qcache_thresh_msec: 3000

Qcache_ttl_sec: 60

Qcache_cached_queries: 0

Qcache_used_bytes: 0

Qcache_hits: 0

6. Test using: mysql client

[root@TiDB-node2 data] # mysql-uroot-predhat-h327.0.0.1-P9306-e "select id,k,c,pad from sbtest1 where match ('03679578678')"

Mysql: [Warning] Using a password on the command line interface can be insecure.

+- -+

| | id | k | c | pad |

+- -+

| | 1 | 15324329 | 64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603 | 78326593386-76411244360-77646817949-14319822046-41963083322 |

+- -+

7.python testing uses

#! / usr/bin/env python

# coding:utf-8

Import pymysql

Con = pymysql.connect (host='127.0.0.1', port=9306, user= "", passwd= "", db= "")

With con.cursor (pymysql.cursors.DictCursor) as cur:

Cur.execute ("select * from sbtest1 where match ('679578678%')")

Print (cur.fetchall ())

[root@TiDB-node2 ~] # python sphinx.py

[{upright canals: '64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603, upright paddies:' 78326593386-76411244360-77646817949-14319822046-41963083322, utilisation: 1}]

8. Table structure:

Root@mysqldb 18:06: [sbtest] > show create table sbtest1\ G

* * 1. Row *

Table: sbtest1

Create Table: CREATE TABLE `sbtest1` (

`id`int (11) NOT NULL AUTO_INCREMENT

`k`int (11) NOT NULL DEFAULT'0'

`c` char (120) NOT NULL DEFAULT''

`pad`char (60) NOT NULL DEFAULT''

PRIMARY KEY (`id`)

KEY `k1` (`k`)

) ENGINE=InnoDB AUTO_INCREMENT=25000000 DEFAULT CHARSET=utf8mb4

The data of table sbtest1 in the test environment has 25 million rows, and the test sql is select * from sbtest1 where c like '679578678%'. It took nearly 4 minutes for MySQL to perform a full table scan because it could not use the index, but it also took more than 70 seconds for sphinx to create the index. After the index is successfully created, the cost of executing the python script above or using the mysql client to get is basically at the millisecond level. Performance has improved a lot, and there is no link between sphinx and MySQL after the index is created successfully. It does not affect the operation of MySQL. Sphinx can be independently deployed to a CVM.

9. Update index file: incrementally update index file

Indexer-rotate test1

After reading the installation and testing introduction of the full-text retrieval tool sphinx above, I hope it can bring some help to everyone in practical application. Due to the limited space in this article, there will inevitably be deficiencies and areas that need to be supplemented. You can continue to pay attention to the industry information section and will update your industry news and knowledge regularly. If you need more professional answers, you can contact us on the official website for 24-hour pre-sales and after-sales to help you answer questions at any time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.