In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the installation and testing of the full-text retrieval tool sphinx, which involves things, learned from the theoretical knowledge, there are many books and documents for your reference, from the perspective of practical significance, accumulated years of practical experience can be shared with you.
Introduction: recently, online due to some indescribable reasons need to filter some words in the database, so more statements on the line are select C1 fron tb1 where C1 like'% name%' similar to this kind of fuzzy query. At first, I thought of full-text indexing for word segmentation, but I also thought that these are all online databases, and the data in the table are tens of millions of rows of data, online online ddl or using third-party tools will block online DML operations during modification, and the maintenance of full-text indexing also needs to consume certain resources, so I temporarily give up this idea. In the discussion with the developer and combined with the actual business, we finally intend to choose a third-party full-text retrieval tool, here we choose the full-text retrieval tool sphinx.
Installation and testing of sphinx:
Installation of sphinx
Yum install sphinx-y
two。 Configuration file
#
# Minimal Sphinx configuration sample (clean, simple, functional)
#
Source sbtest_sbtest1
{
Type = mysql
Sql_host = localhost
Sql_user = root
Sql_pass = redhat
Sql_db = sbtest
Sql_port = 3306 # optional, default is 3306
Sql_query =\
SELECT userid as id,\
K,\
C,\
Pad\
FROM sbtest1\
Where c like '679578678%'\
Sql_attr_uint = k # data type int or timestamp
Sql_field_string = c # data type text or string
Sql_attr_string = pad # data type string
}
Index sbtest1
{
Source = sbtest_sbtest1
Path = / var/lib/sphinx/sbtest1 # path to the index file
}
Index testrt
{
Type = rt
Rt_mem_limit = 128m
Path = / var/lib/sphinx/testrt
Rt_field = title
Rt_field = content
Rt_attr_uint = gid
}
Indexer
{
Mem_limit = 128m
}
Searchd
{
Listen = 9312
Listen = 9306:mysql41 # the port on which services are provided, similar to port 3306 in mysql
Log = / var/log/sphinx/searchd.log
Query_log = / var/log/sphinx/query.log
Read_timeout = 5
Max_children = 30
Pid_file = / var/run/sphinx/searchd.pid
Seamless_rotate = 1
Preopen_indexes = 1
Unlink_old = 1
Workers = threads # for RT to work
Binlog_path = / var/lib/sphinx/
}
Note: it is important to note that in the configuration file:
a. There is a sql statement after the sql_query parameter, but the first field of the sql statement must be the id field, which does not need to be defined when the parameter type is defined, otherwise sphinx will not recognize this attribute.
b. You must include a field of type sql_filed_string when defining the property, otherwise sphinx doesn't think you have to use full-text search.
3. Create an index
Indexer sbtest1
4. Start the searchd service
Service searchd start
5. View searchd service status
[root@TiDB-node2 data] # searchd-- status
Sphinx 2.3.2-id64-beta (4409612)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Using config file'/ etc/sphinx/sphinx.conf'...
Searchd status
-
Uptime: 80403
Connections: 11
Maxed_out: 0
Command_search: 7
Command_excerpt: 0
Command_update: 0
Command_delete: 0
Command_keywords: 0
Command_persist: 0
Command_status: 1
Command_flushattrs: 0
Agent_connect: 0
Agent_retry: 0
Queries: 7
Dist_queries: 0
Query_wall: 0.010
Query_cpu: OFF
Dist_wall: 0.000
Dist_local: 0.000
Dist_wait: 0.000
Query_reads: OFF
Query_readkb: OFF
Query_readtime: OFF
Avg_query_wall: 0.001
Avg_query_cpu: OFF
Avg_dist_wall: 0.000
Avg_dist_local: 0.000
Avg_dist_wait: 0.000
Avg_query_reads: OFF
Avg_query_readkb: OFF
Avg_query_readtime: OFF
Qcache_max_bytes: 16777216
Qcache_thresh_msec: 3000
Qcache_ttl_sec: 60
Qcache_cached_queries: 0
Qcache_used_bytes: 0
Qcache_hits: 0
6. Test using: mysql client
[root@TiDB-node2 data] # mysql-uroot-predhat-h327.0.0.1-P9306-e "select id,k,c,pad from sbtest1 where match ('03679578678')"
Mysql: [Warning] Using a password on the command line interface can be insecure.
+- -+
| | id | k | c | pad |
+- -+
| | 1 | 15324329 | 64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603 | 78326593386-76411244360-77646817949-14319822046-41963083322 |
+- -+
7.python testing uses
#! / usr/bin/env python
# coding:utf-8
Import pymysql
Con = pymysql.connect (host='127.0.0.1', port=9306, user= "", passwd= "", db= "")
With con.cursor (pymysql.cursors.DictCursor) as cur:
Cur.execute ("select * from sbtest1 where match ('679578678%')")
Print (cur.fetchall ())
[root@TiDB-node2 ~] # python sphinx.py
[{upright canals: '64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603, upright paddies:' 78326593386-76411244360-77646817949-14319822046-41963083322, utilisation: 1}]
8. Table structure:
Root@mysqldb 18:06: [sbtest] > show create table sbtest1\ G
* * 1. Row *
Table: sbtest1
Create Table: CREATE TABLE `sbtest1` (
`id`int (11) NOT NULL AUTO_INCREMENT
`k`int (11) NOT NULL DEFAULT'0'
`c` char (120) NOT NULL DEFAULT''
`pad`char (60) NOT NULL DEFAULT''
PRIMARY KEY (`id`)
KEY `k1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=25000000 DEFAULT CHARSET=utf8mb4
The data of table sbtest1 in the test environment has 25 million rows, and the test sql is select * from sbtest1 where c like '679578678%'. It took nearly 4 minutes for MySQL to perform a full table scan because it could not use the index, but it also took more than 70 seconds for sphinx to create the index. After the index is successfully created, the cost of executing the python script above or using the mysql client to get is basically at the millisecond level. Performance has improved a lot, and there is no link between sphinx and MySQL after the index is created successfully. It does not affect the operation of MySQL. Sphinx can be independently deployed to a CVM.
9. Update index file: incrementally update index file
Indexer-rotate test1
After reading the installation and testing introduction of the full-text retrieval tool sphinx above, I hope it can bring some help to everyone in practical application. Due to the limited space in this article, there will inevitably be deficiencies and areas that need to be supplemented. You can continue to pay attention to the industry information section and will update your industry news and knowledge regularly. If you need more professional answers, you can contact us on the official website for 24-hour pre-sales and after-sales to help you answer questions at any time.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.