In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
The following talk about sphinx combined with scws to achieve full-text search of mysql, the secret of the text is close to the topic. So, no gossip, let's just look at the following, I believe that after reading sphinx combined with scws to achieve full-text search of mysql this article will certainly benefit you.
System environment
Hostnam
IP address
Related services
Version
SQL
172.169.18.128
Mysql5.6 (Master)
Sphinx
172.169.18.210
Mysql5.6 (slave) php5.6
Apache2.4
Sphinx version: 2.2.10
Sphinx plug-in: 1.3.3
Scws participle version: 1.2.3
I. brief introduction
1.1.What is Sphinx
Reference address: http://www.sphinxsearch.org/sphinx-tutorial
Sphinx is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide full-text search function with high speed, low space consumption and high result relevance for other applications. Sphinx can be easily integrated with SQL databases and scripting languages. The current system has built-in support for MySQL and PostgreSQL database data sources, and also supports reading XML data in a specific format from standard input.
The speed of Sphinx to create an index is that it takes only 3-4 minutes to create an index of 1 million records, an index of 10 million records can be created in 50 minutes, and an incremental index containing only the latest 100000 records takes only a few seconds to rebuild at a time.
1.2.The features of Sphinx are as follows:
(1) High-speed indexing (on modern CPU, peak performance can reach 10 MB/ seconds)
(2) High performance search (on 2-4GB text data, the average response time for each retrieval is less than 0.1s)
(3) it can handle large amounts of data (it is known to handle more than 100 GB of text data and 100m documents on a single CPU system)
(4) provide excellent correlation algorithm, compound Ranking method based on phrase similarity and statistics (BM25).
(5) support distributed search
(6) phrase search is supported
(7) provide document summary generation
(8) it can provide search service as the storage engine of MySQL.
(9) support multiple retrieval modes such as Boolean, phrase, word similarity, etc.
(10) the document supports multiple full-text retrieval fields (up to 32)
(11) the document supports multiple additional attribute information (for example, grouping information, timestamp, etc.)
(12) support verdict
1.3. Summary
Advantages: high efficiency and high expansibility
Disadvantages: not responsible for data storage
Use the Sphinx search engine to index the data, load it in at once, and then save it in memory. In this way, users only need to retrieve data on the Sphinx CVM when searching. Moreover, Sphinx has better performance without the drawbacks of MySQL's random disk I Dot O.
SCWS is the acronym of Simple Chinese Word Segmentation (simple Chinese word segmentation system).
Reference address: http://www.xunsearch.com/scws/index.php
This is a mechanical Chinese word segmentation engine based on word frequency dictionary, which can segment a whole paragraph of Chinese text into words correctly. Words are the smallest morpheme unit in Chinese, but they are not separated by spaces in writing as in English, so how to segment words accurately and quickly has always been a difficult problem in Chinese word segmentation.
2.2, featur
SCWS is developed in pure C language, does not rely on any external library functions, can directly use dynamic link libraries to embed applications, and supports Chinese codes such as GBK, UTF-8 and so on. In addition, the PHP expansion module is also provided, which can quickly and easily use the word segmentation function in PHP.
There are not many innovative components in the word segmentation algorithm. We use the word frequency dictionary collected by ourselves, supplemented by certain proper names, person names, place names, digital age and other rules to achieve basic word segmentation. after small-scale testing, the accuracy rate is between 90% and 95%, which can basically meet the needs of some small search engines, keyword extraction and other occasions. The first prototype version was released at the end of 2005.
II. Environmental preparation
1. Temporarily turn off the firewall
2. Close seliunx
3. System environment
Centos7.4 mysql5.6 php5.6
Third, set up Sphinx service
1. Install the dependency package
# yum-y install make gcc gcc-c++ libtool autoconf automake mysql-devel
2. Install Sphinx
# yum install expat expat-devel
# wget-c http://sphinxsearch.com/files/sphinx-2.2.10-release.tar.gz
# tar-zxvf sphinx-2.2.10-release.tar.gz
# cd sphinx-2.2.10-release/
#. / configure-prefix=/usr/local/sphinx-with-mysql-with-libexpat-enable-id64
# make & & make install
3. Install the libsphinxclient,php extension using
# cd api/libsphinxclient/
#. / configure-prefix=/usr/local/sphinx/libsphinxclient
# make & & make install
After the installation, check to see if there are three directories bin etc var under / usr/local/sphinx. If so, the installation is correct!
4. Install the PHP extension of Sphinx: mine is 5.6. you need to install sphinx-1.3.3.tgz. If it is below php5.4, you can sphinx-1.3.0.tgz. If php7.0 needs to install sphinx-339e123.tar.gz (http://git.php.net/?p=pecl/search_engine/sphinx.git;a=snapshot;h=339e123acb0ce7beb2d9d4f9094d6f8bcf15fb54;sf=tgz)
# wget-c http://pecl.php.net/get/sphinx-1.3.3.tgz
# tar zxvf sphinx-1.3.3.tgz
# cd sphinx-1.3.3/
# phpize
#. / configure-with-sphinx=/usr/local/sphinx/libsphinxclient/-with-php-config=/usr/bin/php-config
# make & & make install
# after success, you will be prompted:
Installing shared extensions: / usr/lib64/php/modules/
# modify php configuration
# echo "[Sphinx]" > > / etc/php.ini
# echo "extension = sphinx.so" > > / etc/php.ini
# restart httpd service
# systemctl restart httpd.service
5. Create test data for testing (db:jiangjj)
CREATE TABLE IF NOT EXISTS `items` (
`id`int (11) NOT NULL AUTO_INCREMENT
`title`varchar (255) NOT NULL
`content` text NOT NULL
`created` datetime NOT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' full-text search test datasheet 'AUTO_INCREMENT=11
INSERT INTO `items` (`id`, `title`, `content`, `created`) VALUES
(1, 'linux mysql Cluster installation', 'MySQL Cluster is a highly practical, scalable, high-performance, highly redundant version of MySQL suitable for distributed computing environment', '2016-09-07 00 MySQL 0015')
(2, 'mysql Master-Slave replication', 'basic principles of mysql Master-Slave backup (replication) mysql supports one-way, asynchronous replication, in which one CVM acts as the master CVM, while one or more other CVM acts as the slave CVM', '2016-09-06 00 0000')
(3, 'hello',' can you search me?', '2016-09-05 00lv 0000')
(4, 'mysql',' mysql is the best database?', '2016-09-03 00Rule 0000')
(5, 'mysql index','as for the benefits of the MySQL index, if the MySQL that designs and uses the index correctly and reasonably is a Lamborghini, then the MySQL without the design and use of the index is a human tricycle', '2016-09-01 00 MySQL')
(6, 'Cluster', 'with regard to the benefits of MySQL index, if the MySQL that correctly designs and uses the index is a Lamborghini, then the MySQL without the design and use of the index is a human tricycle', '0000-00-0000: 00MySQL')
(9, 'replication principle', 'redis also has replication', '0000-00-0000: 00 00')
(10, 'redis cluster', 'cluster technology is an important means of building a high-performance website architecture. Imagine that while a website is under the pressure of high concurrency, it is also necessary to query the qualified data from the massive data and respond quickly. What we must think of is to slice the data and put the data into multiple different cloud server nodes according to certain rules. To reduce the pressure on a single-node CVM', '0000-00-0000: 00: 00
CREATE TABLE IF NOT EXISTS `sph_ counter` (
`counter_ id` int (11) NOT NULL
`max_doc_ id` int (11) NOT NULL
PRIMARY KEY (`counter_ id`)
) count table marked by ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT=' incremental index'
6. Sphinx configuration: pay attention to modifying the data source configuration information.
Configuration address: http://www.cnblogs.com/yjf512/p/3598332.html
The index strategy of "Main + Delta" ("main index" + "incremental index") is adopted below, and the unary participle that comes with Sphinx is used.
# vim / usr/local/sphinx/etc/sphinx.conf
Source items {
Type = mysql
Sql_host = 172.169.18.128
Sql_user = jiangjj
Sql_pass = 123456
Sql_db = jiangjj
Sql_query_pre = SET NAMES utf8
Sql_query_pre = SET SESSION query_cache_type = OFF
Sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX (id) FROM items
Sql_query_range = SELECT MIN (id), MAX (id) FROM items\
WHERE id (SELECT max_doc_id FROM sph_counter WHERE counter_id=1)\
AND id > = $start AND id a.. z, _, a.. z, Utility 410.. Utility 42F-> U+430..U+44F, U+430..U+44F
Preopen = 1
Min_infix_len = 1
}
# incremental indexing
Index items_delta: items {
Source = items_delta
Path = / usr/local/sphinx/var/data/items_delta
}
# distributed indexing
Index master {
Type = distributed
Local = items
Local = items_delta
}
Indexer {
Mem_limit = 256m
}
Searchd {
Listen = 9312
Listen = 9306:mysql41
Log = / usr/local/sphinx/var/log/searchd.log
Query_log = / usr/local/sphinx/var/log/query.log
# compat_sphinxql_magics = 0
Attr_flush_period = 600,
Mva_updates_pool = 16m
Read_timeout = 5
Max_children = 0
Dist_threads = 2
Pid_file = / usr/local/sphinx/var/log/searchd.pid
# max_marches = 1000
Seamless_rotate = 1
Preopen_indexes = 1
Unlink_old = 1
Workers = threads
Binlog_path = / usr/local/sphinx/var/data
}
7. Sphinx creates an index
# cd / usr/local/sphinx/bin/
# rebuilding the index for the first time
#. / indexer-c / usr/local/sphinx/etc/sphinx.conf-- all
# start sphinx
#. / searchd-c / usr/local/sphinx/etc/sphinx.conf
# View the process
# ps-ef | grep searchd
# View status
#. / searchd-c / usr/local/sphinx/etc/sphinx.conf-- status
# turn off sphinx
#. / searchd-c / usr/local/sphinx/etc/sphinx.conf-- stop
8. Index update and usage instructions
The incremental Index is updated every N minutes. An index merge is usually done every night when the load is low, and the "incremental index" is re-established. Of course, if there is not much data in the "main index", you can also directly re-establish the "main index".
When searching in API, both primary index and incremental index are used to obtain quasi-real-time search data. The Sphinx configuration of this article puts "primary index" and "incremental index" into the distributed index master, so you only need to query the distributed index "master" to get all the matching data (including the latest data).
Updating and merging indexes can be done by cron job:
8.1. Edit
# crontab-e
* / 1 * / usr/local/sphinx/shell/index_update.sh
0 3 * / usr/local/sphinx/shell/merge_index.sh
/ / View
# crontab-l
8.2. The script is as follows
# Update script
# vim / usr/local/sphinx/shell/index_update.sh
#! / bin/sh
/ usr/local/sphinx/bin/indexer-c / usr/local/sphinx/etc/sphinx.conf-- rotate items_delta > / dev/null 2 > & 1
# merge script
[root@Sphinx shell] # vim merge_index.sh
#! / bin/bash
Indexer=/usr/local/sphinx/bin/indexer
Mysql= `which mysql`
# host=172.169.18.128
# mysql_user=jiangjj
# mysql_password=123456
QUERY= "use jiangjj;select max_doc_id from sph_counter where counter_id = 2 limit 1;"
Index_counter=$ ($mysql-h272.169.18.128-ujiangjj-p123456-sN-e "$QUERY")
# merge "main + delta" indexes
$indexer-c / usr/local/sphinx/etc/sphinx.conf-- rotate-- merge items items_delta-- merge-dst-range deleted 0 0 > > / usr/local/sphinx/var/index_merge.log 2 > & 1
If ["$?"-eq 0]; then
# # update sphinx counter
If [!-z $index_counter]; then
$mysql-h272.169.18.128-ujiangjj-p123456-Djiangjj-e "REPLACE INTO sph_counter VALUES (1,'$index_counter')"
Fi
# # rebuild delta index to avoid confusion with main index
$indexer-c / usr/local/sphinx/etc/sphinx.conf-- rotate items_delta > > / usr/local/sphinx/var/rebuild_deltaindex.log 2 > & 1
Fi
# Licensing
# chmod uplix * .sh
After the test is clear, proceed to the next step.
Set up scws (Chinese word Segmentation) service
Download address: http://www.xunsearch.com/scws/download.php
1. Download and install scws: pay attention to the extended version and the version of php
# wget-c http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2
# tar jxvf scws-1.2.3.tar.bz2
# cd scws-1.2.3/
#. / configure-prefix=/usr/local/scws
# make & & make install
2. Installation of PHP extension for scws
# cd. / phpext/
# phpize
#. / configure
# make & & make install
The compilation completion status is as follows:
# modify php configuration file
[root@Sphinx ~] # echo "[scws]" > > / etc/php.ini
[root@Sphinx ~] # echo "extension = scws.so" > > / etc/php.ini
[root@Sphinx ~] # echo "scws.default.charset = utf-8" > > / etc/php.ini
[root@Sphinx ~] # echo "scws.default.fpath = / usr/local/scws/etc/" > > / etc/php.ini
3. Thesaurus installation
Download address: http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
# wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
# tar jxvf scws-dict-chs-utf8.tar.bz2-C / usr/local/sphinx/etc/
# chown-R apache:apache / usr/local/sphinx/etc/dict.utf8.xdb
5. Php uses Sphinx+scws to test
1. In the Sphinx source code API, there are several API calls in several languages. One of them is sphinxapi.php.
Create a new Search.php file, a front-end page index.php
Code omission
Visit:
2. SphinxQL test
To use SphinxQL, you need to add the corresponding listening port to the configuration of Searchd (see configuration above).
# mysql-h227.0.0.1-P9306-ujiangjj-p
Mysql > show global variables
Mysql > desc items
Mysql > select * from master where match ('mysql*') limit 10
Mysql > show meta
For the above sphinx combined with scws to achieve full-text search of mysql related content, is there anything you don't understand? Or if you want to know more about it, you can continue to follow our industry information section.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.