Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use elasticsearch to build your own search system

2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how to use elasticsearch to build your own search system. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

What is elasticsearch#?

Elasticsearch is an open source and highly scalable full-text search and analysis engine with near-real-time query performance.

The famous Lucene search engine is widely used in the search field, but the operation is complex and tedious, which always keeps developers away. Elasticsearch takes Lucene as its core to achieve all the indexing and search functions, and hides the complexity of Lucene through simple RESTful syntax, thus making full-text search simple.

Based on Lucene, ES provides some distributed implementations: clustering, sharding, replication, etc.

Search why use es# instead of MySQL

Our case study is a mini product search system, why not consider using MySQL to achieve the search function? The reasons are as follows:

MySQL uses innodb engine by default, and the bottom layer is implemented by b + tree, while the bottom layer of Es is implemented by inverted index, which supports word segmentation in various dimensions and can control search requirements of different granularity. (the MYSQL8 version also supports full-text retrieval, which is implemented using inverted indexing. If you are interested, you can see the difference between the two.)

If you use the fuzzy match of% key% of MySQL to compare with the search of es, their time-consuming has reached around 40:1 at 80, 000 data, and there is no doubt that es wins in terms of speed.

Application of es in large factories #

The most widely used combination of es is elk to search and analyze logs.

58 Security Department and JD.com order Center almost all use es to complete the storage and retrieval of related information.

Es is also used for various searches and analysis in tob projects.

In C-end products, enterprises usually package their own search system based on Lucene, so there will be more customized search requirements to adapt to the company's marketing strategy, recommendation system, etc.

Es client selection # spring-boot-starter-data-elasticsearch#

I believe that all kinds of open class videos or small projects you see online recommend using this springboot integrated es client, but we want say no!

This figure is the latest version of the dependency introduced, we can see that it uses es-high-client 6.8.7, while the es7.x version has been updated for a long time, and many new features cannot be used here, so version lag is his biggest problem. And its underlying layer is also highclient, we can operate highclient more flexibly. Neither of the two companies I have stayed with has adopted this client.

Elasticsearch-rest-high-level-client#

This is an officially recommended client that supports the latest es. In fact, it is also very convenient to use, because it is an official recommendation, so it is certainly superior to the former in the operation of features. And the client is different from TransportClient, there is no concurrency bottleneck, the official first push, must be a boutique!

# build your own mini search system #

The introduction of es dependency, in addition to the need to introduce springboot-web dependency, jackson dependency and lombok dependency.

Copy 7.3.2 org.elasticsearch.client elasticsearch-rest-high-level-client ${es.version} org.elasticsearch.client elasticsearch-rest-client org.elasticsearch Elasticsearch org.elasticsearch elasticsearch ${es.version} org.elasticsearch.client elasticsearch-rest-client ${es.version}

Es profile es-config.properties

Copyes.host=localhostes.port=9200es.token=es-tokenes.charset=UTF-8es.scheme=httpes.client.connectTimeOut=5000es.client.socketTimeout=15000

Encapsulated RestHighLevelClient

Copy@Configuration@PropertySource ("classpath:es-config.properties") public class RestHighLevelClientConfig {@ Value ("${es.host}") private String host; @ Value ("${es.port}") private int port; @ Value ("${es.scheme}") private String scheme; @ Value ("${es.token}") private String token; @ Value ("${es.charset}") private String charSet @ Value ("${es.client.connectTimeOut}") private int connectTimeOut; @ Value ("${es.client.socketTimeout}") private int socketTimeout; @ Bean public RestClientBuilder restClientBuilder () {RestClientBuilder restClientBuilder = RestClient.builder (new HttpHost (host, port, scheme)) Header [] defaultHeaders = new Header [] {new BasicHeader ("Accept", "* / *"), new BasicHeader ("Charset", charSet), / / token is set so that the security gateway can verify token to decide whether to initiate a request or not. We only do symbolic configuration new BasicHeader ("E_TOKEN", token)} here. RestClientBuilder.setDefaultHeaders (defaultHeaders); restClientBuilder.setFailureListener (new RestClient.FailureListener () {@ Override public void onFailure (Node node) {System.out.println ("failed listening to an es node");}}); restClientBuilder.setRequestConfigCallback (builder-> builder.setConnectTimeout (connectTimeOut) .setSocketTimeout (socketTimeout)); return restClientBuilder @ Bean public RestHighLevelClient restHighLevelClient (RestClientBuilder restClientBuilder) {return new RestHighLevelClient (restClientBuilder);}}

Encapsulating es common operations es search system encapsulating source code

Copy@Servicepublic class RestHighLevelClientService {@ Autowired private RestHighLevelClient client; @ Autowired private ObjectMapper mapper; / * create an index * @ param indexName * @ param settings * @ param mapping * @ return * @ throws IOException * / public CreateIndexResponse createIndex (String indexName, String settings, String mapping) throws IOException {CreateIndexRequest request = new CreateIndexRequest (indexName) If (null! = settings & &! ".equals (settings)) {request.settings (settings, XContentType.JSON);} if (null! = mapping & &!" .equals (mapping)) {request.mapping (mapping, XContentType.JSON);} return client.indices () .equals (request, RequestOptions.DEFAULT) } / * determine whether index exists * / public boolean indexExists (String indexName) throws IOException {GetIndexRequest request = new GetIndexRequest (indexName); return client.indices () .exists (request, RequestOptions.DEFAULT) } / * search * / public SearchResponse search (String field, String key, String rangeField, String from, String to,String termField, String termVal, String... IndexNames) throws IOException {SearchRequest request = new SearchRequest (indexNames); SearchSourceBuilder builder = new SearchSourceBuilder (); BoolQueryBuilder boolQueryBuilder = new BoolQueryBuilder (); boolQueryBuilder.must (new MatchQueryBuilder (field, key)) .must (new RangeQueryBuilder (rangeField) .from (from) .to (to)) .must (new TermQueryBuilder (termField, termVal); builder.query (boolQueryBuilder); request.source (builder) Log.info ("[search statement is: {}]", request.source () .toString ()); return client.search (request, RequestOptions.DEFAULT) } / * * bulk import * @ param indexName * @ param isAutoId use automatic id or id * @ param source * @ return * @ throws IOException * / public BulkResponse importAll (String indexName, boolean isAutoId) of incoming objects String source) throws IOException {if (0 = = source.length ()) {/ / todo throws an exception import data is empty} BulkRequest request = new BulkRequest () JsonNode jsonNode = mapper.readTree (source); if (jsonNode.isArray ()) {for (JsonNode node: jsonNode) {if (isAutoId) {request.add (new IndexRequest (indexName) .source (node.asText (), XContentType.JSON)) } else {request.add (new IndexRequest (indexName) .id (node.get ("id"). AsText ()) .source (node.asText (), XContentType.JSON);} return client.bulk (request, RequestOptions.DEFAULT);}

To create an index, settings sets whether the index sets replication nodes and the number of shards. Mappings, like the table structure in the database, is used to specify the type of each field. At the same time, you can also set whether the field is segmented (we use ik Chinese word Segmentation here) and what word segmentation method is adopted.

Copy @ Test public void createIdx () throws IOException {String settings = "" + "{\ n" + "\" number_of_shards\ ":\" 2\ ",\ n" + "\" number_of_replicas\ ":\" 0\ "\ n" + "}" String mappings = "" + "{\ n" + "\" properties\ ": {\ n" + "\" itemId\ ": {\ n" + "\" type\ ":\" keyword\ " \ n "+"\ "ignore_above\": 64\ n "+"},\ n "+"\ "urlId\": {\ n "+"\ "type\":\ "keyword\" \ n "+"\ "ignore_above\": 64\ n "+"},\ n "+"\ "sellAddress\": {\ n "+"\ "type\":\ "text\" \ n "+"\ "analyzer\":\ "ik_max_word\",\ n "+"\ "search_analyzer\":\ "ik_smart\",\ n "+"\ "fields\": {\ n "+"\ "keyword\": {\ "ignore_above\": 256 \ "type\":\ "keyword\"}\ n "+"}\ n "+"},\ n "+"\ "courierFee\": {\ n "+"\ "type\":\ "text\ n" + "} \ n "+"\ "promotions\": {\ n "+"\ "type\":\ "text\",\ n "+"\ "analyzer\":\ "ik_max_word\",\ n "+"\ "search_analyzer\":\ "ik_smart\" \ n "+"\ "fields\": {\ n "+"\ "keyword\": {\ "ignore_above\": 256,\ "type\":\ "keyword\"}\ n "+"}\ n "+"} \ n "+"\ "originalPrice\": {\ n "+"\ "type\":\ "keyword\",\ n "+"\ "ignore_above\": 64\ n "+"} \ n "+"\ "startTime\": {\ n "+"\ "type\":\ "date\",\ n "+"\ "format\":\ "yyyy-MM-dd HH:mm:ss\"\ n "+"} \ n "+"\ "endTime\": {\ n "+"\ "type\":\ "date\",\ n "+"\ "format\":\ "yyyy-MM-dd HH:mm:ss\"\ n "+"} \ n "+"\ "title\": {\ n "+"\ "type\":\ "text\",\ n "+"\ "analyzer\":\ "ik_max_word\",\ n "+"\ "search_analyzer\":\ "ik_smart\" \ n "+"\ "fields\": {\ n "+"\ "keyword\": {\ "ignore_above\": 256,\ "type\":\ "keyword\"}\ n "+"}\ n "+"} \ n "+"\ "serviceGuarantee\": {\ n "+"\ "type\":\ "text\",\ n "+"\ "analyzer\":\ "ik_max_word\",\ n "+"\ "search_analyzer\":\ "ik_smart\" \ n "+"\ "fields\": {\ n "+"\ "keyword\": {\ "ignore_above\": 256,\ "type\":\ "keyword\"}\ n "+"}\ n "+"} \ n "+"\ "venue\": {\ n "+"\ "type\":\ "text\",\ n "+"\ "analyzer\":\ "ik_max_word\",\ n "+"\ "search_analyzer\":\ "ik_smart\" \ n "+"\ "fields\": {\ n "+"\ "keyword\": {\ "ignore_above\": 256,\ "type\":\ "keyword\"}\ n "+"}\ n "+"} \ n "+"\ "currentPrice\": {\ n "+"\ "type\":\ "keyword\",\ n "+"\ "ignore_above\": 64\ n "+"}\ n "+"}\ n "+" ClientService.createIndex ("idx_item", settings, mappings);

Word segmentation skills:

Minimum participle in indexing and maximum participle in search. For example, "Java bosom friend" index participle includes Java, bosom friend, bosom friend, knowledge, etc. The minimum granularity participle allows us to match more retrieval needs, but we should set the maximum participle when we search, and match the index database with "Java" and "bosom friend". The result is closer to our purpose.

Keyword is also set for the word segmentation field, which is convenient for accurate matching and fast positioning when subsequent errors are checked.

We imported 100, 000 Taobao Singles' Day activity data into es as our sample data, the data structure is as follows

Copy {"_ id": "https://detail.tmall.com/item.htm?id=538528948719\u0026skuId=3216546934499"," seller's address ":" Shanghai "," express fee ":" freight: 0.00 yuan "," discount activity ":" full 199 minus 10, full 299 minus 30, full 499 minus 60, cross-store "," commodity ID ":" 538528948719 "," original price ":" 2290.00 " "event start time": "2016-11-11 00:00:00", "event end time": "2016-11-11 23:59:59", "title": "[Tmall overseas direct camp] ReFa CARAT RAY Lifa double ball roller wave beauty instrument", "Service guarantee": "authentic guarantee" Free freight insurance; quick refund; seven-day refund "," venue ":" imported premium goods "," current price ":" 1950.00 "}

Call the batch import method encapsulated above to import

Copy @ Test public void importAll () throws IOException {clientService.importAll ("idx_item", true, itemService.getItemsJson ();}

We call the encapsulated search method to search for related wine products whose origin is Wuhan and the price is between 11 and 149, which is consistent with our operation of setting filter criteria to search products on Taobao.

Copy @ Test public void search () throws IOException {SearchResponse search = clientService.search ("title", "wine", "currentPrice", "11", "149"," sellAddress "," Wuhan "); SearchHits hits = search.getHits (); SearchHit [] hits1 = hits.getHits (); for (SearchHit documentFields: hits1) {System.out.println (documentFields.getSourceAsString ()) }}

We get the following search results, where _ score is the score of a certain item, and the products are sorted by it.

Copy {"_ index": "idx_item", "_ type": "_ doc", "_ id": "Rw3G7HEBDGgXwwHKFPCb", "_ score": 10.995819, "_ source": {"itemId": "525033055044", "urlId": "https://detail.tmall.com/item.htm?id=525033055044&skuId=def"," sellAddress ":" Wuhan, Hubei " "courierFee": "Express: 0.00", "promotions": "109,299-30,499-60, cross-store", "originalPrice": "3768.00", "startTime": "2016-11-01 00:00:00", "endTime": "2016-11-11 23:59:59" "title": "Wine, high wine, original Spanish bottle, imported red wine, Monde dry red wine, 6 whole cases of wine set", "serviceGuarantee": "breakage package refund" Authentic guarantee; commonweal baby; do not support 7-day refund; extremely quick refund "," venue ":" main food venue "," currentPrice ":" 151.00 "}} scalability thinking #

With the expansion of commodity search weight, we can use a variety of charging methods to provide different stores with increased weight and increase exposure to adapt to their own marketing strategies. At the same time, we often find that many of the items in front of Taobao search are those we have checked before, which intelligently adds weight to these products by recording user behavior, running models and so on.

Word segmentation expansion, perhaps because of the particularity of some goods, we can customize the extended word segmentation dictionary for a more accurate and humanized search.

Highlight function, es provides highlight highlight function, which is realized by highlighting search keywords in the product display we see on Taobao. Highlight use mode

The above is the editor for you to share how to use elasticsearch to build their own search system, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 271

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report