Analysis of opensearch Search usage cases 04/18 Update SLTechnology News&Howtos

Analysis of opensearch Search usage cases

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is a detailed introduction to "opensearch use case analysis" for everyone. The content is detailed, the steps are clear, and the details are properly handled. I hope this article "opensearch use case analysis" can help you solve your doubts. Let's go deeper and learn new knowledge together with the ideas of the small editor.

[Background]

I believe many people have encountered the need to do a search function for websites or apps. A long time ago, they had to do lucene for a long time. They had to do Chinese participles (such as the one with Chinese Academy of Sciences), rewrite tokenizer, build indexes themselves, do real-time update process, and consider how to divide the data into rings when the data volume is large. Since the beginning of 2014, I have been in contact with opensearch. At that time, I had to do cluster search (including searching clusters, searching posts, searching posts in a certain heap, searching members in a heap, etc.). It took less than a week from getting started to skillfully using opensearch. Overall, I was very satisfied. I feel that this thing is very consistent with the rhythm of Internet entrepreneurship, simple and convenient, and I can quickly realize my own search interface.

[Use process]

Take the establishment of a coming and going cluster search as an example (if you don't know what the cluster is, it's understood as something similar to Baidu Post Bar), such as searching Zhao Wei, we can find Zhao Wei related clusters. Let me briefly describe how to quickly build a search interface with opensearch.

1. Sign up for an opensearch account and press the button.

2. Create an application that defines the index structure. For example:

This can be searched and can be understood as fields that need to be indexed, such as names of clusters, pinyin names of clusters, labels of clusters, etc. I am not currently using aggregation, so let's not worry about this for now. Filterable, for example, a certain field (checkin_type) indicates that some clusters are private and some are not, so you need to check checkin_type as filterable, so that you can write a statement to select which search results meet the conditions when retrieving. The displayable indicates that the search interface comes out which fields we want to display to the client.

3. Data import, opensearch provides three ways to import data, you can choose according to your application needs. For example, importing from mysql is a graphical interface. All you need to do is to map the fields in mysql to the fields in the index structure just established. You can also push data through HDFS, SDK and HTTP APIs. SDK and HTTP APIs are very flexible. For details, please refer to the help document. [Note: mysql\hdfs is only supported on intranet]

4. Create index. Click the Data Import tab in the interface, and there will be an index rebuild tab. Click Rebuild Now, and opensearch will read data from the database we configured just now and create index according to the configured field corresponding method. Wait until this process is complete and you can access the search interface.

5. Visit the search interface and search for tests at the top right corner of the app's home page.

As shown in the figure, there is an http interface, and the search result data returned after accessing is json format. This is the simplest prototype of a search built out of this way.

[Tips]

Here are some of the requirements and problems you may encounter:

1. For example, if you encounter sorting requirements, for example, it is necessary to make field A hits more important than field B hits, that is, the good matches in field A should be ranked first (such as title and content). In this way, you can customize the sorting formula, you can refer to the document here to give a lot of sorting functions, for example, you can use bm25 algorithm to calculate static scores, text_relevance to calculate the matching degree of a field, fieldterm_proximity to calculate the density of matching, there are also functions that decay by time field.

2. For example, some recall requirements, such as search zhoujielun hope to be able to find Jay Chou, search stars can be all star-related documents (Does not necessarily contain the star two words), may usually be relatively large search engines through query refine and query correct this similar module to analyze query to expand the recall, here can be a little speculation, we will determine the hope of recalling the term can be made into a new field into the index structure, and give these fields a sorting weight to do recall and can be properly sorted.

3. For example, when encountering the need to search for nearby things, the sorting function provides a distance function, which calculates the spherical distance. This method is o(n). If there is more data, the efficiency may be affected. We can make a field in the index structure, use the geohash algorithm (this algorithm refers to http://en.wikipedia.org/wiki/Geohash) to change the two-dimensional coordinates in the query into some strings with the same prefix (for example, we can fix 5, 6, 7, 8 bits), and put these geohash strings into this new index field. When retrieving, the input two-dimensional coordinates are also calculated into geohash strings of 5, 6, 7, and 8 bits. In the data that can be indexed by these strings, the distance function is used for more accurate distance calculation and sorting, which can efficiently complete the search of nearby things.

4. In addition, the index with the largest data volume is about 5000w doc. In this case, the search results can still be returned within 10ms at qps500 (of course, the displayable fields of each doc of the search results should not be too large, so that the response time will be slower because of the network transmission data).

[Demand]

1. There are also badcases found in some queries. There are still some badcases in the query analysis and result intersection. For example, if we search for "Jay Chou Middle School Photo," it feels like Jay Chou>> Middle School = Photo according to the importance degree. Even if there is no article with full hits, Jay Chou Photo or Jay Chou Document should be recalled. I believe this piece will get better and better.

OpenSearch solution: A new feature is currently being developed that will rewrite multiple dimensions of user queries, such as low-weight term weight reduction, support for user-defined dictionaries (synonyms, error correction, stop words, professional words, etc.), which will further improve the search effect of long-tail words and reduce the rate of no results.

2. It would be nice to have a search function:). For example, according to the logs of some queries that are frequently searched by this search application, it is given what other words users searching for this query may search.

OpenSearch answer: related search, drop-down tips and other functions are already in the planning, including follow-up click feedback, personalized search We have already started research work, please look forward to it.

Read here, this article "opensearch use example analysis" article has been introduced, want to master the knowledge points of this article also need to practice to understand, if you want to know more related content of the article, welcome to pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.