Introduction to the latest functions of OpenSearch 04/25 Update SLTechnology News&Howtos

Introduction to the latest functions of OpenSearch

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Abstract: Aliyun Open search (OpenSearch) is a structured data search hosting service, which can provide simple, efficient, stable, low-cost and scalable search solutions. In the form of platform service, OpenSearch simplifies the professional search technology with low threshold and low cost, so that the search engine technology is no longer the business bottleneck of customers, and the product search function is realized at low cost and iterated quickly. This article will introduce two new functions of OpenSearch: semantic understanding of e-commerce query and search algorithm platform. This article is compiled from live sharing and PPT. This sharing will focus on the following three aspects: introduction of new features of OpenSearch product profile-introduction of new features of e-commerce query semantic understanding-search algorithm platform and popularity model 1. OpenSearch product profile OpenSearch has always played the role of search platform within Alibaba Group before it was launched on the cloud. In addition to Taobao and Tmall, other core business products of Alibaba Group, such as Flying Pig and Cainiao, are also accessed through OpenSearch. In fact, the search engine principles used by OpenSearch, Taobao and Tmall are basically the same, but OpenSearch implements multi-tenant logic on its basis. After OpenSearch goes to the cloud, the main direction is to provide external enterprise customers with one-stop intelligent content search service with high search quality. The main advantages of OpenSearch can be summarized into the following four aspects: the native integration of Taobao's main algorithm functions: OpenSearch is not only based on the large-scale distributed search engine independently developed by Alibaba, but also integrates the industry-leading Taos algorithm functions on the search engine, including semantic query and understanding, sorting and so on in the e-commerce field. With the help of these algorithm capabilities, we can help the rapid improvement of GMV conversion rate, CTR and other business indicators. You can update the index in seconds without development: OpenSearch supports seamless interface with RDS and PolarDB databases on the cloud. In the console of OpenSearch, only a few key parameters, such as database name, user name and password, can be configured to directly connect the database and OpenSearch to achieve real-time data update. The whole process is very simple and does not require any development. 0 OPS, deployment-free: OpenSearch is a fully managed service on the cloud, so there is no need for operation and deployment, which is very convenient. Service building is easier: developers can quickly start using OpenSearch without understanding the complex principles of the service, and process and visualize the critical paths. It only takes a few steps to build the basic service. If there are frequent peaks and troughs in the business, such as 618 or double 11, you can expand and scale up the capacity on OpenSearch at any time, and can take effect in a timely manner, without the need to apply for quotas in advance, so it is also very convenient. Second, the introduction of new functions-semantic understanding of e-commerce query. First of all, we will introduce the new function of OpenSearch's semantic understanding of e-commerce query. The reason for realizing the function of semantic understanding of e-commerce query is that in e-commerce scenarios, search-guided transactions often account for more than 60%, so e-commerce requires very strict search accuracy. However, in the actual search scene, the keywords entered by users are often very colloquial, while the title, detailed description, category and label of the goods in the site are more written and standardized text, which leads to the query words and product information can not match. If the site does not implement semantic Mapping, users often can not find the products they want, which makes the rate of no fruit or less fruit will be very high for the platform. Based on the above background, OpenSearch integrates the intelligent semantic Mapping function of Taobao search into OpenSearch, which is called query semantic understanding. This function provides services specifically for the e-commerce field, so the verticality is very strong.

Taobao search and OpenSearch solution here to introduce an actual case to help you understand the specific role of the semantic query Mapping function of e-commerce scenarios for the actual business. In the following figure, the user enters "new NKIE sneakers" in the search box, where the "NKIE" is actually the wrong "NIKE", and the search results are actually the new NIKE sneakers that meet the actual needs of the user. So, what does the system do in the process? First of all, the system will correct the spelling errors of the keywords entered by the user, modify them to the correct keywords, and normalize the case. Then use the word separator to deal with it, for the semantic unit of word segmentation to achieve entity label recognition, the role here is to grasp the core things to search, to ensure that the search results are rich and will not stray from the topic. In the end, the system will expand synonyms to expand the number of recalls of search results. Therefore, the word that the system actually queries at the end is not the "new NKIE sneakers" entered by the user, but the query word actually executed as shown in the figure. This is the role of semantic understanding of e-commerce query in the actual business.

Core function highlights the e-commerce query semantic understanding function integrated by this OpenSearch has the following three core highlights. The reason why the field of e-commerce is emphasized here is that OpenSearch provides in-depth optimization for e-commerce scenarios on the basis of the original general query semantic understanding. E-commerce spelling error correction: the OpenSearch system will check the spelling errors of the query string entered by the user, calculate whether the query words need to be corrected, and give suggestions for error correction. For explicit spelling errors, the original query string will be rewritten directly and then retrieved; for possible spelling errors, the original query string will still be used for retrieval. E-commerce named entity recognition (NER): the OpenSearch system will segment the query words and identify the requirements of each semantic unit, and each entity will be marked with a type tag (a total of 36 tags), such as brand, category, new product, etc. In the system, the importance of labels is roughly divided into three grades: high, medium and low. Entities with low tag importance will be ignored in the query to expand the recall rate; entities with high tag importance will directly affect the calculation of text relevance and category prediction training. For example, "Nike slim dress", the physical recognition results are "Nike / brand / medium", "slim / style element / low", "dress / category / tall". And if the default label importance in the system does not meet the user's expectations, the user can also allocate it directly. Flexible intervention: the query semantic understanding function in e-commerce scenarios is based on the big data training of Taobao search for many years, which will provide good query analysis results for e-commerce scenarios in most cases. however, different business scenarios will have their own vertical query words that may not be covered or analyzed errors, so open search also supports visual upload, management intervention entries, and custom thesaurus upload. Support flexible configuration, that is, effective intervention, simple and fast, convenient for users to use this function more flexibly in the business.

E-commerce query semantic understanding customer case here to share an actual customer case of semantic understanding using OpenSearch e-commerce query. The customer's business belongs to the vertical e-commerce industry, and search-led transactions account for more than 60% of all transactions, so search is the most important function on the site. On the other hand, the rate of no results searched by users in the site is close to 60%, which means that the massive search PV every day is converted to zero, which is a huge waste. Based on the above background, the customer clearly set up a project to optimize the search results within two months, and the biggest challenge it faces is that the business is very vertical and the query words entered by users in the circle and the official names of products are often very different. In addition, many of the goods sold by the customer come from overseas, with the initial name in English, and when introduced into China, the customer's users will sometimes be transliterated directly into Chinese to address the product name. so the hot search words in the station are often aliases, acronyms, transliteration words, etc. Moreover, the customer's original self-built open source service does not have an intelligent semantic understanding of the search keywords, and even some entity noun participles are wrong, resulting in a high rate of search results.

Based on the above problems, the customer's technical staff began to investigate the solution. The customer clearly requires that the optimization cycle for the search results should be completed within two months, while there is only one technical manpower. The main way to solve the problem of search results recall is to establish an intelligent semantic understanding service for search keywords, and the realization of this capability from 0 to 1 not only requires several algorithm experts who are proficient in NLP, but also requires at least hundreds of thousands of manually labeled data materials to do training, but at this stage, customers have neither people nor data, and there are no ready-made open source plug-ins on the market. Based on the consideration of such efficiency and input cost, customers choose to use OpenSearch to modify and upgrade the previous search function. The customer not only uses OpenSearch's e-commerce word segmentation and semantic understanding of e-commerce query, but also puts some custom patch thesaurus on OpenSearch, which can be uploaded directly by operating students without additional development workload for technicians. After less than 2 months, after the completion of the whole process, the result recall rate of some services using OpenSearch has basically reached about 90%, while that of self-built services is only about 31%. By comparing the two, we can find that using OpenSearch has a great advantage in the result recall rate. In addition, the use of OpenSearch increases the search-led transaction conversion rate by 9% compared to the original open source self-built service. Third, the introduction of new functions-search algorithm platform and popular model search algorithm platform want to realize the function of search algorithm platform, because when the search business develops to a certain stage, user behavior data will play an important role in improving the effect of search sorting. Limited to the large and complex behavior log data, it is often necessary for machines to learn a large number of historical data through statistical learning algorithms to generate empirical models. The empirical model is used to quantify the behavior data into reasonable scores, which is finally used in the ranking. In order to efficiently implement this whole set of machine learning processes in business, we often face the following two problems: complicated data collection and preprocessing: hundreds of millions of data need to be collected, stored and processed automatically every day. For these raw data, statistics, analysis and processing are also needed to find the basic rules of user behavior, which requires developers to have long-term analysis ability and long-term experience accumulation. For example, when finding the positive and negative imbalance of the original data, how to construct the sample data with a balanced number of positive and negative samples, which also requires some work experience of the algorithm. Algorithm tuning parameters like looking for a needle in a haystack: developers often need to spend a lot of time debugging in the face of complex algorithm parameters, which is like looking for a needle in a haystack. When a seemingly reliable parameter combination is found, there will be some uncertainty as to whether the parameter combination is optimal.

Based on the above two extremely troublesome problems, OpenSearch supports users to directly complete a series of complicated algorithm daily work, such as data preprocessing, feature engineering, algorithm parameter adjustment, model evaluation, model management model launch, and so on, and integrates the relatively mature sorting algorithm model within Alibaba Group. Users can train the high-quality algorithm sorting model by themselves through the whole algorithm platform and apply it to the algorithm sorting of OpenSearch, and then put the model online to the actual business and compare it with AB Test to get the overall experimental results and recovery, so that the full-link OpenSearch can be fully opened and presented visually and programmatically. For developers who have not been exposed to these algorithms, they can also use the OpenSearch platform directly. And the platform focuses on search algorithm model training, so it will build some models that Alibaba Group has precipitated after many years of actual combat experience, such as more mature popularity models. And in the future, models such as category prediction will also be online to the OpenSearch platform one after another. In short, users can complete the whole algorithm model training and upstream and downstream processes on the platform. In addition, for whether the online model is excellent, the platform will also produce a more professional evaluation report, through these evaluation reports and indicators to make the effect of the model clear at a glance. As for whether users pay for the model after launch, OpenSearch also supports one-click generation of comparative AB Test experiments, which allows users to quickly iterate through the model and improve the effect. For the regression of the online effect of the model, the platform can also provide reports for business index analysis. The popularity model simply measures the popularity of users for each product or file in the station, that is, the static quality score. It can be added to the sorting, making it a factor of sorting, through which the sorting effect is directly affected. OpenSearch will add the characteristics of four dimensions to the training, including entity dimension, time dimension, behavior dimension and statistical dimension.

Model Evaluation report OpenSearch automatically produces a model evaluation report after each model is trained, and evaluates the effectiveness of the model through this quality. For example, the evaluation report of the classification algorithm model includes the following indicators, such as whether the whole model is suitable for use in the business, as well as some detailed indicators such as AUC value, ROC curve, confusion matrix, feature weight, and so on. For some students who have experience in algorithms, they can also find problems directly through these index values, and carry out the next round of tuning of the model.

How to use if you want to use the search algorithm platform, you first need to upload behavior data through OpenSearch's SDK, and then you can directly create and train the model on the algorithm platform. After the model training is completed, you can view the model evaluation report. If the model evaluation report recommends the use of this model, you can deploy the model directly to the application of OpenSearch and then apply it to the sort expression.

Search business effect evaluation after applying the model to the ranking expression, the search business effect needs to be evaluated. The traditional way is to combine offline manual evaluation and online traffic testing. Offline manual evaluation often extracts some representative query keywords to form a set of keywords of appropriate scale. Aiming at this keyword set, the corresponding results are queried from the output results of the sorting model, and the correlation is marked manually. For the manual evaluation result data, we use the predefined evaluation formula such as DCG, and use the numerical method to evaluate the proximity between the model result and the labeled ideal result. If the manual evaluation believes that the effect of the search business is getting better, online traffic testing can be carried out at this time. In order to truly verify the quality of a sorting model, you need to use the AB Test mechanism just mentioned to verify and compare. During the user search, the test mechanism automatically determines the user's packet number according to a certain strategy to ensure that the traffic automatically extracted and imported into different packets is comparable, and then let the users of different groups see the results provided by different sorting models, and the user's behavior under different models will be recorded. These behavior data form a series of indicators through data analysis and comparison. Finally, the conclusion of which is better or worse between different models is formed. For OpenSearch, it cannot support the first kind of offline manual evaluation, so users can only extract Query for manual evaluation. When the manual evaluation is completed, OpenSearch can support the end-to-end one-button configuration of AB Test online, and can support the experiment of traffic division under multi-scenarios and multi-packets, which fully meets the experimental requirements of a single application but covering a variety of business scenarios. And make the experimental configuration and online and offline status can be adjusted flexibly, that is, the operation will take effect. And it can achieve the sky-level output of the report and help customers to make decisions. And can achieve the whole process visual interface operation, can enable users to achieve fast access, simple no threshold, and products and operators can also be used directly. When the whole optimization is completed, OpenSearch also provides a set of professional search service quality report index system, which is divided into five systems for e-commerce scenarios, namely, traffic indicators, click indicators, user analysis indicators, Query analysis indicators and transaction indicators. With the help of these effect evaluation indicators, customers can be very professional and directly see where the improvement of search results can be reflected and where the existing problems can be reflected after their own round after round of optimization. Ali Yunshuang received a subsidy of 1.1 billion yuan in advance and entered the iPhone 11 Pro: https://www.aliyun.com/1111/2019/home?utm_content=g_1000083110.

The original link to this article is the original content of Yunqi community and may not be reproduced without permission.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.