What is the two-classification model used in elasticsearch intelligent recommendation system? 04/11 Update SLTechnology News&Howtos

What is the two-classification model used in elasticsearch intelligent recommendation system?

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the two-classification model used in elasticsearch intelligent recommendation system". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what the two-classification model used in elasticsearch intelligent recommendation system is.

1. Overview of algorithm Design

The design of the algorithm scheme, under the guidance of the overall idea, starts with the data investigation, and makes clear the problems that need to be paid attention to in the process of model development and application. In the algorithm scheme design, the consideration of the problem is added in order to get the optimized model results.

The overall algorithm scheme is shown in the following figure:

two。 Sample construction

Sample construction is very important for any data mining scenario, and the recommendation system is no exception.

Generally speaking, users' purchase is on the low side, so just using user purchase as a positive sample will lead to a small sample size. In order to solve this problem, the courses in which users have behavior are defined as positive samples and others as negative samples. These behaviors include: clicking, collecting, sharing, adding shopping carts, buying and so on. Because the users represented by various behaviors have different preferences for the course, they will be reflected in the subsequent modeling by the way of sample weight.

The courses in which users have produced behavior are only a small part of all courses, so if we directly take the courses in which users have not produced behavior as negative samples, it will lead to a serious imbalance between positive and negative samples. Therefore, in the actual modeling process, the negative samples will be sampled. According to past experience, the ratio of positive and negative samples can be set to 1:80 after sampling.

3. Feature engineering

This section will describe the idea of feature engineering, and the final features need to be adjusted according to the data. The following is for reference only. Details can be found in the "label / screening system".

Feature processing of short-term user behavior

The number of times the course was clicked, collected, shared, added to the shopping cart, purchased, etc.

The number of clicks, favorites, shares, purchases, etc., of courses under different tags

Have users clicked on or collected the course recently?

L characteristics of curriculum attributes

Category to which the course belongs

Label to which the course belongs

The ratio of course sales to the average sales of similar courses

The ratio of course price to the average price of similar courses

The number of recent clicks, shares, favorites, and shopping carts

The number of words in the title of the course, the number of pictures, and the ratio to the mean of similar courses.

Days on the shelf

L user attribute characteristics

Gender, age, occupation, region, days of app use

L users' long-term preferences

User label preference obtained by Matrix decomposition

User price preference

4. Model training

Model training mainly considers PLM, LightGBM, FM and other models, and each model has its own advantages and disadvantages.

L PLM can achieve the nonlinear effect by adding blocks to the linear model. The speed of training and prediction is fast, and it is suitable for large-scale sparse features.

L LightGBM is an iterative model based on decision tree. It is suitable for dense features and has high model accuracy. But the speed of training and prediction is not high, so it is not suitable for large-scale sparse features.

L FM automatically crosses between features and can find useful feature crossover. The speed of training and prediction is fast, but it is essentially a linear model, and the effect may not be guaranteed.

In the selection of evaluation indicators, we mainly pay attention to the accuracy, taking into account the coverage, the accuracy of the use of evaluation indicators such as MAP@k,DDCG@k.

In the process of the project, the models are tuned separately, and the features and models are iteratively optimized according to the evaluation indicators, so as to continuously improve the effect of offline evaluation.

5. Evaluation index

In the selection of evaluation indicators, it is mainly considered from three aspects: user satisfaction, prediction accuracy and coverage.

User satisfaction is the most important index to evaluate the recommendation system, but user satisfaction can not be calculated offline, but generally obtained through user survey and online experiments. Or you can calculate user satisfaction by analyzing the user behavior log, which is roughly calculated by calculating the percentage of recommended courses that users buy and score higher.

L prediction accuracy is the most important offline evaluation index of recommendation system, which includes score prediction and TopN recommendation. The prediction accuracy of score prediction is measured by root mean square error (RMSE) and mean absolute error (MAE), and the prediction accuracy recommended by TopN is measured by accuracy (Precision) and recall rate (Recall).

L coverage (Coverage) describes the ability of a recommendation system to discover the long tail of an item, which can be defined in two indicators. The first is information entropy, and the calculation formula is as follows:

Where p (I) is the sum of the popularity of the item I divided by the popularity of all items.

The second is the Gini index, which is calculated as follows:

Here ij is the j item in the list of items sorted by item popularity p (I) from smallest to largest.

In addition, mAP@k,DDCG@k is also an important index of model evaluation.

In the process of the project, we optimize the models separately, and weigh the evaluation indicators to iteratively optimize the characteristics and models, so as to continuously improve the effect of offline evaluation.

6. Model application

In the practical application of the model, we need to pay attention to the running efficiency and update frequency of the model.

1) forecasting efficiency

In order to predict the full number of users and course combinations, it is necessary to predict the number of users and courses, where n is the number of users and m is the number of courses, which is very resource-consuming. Therefore, for individual users, it is necessary to filter some of the courses and only predict the user's preference for these courses. The following parts of courses are currently being considered for inclusion in the forecast:

L Operations course

L popular courses

Mainly aimed at new users, used to recommend people's courses to them

Recently, users often click on the courses under the category and the courses under the recent frequently clicked tabs.

The courses that users often click on recently represent, to some extent, the recent needs of users.

Recently, the regular label has a certain degree of divergence, for example, the label about the brand will spread to the courses of the same brand, and the label about the function will spread to the course of the same function.

Contribute to the discovery of new courses on the shelves

Recommended courses based on collaborative filtering algorithm

Replacing courses with tags, user-based collaborative filtering and curriculum-based collaborative filtering will improve the novelty of recommended courses.

2) Update frequency

The current algorithm is designed to be updated once a day. Every night, today's new data will be brought into the scope of model training and prediction, including new user behavior, new course information, and new user data.

Because the current model is offline, the user's new behavior will not affect the recommendation result, and the new recommendation will not be generated until the model update is completed the next day.

3) scene combination

After the model predicts the commodity, it will generate the score of users' preference for the commodity. The following are designed separately for three different scenarios.

1. Recommend the goods for operation

First, the id of the operating goods is obtained from the operation table, and then the user's commodity score table is associated to obtain the user's preference for the goods, and the display order of the goods on the app is controlled according to the preference degree.

two。 Recommend goods under specified categories

First, the id of the goods under the specified category is obtained, and then the user commodity score table is associated to obtain the user's preference for the goods, and the display order of the goods on the app is controlled according to the preference.

3. Search recommendation

After the product search relevance is obtained through ElasticSearch, the user's commodity score table is associated, and the user's preference for goods is multiplied by the search relevance to get a comprehensive score, according to which the display order of search items on app is controlled.

At this point, I believe that everyone on the "elasticsearch intelligent recommendation system used in the two-classification model is how" have a deeper understanding, might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.