How to do in-depth Analysis of FM+GBM sorting Model 07/01 Update SLTechnology News&Howtos

How to do in-depth Analysis of FM+GBM sorting Model

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

How to carry out FM+GBM sorting model depth analysis, I believe that many inexperienced people are helpless about this, this article summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

background

Short Video of information flow are mainly distributed by algorithms, supplemented by manual distribution, and intelligent distribution of videos is realized by algorithms to achieve the effect of thousands of people. The whole distribution process is divided into three stages: trigger recall, sorting and rearrangement. The sorting layer plays a connecting role and is a very important link. In the process of sorting layer optimization, in addition to drawing lessons from the cutting-edge experience and practices of the industry, we have also made some innovations in the model.

Short Video sorting of information streams currently uses the Wide&Deep model targeting CTR prediction. By introducing duration features, click + duration multi-objective optimization, etc., we have achieved good returns:

● Increase the average playing time of video as a real somatosensory signal for users, bringing about an increase in user consumption time;

● Multi-objective optimization of click + duration is realized through sample weighting of consumption duration, and the improvement of click rate and consumption duration is realized;

● Introduce sample data of multiple video delivery scenes to realize multi-scene sample fusion;

In the process of optimizing the ranking model, we also investigated deep models such as DeepFM/DeepCN, which had no obvious advantage in both offline and online indicators. While optimizing the Wide&Deep model, the more urgent need is to jump out of the original framework and find new revenue points.

cdn.com/9d7991c499d8a9b82991051261863d1811117a0a.png">

GBM is introduced to integrate submodel and high-level features, and the effect is better than that of single model. From the perspective of computational learning theory, Wide&Deep is a high-variance model that is prone to overfitting (wd model training is 7% higher than evaluation index). GBM integrates multiple submodels and advanced features by boosting, giving better play to their different roles and complementary advantages, and at the same time has better interpretability as a whole.

The above is a brief introduction to the evolution of the information flow Short Video sorting model; the FM+GBM model is a pioneering work of our team, and the following is a brief introduction.

model

Factorization Machines (FM) is a widely used recommendation model invented by Steffen Rendle, who currently works for Google. FM is an optimization and improvement of the traditional LR model in dealing with high-order interaction features: LR adds the combined features as new features to the model by means of feature intersection, and the model complexity is O(N^2).(N is the number of interactive features, the same below), the memory is strong and the generalization is weak;FM cleverly improves the generalization ability of the model by characterizing the features as hidden vectors and expressing the feature association through the similarity (inner product) of the hidden vectors; the complexity of FM model is O(N*k)(k is the dimension super parameter of the hidden vector).

Take the FM model of second-order interaction as an example. The model definition is as follows:

FM is essentially a linear model, and different terms affect the output of the model in a linear combination. If more complex combinations of models are considered, the computational complexity will be very high. Although academic models such as Tensor Decomposition deal with higher-order interaction features, at the industrial level, tradeoffs between effectiveness and performance tend to be considered only for second-order interactions. But on this basis, we can consider introducing nonlinear models to optimize FM model.

Among nonlinear models, tree models (CART/GBM/Random Forest) are widely used. We introduce GBM as a nonlinear model for combinatorial FM:

FM+GBM Phase I (pure GBM)

The first phase mainly opened up the whole experimental framework and data flow, and did not introduce additional signals. The signals used by GBM include sub-model scores such as wd/lr model, click rate/duration and somatosensory features, as well as some simple matching features. The whole experimental framework is relatively simple: GBMScorer is added to the fine arrangement process to realize the following two functions:

● The distribution server decides whether to use GBM scoring for fine arrangement through traffic buckets, which is specifically executed by GBMScorer;

● Feature normalization and reflux. After normalization, the extracted features are returned to the distribution server, and then returned to the log server by the distribution server. Click logs are also dropped via log servers. Click-show logs are aligned by reco_id+iid, cleaned, filtered and anti-cheat processed, and reflux features are extracted for model training;

In the course of research and experiments, the following are some lessons learned:

Selection of samples and hyperparameters: To make the model as smooth as possible, we randomly draw samples from the 7-day sliding window of data and divide the training/validation/test sets proportionally. The tree depth has a great influence on the results, and the effect of depth 6 is obviously better than other choices. In the process of parameter adjustment, auc and loss have no obvious difference in training/evaluation/test data sets, which shows the generalization of GBM model.

● Offline evaluation index: auc is one of the commonly used offline evaluation indexes for sorting models, but the global auc granularity is too coarse, and some fine-grained auc can be calculated in combination with business. The industry uses Query as the granularity to calculate QAUC, that is, auc of a single Query, and then auc obtained by fusing in a mean or weighted way, which is more reasonable than the global auc index. We take a similar approach, calculating auc at a single delivery granularity and then averaging or weighting by click. It should be noted that the granularity of auc computation determines the granularity at which the dataset is partitioned. If the calculation is based on the granularity of single delivery, all samples delivered at one time must fall on the training/evaluation/test data set at the same time. In addition, if there is zero click or full click in a single delivery, this part of the data also needs to be discarded.

Normalization of features: Normalization of user-relevant features is particularly important. By analyzing wd, we find that there are significant differences in the distribution of wd scores among different users: the variance of the scores of the same user is small, and the distribution is concentrated; the variance of the mean scores of different users is relatively large. Without normalization, the GBM training process is difficult to converge.

GBM and fine ranking scores also flow back with features. After log alignment, the two models can be compared on offline evaluation metrics. From the results of global auc/single-shot auc and small-flow experiments, the effect of fine-grained auc and online experiments tends to be more consistent.

FM+GBM II

The experimental framework and data flow were built in Phase I, and new signals were considered in Phase II.

Looking at the signals used by GBM at present, they are mainly divided into two categories: one is item side signal, which describes the characteristics of item from various dimensions: heat, duration, quality, etc. These characteristics help us filter quality content and improve the baseline of recommendation quality. The second is correlation feature, which is used to characterize the association degree between users and videos (association degree can be characterized by clicking or duration; at present, it is mainly characterized by clicking), so as to improve the personalization of recommendations and achieve thousands of people. Personalization is the core competitiveness of information flow.

At present, the correlation features are calculated by long-term and short-term user portraits and the matching degree of videos in primary/secondary categories and TAG. There are at least two problems:

BoW's sparse feature representation does not compute semantically-level matches; for example, users tagged with soccer and Messi's video compute a match of 0 in this way.

● The accuracy/coverage of video structured information is low at present, which will directly affect the effect of such features.

wd/lr model can solve the above problems to some extent. In particular, wd model, through embedding technology, embeds the structured information of users and videos themselves and various dimensions into a low-dimensional hidden vector, which can alleviate this problem to a certain extent. However, this kind of hidden vector lacks flexibility and cannot be used independently from wd model: calculating the matching degree between user and video requires not only hidden vectors of user and video, but also other features, and can be obtained after a series of hidden layer calculations.

The practice of mainstream companies in the industry is to divide all id features into hidden vectors in the same space through FM model, so all vectors are comparable: not only the matching degree between users and videos themselves and various dimensions, but even between users and videos. From the model structure, FM model can be considered as a neural network structure that can describe this matching degree more closely. Therefore, FM model is introduced to decompose click-to-show data, and hidden vectors of users, video itself and each dimension are obtained. The matching degree between the user and the video is calculated through these hidden vectors. These signals, along with other sub-models and advanced features, are used for click-through rate estimation via GBM.

This approach is similar to the LR+GBDT model published by Facebook in KDD'14, but the difference is that LR+GBDT is essentially a linear model, while FM+GBM is a tree model that can handle highly nonlinear complex relationships between signals and targets and has better interpretability. The whole algorithm framework is shown in the figure:

Because FM needs routine training, there is a time difference between user implicit vector filling and video implicit vector loading, and the implicit vectors of different versions of the model are incomparable. For this reason, we design a simple version alignment mechanism: all hidden vectors will retain the data of the latest two versions; in FM online calculation module, the version alignment logic is implemented, and the matching degree is calculated by the latest version of the hidden vector after alignment. Since the time window for routine training is 4~6 hours, keeping 2 versions of data is enough to ensure that most hidden vectors can be aligned. In more frequent model training, the number of versions can be increased to ensure model alignment.

Effect: Phase I + Phase II offline AUC increased by 10%, online CTR and per capita clicks increased by 6%.

After a period of iterative optimization, LR->WD->FM+GBM has been formed in the sorting layer of information flow Short videos. This funnel system helps sort layers trade-off between performance and effectiveness: the later the model gets more complex/features get more advanced/computation gets more expensive, and the less video data gets involved in the computation.

After reading the above content, do you know how to perform deep analysis of FM+GBM sorting model? If you still want to learn more skills or want to know more related content, welcome to pay attention to the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.