How to analyze Learning to Rank 07/01 Update SLTechnology News&Howtos

How to analyze Learning to Rank

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

How to analyze Learning to Rank, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

I. Learning ranking (Learning to Rank)

LTR (Learning torank) learning ranking is a supervised learning (SupervisedLearning) ranking method. LTR has been widely used in many fields of text mining, such as sorting returned documents in IR, candidate products in recommendation system, user sorting, sorting candidate translation results in machine translation and so on. The traditional ranking methods in the field of IR generally construct the correlation function, and then sort according to the correlation degree. There are many factors that affect the degree of correlation, such as tf,idf,dl mentioned above. There are many classical models to accomplish this task, such as VSM,Boolean model, probability model and so on. For the traditional ranking method, it is difficult to integrate many kinds of information, for example, the vector space model uses tf*idf as the weight to construct the correlation function, it is difficult to use other information, and if there are many parameters in the model, it will make it very difficult to adjust the parameters, and there is likely to be over-fitting. So people naturally think of using machine learning (Machine Learning) to solve this problem, so there is Learning to rank. Machine learning method is easy to integrate a variety of characteristics, and has a mature and profound theoretical foundation, parameters are optimized through iteration, there is a set of mature theory to solve sparse, over-fitting and other problems. The framework of the learning sorting system is shown in figure 2.1:

Figure 2.1 sorted learning system framework

For the labeling training set, the LTR method is selected, the loss function is determined, and the relevant parameters of the sorting model can be obtained by optimizing with the goal of minimizing the loss function, which is the learning process. In the prediction process, the result to be predicted is inputted into the ranking model obtained by learning, and the correlation score of the result can be obtained, and the final order of the result to be predicted can be obtained by using the score.

Generally speaking, LTR has three types of methods: single document method (Pointwise), document pair method (Pairwise), and document list method (Listwise).

1 Pointwise

The object processed by Pointwise is a single document. After transforming the document into a feature vector, the sorting problem is mainly transformed into a conventional classification or regression problem in machine learning. We now take multi-category classification as an example: table 2-1 is a manually labeled part of the training set, each document uses three characteristics: query and document BM25 similarity, query and document cosin similarity, and page PageRank value, the correlation between query and di is multiple, label is divided into five levels, namely {perfect,Excellent,good,fair,bad}. Therefore, five training examples with label are generated, and then we can use any multi-class classification algorithm of machine learning, such as maximum entropy, support vector machine and so on.

Pointwise is calculated entirely from the point of view of the classification of individual documents, regardless of the relative order between documents. And it assumes that the relevance is query-independent, as long as (query,di) has the same correlation, then they are divided into the same level and belong to the same category. In fact, however, the relativity of relevance is related to queries, such as a common query that has many related documents, and the label annotation level of the query and its relatively later documents may be higher than the label standard level of a rare query and its few highly relevant documents. This results in inconsistent training samples and no relative ranking between documents predicted to be at the same label level. The common methods of Pointwise include McRank and so on. When the model parameters are learned, the model can be used to judge the correlation. For new queries and documents, a value can be obtained through the scoring function of the model, and the documents can be sorted using this value.

2 pairwise

Pairwise is a popular method at present, and he will focus on document order relationship compared to pointwise. It mainly sums up the scheduling problem as a binary classification problem, at this time there are more machine learning methods, such as Boost, SVM, neural network and so on. For the related document set of the same query, for any two different label documents, you can get a training instance (di,dj). If di > dj, the value is + 1, and vice versa-1, so we get the training samples needed for binary classifier training, as shown in figure 2.2. When testing, you can get a partial order relation of all documents by classifying all pair, thus sorting can be achieved.

Figure 2.2 schematic of Pairwise sorting method

Although Pairwise has improved Pointwise, there are obvious problems with this approach:

a. Only the relative order of the two documents is considered, regardless of where they appear in the search results list. The document at the top of the list is more important, and if the document judgment error appears in the front, the penalty function is significantly higher than the judgment error at the bottom. Therefore, we need to introduce the location factor, each document has a different weight according to its position in the result list, the higher the weight, the greater the penalty.

b. The number of related document sets varies greatly for different queries. after conversion to document pairs, some queries may have only a dozen document pairs, while some queries may have hundreds of corresponding document pairs, which brings bias to the evaluation of the effectiveness of the learning system. Suppose that query 1 corresponds to 500 document pairs, query 2 corresponds to 10 document pairs, that the machine learning system corresponding to query 1 can judge 480 document pairs correctly, and that query 2 can judge 2 correctly. For the total document, the accuracy of the system is (480 percent) / (500 percent 10) = 95%, but from the point of view of query, the corresponding accuracy of the two queries is 96% and 20% respectively, with an average of 58%, which is very different from that of the total document pair. This will make the model tend to query with a large set of related documents.

Pairwise has many implementations, such as Ranking SVM,RankNet,Frank,RankBoost and so on.

3 Listwise

Listwise differs from the above two methods in that it takes the list of all search results corresponding to each query as a training example. Listwise obtains the optimal score function F according to the training sample training. Corresponding to the new query, the score F scores each document, and then sorts from high to low according to the score, which is the final ranking result.

Corresponding to how to train the optimal score function F, this paper introduces a training method based on the probability distribution of permutation and combination of search results. As shown in figure 2-2, the corresponding query Q, assuming that the search engine returns three documents A, B, C, these three documents can produce 6 permutations, corresponding to the scoring function F, and score the correlation of the three documents to get F (A), F (B), F (C). According to these three values, the probability values of six kinds of permutation and combination can be calculated. Corresponding to different score function F, the probability distribution of the six kinds of arrangement is different.

Assuming that the scoring function g is the scoring function corresponding to the standard answer obtained by manual marking, we do not know what it is. We try to find a scoring function f, so that the score generated by f is the same as that generated by manual marking as much as possible. Suppose there are two other scoring functions h and f, their calculation method is known, and the corresponding search permutation and combination probability distribution is shown in the figure. Through the KL distance, f is closer to the virtual optimal function g than h. The training process is to find the function closest to the virtual function g in as many functions as possible, and use this scoring function to score in the prediction.

The Listwise method is often more direct, it focuses on its own goals and tasks, and directly optimizes the results of document sorting, so the effect is often the best. The common methods of Listwise include AdaRank,SoftRank,LambdaMART and so on.

II. Acquisition of LTR training data

1. Manual labeling. If a large amount of training data is needed, manual labeling is not realistic.

two。 Corresponding to the search engine, the training data can be obtained by the user clicking on the record. Corresponding to the search results returned by the query, the user will click on some of the pages, assuming that the user first clicks on the page that is more relevant to the query. Although this assumption is not true in many cases, practical experience shows that it is feasible to obtain training data.

Third, LTR feature selection

When using LTR, we will select a series of text features and use machine learning methods to integrate them into a sorting model to determine the order of the final results, each of which we call a "feature". For a web page text, the document area where feature is located can include Body field, anchor field, title field, URL field, whole document field, and so on.

The feature of a document can be divided into two types: one is the characteristics of the document itself, such as Pagerank value, content richness, spam value, number of slash, url length, inlink number, outlink number, siterank and so on. The second is the characteristics of Query-Doc: the relevance of the corresponding query of the document, the tf, IDF value of each domain, the correlation of bool model,vsm,bm25,language model and so on.

The answer to the question on how to analyze Learning to Rank is shared here. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.