What is the concept of CF recommendation algorithm 04/18 Update SLTechnology News&Howtos

What is the concept of CF recommendation algorithm

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is the concept of CF recommendation algorithm". In daily operation, I believe many people have doubts about the concept of CF recommendation algorithm. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "what is the concept of CF recommendation algorithm?" Next, please follow the editor to study!

I. description of collaborative filtering algorithm

The recommendation system applies data analysis technology to find out what users are most likely to like and recommend it to users, which is now available on many e-commerce websites. At present, the more mature recommendation algorithm is collaborative filtering (Collaborative Filtering, referred to as CF) recommendation algorithm. The basic idea of CF is to recommend items to users according to their previous preferences and the choices of other users with similar interests.

As shown in figure 1, in CF, the matrix of m × n is used to express the user's preference for the item. Generally, a score is used to indicate the user's preference for the item. The higher the score, the more you like the item. 0 indicates that you have not bought the item. The row in the figure represents a user, the column represents an item, and the Uij represents the score given by user I to the item j. CF is divided into two processes, one is the prediction process, the other is the recommendation process. The prediction process is to predict the possible score of items that users have not purchased, and recommendation is to recommend one or Top-N items that users are most likely to like based on the results of the prediction phase.

2. Comparison between User-based algorithm and Item-based algorithm.

CF algorithms are divided into two categories, one is based on memory (Memory-based), the other is based on Model (Model-based). Both User-based and Item-based algorithms belong to Memory-based type, the specific subdivision class can refer to the description of wikipedia.

The basic idea of User-based is that if user A likes items a, user B likes items a, b, c, and user C likes an and c, then user An is similar to users B and C, because they both like a, and users who like an also like c, so recommend c to user A. The algorithm uses the nearest neighbor (nearest-neighbor) algorithm to find a set of neighbors of a user, whose preferences are similar to those of the user. The algorithm predicts the user according to the preference of the neighbor.

There are two major problems with the User-based algorithm:

1. Data sparsity. A large e-commerce recommendation system generally has a very large number of items, of which less than 1% may be bought by users, and the items bought by different users have low overlap, so that the algorithm can not find a user's neighbor, that is, prefer similar users.

two。 The algorithm is scalable. The computational complexity of the nearest neighbor algorithm increases with the increase of the number of users and items, so it is not suitable to be used in the case of large amount of data.

The basic idea of Iterm-based is to calculate the similarity between items in advance based on the historical preference data of all users, and then recommend items similar to the items that users like. Taking the previous example as an example, we can see that items an and c are very similar, because users who like an also like c, while user A likes a, so recommend c to user A.

Because the direct similarity of items is relatively fixed, we can calculate the similarity between different items offline in advance, store the results in the table, look up the table when recommended, and calculate the possible score of the user, which can solve the above two problems at the same time.

Third, the detailed process of Item-based algorithm

(1) similarity calculation

The Item-based algorithm is preferred to calculate the similarity between items, and there are several methods to calculate the similarity:

1. Based on the similarity calculation of Cosine-based, the similarity between items is calculated by calculating the cosine of the angle between two vectors. The formula is as follows:

The molecule is the inner product of two vectors, that is, the numbers of two vectors in the same position are multiplied.

two。 Based on the similarity calculation of association (Correlation-based), the Pearson-r correlation degree between two vectors is calculated as follows:

Among them

Represents the score given by the user u to the item I, and represents the average score of the first item.

3. In the calculation of adjusted Adjusted Cosine similarity, because the calculation of similarity based on cosine does not take into account the scores of different users, some users may prefer to give high scores, while others tend to give low scores. This method eliminates the influence of different users' scoring habits by subtracting the average of their scores. The formula is as follows:

Among them

Represents the average of the user u score.

(2) calculation of predicted value

Based on the similarity between the previously calculated items, there are two ways to predict the items that the user has not scored:

1. Weighted summation.

After using the weighted summation of the scores of the items scored by the user u, the weight is the similarity between each item and the item I, then averaging the similarity of all items, and then calculating the score of the item I by the user u, the formula is as follows:

Among them, it is the similarity between item I and item N, and the score of item N by user u.

two。 Return.

It is similar to the above weighted summation method, but the regression method does not directly use the score value of similar items N, because there is a misunderstanding when using cosine method or Pearson correlation method to calculate similarity, that is, two scoring vectors may be far apart (Euclidean distance), but may have high similarity. Because different users have different scoring habits, some tend to score high and some tend to score low. If both users like the same item, their European distance may be far away because of their different scoring habits, but they should have a higher degree of similarity. In this case, the calculation of the user's original score for similar items will result in a bad prediction. A new value is re-estimated by linear regression and predicted using the same method as above. The method of recalculation is as follows:

Item N is similar to item I, and the linear regression calculation of the score vector of item N and I is the error of the regression model. There is no explanation in the article on how to carry out linear regression, so we need to consult other relevant literature.

At this point, the study of "what is the concept of CF recommendation algorithm" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.