How to use PyTorch for Matrix decomposition for Animation recommendation 07/02 Update SLTechnology News&Howtos

How to use PyTorch for Matrix decomposition for Animation recommendation

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how to use PyTorch matrix decomposition for animation recommendations, the content is concise and easy to understand, absolutely can make your eyes bright, through the detailed introduction of this article, I hope you can get something.

We encounter many referrals a day-when we decide what to watch on Netflix/Youtube, product recommendations on shopping sites, song recommendations on Spotify, recommendations from friends on Instagram, job recommendations on LinkedIn. The list continues! The purpose of the recommendation system is to predict users'"evaluation" or "preference" of a product. These ratings are used to determine what users might like and to make wise recommendations.

There are two main types of recommendation systems:

Content-based systems: these systems try to match users based on the content of the project (type, color, etc.) and the user's profile (likes, dislikes, demographics, etc.). For example, Youtube may recommend my cooking videos based on the fact that I am a chef and / or I have seen a lot of baking videos in the past to take advantage of the information it has about the video content and my profile.

Collaborative filtering: they rely on the assumption that similar users like similar items. A measure of similarity between users and / or projects is used to make recommendations.

Envy discussed a very popular collaborative filtering technology-matrix decomposition.

Matrix decomposition

The recommendation system has two entities-users and items (a wide range of items, which can be actual products sold, videos, articles, etc.). Suppose there are m users and n items. The goal of our recommendation system is to build a mxn matrix (called utility matrix), which consists of ratings (or preferences) for each user-item pair. Initially, this matrix is usually very sparse because we only rate a limited number of user-item pairs.

This is an example. Suppose we have four users and five superheroes, and we try to predict what each user will think of each superhero. This is what our score matrix looked like at first:

4x5 score matrix for superhero level

Now, our goal is to populate this matrix by looking for similarities between users and projects. For example, we see that User3 and User4 give Batman the same rating, so we can assume that users are similar, they feel the same about Spider-Man, and predict that User3 will give Spider-Man a rating of 4. In practice, however, this is not that simple because there are multiple users interacting with many different items.

In practice, the score matrix is populated by decomposing the score matrix into two tall and fine matrices. Decompose to get:

The prediction of user-product score is the dot product of user and product.

Matrix factorization (numbers are taken at random for convenience)

PyTorch implementation

Using PyTorch to realize matrix decomposition, the embedding layer provided by PyTorch can be used to decompose the embedded matrix (Embedding) of users and items, and the gradient descent method is used to get the optimal decomposition.

Data set

I used the animation recommendation dataset from Kaggle:

Https://www.kaggle.com/CooperUnion/anime-recommendations-database

We have 69600 users and 9927 anime. 6337241 scores are provided.

target

Given the user score of a set of animation, predict the score of each pair of user animation.

Data exploration

We see a lot of lines with a rating of-1, which means there is a lack of ratings, and we can get rid of these lines.

Anime_ratings_df = pd.read_csv ("rating.csv")

Anime_ratings_df.shape

Print (anime_ratings_df.head ())

We can also look at the distribution of scores and the number of scores for each user.

Counter (anime_ratings.rating)

Average score per user

Np.mean (anime_ratings.groupby (['user_id']). Count () [' anime_id'])

Output 91.05231321839081

Data preprocessing

Because we will use the PyTorch embedding layer to create user and item embedding, we need a continuous id to index the embedding matrix and access each user / project embedding.

Def encode_column (column):

"" Encodes a pandas column with continous IDs "

Keys = column.unique ()

Key_to_id = {key:idx for idx,key in enumerate (keys)}

Return key_to_id, np.array ([key_to_ id [x] for x in column]), len (keys)

Anime_df, num_users, num_anime, user_ids, anime_ids = encode_df (train_df)

Print ("Number of users:", num_users)

Print ("Number of anime:", num_anime)

Anime_df.head ()

Training

Our goal is to find the best embedding vector for each user and each item. Then, we can predict any user and item by obtaining the dot product of user embedding and item embedding.

Cost function: our goal is to minimize the mean square error of the scoring matrix. The N here is the number of non-blank elements in the score matrix.

Def cost (df, emb_user, emb_anime):

"" Computes mean square error "

Y = create_sparse_matrix (df, emb_user.shape [0], emb_anime.shape [0])

Predicted = create_sparse_matrix (predict (df, emb_user, emb_anime), emb_user.shape [0], emb_anime.shape [0], 'prediction')

Return np.sum ((Y-predicted) .power (2)) / df.shape [0]

Forecast

Def predict (df, emb_user, emb_anime):

"" This function computes df ["prediction"] without doing (U* V ^ T).

Computes df ["prediction"] by using elementwise multiplication of the corresponding embeddings and then

Sum to get the prediction u_i*v_j. This avoids creating the dense matrix U * V ^ T.

Df ['prediction'] = np.sum (np.multiply (emb_anime [df [' anime_id']], emb_user [df ['user_id']]), axis=1)

Return df

Initialization of user and item vectors

There are many ways to initialize embedded weights, but there is no uniform answer. For example, fastai uses something called truncated standard initializers (Truncated Normal initializer). In my implementation, I just initialized the embedding with the uniform value of (0meme 11 / K) (random initialization works well in my example!) Where K is the number of factors in the embedded matrix. K is a superparameter, usually determined by experience-it should not be too small, because you want your embedding to learn enough features, but you don't want it to be too large, because it starts to over-fit your training data and increase computing time.

Def create_embeddings (n, K):

Creates a random numpy matrix of shape n, K with uniform values in (0,11thumb K)

N: number of items/users

K: number of factors in the embedding

Return 11*np.random.random ((n, K)) / K

Create a sparse utility matrix: since our cost function requires a utility matrix, we need a function to create this matrix.

Def create_sparse_matrix (df, rows, cols, column_name= "rating"):

"" Returns a sparse utility matrix "

Return sparse.csc_matrix ((DF [column _ name] .values, (df ['user_id'] .values, df [' anime_id'] .values)), shape= (rows, cols))

Gradient decline

The gradient descent equation is:

I used momentum in the implementation process, which can help accelerate the gradient decline in the relevant direction and restrain the oscillation, thus speeding up the convergence speed. I have also added regularization to ensure that my model is not overly suitable for training data. Therefore, the gradient descent equation in my code is slightly more complex than the above equation.

The regular cost function is:

Def gradient_descent (df, emb_user, emb_anime, iterations=2000, learning_rate=0.01, df_val=None):

Computes gradient descent with momentum (0.9) for given number of iterations.

Emb_user: the trained user embedding

Emb_anime: the trained anime embedding

Y = create_sparse_matrix (df, emb_user.shape [0], emb_anime.shape [0])

Beta = 0.9

Grad_user, grad_anime = gradient (df, emb_user, emb_anime)

V_user = grad_user

V_anime = grad_anime

For i in range (iterations):

Grad_user, grad_anime = gradient (df, emb_user, emb_anime)

V_user = beta*v_user + (1-beta) * grad_user

V_anime = beta*v_anime + (1-beta) * grad_anime

Emb_user = emb_user-learning_rate*v_user

Emb_anime = emb_anime-learning_rate*v_anime

If (not (iTun1)% 50):

Print ("\ niteration", iTun1, ":")

Print ("train mse:", cost (df, emb_user, emb_anime))

If df_val is not None:

Print ("validation mse:", cost (df_val, emb_user, emb_anime))

Return emb_user, emb_anime

Prediction on validation set

Because we can't predict the users and anime (cold start problems) that we haven't encountered in our training set, we need to remove them from the invisible data set.

Def encode_new_data (valid_df, user_ids, anime_ids):

"" Encodes valid_df with the same encoding as train_df.

Df_val_chosen = valid_df ['anime_id'] .isin (anime_ids.keys ()) & valid_df [' user_id'] .isin (user_ids.keys ())

Valid_df = valid_ DF [DF _ val_chosen]

Valid_df ['anime_id'] = np.array ([anime_ IDs [x] for x in valid_df [' anime_id']])

Valid_df ['user_id'] = np.array ([user_ IDs [x] for x in valid_df [' user_id']])

Return valid_df

Our model slightly overfits the training data, so we can increase the regularization factor (lambda) to make it more generalized.

Train_mse = cost (train_df, emb_user, emb_anime)

Val_mse = cost (valid_df, emb_user, emb_anime)

Print (train_mse, val_mse)

Output: 6.025304207874527 11.735503902293352

Let's take a look at the forecast:

Valid_df [70:80] .head ()

Given that these scores are only based on the similarity between user behaviors, a RMS value of only 3.4 is good in the range of 1-10. It shows how powerful matrix factorization can be even if it is so simple.

The limitation of Matrix decomposition

Matrix decomposition is a very simple and convenient method. However, it also has its drawbacks, one of which has been encountered in our implementation:

Cold start problem

We cannot predict projects and users that we have never encountered in the training data because we do not provide embedding for them.

Cold start problems can be solved in many ways, including recommending popular projects, letting users rate some projects, and using a content-based approach until we have enough data to use collaborative filtering.

It is difficult to include additional context about users / items

We only use user id and item id to create embedding. We cannot use any other information about users and items in the implementation. There are some complex content-based collaborative filtering models that can be used to solve this problem.

Ratings are not always available

It's hard to get feedback from users. Most users rate something only when they really like it or absolutely hate it. In this case, we usually have to come up with a way to measure implicit feedback and use negative sampling techniques to come up with a reasonable training set.

The above content is how to use PyTorch matrix decomposition for animation recommendation, have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.