How to use TensorFlow and Keras 07/03 Update SLTechnology News&Howtos

How to use TensorFlow and Keras

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use TensorFlow and Keras". The content in the article is simple and clear, easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use TensorFlow and Keras".

Dataset: MovieLens 20m initial dataset

For analysis, we will use the famous dataset MovieLens 20m.

This dataset contains more than 20 million ratings from the movie recommendation service MovieLens. The following is an example of dataframe:

The dataset lists 138000 users and more than 27000 movies. After cleaning and filtering (we only accept positive comments), we have:

136000 users

20,000 movies

10 million interactions

99.64% sparsity

We can also see from the histogram below that most movies have ratings below 5000.

And most users rate no more than 500 movies.

This is consistent with most recommendation system problems: few users rate a lot of movies, and few movies have a lot of ratings.

Training data set

We can build a click matrix based on this data. The format of the click matrix is shown below. If user u interacts with item I, the cells on row u and column I contain 1, otherwise they contain 0.

We also define the click vector x click as the u th row vector of the click matrix.

Training verification test data set

In order to evaluate the quality of the model, we divide the data set into three subsets, one subset for training, one subset for verification and one subset for testing. We will use the first subset to train the model, the second subset to select the best model during training, and the last subset to get the metric.

Indicators: NDCG and PersonalizationNDCG

As mentioned earlier, we will use two indicators to evaluate our model. The first will be NDCG, which measures quality and the order of our recommendations. We first need to define DCG. The higher the DCG, the better. DCG@p is defined as:

I is the pointer function, and elem_i represents the first element of the recommendation list. To illustrate this abstract formula, here is a short example:

Need suggestions: {A _ Magi B _ M C}

Recommendation 1: [C, A, D]-DCG@3=1.63

Recommendation 2: [d, B, A]-DCG@3=1.13

Note that these suggestions are in order. Therefore, we have: DCG projects > DCG projects, because the first two items in Forecast 1 are our target projects, and these items are at the end of the list of Forecast 2.

NDCG, a close relative of DCG, projects fractions between 0 and 1 so that they can be converted between models.

Personalization (Personalization Index)

Personalization= calculates the distance between each pair of recommendations and then calculates the average. To compare different personalization indices, we standardize them (as we did with NDCG, we project scores between 0 and 1). To illustrate this metric, let's look at the following example:

Recommendation 1:

User 1: [A _ Magi B _ M C] / user 2: [D _ Magi E ·F] Personalization = 1

Recommendation 2:

User 1: [A _ Magi B ~ (C)] / user 2: [A _ Magi B ~ M C] Personalization = 0

Collaborative and content-based filtering

Recommendation systems can be divided into two categories: collaborative filtering and content-based filtering.

Collaborative filtering

Collaborative filtering is a RS subfamily based on user similarity. It predicts the interest of user u by analyzing the tastes of other users who are closely related to user u. It is similar based on what closely related users like.

Content-based filtering

Content-based filtering is another type of RS based on user preferences and content similarity, which means it is based on the idea that if you like item I, you are more likely to like items similar to I than items different from it.

Content-based definition

As mentioned above, the content-based approach uses a project description to find the project that is closest to what the user sees. I implemented this method in as much detail as possible, but a dataset with few features is a limitation of this method. The MovieLens dataset provides only the type of movie.

However, we have developed a simple method, as described in the following pseudo code:

Reco = zero-vector of size number of itemsfor i in items of user u: for j in the k closest items to i: reco [j] = max (https://my.oschina.net/u/4253699/blog/,1-dist (I https://my.oschina.net/u/4253699/blog/,j))output recommendation reco)

For dist (iQuery j), use the cosine distance between the type vectors.

Result

NDCG@100: 0.011

Personalization: 0.958

The NDCG is very low because the number of features per sample is very limited.

Advantages

No cold boot: one of the common problems in the recommendation system (RS) is cold startup. This problem occurs when you add a new project or user. Since there are no previous activities to infer, the recommendations given by the recommendation system will be a bit stiff. In our scenario, the number of interactions with a project does not affect the likelihood that it will eventually be recommended, which means that we do not have a cold start problem when it comes to new projects.

The implementation is simple: as shown in the figure above, the algorithm is quite simple with a few lines of pseudo code.

Shortcoming

The query time is O (# items × # features), and # represents the number. We must be careful about the size of the data. Without preprocessing, every time the system is asked to recommend new content to the user, it must find the k items closest to each project that the user interacts with. Because there are items to compare, and each distance needs to be measured by calculating features, the whole process requires O (# items x # features). With preprocessing, we can terminate the query time, but we need to store k recent items in each item, which means storing k × # items items in memory.

Valid only if the project has sufficient characteristics: as the result shows, this action does not work if the project does not have sufficient characteristics. For example, if there is a description of the plot of the movie, we will have better results.

Based on memory definition

Memory-based recommendation is a simple method to calculate the similarity between users and projects. Unlike the model-based approach, memory-based recommendations have no parameters to optimize. This is a very simple algorithm that can be summarized into the following lines of pseudo code:

Input user u: use the dist function to find the k users closest to u and gather k nearest users in a new vector vroomu to suggest vector output

In our example, we implement the algorithm in the following ways:

For the distance function, we use hamming distance:

The aggregate functions we use are:

Result

NDCG@100: 0.173

Personalization: 0.715

Advantages

Simple to implement: as shown above, with a small amount of pseudocode, the algorithm is quite simple and easy to implement.

Explainable: this is an important feature of some algorithms. This allows you to explain to users why specific content is recommended to them. We recommend you to see the movie A because you saw the movie B.

Shortcoming

Complexity: the main problem with this approach is that it makes it more difficult to get scalable objects. Our best friends in this area are local sensitive hash (LSH) and nearest neighbor search algorithm.

Query time is O (# users x # items): unpreprocessed query time is high for each user because you need to calculate the user distance at the cost of O (# items) to get the distance to all other users. Then we need to find the k closest user, O (# items). With preprocessing, we can end the query time, but we need to store the k users closest to each user, which means that k × # users users are in memory.

Definition of nonnegative matrix decomposition

Non-negative matrix factorization (Non-negative matrix factorization,NMF) is a famous recommendation system algorithm that appeared during the Netflix competition.

The idea of NMF is to decompose the click matrix into two low-dimensional rectangular matrices, one for the user and one for the project, embedded in the vector of the computable dimension (we call it the potential space). Multiply the two matrices to get a new matrix whose value is close to the original click matrix where they exist, and all the gaps are filled with (hopefully) good predictions.

Result

NDCG@100: 0.315

Personalization: 0.800

Advantages

Implementation is simple: some libraries, such as Surprise or sklearn, can implement matrix factorization!

Potentially interpretable: use some clustering and some analysis of them (find common actors, genres, etc.); technically, it is possible to obtain interpretable results.

Fast query time: in order to get recommendations from users, we only need to multiply by a vector and a matrix.

Shortcoming

Linear model: one of the main limitations of matrix decomposition is that it is a linear model, so it cannot capture more complex relationships in the data. Although it is linear, we see that it gives good results in terms of NDCG.

Neural matrix decomposition definition

Neural matrix factorization (NeuMF) is a new method to generalize the classical NMF mentioned above. It was developed in this article. The model takes two integers (two indexes) as inputs to the item I and user u, and outputs a number between 0 and 1. The output indicates the probability that user u is interested in project I.

The structure of the neural network can be divided into two parts: the matrix decomposition part and the fully connected part. It is then connected and passed to the Sigmoid layer.

Result

NDCG@100: 0.173

Personalization: 0.017

Next, although I try to regularize with many different parameters, overfitting is inevitable.

Advantages

Neural network (nonlinear model): one of the main advantages of NeuMF is that it is a nonlinear model, so it can capture more complex patterns in the data. However, we can see that our NDCG is lower than the regular NMF.

Shortcoming

Over-fitting of large data sets: in the initial paper, NeuMF improved the NMF model, but it is suitable for smaller data sets. We can infer that this method tends to over-fit for larger data sets.

The query time is O (# items): one of the problems with this approach is that for a given user, we need to resolve all items. This can become a scalability issue as the number of projects increases.

Restricted Boltzmann machine definition

Restricted Boltzmann machine (RBM) is a generating stochastic neural network with a very simple structure (an input layer and a hidden layer), which can be used to learn the probability distribution on the input, in our case, the click vector.

Result

NDCG@100: 0.155

Personalization: 0.959

The following figure shows the NDCG@100 increasing with epochs on the verification set

Advantages

Neural network (nonlinear model): because RBM is a neural network, it is a nonlinear model, so it can capture more complex patterns in data.

Potentially interpretable: RBM learns complex features from the data represented by the hidden layer. By doing some analysis (such as actors), it is possible to technically explain the results.

Shortcoming

Long-term training: the training of this model revolves around a method called Gibbs sampling. This method means a large number of samples, which is computationally intensive.

Depth collaborative definition

Deep collaboration is a straightforward collaboration model designed to predict the most useful projects for users. The input is the click vector of the user, and the original output is our suggestion. To train this model, I used 70% of the user's click vector as input and the rest as output.

The architecture is simple. There is one input and output of the same size (# items) and multiple hidden layers of the same size (1000 neurons).

Result

NDCG@100: 0.353

Personalization: 0.087

Here, as usual (NDCG@100 added with epochs on the verification set):

Advantages

Neural network (nonlinear model): depth collaboration is a nonlinear model, so it can capture more complex patterns in the data.

Fast query time: the main advantage of this model is that in a forward transfer, we can get recommendations to a given user, thus shortening the query time. We can see that the number of parameters of the model increases as the number of projects increases, but even so, it is still faster than NeuMF.

Shortcoming

There is no explanation: this deep neural network makes it impossible to interpret the results.

Self-coding definition

The automatic encoder (AE) was originally used to learn the representation (encoding) of data. They are broken down into two parts:

Encoder, which reduces the dimensional size of the data

A decoder that converts the encoding back to its original form. Because of the dimensionality reduction, the neural network needs to learn the low-dimensional representation of the input (potential space) in order to be able to reconstruct the input.

In a RS environment, they can be used to predict new recommendations. To do this, both input and output are click vectors (usually the input and output of AE are the same), we will use dropout after the input layer. This means that the model will have to reconstruct the click vector because an element in the input will be lost, so learn to predict the recommended value of a given click vector.

Result

NDCG@100: 0.382

Personalization: 0.154

The following convention (NDCG@100 added with epochs on the verification set). Although we tried to adjust it with many different parameters, it quickly overfitted.

Advantages

Neural network (nonlinear model): this model is a nonlinear model, which means that it can capture more complex patterns in the data.

Fast query time: a forward pass is enough to get a recommendation from a given user. This means that the query time is very fast.

Shortcoming

Unexplained: this deep neural network makes it impossible to interpret the results.

Variational self-encoder definition

Variational self-encoder (VAE) is an extension of AE. It will have a sampling layer instead of a simple fully connected layer. This layer will use the mean and variance from the last layer of the encoder to get a Gaussian sample and use it as the input decoder. Like AE, we use dropout on the first layer.

Result

NDCG@100: 0.403

Personalization: 0.117

Here, as usual (NDCG@100 added with epochs on the verification set):

Advantages

Neural network (nonlinear model): VAE is a nonlinear model, so it can capture more complex patterns in the data.

Fast query time: a forward pass is enough to get a recommendation from a given user. Therefore, the query time is very fast.

Shortcoming

More complex implementation: the sampling layer makes it difficult to calculate the gradient decline using back propagation. The reparameterization technique makes it possible to solve this problem by using the equation z = ε × σ + μ, ε ~ N (0 https://my.oschina.net/u/4253699/blog/,1)). We can now safely calculate the gradient.

Inexplicable: this deep neural network makes the interpretation results infeasible.

Mixed definition

Hybrid models provide the best of the two worlds (memory-based and model-based methods), so they are very popular in RS.

To implement the hybrid approach, I chose to use VAE and then averaged the results with the memory-based results.

Result

NDCG@100: 0.334

Personalization: 0.561

Advantages

Part of it is NN: as part of the VAE method, it can capture more complex patterns in the data.

Explainable: as part of the memory-based approach, we get an interesting attribute that we can explain to users why we recommend them for a particular project.

Shortcoming

The query time is O (# users x # items): the bottleneck of computing time is the memory-based part. As shown above, its query time is O (# users × # items) and no preprocessing is required.

Compare

We can now compare all our models. The best model of NDCG@100 is VAE. For personalized indexes, it is RBM.

Thank you for your reading, the above is the content of "how to use TensorFlow and Keras", after the study of this article, I believe you have a deeper understanding of how to use TensorFlow and Keras, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.