Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the basic composition of the recommendation system?

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what is the basic composition of the recommendation system". In the daily operation, I believe that many people have doubts about the basic composition of the recommendation system. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what is the basic composition of the recommendation system?" Next, please follow the editor to study!

What exactly is the problem solved by the recommendation system?

Recommendation systems have been proposed since the 1990s, but it is only in recent years that recommendation systems have really entered the public eye and become popular among major Internet companies.

With the development of mobile Internet, more and more information begins to spread on the Internet, resulting in serious information overload.

Therefore, how to find the information that users are interested in from a lot of information, this is the value of the recommendation system. Accurate recommendation solves the pain points of users, improves the user experience, and ultimately retains users.

Recommendation system is essentially an information filtering system, which is usually divided into three links: recall, sorting and reordering, each of which is filtered layer by layer. Finally, dozens of items that users may be interested in are selected from the massive material library and recommended to users.

Phased filtering process of recommendation system

Application scenarios of recommendation system

Recommended functions in Toutiao, JD.com and NetEase Yun Music

Where there is a large amount of information, there is a recommendation system. Our most commonly used APP every day involves the recommendation function:

Information: Jinri Toutiao, Tencent News, etc.

E-commerce: Taobao, JD.com, pinduoduo, Amazon and so on.

Entertainment: Douyin, Kuaishou, iqiyi, etc.

Life service category: Meituan, Dianping, Ctrip, etc.

Social category: Wechat, Momo, Memai, etc.

The application scenarios of recommendation systems are usually divided into the following two categories:

Recommendation based on user dimension: recommend according to the user's historical behavior and interests, such as Taobao home page guess you like, Douyin home page recommendation and so on.

Recommendation based on item dimension: make recommendations based on the subject matter that users are currently browsing, such as opening the product details page of JD.com APP, which will recommend products related to the main product to you.

Similarities and differences among search, recommendation and Advertising

Search and recommendation are the two most common application scenarios of AI algorithm, which have something in common in technology.

When it comes to advertising, we mainly consider that many students who have not done advertising business do not know why advertising has something to do with search and recommendation, so give an explanation:

Search: there is a clear search intention, and the search results are related to the user's search terms.

Recommendation: it is not purposeful and relies on the user's historical behavior and portrait data for personalized recommendation.

Advertising: with the help of search and recommendation technology to achieve accurate advertising, advertising can be understood as an application scenario of search recommendation, and the technical solution is more complex, involving intelligent budget control, advertising bidding and so on.

The overall architecture of the recommendation system

The overall architecture of the recommendation system

The above is the overall architecture diagram of the recommendation system, which is divided into multiple layers from bottom to top, and the main functions of each layer are as follows:

Data sources: a variety of data sources on which the recommendation algorithm depends, including item data, user data, behavior logs, other available business data, and even data outside the company.

Computing platform: responsible for cleaning, processing, offline and real-time computing of all kinds of heterogeneous data at the bottom.

Data storage layer: stores the data processed by the computing platform and can be landed in different storage systems as needed. For example, user characteristics and user profile data can be stored in Redis, item data can be indexed in ES, and embedding vectors of users or items can be stored in Faiss.

Recall layer: including a variety of recommendation strategies or algorithms, such as classic collaborative filtering, content-based recall, vector-based recall, popular recommendations for backing, etc. In order to cope with the high concurrency of online traffic, the recall results are usually pre-calculated and stored in the cache after the inverted index is established.

Fusion filter layer: triggers multiple recalls. Since each recall source in the recall layer returns a candidate set, this layer needs to be fused and filtered.

Sorting layer: use machine learning or deep learning model, as well as richer features to reorder, select a smaller and more accurate set of recommendations and return them to the upper-level business.

From the data storage layer to the recall layer, and then to the fusion filter layer and sorting layer, the candidate set decreases layer by layer, but the requirement of accuracy is higher and higher, so it also brings the layer-by-layer increase of computational complexity, which is the biggest challenge of the recommendation system.

In fact, for the recommendation engine, the core part is mainly two pieces: features and algorithms.

Core functions and technical solutions of recommendation engine

Due to the large amount of data, feature computing usually uses big data's offline and real-time processing techniques, such as Spark, Flink, etc., and then saves the calculation results in Redis or other storage systems (such as HBase, MongoDB or ES) for recall and sorting module use.

The function of the recall algorithm is to quickly obtain a batch of candidate data from massive data, which is required to be as fast and accurate as possible.

This layer usually has a wealth of strategies and algorithms to ensure diversity, and some algorithms will be made near real-time for better recommendation results.

The function of the sorting algorithm is to finely sort the candidate set of multiple recalls. It uses objects, users and their cross-features, and then scores and sorts them through complex machine learning or deep learning models, which are characterized by complex calculations but more accurate results.

Illustrating the classical collaborative filtering algorithm

After understanding the overall architecture and technical solution of the recommendation system, let's go deep into the details of the algorithm. What is chosen here is the star algorithm in the recommendation system: collaborative filtering (Collaborative Filtering,CF).

For engineering students, they may find the AI algorithm obscure and the threshold too high, which is true for many deep learning algorithms, but collaborative filtering is a simple and effective algorithm that can be understood as long as you have the foundation of junior high school mathematics.

What is collaborative filtering?

The core of collaborative filtering algorithm is "finding similarity". Based on the user's historical behavior (browsing, collection, comments, etc.), it discovers users' preferences for items, measures and scores them, and finally selects the recommendation set.

It also includes two branches:

① is based on user-based collaborative filtering: User-CF, the core of which is to find similar people.

For example, in the following figure, both user An and user C have purchased items an and b, so it can be considered that An and C are similar because they like many items in common. In this way, the item d purchased by user A can be recommended to user C.

Example of user-based Collaborative filtering

② collaborative filtering based on items: Item-CF, the core is to find similar items.

For example, in the following figure, item an and item b are purchased by the user An and B at the same time, then item an and item b are considered similar because of their high frequency of co-occurrence.

In this way, if user D buys item a, the item b most similar to item a can be recommended to user D.

Example of item-based Collaborative filtering

How to find similarities?

As mentioned earlier, the core of collaborative filtering is to find similarity, User-CF is to find similarity between users, Item-CF is to find similarity between items, so how to measure the similarity between two users or items?

We all know that for two points in coordinates, the smaller the angle between them, the more similar the two points are. This is the cosine distance learned in junior high school, and its formula is as follows:

For example, if the A coordinate is (0Power3) and the B coordinate is (4p3p0), then the cosine distance between the two points is 0.569, and the closer the cosine distance is to 1, the more similar they are.

In addition to cosine distance, there are many ways to measure similarity, such as Euclidean distance, Pearson correlation coefficient, Jaccard similarity coefficient and so on.

The algorithm flow of Item-CF

After clarifying the definition of similarity, let's take Item-CF as an example to explain in detail how this algorithm selects recommended items.

Step 1: sort out the co-occurrence matrix of items

Suppose there are five users: a, B, C, D, E, and so on, in which user A likes items a, b, c, user B likes items a, b, and so on.

The so-called co-occurrence, that is, two items are liked by the same user. For example, items an and b are liked by users A, B and C at the same time, so the number of co-occurrence of an and b is 3. Using this statistical method, the co-occurrence matrix can be quickly constructed.

Step 2: calculate the similarity matrix of items

For Item-CF algorithm, the cosine distance mentioned above is generally not used to measure the similarity of objects, but the following formula is used:

Among them, N (u) indicates the number of users who like item u, N (v) indicates the number of users who like item v, and the intersection of the two indicates the number of users who like item u and item v at the same time. Obviously, if two items are liked by many people at the same time, the more similar the two items are.

Based on the co-occurrence matrix calculated in step 1 and the number of people who like each item, the similarity matrix of items can be constructed:

Step 3: recommend items

Finally, you can recommend items based on the similarity matrix. The formula is as follows:

Among them, Puj indicates the degree of interest of user u in item j, the higher the value, the more worthy of recommendation. N (u) represents the collection of items that the user u is interested in, S (jjery N) represents the top N items that are most similar to item j, Wij represents the similarity between item I and object j, and Rui represents the interest of user u in item I.

The above formula is a little abstract, and it is easier to understand directly from the example. suppose I want to recommend items to user E. We already know that user E likes item b and item c, and the degree of liking is 0.6 and 0.4 respectively.

Then, the recommended results calculated using the above formula are as follows:

Because item b and item c have already been liked by user E, they are not recommended repeatedly. Finally, the degree of interest of user E in item an and item d is compared, because 0 > 0.682, so the recommended item an is selected.

Build a recommendation system from 0 to 1

With the above theoretical basis, we can quickly implement a recommendation system with Python.

Select dataset

The MovieLens dataset, which is a classic in the field of recommendation, is a dataset about movie ratings, and several versions of different sizes are available on the official website. Take the ml-1m dataset (about 1 million user rating records) as an example.

After downloading and decompressing, the folder contains: ratings.dat, movies.dat, users.dat,3 files, a total of 6040 users, 3900 movies, 1000209 scoring records. The format of each file is the same, each line represents a record, and the fields are divided by::.

In ratings.dat, for example, each line contains four attributes: UserID, MovieID, Rating, and Timestamp.

Through the script, you can count the distribution of people with different scores:

Read the original data

The program mainly uses the ratings.dat file in the data set, through parsing the file, extracting user_id, movie_id, rating,3 fields, and finally constructing the data that the algorithm depends on, and saving it in the variable dataset.

It has the format [user _ id] [movie_id] = rate:

Construct the similarity matrix of items

Based on the dataset in step 2, we can further count the scoring times of each movie and the symbiosis matrix of the film, and then generate the similarity matrix.

Recommend items based on similarity matrix

Finally, we can recommend based on the similarity matrix, enter a user id, first select the most similar movies in top 10 according to the movies scored by the user, then add the weight to sum and calculate the final score of each candidate film, and finally select the top 5 movies for recommendation.

Call the recommendation system

Let's select UserId=1 as a user and take a look at the execution results of the program. Since the recommendation program outputs a movieId list, in order to understand the recommendation results more intuitively, it is converted to the title of the movie for output.

The first five movies recommended are:

The challenge of online recommendation system

Through the above introduction, we should have a preliminary understanding of the basic composition of the recommendation system, but when it is really applied to the online real environment, we will encounter a lot of algorithm and engineering challenges, which can not be solved by dozens of lines of Python code.

The main points are summarized as follows:

The above example uses standardized data sets, while the data in the online environment is non-standardized, so it involves the collection, cleaning and processing of large amounts of data, resulting in the construction of data sets that can be used by the model.

In the complex and tedious feature engineering, it is said that the upper limit of the algorithm model is determined by data and features. For the online environment, we need to select the available features from the business point of view, and then clean, standardize, normalize and discretize the data, and further verify the effectiveness of the features through the experimental results.

How to reduce the complexity of the algorithm? For example, the time and space complexity of the Item-CF algorithm described above is O (N × N), while the data in the online environment is at the level of 10 million or even hundreds of millions of dollars. If you do not optimize the algorithm, you may not be able to run out of data for several days, or you may not be able to put such a large matrix data in memory.

How to meet the real-time performance? Because users' interests change in real time with their latest behavior, if the model is only based on historical data, the results may not be accurate enough.

Therefore, how to meet the real-time requirements and how to recommend the newly added items or users are all problems to be solved.

The tradeoff between algorithm effect and performance. A balance must be found between the pursuit of diversity and accuracy from the perspective of algorithm and the pursuit of performance from the point of view of engineering.

Stability and effect tracking of the recommendation system. Need to have a set of perfect data monitoring and application monitoring system, at the same time, there is an ABTest platform for grayscale experiments to compare the results.

At this point, the study on "what is the basic composition of the recommendation system" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 253

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report