E-commerce big data project-recommendation algorithm of recommendation system (3) 04/19 Update SLTechnology News&Howtos

E-commerce big data project-recommendation algorithm of recommendation system (3)

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

E-commerce big data project-recommendation system practice (1) Environmental construction and log, population, commodity analysis

Https://blog.51cto.com/6989066/2325073

E-commerce big data project-recommendation algorithm of recommendation system in practice

Https://blog.51cto.com/6989066/2326209

Real-time analysis and offline analysis of e-commerce big data project-recommendation system

Https://blog.51cto.com/6989066/2326214

(7) algorithms commonly used in recommendation systems

Collaborative filtering algorithm

Collaborative filtering algorithm (Collaborative Filtering:CF) is a commonly used algorithm, which is useful on many e-commerce websites. The CF algorithm includes user-based CF (User-based CF) and item-based CF (Item-based CF).

(8) Apache Mahout and Spark MLLib

Introduction to ① Apache Mahout

Apache Mahout is an open source project under Apache Software Foundation (ASF), which provides some classic machine learning algorithms that help developers create smart applications more easily and quickly. There are already three public hairstyles available, and through the ApacheMahout library, Mahout can be effectively extended to the cloud. Mahout includes many implementations, including clustering, classification, recommendation engine, and frequent sub-item mining.

The main goal of Apache Mahout is to build scalable machine learning algorithms. This scalability is for large datasets. Apache Mahout's algorithm runs on the ApacheHadoop platform, and he implements it through Mapreduce mode. However, Apache Mahout does not strictly require that the implementation of the algorithm is based on Hadoop platform, and single node or non-Hadoop platform can also be used. The non-distributed algorithm of Apache Mahout core library also has good performance.

Mahout mainly consists of the following five parts

 frequent mining patterns: mining itemsets that occur frequently in data.

 clustering: data such as text, documents, etc. are divided into locally related groups.

 classification: using existing classification documents to train classifiers to classify unclassified documents.

 recommendation engine (collaborative filtering): get the user's behavior and find out what the user might like.

 frequent subitem mining: use an itemset (query record or shopping record) to identify items that often appear together.

Introduction to ② Spark MLLib

Spark MLlib (Machine Learnig lib) is a library for Spark to implement common machine learning algorithms, including related tests and data generators. Spark is designed to support some iterative Job, which coincides with the characteristics of many machine learning algorithms.

Spark MLlib currently supports four common machine learning problems: classification, regression, clustering and collaborative filtering. Spark MLlib is based on RDD and can be seamlessly integrated with Spark SQL, GraphX and Spark Streaming. With RDD as the cornerstone, four subframeworks can work together to build big data Computing Center!

The following figure is the core of the MLlib algorithm library:

Commodity recommendation based on users' interests

(I) user-based CF (User CF) and item-based CF (Item CF)

 is based on the user's CF (User CF)

The basic idea based on the user's CF is quite simple, based on the user's preference for items to find the neighboring users, and then recommend what the neighboring users like to the current user. In calculation, the similarity between users is calculated by taking a user's preference for all items as a vector. after finding K neighbors, according to the similarity weight of neighbors and their preference for items, predict the unrelated items that the current user has no preference, and calculate a sorted list of items as a recommendation. Figure 2 shows an example. For user A, according to the user's historical preference, only one neighbor, user C, is calculated, and then the item D that user C likes is recommended to user A.

 item-based CF (Item CF)

The principle of item-based CF is similar to that of user-based CF, except that the neighbor is calculated using the item itself, not from the user's point of view, that is, to find similar items based on the user's preference for items, and then recommend similar items to him according to the user's historical preferences. From the point of view of calculation, the similarity between items is calculated by taking all users' preferences for an item as a vector, and after getting similar items, the items that have not been expressed by the current user are predicted according to the preference of the user's history, and a sorted list of items is calculated as a recommendation. Figure 3 shows an example. For item A, according to the historical preferences of all users, users who like item A like item C, and it is concluded that item An is similar to item C, while user C likes item A. then it can be inferred that user C may also like item C.

Collaborative filtering recommendation based on ALS

I) the basic principles of ALS

(2) ALS based on Spark MLLib

The basic process is:

a. Load data into rating RDD

b. Using rating RDD to train ALS model

c. Use ALS model to recommend items for users and print the results.

d. Mean square error of evaluation model

(3) ALS based on Apache Mahout

1. Rating is divided into prediction set (10%) and training set (90%)

Bin/mahout splitDataset-I / input/ratingdata.txt-o / output/ALS/dataset

two。 Using the parallel ALS algorithm, the training set is decomposed to the matrix, and then two matrices U (user feature matrix) and M (item feature matrix) are generated in / output/ALS/out, as well as scoring.

Bin/mahout parallelALS-I / output/ALS/dataset/trainingSet/-o / output/ALS/out-- numFeatures 20-- numIterations 5-- lambda 0.1

3. The model is evaluated by prediction set, and the evaluation standard is RMSE. The RMSE result will be output in / output/ALS/rmse/rmse.txt

Bin/mahout evaluateFactorization-I / output/ALS/dataset/probeSet/-o / output/ALS/rmse-- userFeatures / output/ALS/out/U-- itemFeatures output/ALS/out/M

4. Finally, make a recommendation.

Bin/mahout recommendfactorized-I / output/ALS/out/userRatings-o / output/ALS/recommendations-- userFeatures / output/ALS/out/U-- itemFeatures output/ALS/out/M-- numRecommendations 6-- maxRating 5

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.