In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
E-commerce big data project-recommendation system practice (1) Environmental construction and log, population, commodity analysis
Https://blog.51cto.com/6989066/2325073
E-commerce big data project-recommendation algorithm of recommendation system in practice
Https://blog.51cto.com/6989066/2326209
Real-time analysis and offline analysis of e-commerce big data project-recommendation system
Https://blog.51cto.com/6989066/2326214
(7) algorithms commonly used in recommendation systems
Collaborative filtering algorithm
Collaborative filtering algorithm (Collaborative Filtering:CF) is a commonly used algorithm, which is useful on many e-commerce websites. The CF algorithm includes user-based CF (User-based CF) and item-based CF (Item-based CF).
(8) Apache Mahout and Spark MLLib
Introduction to ① Apache Mahout
Apache Mahout is an open source project under Apache Software Foundation (ASF), which provides some classic machine learning algorithms that help developers create smart applications more easily and quickly. There are already three public hairstyles available, and through the ApacheMahout library, Mahout can be effectively extended to the cloud. Mahout includes many implementations, including clustering, classification, recommendation engine, and frequent sub-item mining.
The main goal of Apache Mahout is to build scalable machine learning algorithms. This scalability is for large datasets. Apache Mahout's algorithm runs on the ApacheHadoop platform, and he implements it through Mapreduce mode. However, Apache Mahout does not strictly require that the implementation of the algorithm is based on Hadoop platform, and single node or non-Hadoop platform can also be used. The non-distributed algorithm of Apache Mahout core library also has good performance.
Mahout mainly consists of the following five parts
frequent mining patterns: mining itemsets that occur frequently in data.
clustering: data such as text, documents, etc. are divided into locally related groups.
classification: using existing classification documents to train classifiers to classify unclassified documents.
recommendation engine (collaborative filtering): get the user's behavior and find out what the user might like.
frequent subitem mining: use an itemset (query record or shopping record) to identify items that often appear together.
Introduction to ② Spark MLLib
Spark MLlib (Machine Learnig lib) is a library for Spark to implement common machine learning algorithms, including related tests and data generators. Spark is designed to support some iterative Job, which coincides with the characteristics of many machine learning algorithms.
Spark MLlib currently supports four common machine learning problems: classification, regression, clustering and collaborative filtering. Spark MLlib is based on RDD and can be seamlessly integrated with Spark SQL, GraphX and Spark Streaming. With RDD as the cornerstone, four subframeworks can work together to build big data Computing Center!
The following figure is the core of the MLlib algorithm library:
Commodity recommendation based on users' interests
(I) user-based CF (User CF) and item-based CF (Item CF)
is based on the user's CF (User CF)
The basic idea based on the user's CF is quite simple, based on the user's preference for items to find the neighboring users, and then recommend what the neighboring users like to the current user. In calculation, the similarity between users is calculated by taking a user's preference for all items as a vector. after finding K neighbors, according to the similarity weight of neighbors and their preference for items, predict the unrelated items that the current user has no preference, and calculate a sorted list of items as a recommendation. Figure 2 shows an example. For user A, according to the user's historical preference, only one neighbor, user C, is calculated, and then the item D that user C likes is recommended to user A.
item-based CF (Item CF)
The principle of item-based CF is similar to that of user-based CF, except that the neighbor is calculated using the item itself, not from the user's point of view, that is, to find similar items based on the user's preference for items, and then recommend similar items to him according to the user's historical preferences. From the point of view of calculation, the similarity between items is calculated by taking all users' preferences for an item as a vector, and after getting similar items, the items that have not been expressed by the current user are predicted according to the preference of the user's history, and a sorted list of items is calculated as a recommendation. Figure 3 shows an example. For item A, according to the historical preferences of all users, users who like item A like item C, and it is concluded that item An is similar to item C, while user C likes item A. then it can be inferred that user C may also like item C.
Collaborative filtering recommendation based on ALS
I) the basic principles of ALS
(2) ALS based on Spark MLLib
The basic process is:
a. Load data into rating RDD
b. Using rating RDD to train ALS model
c. Use ALS model to recommend items for users and print the results.
d. Mean square error of evaluation model
(3) ALS based on Apache Mahout
1. Rating is divided into prediction set (10%) and training set (90%)
Bin/mahout splitDataset-I / input/ratingdata.txt-o / output/ALS/dataset
two。 Using the parallel ALS algorithm, the training set is decomposed to the matrix, and then two matrices U (user feature matrix) and M (item feature matrix) are generated in / output/ALS/out, as well as scoring.
Bin/mahout parallelALS-I / output/ALS/dataset/trainingSet/-o / output/ALS/out-- numFeatures 20-- numIterations 5-- lambda 0.1
3. The model is evaluated by prediction set, and the evaluation standard is RMSE. The RMSE result will be output in / output/ALS/rmse/rmse.txt
Bin/mahout evaluateFactorization-I / output/ALS/dataset/probeSet/-o / output/ALS/rmse-- userFeatures / output/ALS/out/U-- itemFeatures output/ALS/out/M
4. Finally, make a recommendation.
Bin/mahout recommendfactorized-I / output/ALS/out/userRatings-o / output/ALS/recommendations-- userFeatures / output/ALS/out/U-- itemFeatures output/ALS/out/M-- numRecommendations 6-- maxRating 5
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.