In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces how to implement the matrix decomposition recommendation algorithm based on random gradient descent in python. It is very detailed and has a certain reference value. Interested friends must read it!
SVD is a common method of matrix decomposition, its principle is that matrix M can be written as matrix A, B and C multiplied, while B can be merged with An or C, and M can be obtained by multiplying the matrix of two elements M1 and M2.
The idea of matrix decomposition recommendation is based on this. If the matrix composed of the intrinsic feature of each user and item is expressed as M1 and M2 respectively, the product of the intrinsic feature will get M. Therefore, we can use the existing data (user's score for item) to calculate the M1 and M2 corresponding to the most likely feature with user and item (equivalent to getting the intrinsic attributes of each user and each item) through the random gradient descent method, so that we can get the score of item that user does not overshoot through the inner product between feature.
The data used in this paper is the data in movielens, and it is cut into train and test by itself, but because of the large amount of data, not all data is used.
The code is as follows:
#-*-coding: utf-8-* "Created on Mon Oct 9 19:33:00 2017@author: wjw"import pandas as pdimport numpy as npimport os def difference (left,right,on): # find the difference between two dataframe df = pd.merge (left,right,how='left') On=on) # Parameter on refers to the column index name left_columns = left.columns col_y = df.columns [- 1] # to get the last column df = DF [DF [DfCol _ y] .isnull ()] # to get boolean's list df = df.iloc [:, 0:left_columns.size] # and other data with the same column name column df.columns = left_columns # redefine columns return df def readfile (filepath): # read the file Get both the training set and the test set pwd = os.getcwd () # return the working directory of the current project os.chdir (os.path.dirname (filepath)) # os.path.dirname () to get the filepath file directory Chdir () switch to filepath directory initialData = pd.read_csv (os.path.basename (filepath)) # basename () to get the relative path of the specified directory os.chdir (pwd) # return to the previous working directory predData = initialData.iloc [:, 0:3] # remove the last column of data newIndexData = predData.drop_duplicates () trainData = newIndexData.sample (axis=0,frac = 0.1) # 90% of the data as the training set testData = difference (newIndexData,trainData, ['userId') 'movieId']) .sample (axis=0,frac=0.1) return trainData,testData def getmodel (train): slowRate = 0.99preRmse = 10000000.0 max_iter = 100features = 3 lamda = 0.2gama = 0.01in the random gradient descent Prevent excessive updates user = pd.DataFrame (train.userId.drop_duplicates (), columns= ['userId']) .reset_index (drop=True) # reset the index in the original dataFrame Drop=True and abandon movie = pd.DataFrame (train.movieId.drop_duplicates (), columns= ['movieId']). Reset_index (drop=True) userNum = user.count (). Loc [' userId'] # 671 movieNum = movie.count (). Loc ['movieId'] userFeatures = np.random.rand (userNum,features) # construct the eigenvector set movieFeatures = np.random.rand (movieNum,features) # of user and movie, assuming that each user and each movie have 3 feature userFeaturesFrame = user.join (pd.DataFrame (userFeatures) Columns=]) movieFeaturesFrame = movie.join (movieFeatures,columns= (movieFeatures,columns= ['f1mcmf2)]) userFeaturesFrame = userFeaturesFrame.set_index (' userId') movieFeaturesFrame = movieFeaturesFrame.set_index ('movieId') # reset index for i in range (max_iter): rmse = 0 n = 0 for index Row in user.iterrows (): uId = row.userId userFeature = userFeaturesFrame.loc [uId] # get the feature uplim of the corresponding uId in userFeatureFrame = train [train ['userId'] = = uId] # find the data for index,row in u_m.iterrows () of movieId commented on by userId in train: u_mId = int (row.movieId) realRating = row.rating movieFeature = movieFeaturesFrame.Lo [u _ mId] eui = realRating-np.dot (userFeature,movieFeature) rmse + = pow (eui 2) n + = 1 userFeaturesFrame.loc [uId] + = gama* (eui*movieFeature-lamda*userFeature) movieFeaturesFrame.Lok [u _ mId] + = gama* (eui*userFeature-lamda*movieFeature) nowRmse = np.sqrt (rmse*1.0/n) print ('step:%f,rmse:%f'% ((item1), nowRmse)) if nowRmse
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.