Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize the similarity algorithm in recommendation system by Spark

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Spark how to implement the similarity algorithm in the recommendation system, in order to solve this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

In the recommendation system, collaborative filtering algorithms are widely used, which are mainly divided into user-based and object-based collaborative filtering algorithms. The core point is based on "one person" or "an object". According to the attributes of this person or object, such as gender, age, job, income, preferences, etc., find out people or things that are similar to this person or object. Of course, the factors referred to in the actual processing will be much more complex.

The following does not introduce the relevant mathematical concepts, mainly gives the commonly used similarity algorithm code implementation, and the same algorithm has a variety of implementation ways.

Euclid distance def euclidean2 (v1: Vector, v2: Vector): Double = {require (v1.size = = v2.size, s "SimilarityAlgorithms:Vector dimensions do not match: Dim (v1) = ${v1.size} and Dim (v2)" + s "= ${v2.size}.")

Val x = v1.toArray val y = v2.toArray

Euclidean (x, y)}

Def euclidean (x: Array [Double], y: Array [Double]): Double = {require (x.length = = y.length, s "SimilarityAlgorithms:Array length do not match: Len (x) = ${x.length} and Len (y)" + s "= ${y.length}.")

Math.sqrt (x.zip (y) .map (p = > p.Vector 1-p.map) .map (d = > d * d) .sum)} def euclidean (v1: Vector, v2: Vector): Double = {val sqdist = Vectors.sqdist (v1, v2) math.sqrt (sqdist)}

Pearson correlation coefficient def pearsonCorrelationSimilarity (arr1: Array [Double], arr2: Array [Double]): Double = {require (arr1.length = = arr2.length, s "SimilarityAlgorithms:Array length do not match: Len (x) = ${arr1.length} and Len (y)" + s "= ${arr2.length}.")

Val sum_vec1 = arr1.sum val sum_vec2 = arr2.sum

Val square_sum_vec1 = arr1.map (x = > x * x). Sum val square_sum_vec2 = arr2.map (x = > x * x). Sum

Val zipVec = arr1.zip (arr2)

Val product = zipVec.map (x = > x.room1 * x.room2). Sum val numerator = product-(sum_vec1 * sum_vec2 / arr1.length)

Val dominator = math.pow ((square_sum_vec1-math.pow (sum_vec1, 2) / arr1.length) * (square_sum_vec2-math.pow (sum_vec2, 2) / arr2.length), 0.5) if (dominator = 0) Double.NaN else numerator / (dominator * 1.0)}

CoSine similarity

/ * * jblas realizes cosine similarity * / def cosineSimilarity (v1: DoubleMatrix, v2: DoubleMatrix): Double = {require (x.length = = y.length, s "SimilarityAlgorithms:Array length do not match: Len (v1) = ${x.length} and Len (v2)" + s "= ${y.length}.") V1.dot (v2) / (v1.norm2 () * v2.norm2 ())} def cosineSimilarity (v1: Vector, v2: Vector): Double = {require (v1.size = = v2.size, s "SimilarityAlgorithms:Vector dimensions do not match: Dim (v1) = ${v1.size} and Dim (v2)" + s "= ${v2.size}.")

Val x = v1.toArray val y = v2.toArray

CosineSimilarity (x, y)}

Def cosineSimilarity (x: Array [Double], y: Array [Double]): Double = {require (x.length = = y.length, s "SimilarityAlgorithms:Array length do not match: Len (x) = ${x.length} and Len (y)" + s "= ${y.length}.")

Val member = x.zip (y) .map (d = > d.room1 * d.room2). Sum val temp1 = math.sqrt (x.map (math.pow (2)) .sum) val temp2 = math.sqrt (y.map (math.pow (2)) .sum)

Val denominator = temp1 * temp2 if (denominator = = 0) Double.NaN else member / (denominator * 1.0)}

Modified cosine similarity

Def adjustedCosineSimJblas (x: DoubleMatrix, y: DoubleMatrix): Double = {require (x.length = = y.length, s "SimilarityAlgorithms:DoubleMatrix length do not match: Len (x) = ${x.length} and Len (y)" + s "= ${y.length}.")

Val avg = (x.sum () + y.sum ()) / (x.length + y.length) val v1 = x.sub (avg) val v2 = y.sub (avg) v1.dot (v2) / (v1.norm2 () * v2.norm2 ())}

Def adjustedCosineSimJblas (x: Array [Double], y: Array [Double]): Double = {require (x.length = = y.length, s "SimilarityAlgorithms:Array length do not match: Len (x) = ${x.length} and Len (y)" + s "= ${y.length}.")

Val v1 = new DoubleMatrix (x) val v2 = new DoubleMatrix (y)

AdjustedCosineSimJblas (v1, v2)} def adjustedCosineSimilarity (v1: Vector, v2: Vector): Double = {require (v1.size = = v2.size, s "SimilarityAlgorithms:Vector dimensions do not match: Dim (v1) = ${v1.size} and Dim (v2)" + s "= ${v2.size}.") Val x = v1.toArray val y = v2.toArray

AdjustedCosineSimilarity (x, y)}

Def adjustedCosineSimilarity (x: Array [Double], y: Array [Double]): Double = {require (x.length = = y.length, s "SimilarityAlgorithms:Array length do not match: Len (x) = ${x.length} and Len (y)" + s "= ${y.length}.")

Val avg = (x.sum + y.sum) / (x.length + y.length)

Val member = x.map (_-avg) .zip (y.map (_-avg)) .map (d = > d.room1 * d.room2) .sum

Val temp1 = math.sqrt (x.map (num = > math.pow (num-avg, 2)) .sum) val temp2 = math.sqrt (y.map (num = > math.pow (num-avg, 2)) .sum)

Val denominator = temp1 * temp2 if (denominator = = 0) Double.NaN else member / (denominator * 1.0)}

If you have relevant requirements in the actual business processing, you can optimize or modify the above code according to the actual scenario. Of course, some of the algorithms provided by many algorithm frameworks encapsulate these similarity algorithms, and the bottom layer still depends on this set. It can also help you to understand better. For example, Spark MLlib in the implementation of the KMeans algorithm, the underlying implementation of the Euclidean distance calculation.

This is the answer to the question about how Spark implements the similarity algorithm in the recommendation system. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report