In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
K-Means algorithm is a distance-based clustering algorithm, which uses iterative method to calculate K clustering centers and cluster several points into K-class.
MLlib implements the K-Means algorithm by running multiple K-Means algorithms, each called run, to return the cluster center of the best cluster. The initial cluster center can be random or KMean | | obtained, when the iteration reaches a certain number of times, or when all run converges, the algorithm ends.
To implement the K-Means algorithm with Spark, first modify the pom file and introduce the machine learning MLlib package:
Org.apache.spark spark-mllib_2.10 1.6.0
Code:
Import org.apache.log4j. {Level,Logger} import org.apache.spark. {SparkContext SparkConf} import org.apache.spark.mllib.clustering.KMeansimport org.apache.spark.mllib.linalg.Vectorsobject Kmeans {def main (args: array [string]) = {/ / block log Logger.getLogger ("org.apache.spark"). SetLevel (Level.WARN) Logger.getLogger ("org.apache.jetty.server") .setLevel (Level.OFF) / / set the runtime environment val conf = new SparkConf (). SetAppName ("K-Means"). SetMaster ("spark:/") / master:7077 ") .setJars (Seq (" E:\ Intellij\\ Projects\\ SimpleGraphX\\ SimpleGraphX.jar ") val sc = new SparkContext (conf) / / load dataset val data = sc.textFile (" hdfs://master:9000/kmeans_data.txt ") 1) val parsedData = data.map (s = > Vectors.dense (s.split ("). Map (_ .toDouble)) / / clustering the data into two classes and 20 iterations to form the data model val numClusters = 2 val numIterations = 20 val model = KMeans.train (parsedData, numClusters) NumIterations) / / the central point of the data model println ("Cluster centres:") for (c val linevectore = Vectors.dense (line.split ("). Map (_ .toDouble)) val prediction = model.predict (linevectore) line +" + prediction} .foreach (println) sc.stop}
Use the textFile () method to load the dataset, get the RDD, and then use the KMeans.train () method to get a KMeans model based on the RDD, K value, and iterations. Once you have the KMeans model, you can determine which class a set of data belongs to. The specific method is to generate a Vector with the Vectors.dense () method, and then use the KMeans.predict () method to return which class it belongs to.
Running result:
Cluster centres: [Within Set Sum of Squared Errors = 943.2074999999998Vectors 7.31.5 10.9 is belong to cluster:0Vectors 4.2 11.2 2.7 is belong to cluster:0Vectors 18.04.5 3.8 is belong to cluster:10.0 0.0 5.00.1 10.1 0.1 01.2 5.2 13.5 09.5 9.0 9.09.1 9.19 . 1 019.2 9.4 29.2 05.8 3.0 18.0 03.5 12.2 60.0 13.6 7.9 8.1 0
Summary
This article on the implementation of Spark K-Means algorithm code examples all the content here, I hope to help you. Interested friends can continue to refer to this site: talking about seven common Hadoop and Spark project cases, Spark broadcast variables and accumulator usage code examples, Spark introduction, etc., if there are deficiencies, welcome to leave a message, the editor will reply to you in time and correct, hope friends to support this site!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.