How to use spark mlilib clustering KMeans 04/11 Update SLTechnology News&Howtos

How to use spark mlilib clustering KMeans

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "spark mlilib clustering KMeans how to use", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "spark mlilib clustering KMeans how to use" this article.

Clustering usage scenario

Data clustering is a technology for static data analysis, which is widely used in many fields, including machine learning, data mining, pattern recognition, image analysis, information retrieval and biological information.

The running code is as follows: package spark.clusteringimport org.apache.spark.mllib.clustering.KMeansimport org.apache.spark.mllib.linalg.Vectorsimport org.apache.spark. {SparkContext, SparkConf} / * generally speaking, classification refers to supervised learning, that is, the samples to be classified are marked and the categories are known. * clustering refers to unsupervised learning, samples are unmarked, and samples are clustered into K classes according to a certain measure of similarity. * * clustering KMEANS * the basic idea and core content is to randomly give several (k) centers at the beginning of the algorithm, assign the sample points to each center point according to the distance principle, and then calculate the center point position of the cluster set according to the average method. In order to redetermine the location of the new center point. Iterate continuously until the samples in the cluster meet a certain threshold. * * Created by eric on 16-7-21. * / object Kmeans {val conf = new SparkConf () / / create the environment variable .setMaster ("local") / / set the localization handler .setAppName ("KMeans") / / set the name val sc = new SparkContext (conf) Def main (args: Array [String]) {val data = sc.textFile (". / src/main/spark/clustering/kmeans.txt") val parsedData = data.map (s = > Vectors.dense (s.split ('). Map (_ .toDouble)) .cache () val numClusters = 2 / / maximum number of classifications val numIterations = 20 / iterations val model = KMeans.train (parsedData) NumClusters, numIterations) model.clusterCenters.foreach (println) / / Category Center / / [1.4000000000000001Power2.0] / / [3.6666666666666665]}} kmeans.txt1 21 11 32 23 44 32 24 4

The above is all the contents of the article "how to use spark mlilib clustering KMeans". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.