In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to use mahout kmeans". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to use mahout kmeans.
Mahout is an open source project under apache Soft Foundation.
Provides the implementation of some scalable classical algorithms in the field of machine learning, which aims to help developers create intelligent applications more easily and quickly.
Many implementations of Mahout, including clustering, classification, recommendation filtering, frequent subproject mining, in addition, by using the Apache Hadoop library
Mahout can be effectively extended to the cloud
Run the kmeans algorithm that comes with Mahout and verify that Mahout is running properly.
Prepare the test data download file
Put the file in the $MAHOUT_HOME directory, synthetic_con
23 17
[hdfs@cloudra ~] $hadoop fs-mkdir testdata
[hdfs@cloudra root] $hadoop fs-mkdir / output
[hdfs@cloudra ~] $hadoop fs-put synthetic_control.data testdata
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
You have new mail in / var/spool/mail/root
/ usr/java/default
Export JAVA_HOME=/usr/java/jdk1.7.0_79
Hdfs@cloudra ~] $mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Or jar mahout-distribution-0.7/mahout-examples-0.7-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.
[root@localhost mahout-distribution-0.9] # hadoop fs-mkdir / user/root/testdata
16-11-23 05:28:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable
Mkdir: `/ user/root/testdata': No such file or directory
[root@localhost mahout-distribution-0.9] # hadoop fs-mkdir-p / user/root/testdata
16-11-23 05:28:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable
[root@localhost mahout-distribution-0.9] # ls
Bin lib mahout-examples-0.9.jar NOTICE.txt
Conf LICENSE.txt mahout-examples-0.9-job.jar README.txt
Docs mahout-core-0.9.jar mahout-integration-0.9.jar
Examples mahout-core-0.9-job.jar mahout-math-0.9.jar
[root@localhost mahout-distribution-0.9] # cd..
[root@localhost soft] # cd..
[root@localhost ~] # cd-
/ root/soft
[root@localhost soft] # ls
Data hadoop-2.6.0 jdk1.7.0_79 mahout-distribution-0.9
[root@localhost soft] # cd data
[root@localhost data] # ls
Synthetic_control.data
[root@localhost data] # hadoop fs-put / user/root/testdata
16-11-23 05:29:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable
Put: `/ user/root/testdata': No such file or directory
[root@localhost data] # hadoop fs-put synthetic_control.data / user/root/testdata
16-11-23 05:29:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable
[root@localhost data] # ls
Synthetic_control.data
[root@localhost data] # cd..
[root@localhost soft] # ls
Data hadoop-2.6.0 jdk1.7.0_79 mahout-distribution-0.9
[root@localhost soft] # cd mahout-distribution-0.9/
[root@localhost mahout-distribution-0.9] # ls
Bin lib mahout-examples-0.9.jar NOTICE.txt
Conf LICENSE.txt mahout-examples-0.9-job.jar README.txt
Docs mahout-core-0.9.jar mahout-integration-0.9.jar
Examples mahout-core-0.9-job.jar mahout-math-0.9.jar
[root@localhost mahout-distribution-0.9] # hadoop jar mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
16-11-23 05:30:30 INFO kmeans.Job: Running with default arguments
16-11-23 05:30:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable
16-11-23 05:30:40 INFO kmeans.Job: Preparing Input
16-11-23 05:30:41 INFO client.RMProxy: Connecting to ResourceManager at hadoop02/127.0.0.1:8032
16-11-23 05:30:42 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16-11-23 05:30:46 INFO input.FileInputFormat: Total input paths to process: 1
16-11-23 05:30:46 INFO mapreduce.JobSubmitter: number of splits:1
16-11-23 05:30:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479907436985_0002
16-11-23 05:30:49 INFO impl.YarnClientImpl: Submitted application application_1479907436985_0002
05:30:49 on 16-11-23 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1479907436985_0002/
16-11-23 05:30:49 INFO mapreduce.Job: Running job: job_1479907436985_0002
16-11-23 05:31:40 INFO mapreduce.Job: Job job_1479907436985_0002 running in uber mode: false
16-11-23 05:31:40 INFO mapreduce.Job: map 0 reduce 0
The source file corresponding to mahout seqdumper converting the SequenceFile file into readable text form is org.apache.mahout.utils.SequenceFileDumper.java, which converts the vector file into
Readable text form, corresponding to the source file is org.apache.mahout.utils.vectors.VectorDumper.java
Mahout clusterdump analyzes the output of the final clustering, and the corresponding source file is org.apache.mahout.utils.clustering.ClusterDumper.java
[root@localhost bin] # mahout seqdumper-s output/clusters-5/part-r-00000-o / txt.data
Mahout clusterdump-seqFileDir / user/root/output/clusters-10-final-pointsDir / user/root/output/clusteredPoints-output $MAHOUT_HOME/examples/output/clusteranalyze.txt
Mahout includes three blocks of clustering, collaborative filtering (recommended item user), and classification algorithm (Bayesian).
Clustering, also known as group analysis, is not only a statistical algorithm for studying the classification of (samples or indicators), but also an important algorithm for data mining.
Cluster analysis is a vector of measurement, or a point in multi-dimensional space.
Cluster analysis is based on similarity, and there is more similarity between patterns in a cluster than between patterns that are not in the same cluster.
Clustering has a wide range of uses.
In business, clustering can help market analysts distinguish different consumer groups from the consumption database, and summarize the consumption patterns or habits of each category of consumers.
Clustering algorithm Canopy algorithm (canopy clustering) K-means algorithm (kmeans cluster) Fuzzy K-means (fuzzy kmeans), EM clustering (expectation maximization clustering EXPECTION MAXMIZATION)
Mean shift clustering (Mean shirt clustering) hierarchical clustering (hieratical cluster) Dikley process clustering (oirichiet process clustering)
Latent dinchiet allocation LOA clustering
Classification is to label objects according to a certain standard, and then distinguish and classify them according to the label.
Classification is defined in advance, and the number of categories remains the same, such as the color size of soybeans and mung beans.
Algorithmic logical regression (logistic regression) Bayesian (Bayesian) support vector machine (Support vector machine) perceptron algorithm
(perceptron and winnow) neural network (Neural network) random forest (random forests)
Finite Boltzmann machine (restric boltzman machine)
Collaborative filtering
Recommendation system (product recommendation, user recommendation)
Recommendation / Collaborative filtering Non-distributed recommenders/ (Distribute Recommenders) TasteUserCF (item cf,slotone) / item cf
Vector similarity calculation RowSimilantyJob / VectorDistanceJob calculate column similarity / calculate vector distance
Non-MR algorithm Hidden markov models Markov model
Collection method extension collocations extends java's collection class
Parallel Fp growth algorithim parallel FP growth algorithm for Mining Association rules
Regression Locally Weighted Linear Regression locally weighted linear regression
Reduced-dimensional stochastic singular value DeCOMPOSITION singular value decomposition / pricipal components Analysis principal component analysis / independent components analysis independent component analysis /
Gaussian discriminative analysis Gaussian discriminant analysis
Parallelization of watchmake Framework by Evolutionary algorithm
Thank you for your reading, the above is the content of "how to use mahout kmeans", after the study of this article, I believe you have a deeper understanding of how to use mahout kmeans, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 278
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.