In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
A brief introduction to Mahout
Mahout is an open source project under Apache Software Foundation (ASF), which provides the implementation of some scalable classical algorithms in the field of machine learning, which is designed to help developers create smart applications more easily and quickly. The Apache Mahout project is in its third year and now has three public distributions. Mahout includes many implementations, including clustering, classification, recommendation filtering, and frequent subitem mining. In addition, Mahout can be effectively extended to the cloud by using the Apache Hadoop library.
(that's him, the Mahout riding on the elephant's head)
Machine learning algorithm implemented in Mahout:
Algorithm class
Algorithm name
Chinese name
Classification algorithm
Logistic Regression
Logical regression
Bayesian
Bayes
SVM
Support vector machine
Perceptron
Perceptron algorithm
Neural Network
Neural network
Random Forests
Random forest
Restricted Boltzmann Machines
Finite Boltzmann machine
Clustering algorithm
Canopy Clustering
Canopy clustering
K-means Clustering
K-means algorithm
Fuzzy K-means
Fuzzy K-means
Expectation Maximization
EM clustering (expectation maximization clustering)
Mean Shift Clustering
Mean shift clustering
Hierarchical Clustering
Hierarchical clustering
Dirichlet Process Clustering
Dirichlet process clustering
Latent Dirichlet Allocation
LDA clustering
Spectral Clustering
Spectral clustering
Association rule mining
Parallel FP Growth Algorithm
Parallel FP Growth algorithm
Regress
Locally Weighted Linear Regression
Locally weighted linear regression
Dimension reduction / dimension reduction
Singular Value Decomposition
Singular value decomposition
Principal Components Analysis
Principal component analysis
Independent Component Analysis
Independent component analysis
Gaussian Discriminative Analysis
Gaussian discriminant analysis
Evolutionary algorithm
Parallelizes the Watchmaker framework
Recommendation / collaborative filtering
Non-distributed recommenders
Taste (UserCF, ItemCF, SlopeOne)
Distributed Recommenders
ItemCF
Vector similarity calculation
RowSimilarityJob
Calculate the similarity between columns
VectorDistanceJob
Calculate the distance between vectors
Non-Map-Reduce algorithm
Hidden Markov Models
Hidden Markov model
Set method extension
Collections
Extends the Collections class of java
Method 1. Installation and configuration of Mahout
Download Mahout
Http://archive.apache.org/dist/mahout/
2. Decompression
Tar-zxvf mahout-distribution-0.9.tar.gz
Third, configure environment variables
Configure Mahout environment variables
# set mahout environment
Export MAHOUT_HOME=/home/yujianxin/mahout/mahout-distribution-0.9
Export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
Export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH
3.2.Configuring Hadoop environment variables required for Mahout
# set hadoop environment
Export HADOOP_HOME=/home/yujianxin/hadoop/hadoop-1.1.2
Export HADOOP_CONF_DIR=$HADOOP_HOME/conf
Export PATH=$PATH:$HADOOP_HOME/bin
Export HADOOP_HOME_WARN_SUPPRESS=not_null
Verify whether Mahout is installed successfully
Execute the command mahout. If some algorithms are listed, they are successful, as shown in the figure:
5. Entry-level use of Mahout
5.1.Starting Hadoop
5.2. Download test data
Synthetic_control.data in http://archive.ics.uci.edu/ml/databases/synthetic_control/ links
5.3. Upload test data
Hadoop fs-put synthetic_control.data / user/root/testdata
5.4 using the kmeans clustering algorithm in Mahout, execute the command:
Mahout-core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
It takes about 9 minutes to complete the clustering.
5.5 View clustering results
Execute hadoop fs-ls / user/root/output to view the clustering results.
Method 2. Installation and configuration of Mahout
Mahout is an advanced application of Hadoop. To run Mahout, you need to install Hadoop in advance. There are a lot of Hadoop installations online, and it's not complicated. I won't talk about it here, but I'll explain how to install Mahout.
1: download the binary decompression installation.
To http://labs.renren.com/apache-mirror/mahout/0.7 download, I choose to download binary package, directly decompress and can.
Hadoop@ubuntu:~$ tar-zxvf mahout-distribution-0.7.tar.gz
2: configure environment variables: add the following red information to / etc/profile,/home/hadoop/.bashrc
# set java environment
MAHOUT_HOME=/home/hadoop/mahout-distribution-0.7
PIG_HOME=/home/hadoop/pig-0.9.2
HBASE_HOME=/home/hadoop/hbase-0.94.3
HIVE_HOME=/home/hadoop/hive-0.9.0
HADOOP_HOME=/home/hadoop/hadoop-1.1.1
JAVA_HOME=/home/hadoop/jdk1.7.0
PATH=$JAVA_HOME/bin:$PIG_HOME/bin:$MAHOUT_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/conf:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$HBASE_HOME/lib:$MAHOUT_HOME/lib:$PIG_HOME/lib:$HIVE_HOME/lib:$JAVA_HOME/lib/tools.jar
Export MAHOUT_HOME
Export PIG_HOME
Export HBASE_HOME
Export HADOOP_HOME
Export JAVA_HOME
Export HIVE_HOME
Export PATH
Export CLASSPATH
3: start hadoop, or you can test it with pseudo-distributed
4:mahout-- help # check that Mahout is installed properly to see if some algorithms are listed
Preparation for 5:mahout use
a. Download a file synthetic_control.data, download address http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data, and place the file in the $MAHOUT_HOME directory.
b. Start Hadoop:$HADOOP_HOME/bin/start-all.sh
c. Create the test directory testdata and import the data into this tastdata directory (the name of the directory here can only be testdata)
Hadoop@ubuntu:~/$ hadoop fs-mkdir testdata # hadoop@ubuntu:~/$ hadoop fs-put / home/hadoop/mahout-distribution-0.7/synthetic_control.data testdata
d. Use the kmeans algorithm (this will run for a few minutes or so)
Hadoop@ubuntu:~/$ hadoop jar / home/hadoop/mahout-distribution-0.7/mahout-examples-0.7-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
e. View the result
Hadoop@ubuntu:~/$ hadoop fs-lsr output
If you see the following results, then the algorithm runs successfully and your installation is successful.
ClusteredPoints clusters-0 clusters-1 clusters-10 clusters-2 clusters-3 clusters-4 clusters-5 clusters-6 clusters-7 clusters-8 clusters-9 data
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.