Deploy and install Mahout 04/21 Update SLTechnology News&Howtos

Deploy and install Mahout

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

A brief introduction to Mahout

Mahout is an open source project under Apache Software Foundation (ASF), which provides the implementation of some scalable classical algorithms in the field of machine learning, which is designed to help developers create smart applications more easily and quickly. The Apache Mahout project is in its third year and now has three public distributions. Mahout includes many implementations, including clustering, classification, recommendation filtering, and frequent subitem mining. In addition, Mahout can be effectively extended to the cloud by using the Apache Hadoop library.

(that's him, the Mahout riding on the elephant's head)

Machine learning algorithm implemented in Mahout:

Algorithm class

Algorithm name

Chinese name

Classification algorithm

Logistic Regression

Logical regression

Bayesian

Bayes

SVM

Support vector machine

Perceptron

Perceptron algorithm

Neural Network

Neural network

Random Forests

Random forest

Restricted Boltzmann Machines

Finite Boltzmann machine

Clustering algorithm

Canopy Clustering

Canopy clustering

K-means Clustering

K-means algorithm

Fuzzy K-means

Expectation Maximization

EM clustering (expectation maximization clustering)

Mean Shift Clustering

Mean shift clustering

Hierarchical Clustering

Hierarchical clustering

Dirichlet Process Clustering

Dirichlet process clustering

Latent Dirichlet Allocation

LDA clustering

Spectral Clustering

Spectral clustering

Association rule mining

Parallel FP Growth Algorithm

Parallel FP Growth algorithm

Regress

Locally Weighted Linear Regression

Locally weighted linear regression

Dimension reduction / dimension reduction

Singular Value Decomposition

Singular value decomposition

Principal Components Analysis

Principal component analysis

Independent Component Analysis

Independent component analysis

Gaussian Discriminative Analysis

Gaussian discriminant analysis

Evolutionary algorithm

Parallelizes the Watchmaker framework

Recommendation / collaborative filtering

Non-distributed recommenders

Taste (UserCF, ItemCF, SlopeOne)

Distributed Recommenders

ItemCF

Vector similarity calculation

RowSimilarityJob

Calculate the similarity between columns

VectorDistanceJob

Calculate the distance between vectors

Non-Map-Reduce algorithm

Hidden Markov Models

Hidden Markov model

Set method extension

Collections

Extends the Collections class of java

Method 1. Installation and configuration of Mahout

Download Mahout

Http://archive.apache.org/dist/mahout/

2. Decompression

Tar-zxvf mahout-distribution-0.9.tar.gz

Third, configure environment variables

Configure Mahout environment variables

# set mahout environment

Export MAHOUT_HOME=/home/yujianxin/mahout/mahout-distribution-0.9

Export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf

Export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH

3.2.Configuring Hadoop environment variables required for Mahout

# set hadoop environment

Export HADOOP_HOME=/home/yujianxin/hadoop/hadoop-1.1.2

Export HADOOP_CONF_DIR=$HADOOP_HOME/conf

Export PATH=$PATH:$HADOOP_HOME/bin

Export HADOOP_HOME_WARN_SUPPRESS=not_null

Verify whether Mahout is installed successfully

Execute the command mahout. If some algorithms are listed, they are successful, as shown in the figure:

5. Entry-level use of Mahout

5.1.Starting Hadoop

5.2. Download test data

Synthetic_control.data in http://archive.ics.uci.edu/ml/databases/synthetic_control/ links

5.3. Upload test data

Hadoop fs-put synthetic_control.data / user/root/testdata

5.4 using the kmeans clustering algorithm in Mahout, execute the command:

Mahout-core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It takes about 9 minutes to complete the clustering.

5.5 View clustering results

Execute hadoop fs-ls / user/root/output to view the clustering results.

Method 2. Installation and configuration of Mahout

Mahout is an advanced application of Hadoop. To run Mahout, you need to install Hadoop in advance. There are a lot of Hadoop installations online, and it's not complicated. I won't talk about it here, but I'll explain how to install Mahout.

1: download the binary decompression installation.

To http://labs.renren.com/apache-mirror/mahout/0.7 download, I choose to download binary package, directly decompress and can.

Hadoop@ubuntu:~$ tar-zxvf mahout-distribution-0.7.tar.gz

2: configure environment variables: add the following red information to / etc/profile,/home/hadoop/.bashrc

# set java environment

MAHOUT_HOME=/home/hadoop/mahout-distribution-0.7

PIG_HOME=/home/hadoop/pig-0.9.2

HBASE_HOME=/home/hadoop/hbase-0.94.3

HIVE_HOME=/home/hadoop/hive-0.9.0

HADOOP_HOME=/home/hadoop/hadoop-1.1.1

JAVA_HOME=/home/hadoop/jdk1.7.0

PATH=$JAVA_HOME/bin:$PIG_HOME/bin:$MAHOUT_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/conf:$PATH

CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$HBASE_HOME/lib:$MAHOUT_HOME/lib:$PIG_HOME/lib:$HIVE_HOME/lib:$JAVA_HOME/lib/tools.jar

Export MAHOUT_HOME

Export PIG_HOME

Export HBASE_HOME

Export HADOOP_HOME

Export JAVA_HOME

Export HIVE_HOME

Export PATH

Export CLASSPATH

3: start hadoop, or you can test it with pseudo-distributed

4:mahout-- help # check that Mahout is installed properly to see if some algorithms are listed

Preparation for 5:mahout use

a. Download a file synthetic_control.data, download address http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data, and place the file in the $MAHOUT_HOME directory.

b. Start Hadoop:$HADOOP_HOME/bin/start-all.sh

c. Create the test directory testdata and import the data into this tastdata directory (the name of the directory here can only be testdata)

Hadoop@ubuntu:~/$ hadoop fs-mkdir testdata # hadoop@ubuntu:~/$ hadoop fs-put / home/hadoop/mahout-distribution-0.7/synthetic_control.data testdata

d. Use the kmeans algorithm (this will run for a few minutes or so)

Hadoop@ubuntu:~/$ hadoop jar / home/hadoop/mahout-distribution-0.7/mahout-examples-0.7-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

e. View the result

Hadoop@ubuntu:~/$ hadoop fs-lsr output

If you see the following results, then the algorithm runs successfully and your installation is successful.

ClusteredPoints clusters-0 clusters-1 clusters-10 clusters-2 clusters-3 clusters-4 clusters-5 clusters-6 clusters-7 clusters-8 clusters-9 data

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.