Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Getting started with Mahout--

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

A brief introduction to Mahout

Mahout is a powerful data mining tool and a collection of distributed machine learning algorithms, including the implementation, classification and clustering of distributed collaborative filtering called Taste. The biggest advantage of Mahout is based on hadoop implementation, which converts many algorithms running on a single machine into MapReduce mode, which greatly improves the amount of data and performance that the algorithm can handle.

Machine learning algorithm implemented in Mahout:

Algorithm class

Algorithm name

Chinese name

Classification algorithm

Logistic Regression

Logical regression

Bayesian

Bayes

SVM

Support vector machine

Perceptron

Perceptron algorithm

Neural Network

Neural network

Random Forests

Random forest

Restricted Boltzmann Machines

Finite Boltzmann machine

Clustering algorithm

Canopy Clustering

Canopy clustering

K-means Clustering

K-means algorithm

Fuzzy K-means

Fuzzy K-means

Expectation Maximization

EM clustering (expectation maximization clustering)

Mean Shift Clustering

Mean shift clustering

Hierarchical Clustering

Hierarchical clustering

Dirichlet Process Clustering

Dirichlet process clustering

Latent Dirichlet Allocation

LDA clustering

Spectral Clustering

Spectral clustering

Association rule mining

Parallel FP Growth Algorithm

Parallel FP Growth algorithm

Regress

Locally Weighted Linear Regression

Locally weighted linear regression

Dimension reduction / dimension reduction

Singular Value Decomposition

Singular value decomposition

Principal Components Analysis

Principal component analysis

Independent Component Analysis

Independent component analysis

Gaussian Discriminative Analysis

Gaussian discriminant analysis

Evolutionary algorithm

Parallelizes the Watchmaker framework

Recommendation / collaborative filtering

Non-distributed recommenders

Taste (UserCF, ItemCF, SlopeOne)

Distributed Recommenders

ItemCF

Vector similarity calculation

RowSimilarityJob

Calculate the similarity between columns

VectorDistanceJob

Calculate the distance between vectors

Non-Map-Reduce algorithm

Hidden Markov Models

Hidden Markov model

Set method extension

Collections

Extends the Collections class of java

II. Installation and configuration of Mahout

Download Mahout

Http://archive.apache.org/dist/mahout/

2. Decompression

Tar-zxvf mahout-distribution-0.9.tar.gz

Third, configure environment variables

Configure Mahout environment variables

# set mahout environment

Export MAHOUT_HOME=/usr/local/mahout-distribution-0.9

Export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf

Export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATHma

Verify whether Mahout is installed successfully

Execute the command mahout. If some algorithms are listed, they are successful, as shown in the figure:

5. Entry-level use of Mahout

5.1.Starting Hadoop

5.2. Download test data

a. Download a file synthetic_control.data, download address http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data, and place the file in the $MAHOUT_HOME directory.

5.3. Upload test data

c. Create the test directory testdata and import the data into this tastdata directory (the name of the directory here can only be testdata)

Hadoop fs-mkdir-p / user/root/testdata

Hadoop fs-put synthetic_control.data / user/root/testdata

5.4 using the kmeans clustering algorithm in Mahout, execute the command:

Mahout-core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It takes about 5 minutes to complete the clustering.

5.5 View clustering results

Execute hadoop fs-ls/user/root/output to view the clustering results.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report