How to install and configure hadoop 04/14 Update SLTechnology News&Howtos

How to install and configure hadoop

2025-04-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "how to install and configure hadoop", the content is easy to understand, clear, hope to help you solve your doubts, let the editor lead you to study and learn "how to install and configure hadoop" this article.

# # introduction to 1.hadoop

What is the purpose of hadoop? In what areas does it apply? Solve what problem?

Three core components of 1.hadoop

1 > HDFS (distributed file system) to solve massive data storage.

2 > YARN (job scheduling and cluster resource management) to solve the problem of resource task scheduling. 3 > MAPReduce (distributed into model) to solve the problem of massive data computing.

2.hadoop application field

1 > Log analysis and processing

2 > ETL, data is extracted to mysql, mongdb, hive and other databases.

3 > Machine learning, such as the Apache Mahout project, is hot at the moment.

4 > data mining is actually data analysis, such as intelligent recommendation.

3.hadoop solves the problem

For example: the demand side requires statistical analysis of the location and number of some special characters in 100m files. At this time, the java file stream can be used to read this file, and then write logic code to analyze statistics. But if the demand side gives 10 100m, or 1w 100m files, it can only rely on some big data framework to deal with, hadoop arises at the historic moment, it solves the problem of massive data storage and massive data analysis.

# # 2. Install hadoop

1. Download hadoop, official website: https://archive.apache.org/dist/hadoop/common/hadoop-2.10.1/

two。 Extract hadoop,tar-zxvf hadoop-2.10.1.tar.gz, as shown below

Here is a brief description of the hadoop directory structure

1 > bin directory, which stores the operation scripts of hadoop related services (hdfs, mapred, yarn).

2 > etc directory, which stores the configuration file of hadoop, which will be modified a lot later.

3 > lib directory, store the local library of hadoop, compress and decompress data

4 > sbin directory, which stores the start or stop scripts for all hadoop services

5 > share directory, storing hadoop dependent jar packages, documents and official samples

3. Configure hadoop environment variables

Vi / etc/profile

# hadoop environment variable export HADOOP_HOME=/usr/java/hadoop/hadoop-2.10.1

Export PATH=$PATH:$HADOOP_HOME/bin

Export PATH=$PATH:$HADOOP_HOME/sbin

4. Test whether the installation is successful

Hadoop version

Obviously, the version information came out, indicating that there was no problem with the configuration of environment variables.

5. Modify hadoop configuration file

1 > modify the configuration file hadoop-env.sh

Open the configuration file and find that there is a line that configures the java environment variable with an initial value of ${JAVA_HOME}, which is changed to the specific directory where the jdk is located. To view the jdk installation directory, use the echo $JAVA_HOME command

After modifying the hadoop-env.sh, the following

2 > modify the configuration file core-site.xml

Mainly set two properties, the address of 1.HDFS 's NameNode; 2. Specify the temporary file directory generated by the hadoop runtime

Fs.defaultFS hdfs://localhost:9090 hadoop.tmp.dir / usr/java/hadoop/hadoop-2.10.1/tmp

The modification is as follows

3 > modify hdfs-site.xml file

There are three main attributes: the file storage path generated by 1.namenode, the file storage path generated by 2.datanode, and 3.replication represents the number of copies of data saved by HDFS.

Dfs.namenode.name.dir file:/usr/java/hadoop/hadoop-2.10.1/hdfs/name dfs.datanode.data.dir file:/usr/java/hadoop/hadoop-2.10.1/hdfs/data dfs.replication 1

4 > modify the configuration file mapred-site.xml

We don't find mapred-site.xml in the hadoop directory, but we do have mapred-site.xml.template, as shown in the following figure

So we need to get rid of .template. Mv mapred-site.xml.template mapred-site.xml is adopted.

In this configuration file, we set only one property to specify that mapreduce runs on yarn, that is, mapreduce uses the yarn framework.

Mapreduce.framework.name yarn

5 > modify the configuration file yarn-site.xml

The main configuration is that the way nodemanager gets data is shuffle.

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandle above is all the content of this article "how to install and configure hadoop". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.