In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to install and configure hadoop", the content is easy to understand, clear, hope to help you solve your doubts, let the editor lead you to study and learn "how to install and configure hadoop" this article.
# # introduction to 1.hadoop
What is the purpose of hadoop? In what areas does it apply? Solve what problem?
Three core components of 1.hadoop
1 > HDFS (distributed file system) to solve massive data storage.
2 > YARN (job scheduling and cluster resource management) to solve the problem of resource task scheduling. 3 > MAPReduce (distributed into model) to solve the problem of massive data computing.
2.hadoop application field
1 > Log analysis and processing
2 > ETL, data is extracted to mysql, mongdb, hive and other databases.
3 > Machine learning, such as the Apache Mahout project, is hot at the moment.
4 > data mining is actually data analysis, such as intelligent recommendation.
3.hadoop solves the problem
For example: the demand side requires statistical analysis of the location and number of some special characters in 100m files. At this time, the java file stream can be used to read this file, and then write logic code to analyze statistics. But if the demand side gives 10 100m, or 1w 100m files, it can only rely on some big data framework to deal with, hadoop arises at the historic moment, it solves the problem of massive data storage and massive data analysis.
# # 2. Install hadoop
1. Download hadoop, official website: https://archive.apache.org/dist/hadoop/common/hadoop-2.10.1/
two。 Extract hadoop,tar-zxvf hadoop-2.10.1.tar.gz, as shown below
Here is a brief description of the hadoop directory structure
1 > bin directory, which stores the operation scripts of hadoop related services (hdfs, mapred, yarn).
2 > etc directory, which stores the configuration file of hadoop, which will be modified a lot later.
3 > lib directory, store the local library of hadoop, compress and decompress data
4 > sbin directory, which stores the start or stop scripts for all hadoop services
5 > share directory, storing hadoop dependent jar packages, documents and official samples
3. Configure hadoop environment variables
Vi / etc/profile
# hadoop environment variable export HADOOP_HOME=/usr/java/hadoop/hadoop-2.10.1
Export PATH=$PATH:$HADOOP_HOME/bin
Export PATH=$PATH:$HADOOP_HOME/sbin
4. Test whether the installation is successful
Hadoop version
Obviously, the version information came out, indicating that there was no problem with the configuration of environment variables.
5. Modify hadoop configuration file
1 > modify the configuration file hadoop-env.sh
Open the configuration file and find that there is a line that configures the java environment variable with an initial value of ${JAVA_HOME}, which is changed to the specific directory where the jdk is located. To view the jdk installation directory, use the echo $JAVA_HOME command
After modifying the hadoop-env.sh, the following
2 > modify the configuration file core-site.xml
Mainly set two properties, the address of 1.HDFS 's NameNode; 2. Specify the temporary file directory generated by the hadoop runtime
Fs.defaultFS hdfs://localhost:9090 hadoop.tmp.dir / usr/java/hadoop/hadoop-2.10.1/tmp
The modification is as follows
3 > modify hdfs-site.xml file
There are three main attributes: the file storage path generated by 1.namenode, the file storage path generated by 2.datanode, and 3.replication represents the number of copies of data saved by HDFS.
Dfs.namenode.name.dir file:/usr/java/hadoop/hadoop-2.10.1/hdfs/name dfs.datanode.data.dir file:/usr/java/hadoop/hadoop-2.10.1/hdfs/data dfs.replication 1
4 > modify the configuration file mapred-site.xml
We don't find mapred-site.xml in the hadoop directory, but we do have mapred-site.xml.template, as shown in the following figure
So we need to get rid of .template. Mv mapred-site.xml.template mapred-site.xml is adopted.
In this configuration file, we set only one property to specify that mapreduce runs on yarn, that is, mapreduce uses the yarn framework.
Mapreduce.framework.name yarn
5 > modify the configuration file yarn-site.xml
The main configuration is that the way nodemanager gets data is shuffle.
Yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandle above is all the content of this article "how to install and configure hadoop". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.