In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
In this issue, the editor will bring you about how to build a Hadoop cluster and how to operate Python. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.
Recently, we are doing hundreds of billions of big data storage and retrieval requirements in the project. 10T of text data should be parsed and stored in the database for real-time retrieval. File storage has become a primary problem, using a variety of storage methods, which do not meet the requirements. Finally, we use HDFS distributed file storage system to find that the efficiency, management and other aspects are quite good, so we study the construction and use methods, and hereby record the documents.
Environment
Modify the hostname # modify the hostname vi / etc/hostname of each machine according to the above environment configuration # use the hostname command to make it effective There is no need to restart hostname xxxx to modify the hosts file vi / etc/hosts 192.168.143.130 master 192.168.143.131 slave1 192.168.143.132 slave2 192.168.143.133 slave3 192.168.143.134 slave4 configuration secret-free login ssh-keygen-t rsa ssh-copy-id-I ~ / .ssh / id_rsa.pub master ssh-copy-id-I ~. Ssh/id_rsa.pub slave1 ssh-copy-id-I ~ / .ssh / id_ Rsa.pub slave2 ssh-copy-id-I ~ / .ssh/id_rsa.pub slave3 ssh-copy-id-I ~ / .ssh/id_rsa.pub slave4 installation JDK (per machine) apt-get install-y openjdk-8-jre-headless openjdk-8-jdk configuration environment variable
Add the following at the end of the / etc/profile file:
Export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools/jar export HADOOP_HOME=/usr/hadoop-3.3.0/ export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP _ YARN_HOME=$HADOOP_HOME export HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
Make the environment variable effective
Source / etc/profile create directory (daily machine)
When you create a directory, you should first check the disk space on this machine through the df-h command, determine the disk for data storage, then create the following three directories, and modify the corresponding directory configuration in the configuration file hdfs-site.xml below.
Mkdir-p / home/hadoop/dfs/name mkdir-p / home/hadoop/dfs/data mkdir-p / home/hadoop/temp installation and configuration Hadoop
Download the Hadoop installation package
Http://archive.apache.org/dist/hadoop/core/stable/hadoop-3.3.0.tar.gz
# copy it to tar-xzvf hadoop-3.3.0.tar.gz mv hadoop-3.3.0 / usr to configure Hadoop under / usr directory after decompression
The configuration file is in the
/ usr/hadoop-3.3.0/etc/hadoop directory
Hadoop-env.sh
Export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root
Core-site.xml
Fs.defaultFS hdfs://master:9000 hadoop.http.staticuser.user root dfs.permissions.enabled false
Hdfs-site.xml configures multiple file storage directories, separated by commas
Dfs.namenode.name.dir / home/hadoop/dfs/name dfs.dataname.data.dir / home/hadoop/dfs/data,/usr1/hadoop/dfs/data dfs.replication 2
Mapred-site.xml
Mapreduce.framework.name yarn
Yarn-site.xml
Yarn.resourcemanager.hostname master yarn.resourcemanager.webapp.address master:8088 yarn.nodemanager.aux-services mapreduce_shuffle
Workers
What is configured here is the DataNode storage machine. It is not recommended to use master as storage. If the storage is full of clusters, it cannot be used.
Slave1 slave2 slave3 slave4
Just copy the / usr/hadoop-3.3.9 on master to another machine.
Scp / usr/hadoop-3.3.0 slave1:/usr scp / usr/hadoop-3.3.0 slave2:/usr scp / usr/hadoop-3.3.0 slave3:/usr scp / usr/hadoop-3.3.0 slave4:/usr format HDFS directory (on master machine) hdfs namenode-format starts Hadoop
It can be executed on the master machine. After execution, you can use the jps command to check the process status on all machines.
Cd / usr/hadoop-3.3.0/sbin. / start-all.sh to view process status
Execute the jps command on master and slave, respectively
Check to see if it is successful
Open the following web page in the browser to see if you can access it properly.
# Hadoop Cluster Information http://192.168.143.130:8088/cluster # HDFS address http://192.168.143.130:9870/dfshealth.html # DataNode address http://192.168.143.130:9864/datanode.html # NodeManager address http://192.168.143.130:8042/node # SecondaryNameNode http://192.168.143.130:9868/status.html
Test File upload (master)
Hdfs dfs-mkdir / test hdfs dfs-put start-dfs.sh / testHDFS operation command
Create a folder
Hdfs dfs-mkdir / myTask
Create a multi-tier file
Hdfs dfs-mkdir-p / myTask/input
Upload files
Hdfs dfs-put / opt/wordcount.txt / myTask
View files and folders in the general directory
Hdfs dfs-ls /
View the contents of the wordcount.txt file in the myTask directory
Hdfs dfs-cat / myTask/wordcount.txt
Delete a file or folder
Hdfs dfs-rm-r / myTask/wordcount.txt
Download the file to local
Hdfs dfs-get / myTask/wordcount.txt / optPython Operation hdfs
When python operates hdfs, if you want to upload and download files, you must configure the hosts file on the machine where the code is executed, because the namenode and datanode of hdfs are recorded in hostname after registration. If you do not configure the upload / download operation directly, hostname will be used for operation, so you need to configure the corresponding configuration of IP and hostname of hdfs cluster machines on the local machine. For example, when I operate on this machine, I must configure it as follows:
C:\ Windows\ System32\ drivers\ etc\ hosts 192.168.143.130 master 192.168.143.131 slave1 192.168.143.132 slave2 192.168.143.133 slave3 192.168.143.134 slave4 installation Library pip install hdfs operation
Connect
From hdfs.client import Client client = Client ("http://192.168.143.130:9870")
Create a directory
Client.makedirs (hdfs_path)
Delete a file
Client.delete (hdfs_path)
Upload files
Client.download (hdfs_path, local_path)
Get the list of files under the directory
Client.list (hdfs_path)
The advantages of HDFS file storage cluster are: low configuration requirements, easy to expand, high efficiency, very suitable for mass file storage, and can provide web management pages and provide a very good third-party library. In web development, as a file and image repository is also a very good choice.
The above is how to build a Hadoop cluster and how to operate Python. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.