Example Analysis of hadoop Architecture 04/21 Update SLTechnology News&Howtos

Example Analysis of hadoop Architecture

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shares with you the content of a sample analysis of the hadoop architecture. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

The architecture of hadoop

NameNode-Primary Node Master Server

SecondaryNameNode- is a secondary nameNode

DataNode-for data preservation

TaskTracker-receive task

JobTracker-split data-100m Datanode1,DataNode2,DataNode3

Master: master node, equivalent to project manager

Slave: slave node is equivalent to PG

Hadoop can only run on linux systems:

Install the JVM virtual machine on the linux system, and then run three processes on it

SecondaryNameNode,Jobtracter,NameNode these three processes, these three are all java processes

Among them, NameNode is the most important java process. It determines the master node, but there is no such process on the slave node.

SecondaryNameNode is equivalent to the secretary of NameNode, assisting NameNode to complete the work, Jobtracter task tracking progress, used to track tasks

And hand over the task to the slave node.

You can see that the application usually has a master node and a slave node, and there are also two java processes in the slave node, because the linux system is also installed on the server with the slave node.

Install jvm on the linux system, and then run two java processes, one is: Jobtracter, the other is: DataNode data node, this process is used to handle data-related tasks.

Note that in the hadoop system, there is only one master node, and the rest belong to the slave node.

L NameNode: this is the daemon for hadoop (note that it is the process JVM). Responsible for recording how the file is divided into data blocks and which data nodes these data blocks are stored on.

Centralized management of memory. There is only one NameNode in the entire hadoop. Once the NameNode server goes down, the whole system will not work.

L DataNode: each slave server in the cluster runs a DataNode daemon. This daemon is responsible for writing HDFS blocks to the local file system.

L Secondary NomeNode: a secondary daemon used to monitor the status of HDFS. For example, save a snapshot of NameNode.

L JobTracker: the user connects the application to hadoop. There is only one JobTracker in each hadoop cluster, which typically runs on the Master node.

L TaskTracker: responsible for combining with DataNode.

Hadoop namenode-format [the directory generated after formatting, that is, the directory of hdfs.]

Start hadoop

If 5 java processes are found, the startup is successful:

Because we use a machine here, it acts as both a master node and a slave node.

With the jps command, you can see which java processes are started.

Jps

ResourceManager (hadoop2.x encapsulates JobTracker, TaskTracker)

NameNode

NodeManager

DataNode

SecondaryNameNode [this process, I don't know why it didn't start]

Hdfs concepts and commands

HDFS is the abbreviation of hadoop Distributed file system and the distributed file system of hadoop.

HDFS is managed by hadoop, which is different from the ordinary file system, can not find files intuitively, and must operate HDFS through the hadoop command.

HDFS is a master-slave architecture. A HDFS cluster consists of a name node, which is a master server that manages the namespace of files and adjusts client access to files.

Of course, there are also data nodes, one at a time, which manages storage. HDFS exposes file namespaces and allows user data to be stored as files.

The internal mechanism is to split a file into one or more blocks, which are stored in a set of data nodes.

The NameNode name node manipulates file or directory operations in the file namespace, such as opening, closing, renaming, and so on. It also determines the mapping of the block to the data node.

The DataNode data node is responsible for read and write requests from file system customers.

The data node also performs block creation, deletion, and block copy instructions from the name node.

Name nodes and data nodes are both software running on ordinary machines, and machines are typically linux,HDFS written in java.

Any machine that supports java can run name nodes or data nodes, and HDFS can be easily deployed to a wide range of machines using the ultra-portable java language.

A typical deployment will have a dedicated machine to run the name node software, and other machines in the cluster will run a data node instance.

The architecture excludes instances running multiple data nodes on a machine, but this is not the case in actual deployments.

Only one name node in the cluster greatly simplifies the system. The name node is the arbiter and the repository of all HDFS metadata. The system is designed so that the actual data of the user does not pass through the name node.

1. File manipulation commands under hadoop

1: list hdfs files

~ # hadoop dfs-ls / or ~ # hadoop fs-ls / or hdfs dfs-ls /

View the contents under user

Hadoop fs-ls / user

Another way to view this file here is to view it on the web page: go to http://127.0.0.1:50070, that is, the management system of hdfs.

Then click the corresponding file directory on it.

Note here: when viewing the hadoop file system, you need to use the command hadoop fs-ls, which is preceded by hadoop fs, because if

If you use ls directly, it means something in the simulated linux file system.

2: upload files to hdfs

Here is a demonstration of uploading a credream.txt file from a windows system to a hdfs distributed system in hadoop

First of all, you can access the local windows c disk via cd c:

And then cd.. Go to the root of the system

Ls can see the drive letter of the current file system

Cd c

You can list the files on disk c

Xiaofeng@xiaofeng-PC/cygdrive/c

The following command transfers the credream.txt file from the local c disk to the user directory in the hadoop hdfs file system

Here is the command line:

Note: here hdfs dfs-put credream.txt / user/credream.txt

Is to upload the files of disk c to the user directory of hadoop's hdfs file system.

Hdfs dfs-put credream.txt, which refers to the local c disk, because you can see that cygdrive/c refers to the c disk, so no other local file directory is specified

And you can rename the file:

Hdfs dfs-put credream.txt / user/hadoop.txt is to upload the credream.txt file of local c disk to hadoop's hdfs system and rename it.

Echo hello credream > test.txt means to create a test.txt file on local C disk, but if it is a win7 system, there will be

Authority problem

You can see that it is not allowed on disk c, but the test.txt file can be created on disk d

This is to upload the test.txt file of the local D disk to the / user/ directory of the hdfs file system without changing the file name.

Here is to upload the entire directory to the / user/ directory of the hdfs file system

Mkdir hadooptest creates a hadooptest folder on the local d disk

Cp test.txt hadooptest is to copy the test.txt file to the hadooptest folder

Hadoop fs-put hadooptest / user/

Upload the hadooptest folder to the / user/ directory of the hdfs file system

Use hadoop fs-copyFromLocal test.txt / user/testa.txt

Make a copy of the local test.txt file and upload it to the hdfs file system and rename it to testa.txt file

Then view the files in the user directory in the hdfs file system through hadoop fs-ls / user

3: copy the files in hdfs to the local system

With the get command, you can manipulate both files and folders:

Manipulate a file to save the file / wj/a.txt on hdfs locally and name it b.txt

Download the whole file to the current directory below and pay attention to the last point

Example 2 of the get command:

Through hadoop fs-get / user/testa.txt testb.txt, the testa.txt file under user/ in hdfs is downloaded locally and renamed.

Hadoop fs-get / user/testa.txt. Download the testa.txt file under user/ in hdfs directly to your local location. Represents going directly to the local area without renaming

Hadoop fs-get / user/hadooptest. Download the hadooptest folder under user/ in hdfs directly to your local location. Represents going directly to the local area without renaming

4: delete a document under hdsf

Here is to delete the local file, the same as the command under linux

Use recursion when deleting a folder

Here is an example of deleting files and folders from the hdfs file system:

Hadoop fs-rm / user/testa.txt Delete the testa.txt file under the user folder

Hadoop fs-rm / user/*.txt deletes all .txt files in the user folder

Hadoop fs-rm / user/hadooptest it is not possible to delete a folder using recursion

The command hadoop fs-rmr user/hadooptest can delete folders and files in folders recursively.

Thank you for reading! This is the end of this article on "sample Analysis of hadoop Architecture". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.