Hadoop piecemeal notes 07/06 Update SLTechnology News&Howtos

Hadoop piecemeal notes

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Find out if this software is available. Query through the pipeline: sudo apt-cache search ssh | grep ssh

If installed: sudo apt-get install xxxxx

To generate a file after installing ssh, execute: ssh-keygen-t rsa-P ""-f ~ / .ssh/id_rsa

Finally, execute the configuration in core-site.xml, hdfs-site.xml and mapred-site.xml in three files under the soft/haoop/etc/hadoop directory.

View the port: netstat-lnpt netstat or netstat-plut. View all ports: netstat-ano

Where to put the files? use hadoop fs-put xxxx/xxxx / xxxxx/xxx

Put the files on the cluster above: hadoop-- config / soft/hadoop/etc/hadoop_cluster fs-put / home/ubuntu/hell.txt / user/ubuntu/data/

Download the file on the cluster: hadoop-- config / soft/hadoop/etc/hadoop_cluster fs-get / user/ubuntu/data/hello.txt bb.txt

Check the health of the file: hdfs-- config / soft/hadoop/etc/hadoop/etc/hadoop_cluster fsck / user/ubuntu/data/hello.txt

Remote replication via scp: scp-r / xxx/x

Format file system: hdfs-- config / soft/hadoop/etc/hadoop_cluster namenode-format

Touch is to create a text file.

Log in from one virtual machine to another virtual machine ssh S2, and if it is ssh S2 ls ~, it shows what the columns look like. If executed

Ssh S2 ls ~ | xargs displays horizontal content

Check the cluster status: hadoop-- config / soft/hadoop/etc/hadoop_cluster fs-lsr /

Putting the files on top of the cluster is hadoop-- config / soft/hadoop/etc/hadoop_cluster fs-put xxxxx is followed by the path location where the add is placed.

View the process ssh S2 jps. Ps-Af is also used to view the process. Kill process is the port number of the process added after kill-9

Su root root user

HDFS concept: namenode & datanode

Namenode: image file + edit log, stored on local disk, and data node information, without block information. Block information is reconstructed by datanode when cluster starts

Datanode:work node, which stores and retrieves block and periodically sends block list to namenode

Switch to su root user to create script under usr/local/sbin, and write the execution script you want.

Modify the blocksize size. Default is 128m.

It is in [hdfs-site.xml]

Dfs.blocksize = 8m sets the block size to 8m

1. Test method: put file > 8m, check block size through webui

Hadoop: reliable, scalable, distributed computing framework, open source software

Four modules: 1. Common-hadoop-commom-xxx.jar

2 、 hdfs

3 、 mapreduce

4 、 yarn

Hadoop is fully distributed:

1. Hdfs-- > NameNode, Datanode, SecondaryNode (secondary name node)

2. Yarn---- > ResourceManager (Resource Manager), NodeManager (Node Manager)

Configure static ip to enter the network of etc and edit sudo nano interfaces:

# This file describes the network interfaces available on your system

# and how to activate them. For more information, see interfaces (5).

# The loopback network interface

Auto lo

Iface lo inet loopback

# The primary network interface

Auto eth0

Iface eth0 inet dhcp

Iface eth0 inet static (set to static ip)

Address 192.168.92.148 (client's ip)

Netmask:255.255.255.0 (client's)

Gateway 192.168.92.2 (NAT gateway address)

Dns-nameservers 192.168.92.2

Finally, restart the network card: sudo / etc/init.d/networking restart

Client shutdown command:

1 、 sudo poweroff

2. Sudo shutdown-h o

3 、 sudo halt

Configure text mode

Go to / boot/grub to check

Then enter cd / etc/default to execute gedit grub

Write GRUB_CMDLINE_LINUX_DEFAULT= "text" under # GRUB_CMDLINE_LINUX_DEFAULT= "quiet"

Write under # Uncomment to disable graphical terminal (grub-pc only):

GRUB_TERMINAL=console / / Open comments

Execute sudo update-grub after modification and restart sudo reboot at last

Start all data nodes:

Hadoop-daemons.sh start namenode / / execute startup name node on the name node server

Hadoop-daemons.sh start datanode / / executes on the specified datanode and starts all data nodes

Hadoop-daemon.sh start secondsrynamenode / / start the secondary name node

Hdfs getconf can see the node configuration information. For example, hdfs getconf-namenode can know that it is running on S1 client.

Four modules:

1 、 common

Hadoop-coommon-xxx.jar

Core-site.xml

Core-default.xml

2 、 hdfs

Hdfs-site.xml

Hdfs-defailt.xml

3 、 mapreduce

Mapre-site.xml

Mapred-default.xml

4 、 yarn

Yarn-site.xml

Yarn-default.xml

Commonly used ports:

1 、 namenode rpc / / 8020 webui / / 50070

2 、 datanode rpc / / 8032 webui / / 50075

3 、 2nn webui / / 50090

4 、 historyServer webui / / 19888

5 、 resourcmanager webui//8088

Dfs.hosts: decided to be able to connect to namenode

Dfs.hosts.exclude: decided not to connect to namenode

Dfs.hosts dfs.hosts.exclude

0 0 / / unable to connect

0 1 / / cannot be connected

1 0 / / can connect

1 1 / / the company will retire

Safety mode

1. When namenode starts, merge p_w_picpath and edit to form a new p_w_picpath and generate a new edit log.

2. In the whole intelligent safe mode, the client can only read

3. Check whether nameode is in safe mode.

Hdfs dfsadmin-safemode get / / View security mode

Hdfs dfsadmin-safemode enter / / enter safe mode

Hdfs dfsadmin-safemode leave / / leave safe mode

Hdfs dfsadmin-safemode wait / / wait security mode

4. Save the namespace manually: dfsadmin-saveNamespace

5. Save the image file manually: hdfs dfsadmin-fetchImage

6. Save metadata: (save it under hadoop_home: hadoop / logs/) hdfs dfsadmin-metasave xxx.dsds

7. Start-balancer.sh: start the equalizer to make the data storage of the cluster more evenly and improve the performance of the entire cluster (usually we start the equalizer only when we add more nodes)

8. Hadoop fs-count Statistical Catalog

Hadoop Snapshot snapshot: is to take a picture of the current situation and save it. Snapshots cannot be created by default in general directories. Hdfs dfsadmin-allowSnapshot / user/ubuntu/data must be executed. Allows you to create a snapshot, followed by the address path where you want to create the snapshot. Once the snapshot is allowed here, we can execute hadoop fs-createSnapshot / user/ubuntu/data snap-1 to create the snapshot. Snap-1 is the name of the snapshot you created. To view snapshots, go directly to hadoop fs-ls-R / user/ubuntu/data/.snapshot/. And you can't disable snapshots when you create snapshots.

1. Create a snapshot hadoop fs [- createSnapshot []]

2. Delete snapshot hadoop fs [- deleteSnapshot]

3. Rename snapshot hadoop fs [- renameSnappshot]

4. Allow directory snapshots hadoop dfsadmin [- allowSnapshot]

5. Disable directory snapshot hadoop dfsamdin [- disallowSnapshot]

Recycling bin

1. The default is 0 seconds, which means that the Recycle Bin is disabled

2. Set the residence time of the File Recycle Bin [corep-site.xml] fs.trash.interval=1 / / minutes

3. Files deleted by shell command will enter trash

4. Each user has his own recycle bin (directory), that is, / user/ubuntu/.Trash

5, programmatic deletion does not enter the Recycle Bin, delete immediately, can be called. The moveToTrash () method, which returns false, indicating that the Recycle Bin is disabled or already in the station

Recycle Bin: the default recycle bin for hadoop is closed. Time unit: minutes correspond to the .Trash directory of the current user folder. Files will be moved to this directory when rm

[core-site.xml]

Fs.trash.interval

thirty

Recycle Bin: recover files. Just move the files in the .Trash directory out: hadoop fs-mv / user/ubuntu/.Trash/xx/x/x data/

Empty the Recycle Bin: hadoop fs-expunge

Test delete Recycle Bin: hadoop fs-rm-R / user/ubuntu/.Trash

Quota: quota

1. Directory quota: hdfs dfsadmin-setQuota N / dir / / N > 0, directory quota. 1: indicates an empty directory and cannot place any elements

2. Space quota: hdfs dfsadmin-setSpaceQuota

Hadoop fs = = hdfs dfs / / operating commands for the file system

-clsSpaceQuota / / clear space quota

-clsQuota / / clear directory quota

Oiv can view the contents of the image file-I is the input file and o is the output file. XML is the processor

Specific operation: hdfs oiv-I fsp_w_picpath_000000000000000054-o / a.xml-p XML

View the edit_xxx editing log file: hdfs oev-I xxx_edit-o xxx.xml-p XML

Is the image file here in / hadoop/dfs/name/current?

Cat: fsp_w_picpath_0000000000000054

Bg% is to let the software run in the background

Refresh node: hdfs dfsadmin-refreshNodes

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.