Hadoop single node building 07/01 Update SLTechnology News&Howtos

Hadoop single node building

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The source of enterprise big data

1. Inside the enterprise

Log file

Database

* * user behavior data

two。 Outside the enterprise

Reptile

Third party purchase (Ant data Bank)-- Guiyang

Big data = massive data + complex type data

Hadoop is written by three Google papers.

"mapreduce"-- "mapreduce distributed offline parallel Computing Box"

Frame

GFS-"HDFS distributed file storage

System

Bigtable-"HBASE distributed database"

Four core modules of Hadoop

Common

Provide infrastructure for other modules

Hdfs

Distributed file storage system

Mapreduce

Distributed off-line parallel computing framework

There are two stages of map reduce

Yarn

Task management resource scheduling

A new framework, a new framework that comes into being after 2.0

Dispatch resource management is separated from mappreduce)

HDFS

Namenode (one)

Master-slave structure

What is stored is metadata, including file type attributes, names, etc.

Datanode (multiple sets) can store different data blocks (block) each data block default size (128m)

) this parameter can be customized

Apache hadoop . Apache.org google hadoop function * * data storage * * data computing model file system Fat32 ntfs ext4 software version 0.2 1.0 2.0 2.5 3.0 HDFS forms a namenode server to store metadata and multiple datanode servers to store real data () master-slave structure a namenode Upload multiple datanode to one file. Split it into multiple data blocks. Each data block defaults.

Size

128M

Hdfs function:

Namenode master node, which stores file metadata and file name text

Part directory structure, file attribute

Datanode file data stored locally, block check

Yarn

Resourcemanage

Task scheduling resource management

Nodemanager

Node management service

Task management resource scheduling

1. The client submits a task to resourcemanager

2 resourcemanager assigns an applicationManager to namenode

And find a container to generate a MR appmatr

3 the manager of the application applies for resources from resourcemanager

4 after the resource application is completed, find the nodemanager to change the MRappmstr in the container

Start up.

5map and reduce tasks start

When a 6mapreduce program starts, it must submit information to MRappmstr.

7 at the end of the program, the manager of the application must submit information to resourcemanager

Mapreduce

The calculation is divided into two stages.

Map file upload is divided into multiple tasks

Shuffer

Reduce outputs the results of map

System installation

1. To create a virtual machine, the selected disk cannot be less than 50g.

two。 Installation type desktop

3. Partition default

Prepare the environment for distribution

1. Set the network card ip to static

Vi / etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

TYPE=Ethernet

UUID=b17e5d93-cd31-4166-827f-18cf14586777

ONBOOT=yes # set the network card to boot

NM_CONTROLLED=yes

Change BOOTPROTO=static DHCP to static

HWADDR=00:0C:29:88:2A:28

IPADDR=192.168.30.100

PREFIX=24

GATEWAY=192.168.30.2

DNS1=192.168.30.2

DNS2=202.96.209.5

DEFROUTE=yes

IPV4_FAILURE_FATAL=yes

IPV6INIT=no

NAME= "System eth0"

Set up DNS

Vi / etc/sysconfig/network-scripts/ifcfg-eth0

DNS1=192.168.30.2DNS2=202.96.209.5 restarts the network service in service network restart 3. Set the hostname vi / etc/sysconfig/network hadoop-senior.beifeng.com 4. Turn off the firewall

Disable the security subsystem in linux, in the selinux file

Path / etc/sysconfig/selinux

1. First of all, selinux=disabled

[root@java15 ~] # vi / etc/sysconfig/selinux

Selinux=disabled

two。 Turn off the firewall

[root@java15 ~] # service iptables stop

3. Set the firewall to boot or not.

[root@java15 ~] # chkconfig iptables off

4. Check to see if the firewall is completely turned off

[root@java15 ~] # service iptables status

Iptables: Firewall is not running.

5. Add hostname mapping

[root@hadoop-senior ~] # vi / etc/hosts

192.168.30.100 hadoop-senior.xiaoping.com

Windows C:\ Windows\ System32\ drivers\ etc

6. Set up ordinary users for later use, and all later operations use ordinary users.

Useradd xiaoping

Echo 123456 | passwd-- stdin xiaoping

Install jdk

Su-root toggle root user creation

Mkdir / opt/modules/ directory for software installation # mkdir / opt/softwares/ for software download directory # chown beifeng:xiaoping / opt/modules/ # chown beifeng:xiaoping / opt/softwares/ hadoop-2.5.0.tar.gz # su xiaoping switch xiaoping user decompress jdk $tar-zxvf jdk-7u67-linux-x64.tar.gz-C / opt/modules/ configure jdk environment variable / opt/modules/jdk1.7.0_67 # vi / etc/profile Uninstall the jdk that comes with the system with root user # # JAVA_HOMEJAVA_HOME=/opt/modules/jdk1.7.0_67export PATH=$PATH:$JAVA_HOME/bin, # rpm-qa | grep-I java # rpm-e-- nodeps tzdata-java-2012j-1.el6.noarch # rpm-e-- nodeps java-1.6.0-openjdk-1.6.0.0-

1.50.1.11.5.el6_3.x86_64

Rpm-e-nodeps java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64 # vi ~ / .bashrc normal user environment configuration can choose not to match # # JAVA_HOMEJAVA_HOME=/opt/modules/jdk1.7.0_67export PATH=$PATH:$JAVA_HOME/bin $source / etc/profile update file # source ~ / .bashrc install hadoop # tar -zxvf hadoop-2.5.0-cdh6.3.6.tar\ (1\). Gz-C / opt/modules/

The directory where the configuration files under the hadoop installation package are located

/ opt/modules/hadoop-2.5.0-cdh6.3.6/etc/hadoop

Configure hadoop environment variables

Hadoop-env.sh mapred-env.sh yarn-env.sh all three modify javahome

Export JAVA_HOME=/opt/modules/jdk1.7.0_67

Modify core-site.xml

Fs.defaultFS

Hdfs://hadoop-senior.beifeng.com:8020

Hadoop.tmp.dir

/ opt/modules/hadoop-2.5.0-cdh6.3.6/data

Modify the slaves file-- "specify which server is datanode hadoop-senior.beifeng.com, for example, three hosts, one host name per line, or ip hadoop-senior.beifeng1.com hadoop-senior.beifeng2.com."

Hdfs-site.xml

The number of copies cannot be greater than the number of hosts (hdfs when the node does not switch files, set

Group meeting)

Dfs.replication

one

Format the file system

$bin/hdfs namenode-format

Start the namenode and datanode services

$sbin/hadoop-daemon.sh start namenode starts namenode

$sbin/hadoop-daemon.sh start datanode starts datanode

View service processes

[beifeng@hadoop-senior hadoop-2.5.0] $jps

10031 Jps

9954 DataNode

9845 NameNode

$bin/hdfs dfs-mkdir / input create folder on hdfs

$bin/hdfs dfs-put / opt/modules/yy.txt / input upload files

$bin/hdfs dfs-cat / input/yy.txt view the file

Configure yarn

Resource management, task scheduling

Yarn-env.sh mapred-env.sh

Environment variables:

Export JAVA_HOME=/opt/modules/jdk1.7.0_67

* * resourcemanager * * nodemanager

Yarn-site.xml

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Yarn.resourcemanager.hostname

Hadoop-senior.beifeng.com

Mapred-site.xml

Mapreduce.framework.name

Yarn

Start yarn

$sbin/yarn-daemon.sh start resourcemanager

$sbin/yarn-daemon.sh start nodemanager

browser

Http://hadoop-senior.beifeng.com:8088/cluster

Use the official jar package to count the words of the file.

$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-exam

Ples-2.5.0.jar wordcount / input/ / output

$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-

Cdh6.3.6.jar wordcount / input/ / output

View statistical results

$bin/hdfs dfs-cat / output/par*

Configure the log server

Yarn-site.xml

Yarn.log-aggregation-enable

True

Yarn.log-aggregation.retain-seconds

86400

Mapred-site.xml

Mapreduce.jobhistory.address

Hadoop-senior.beifeng.com:10020

Mapreduce.jobhistory.webapp.address

Hadoop-senior.beifeng.com:19888

Restart yarn

$sbin/yarn-daemon.sh stop resourcemanager

$sbin/yarn-daemon.sh stop nodemanager

$sbin/yarn-daemon.sh start resourcemanager

$sbin/yarn-daemon.sh start nodemanager

Start the historyserver service

$sbin/mr-jobhistory-daemon.sh start historyserver

$bin/yarn jar share/hadoop/mapreduce/hadoop-ma

Preduce-examples-2.5.0.jar wordcount / input/ / output1

Source file delimiter\ t

Input indicates the input path

Output indicates that the output path must not exist

All the paths in HDFS

Resolve warnings:

Replace the native package

Extract the native package to / opt/modules/hadoop-2.5.0/lib to replace the original native

Question:

Problems with user operation

Do not start the service with a root user

When switching ordinary users, remember to go to the virtual machine to see if you have switched ordinary users.

The two folders under the / opt directory must belong to ordinary users.

Notepad uses ordinary users to log in when editing the configuration file.

Virtual machine environment issues firewall gateway ip hostname local hosts file add image

Shoot

View log information tail-N50 View the last 50 lines of log information

/ opt/modules/hadoop-2.5.0/logs

$tail-n50 hadoop-beifeng-namenode-hadoop-senior.beifeng.com.log

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.