In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The source of enterprise big data
1. Inside the enterprise
Log file
Database
* * user behavior data
two。 Outside the enterprise
Reptile
Third party purchase (Ant data Bank)-- Guiyang
Big data = massive data + complex type data
Hadoop is written by three Google papers.
"mapreduce"-- "mapreduce distributed offline parallel Computing Box"
Frame
GFS-"HDFS distributed file storage
System
Bigtable-"HBASE distributed database"
Four core modules of Hadoop
Common
Provide infrastructure for other modules
Hdfs
Distributed file storage system
Mapreduce
Distributed off-line parallel computing framework
There are two stages of map reduce
Yarn
Task management resource scheduling
A new framework, a new framework that comes into being after 2.0
Dispatch resource management is separated from mappreduce)
HDFS
Namenode (one)
Master-slave structure
What is stored is metadata, including file type attributes, names, etc.
Datanode (multiple sets) can store different data blocks (block) each data block default size (128m)
) this parameter can be customized
Apache hadoop . Apache.org google hadoop function * * data storage * * data computing model file system Fat32 ntfs ext4 software version 0.2 1.0 2.0 2.5 3.0 HDFS forms a namenode server to store metadata and multiple datanode servers to store real data () master-slave structure a namenode Upload multiple datanode to one file. Split it into multiple data blocks. Each data block defaults.
Size
128M
Hdfs function:
Namenode master node, which stores file metadata and file name text
Part directory structure, file attribute
Datanode file data stored locally, block check
Yarn
Resourcemanage
Task scheduling resource management
Nodemanager
Node management service
Task management resource scheduling
1. The client submits a task to resourcemanager
2 resourcemanager assigns an applicationManager to namenode
And find a container to generate a MR appmatr
3 the manager of the application applies for resources from resourcemanager
4 after the resource application is completed, find the nodemanager to change the MRappmstr in the container
Start up.
5map and reduce tasks start
When a 6mapreduce program starts, it must submit information to MRappmstr.
7 at the end of the program, the manager of the application must submit information to resourcemanager
Mapreduce
The calculation is divided into two stages.
Map file upload is divided into multiple tasks
Shuffer
Reduce outputs the results of map
System installation
1. To create a virtual machine, the selected disk cannot be less than 50g.
two。 Installation type desktop
3. Partition default
Prepare the environment for distribution
1. Set the network card ip to static
Vi / etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
UUID=b17e5d93-cd31-4166-827f-18cf14586777
ONBOOT=yes # set the network card to boot
NM_CONTROLLED=yes
Change BOOTPROTO=static DHCP to static
HWADDR=00:0C:29:88:2A:28
IPADDR=192.168.30.100
PREFIX=24
GATEWAY=192.168.30.2
DNS1=192.168.30.2
DNS2=202.96.209.5
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME= "System eth0"
Set up DNS
Vi / etc/sysconfig/network-scripts/ifcfg-eth0
DNS1=192.168.30.2DNS2=202.96.209.5 restarts the network service in service network restart 3. Set the hostname vi / etc/sysconfig/network hadoop-senior.beifeng.com 4. Turn off the firewall
Disable the security subsystem in linux, in the selinux file
Path / etc/sysconfig/selinux
1. First of all, selinux=disabled
[root@java15 ~] # vi / etc/sysconfig/selinux
Selinux=disabled
two。 Turn off the firewall
[root@java15 ~] # service iptables stop
3. Set the firewall to boot or not.
[root@java15 ~] # chkconfig iptables off
4. Check to see if the firewall is completely turned off
[root@java15 ~] # service iptables status
Iptables: Firewall is not running.
5. Add hostname mapping
[root@hadoop-senior ~] # vi / etc/hosts
192.168.30.100 hadoop-senior.xiaoping.com
Windows C:\ Windows\ System32\ drivers\ etc
6. Set up ordinary users for later use, and all later operations use ordinary users.
Useradd xiaoping
Echo 123456 | passwd-- stdin xiaoping
Install jdk
Su-root toggle root user creation
Mkdir / opt/modules/ directory for software installation # mkdir / opt/softwares/ for software download directory # chown beifeng:xiaoping / opt/modules/ # chown beifeng:xiaoping / opt/softwares/ hadoop-2.5.0.tar.gz # su xiaoping switch xiaoping user decompress jdk $tar-zxvf jdk-7u67-linux-x64.tar.gz-C / opt/modules/ configure jdk environment variable / opt/modules/jdk1.7.0_67 # vi / etc/profile Uninstall the jdk that comes with the system with root user # # JAVA_HOMEJAVA_HOME=/opt/modules/jdk1.7.0_67export PATH=$PATH:$JAVA_HOME/bin, # rpm-qa | grep-I java # rpm-e-- nodeps tzdata-java-2012j-1.el6.noarch # rpm-e-- nodeps java-1.6.0-openjdk-1.6.0.0-
1.50.1.11.5.el6_3.x86_64
Rpm-e-nodeps java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64 # vi ~ / .bashrc normal user environment configuration can choose not to match # # JAVA_HOMEJAVA_HOME=/opt/modules/jdk1.7.0_67export PATH=$PATH:$JAVA_HOME/bin $source / etc/profile update file # source ~ / .bashrc install hadoop # tar -zxvf hadoop-2.5.0-cdh6.3.6.tar\ (1\). Gz-C / opt/modules/
The directory where the configuration files under the hadoop installation package are located
/ opt/modules/hadoop-2.5.0-cdh6.3.6/etc/hadoop
Configure hadoop environment variables
Hadoop-env.sh mapred-env.sh yarn-env.sh all three modify javahome
Export JAVA_HOME=/opt/modules/jdk1.7.0_67
Modify core-site.xml
Fs.defaultFS
Hdfs://hadoop-senior.beifeng.com:8020
Hadoop.tmp.dir
/ opt/modules/hadoop-2.5.0-cdh6.3.6/data
Modify the slaves file-- "specify which server is datanode hadoop-senior.beifeng.com, for example, three hosts, one host name per line, or ip hadoop-senior.beifeng1.com hadoop-senior.beifeng2.com."
Hdfs-site.xml
The number of copies cannot be greater than the number of hosts (hdfs when the node does not switch files, set
Group meeting)
Dfs.replication
one
Format the file system
$bin/hdfs namenode-format
Start the namenode and datanode services
$sbin/hadoop-daemon.sh start namenode starts namenode
$sbin/hadoop-daemon.sh start datanode starts datanode
View service processes
[beifeng@hadoop-senior hadoop-2.5.0] $jps
10031 Jps
9954 DataNode
9845 NameNode
$bin/hdfs dfs-mkdir / input create folder on hdfs
$bin/hdfs dfs-put / opt/modules/yy.txt / input upload files
$bin/hdfs dfs-cat / input/yy.txt view the file
Configure yarn
Resource management, task scheduling
Yarn-env.sh mapred-env.sh
Environment variables:
Export JAVA_HOME=/opt/modules/jdk1.7.0_67
* * resourcemanager * * nodemanager
Yarn-site.xml
Yarn.nodemanager.aux-services
Mapreduce_shuffle
Yarn.resourcemanager.hostname
Hadoop-senior.beifeng.com
Mapred-site.xml
Mapreduce.framework.name
Yarn
Start yarn
$sbin/yarn-daemon.sh start resourcemanager
$sbin/yarn-daemon.sh start nodemanager
browser
Http://hadoop-senior.beifeng.com:8088/cluster
Use the official jar package to count the words of the file.
$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-exam
Ples-2.5.0.jar wordcount / input/ / output
$bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-
Cdh6.3.6.jar wordcount / input/ / output
View statistical results
$bin/hdfs dfs-cat / output/par*
Configure the log server
Yarn-site.xml
Yarn.log-aggregation-enable
True
Yarn.log-aggregation.retain-seconds
86400
Mapred-site.xml
Mapreduce.jobhistory.address
Hadoop-senior.beifeng.com:10020
Mapreduce.jobhistory.webapp.address
Hadoop-senior.beifeng.com:19888
Restart yarn
$sbin/yarn-daemon.sh stop resourcemanager
$sbin/yarn-daemon.sh stop nodemanager
$sbin/yarn-daemon.sh start resourcemanager
$sbin/yarn-daemon.sh start nodemanager
Start the historyserver service
$sbin/mr-jobhistory-daemon.sh start historyserver
$bin/yarn jar share/hadoop/mapreduce/hadoop-ma
Preduce-examples-2.5.0.jar wordcount / input/ / output1
Source file delimiter\ t
Input indicates the input path
Output indicates that the output path must not exist
All the paths in HDFS
Resolve warnings:
Replace the native package
Extract the native package to / opt/modules/hadoop-2.5.0/lib to replace the original native
Question:
Problems with user operation
Do not start the service with a root user
When switching ordinary users, remember to go to the virtual machine to see if you have switched ordinary users.
The two folders under the / opt directory must belong to ordinary users.
Notepad uses ordinary users to log in when editing the configuration file.
Virtual machine environment issues firewall gateway ip hostname local hosts file add image
Shoot
View log information tail-N50 View the last 50 lines of log information
/ opt/modules/hadoop-2.5.0/logs
$tail-n50 hadoop-beifeng-namenode-hadoop-senior.beifeng.com.log
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.