Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The hadoop of big data

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1. Big data

Big data: the technology to solve the problem of massive data. Big data is composed of giant data sets, which can be merged and analyzed to get a lot of additional information and data relationships.

Big data refers to the data collection that can not capture, manage and process its content with conventional software tools within a certain period of time.

Big data technology refers to the ability to quickly obtain valuable information from various types of data, which is suitable for big data, including big data technology, including massively parallel processing database, data mining power grid, distributed file system, distributed database, cloud computing platform, Internet and scalable storage system.

Big data's characteristics:

Bulk: sizes that can range from hundreds of TB to hundreds of PB or even EB

Diversity: big data includes data in various forms

Timeliness: need to be dealt with in a timely manner within a certain time limit

Accuracy: the result of processing should ensure a certain accuracy.

Great value: big data contains a lot of deep value. Big data's analysis, mining and utilization will bring great commercial value.

II. Hadoop

Hadoop is a software platform for analyzing and processing large amounts of data. It is an open source software that can be developed using Java to provide a distributed infrastructure.

Hadoop features: high reliability, high scalability, high efficiency, high fault tolerance, low cost

Common components of hadoop:

-HDFS (Hadoop distributed file system)

-Mapreduce (distributed computing framework)

-Zookeeper (distributed collaboration Service)

-Hbase (distributed inventory database)

-Hive (Hadoop-based data warehouse)

-Sqoop (data synchronization tool)

-Pig (Hadoop-based data flow system)

-Mahout (data Mining algorithm Library)

-Flume (log collection tool)

Hadoop core components:

-HDFS: distributed file system

-Yarn: cluster resource management system

-MapReduce: distributed computing framework

The role and concept of HDFS

-NameNode:Master node, manages HDFS namespace and block mapping information, configures replica policy, and handles all client requests.

-Secondary NameNode: regularly combines fsimage and fsedits, and pushes them to NameNode. In case of emergency, it can help restore NameNode.

-DataNode: data storage node, which stores the actual data and reports the storage information to NameNode.

-Client: split files, access HDFS, interact with NameNode, get file location information, interact with DataNode, read and write data.

There are three deployment models for Hadoop:

-stand-alone

-pseudo-distributed (all roles are installed on one machine)

-fully distributed (different roles install different machines)

Third, stand-alone mode:

1. Get the software

Http://hadoop.apache.org

Download: hadoop-2.7.6.tar.gz

Decompress: tar-xf hadoop-2.7.6.tar.gz

Installation: mv hadoop-2.7.6

two。 Install java environment, jps tools

Yum-y install java-1.8.0-openjdk

Yum-y install java-1.8.0-openjdk-devel

3. Set environment variabl

Vim / usr/local/hadoop/etc/hadoop/hadoop-env.sh

Export JAVA_HOME= "/ usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-11.b12.el7.x86_64/jre"

Export HADOOP_CONF_DIR= "/ usr/local/hadoop/etc/hadoop"

Analyze the number of times words appear

. / bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount oo xx

Fourth, fully distributed:

-distributed file system: the physical storage resources managed by the file system are not necessarily directly connected to the local node, but are connected through the computer network node. the design of the distributed file system is based on the client / server mode. Distributed file system can effectively solve the problem of data storage and management. a file system fixed in a location is extended to any number of locations / file systems, and many nodes form a file system network. each node can be distributed in different locations to communicate and transfer data between nodes through the network.

Conditions for cluster formation:

ALL: can ping each other (configuration / etc/hosts) ALL: install java-1.8.0-openjdk-develNN1: can log in to all cluster hosts without secret ssh, including yourself (cannot prompt for yes)

Ssh secret-free login: deploy sshkey

Do not enter yes: modify / etc/ssh/ssh_config

60 lines add StrictHostKeyChecking no

Configuration file format

Profile reference URL http://hadoop.apache.org

Cd / usr/local/hadoop/etc/hadoop

1. Configure the environment variable file hadoop-env.sh (see III, 3)

two。 Core profile core-site.xml

Vim core-site.xml

Fs.defaultFS

Hdfs:///nn01:9000

Hadoop.tmp.dir

/ var/hadoop

Create / var/hadoop on all hosts

two。 Fully distributed configuration hdfs-site.xml

Vim hdfs-site.xml

Dfs.datanode.http-address

Nn01:50070

Dfs.namenode.secondary.http-address

Nn01:50090

Dfs.replication

two

3. Configure slaves

Vim slaves

Node01

Node02

Node03

4. Synchronize configuration to all hosts

5. Format namenode (operation on nn01)

. / bin/hdfs namenode-format

6. Start the cluster (operation on nn01)

. / sbin/start-dfs.sh

You can use. / sbin/stop-dfs.sh to stop the cluster

7. Verify role jps (all host operations)

8. Verify whether the cluster is set up successfully (operation on nn01)

. / bin/hdfs dfsadmin-report

Service startup log path / usr/local/hadoop/logs

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report