Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Hadoop literacy

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1 hadoop big data platform architecture and practice

Master the principle of big data's storage and processing technology

Master hadoop development

2 knowledge of course forecast

Linux common commands

The basis of java programming

3 hadoop previous life and present life

Big data came to the PB level of data.

Mapreduce 、 GFS

Parallel, node synchronization, developed technical papers, no open source code.

Hadoop * baby elephant.

4 Features and benefits of hadoop

Open source distributed access + distributed computing platform.

HDFS: distributed file system

Task scheduling.

A high-expansion, low-cost, mature ecosystem.

Hadoop talent demand, development talent, operation and maintenance

5 ecosystem version

HDFS MAPreduce hive

Sql-> hive-> hadoop

Hbase non-relational database

Zookeeper Animal Manager

Version selection, 2.6 version 1.2 stable version

6 installation of hadoop

1 linux environment

2 install jdk

3 configure 4 profiles for hadoop

You can rent a CVM. Ali Yun is a good choice.

7 the core file of hadoop

HDFS architecture

Read metadata dateNode is the work node

Data management strategy: three copies of storage, 64m database, heartbeat detection, regular status report, secondary namenode, regular synchronization, that is, all kinds of backup, automatic synchronization.

Read and write process: any program is possible. Pipeline replication. Update source data.

Features: data redundancy, hardware fault tolerance. Streaming data access, can not be modified, directly deleted and added. Store large files. Batch read and write, large throughput, once write, multiple read and write, poor interactive performance

Command line operation: similar to shell programming.

8 map reduce divides big tasks into small tasks and merges the results together.

100GB's website access log file, find the IP with the most visits.

Exchange is very important.

Run the process:

Basic concept: job task one job to more task

Jobtracker map task and reduce task.

Jobtracker 1 Job scheduling 2 assigns tasks and monitors the progress of task execution

Monitor the status of tasktracker

Carry out the task and report the mission status.

Input data fragmentation, map tasks, intermediate results, reduce tasks, output results.

In HDFS table.

Jobtracker tasks, execution process.

Fault-tolerant mechanism, repeat execution, retry 4 times first, give up by default, speculate execution. After the map side completes, the tasktracker executes.

9 Application case:

Wordcount Classic Records:

Calculate the frequency of each word in the file,

Map process score

Reduce process combination

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report