Hadoop-my understanding of hadoop 04/04 Update SLTechnology News&Howtos

Hadoop-my understanding of hadoop

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Big data: huge amount of data

Structured data: row data, data that can be stored in a two-dimensional table

Unstructured data: it is not possible to represent data using two-dimensional logic. Such as word,ppt, picture

Semi-structured data: data that describes itself and stores the structure with the data itself between structured and unstructured data: xml, json, html

Goole's paper: MapReduce:Simplified Date Processing On Large Clusters

Dynam

Map: mapping big data to small data processed by segmented nodes

Reduce: folding

I1pr i2 = > o1pr i3 = = > o2pr i4pen = > o4

MapReduce: map big data to a key-value pair

Data collection, monitoring, analysis and processing

Hadoop: jobtracker 、 tasktracker,namenode,datanode

Features of hadoop:

(1) expand outward

(2) data redundancy

(3) move the program to the data

(4) Sequential processing of data to avoid random access

(5) hide system-level details from programmers

(6) smooth expansion

How to cut big data into multiple small data that can be processed, and how to merge the processed results

How to choose to move a task to a host with multiple different small data to process the task

How to get segmented small data

How to ensure how to synchronize a Map process

How to transfer the result of processing to Reduce by Map

How to ensure the integrity of a task after a software or hardware failure

Mapreduce:

1. Programming framework: API

two。 Running platform

3. Concrete realization

Hadoop:HDFS-- > MapReduce (API,Java)

HDFS:

HDFS distributed cluster data storage

1) HDFS

2) Save data storage to HDFS sub-file system

MapReduce cluster data processing large files

HBase, which runs on HDFS and is coordinated by zookeeper

Hadoop DataBase

Through zookeeper, hadoop can store a single small file and realize random storage.

NoSQL

Colum: column storage

Storage of loose data, column storage based on key-value pairs

Merge a single small file into a large file

Bigtable: big table

ETL

Extraction, transformation and loading of data

Log collection:

Flume

Scrible

Chukwa

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.