Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction to big data: introduction to Hadoop

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

1.1. What is hadoop?

1.hadoop is an open source software platform under apache, which can be accessed through http://apache.org/--->project-

-> hadoop open

2.Hadoop is open source software, reliable, distributed, and scalable.

The function provided by 3.Hadoop: using server cluster, according to the user's custom business logic, distribute the massive data.

4. In a broad sense, Hadoop usually refers to a broader concept-the hadoop biosphere.

1.2 data Analysis Story

1.3 how big is the data?

Data volume base:

1G = 1024m

1T = 1024G

1P = 1024T

1E = 1024P

1Z = 1024E

1Y = 1024Z

1N = 1024Y

Hadoop cheap machine

Go to IOE

IBM//ibm minicomputer.

Oracle//oracle database server RAC

EMC//EMC shared storage devices.

1.4 distributed

The whole application can be formed by the cooperation of processes distributed on different hosts.

1. Distributed software system (Distributed Software Systems)

2 the software system will be divided into multiple subsystems or modules, each running on different machines

Subsystems or modules cooperate with each other through network communication to achieve the final overall function.

two。 Simulation development of distributed application system

Requirements: the master node can send the computing task to the slave node, and start the tasks on each slave node.

Program list:

AppMaster

AppSlave/APPSlaveThread

Task

The logical process of running the program:

Location and relationship of 1.5 HADOOP in big data and Cloud Computing

1. Cloud computing is the product of the integrated development of traditional computer technology and Internet technology, such as distributed computing, parallel computing, grid computing, multi-core computing, network storage, virtualization, load balancing and so on. With the help of business models such as IaaS (Infrastructure as a Service), PaaS (platform as a Service) and SaaS (Software as a Service), powerful computing power is provided to end users.

two。 At this stage, the two underlying supporting technologies of cloud computing are "virtualization" and "big data technology".

1.6 HADOOP production background

1. HADOOP originated from Nutch. The design goal of Nutch is to build a large-scale web-wide search engine, including web page crawling, indexing, query and other functions, but with the increase of the number of crawled web pages, it has encountered a serious scalability problem-how to solve the storage and indexing problems of billions of web pages.

2. Two papers published by Google in 2003 and 2004 provide a feasible solution to this problem.

-- distributed file system (GFS), which can be used to deal with the storage of massive web pages

-- distributed computing framework MAPREDUCE, which can be used to deal with the index calculation of massive web pages.

3. Nutch developers completed the corresponding open source implementation of HDFS and MAPREDUCE, and spun off from Nutch into an independent project HADOOP. In January 2008, HADOOP became a top-level Apache project and ushered in a period of rapid development.

1.7 HADOOP status

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report