In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1.1. What is hadoop?
1.hadoop is an open source software platform under apache, which can be accessed through http://apache.org/--->project-
-> hadoop open
2.Hadoop is open source software, reliable, distributed, and scalable.
The function provided by 3.Hadoop: using server cluster, according to the user's custom business logic, distribute the massive data.
4. In a broad sense, Hadoop usually refers to a broader concept-the hadoop biosphere.
1.2 data Analysis Story
1.3 how big is the data?
Data volume base:
1G = 1024m
1T = 1024G
1P = 1024T
1E = 1024P
1Z = 1024E
1Y = 1024Z
1N = 1024Y
Hadoop cheap machine
Go to IOE
IBM//ibm minicomputer.
Oracle//oracle database server RAC
EMC//EMC shared storage devices.
1.4 distributed
The whole application can be formed by the cooperation of processes distributed on different hosts.
1. Distributed software system (Distributed Software Systems)
2 the software system will be divided into multiple subsystems or modules, each running on different machines
Subsystems or modules cooperate with each other through network communication to achieve the final overall function.
two。 Simulation development of distributed application system
Requirements: the master node can send the computing task to the slave node, and start the tasks on each slave node.
Program list:
AppMaster
AppSlave/APPSlaveThread
Task
The logical process of running the program:
Location and relationship of 1.5 HADOOP in big data and Cloud Computing
1. Cloud computing is the product of the integrated development of traditional computer technology and Internet technology, such as distributed computing, parallel computing, grid computing, multi-core computing, network storage, virtualization, load balancing and so on. With the help of business models such as IaaS (Infrastructure as a Service), PaaS (platform as a Service) and SaaS (Software as a Service), powerful computing power is provided to end users.
two。 At this stage, the two underlying supporting technologies of cloud computing are "virtualization" and "big data technology".
1.6 HADOOP production background
1. HADOOP originated from Nutch. The design goal of Nutch is to build a large-scale web-wide search engine, including web page crawling, indexing, query and other functions, but with the increase of the number of crawled web pages, it has encountered a serious scalability problem-how to solve the storage and indexing problems of billions of web pages.
2. Two papers published by Google in 2003 and 2004 provide a feasible solution to this problem.
-- distributed file system (GFS), which can be used to deal with the storage of massive web pages
-- distributed computing framework MAPREDUCE, which can be used to deal with the index calculation of massive web pages.
3. Nutch developers completed the corresponding open source implementation of HDFS and MAPREDUCE, and spun off from Nutch into an independent project HADOOP. In January 2008, HADOOP became a top-level Apache project and ushered in a period of rapid development.
1.7 HADOOP status
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.