Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Hadoop1.0

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the example analysis of Hadoop1.0, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

Background introduction of Hadoop

Apache Hadoop is an open source software framework that can be installed in a business machine cluster so that machines can communicate with each other and work together to store and process large amounts of data in a highly distributed manner.

Initially, Hadoop consists of two main components: Hadoop Distributed File System (HDFS) and a distributed computing engine that supports implementing and running programs in the form of MapReduce jobs.

MapReduce is a simple programming model promoted by Google and is useful for dealing with big data sets in a highly parallel and extensible manner. MapReduce is inspired by functional programming, where users can express their calculations as map and reduce functions and treat data as key-value pairs. Hadoop provides a high-level API to implement custom map and reduce functions in various languages.

Hadoop also provides the software infrastructure to run MapReduce jobs as a series of map and reduce tasks. The Map task calls the map function on a subset of the input data. After completing these calls, the reduce task starts calling the reduce task on the intermediate data generated by the map function to generate the final output. Map and reduce tasks run separately from each other, which supports parallel and fault-tolerant computing.

Most importantly, the Hadoop infrastructure handles all the complex aspects of distributed processing: parallelization, scheduling, resource management, machine-to-machine communication, software and hardware fault handling, and so on. Thanks to this clean abstraction, implementing distributed applications that process TB data on hundreds (or even thousands) of machines has never been easier, even for developers who have no previous experience with distributed systems.

The Golden Age of Hadoop

Although there are many open source implementations of the MapReduce model, Hadoop MapReduce quickly became very popular. Hadoop is also one of the most exciting open source projects in the world, providing several excellent features: advanced API, near-linear scalability, open source licensing, the ability to run on commercial hardware, and fault tolerance. It has been successfully deployed by hundreds (perhaps thousands) of companies and is the latest standard for large-scale distributed storage and processing.

Some early Hadoop adopters, such as Yahoo! And Facebook, built a large cluster of 4000 nodes to meet growing and changing data processing needs. However, after building their own clusters, they began to notice some limitations of the Hadoop MapReduce framework.

Limitations of Classical MapReduce

The most serious limitations of classic MapReduce are mainly related to scalability, resource utilization, and support for workloads that are different from MapReduce. In the MapReduce framework, job execution is controlled by two types of processes:

1. A main process called JobTracker coordinates all jobs running on the cluster and assigns map and reduce tasks to run on the TaskTracker.

two。 Many subordinate processes, called TaskTracker, run assigned tasks and report progress to JobTracker on a regular basis.

Classic version of Apache Hadoop (MRv1)

Large Hadoop clusters show scalability bottlenecks caused by a single JobTracker. According to Yahookeeper, such a design is actually limited when there are 5000 nodes and 40000 tasks running at the same time in the cluster. Because of this limitation, smaller, less functional clusters must be created and maintained.

In addition, both smaller and larger Hadoop clusters have never used their computing resources most efficiently. In Hadoop MapReduce, the computing resources on each slave node are decomposed by the cluster administrator into a fixed number of map and reduce slot, which are irreplaceable. After setting the number of map slot and reduce slot, the node cannot run more map tasks than map slot at any time, even if there are no reduce tasks running. This affects cluster utilization because we cannot use any map slot when all reduce slot is used (and we need more), even if they are available and vice versa.

Last but not least, Hadoop is designed to run only MapReduce jobs. With the advent of alternative programming models, such as the graphics processing provided by Apache Giraph, there is a growing need to support other programming models that can run and share resources on the same cluster in an efficient and fair manner, in addition to MapReduce.

In 2010, Yahoo! Engineers began to work on an entirely new Hadoop architecture that addresses all of the above limitations and adds a variety of additional features.

Apache Hadoop 2.0 includes YARN, which separates resource management and processing components. YARN-based architectures are not constrained by MapReduce.

Thank you for reading this article carefully. I hope the article "sample Analysis of Hadoop1.0" shared by the editor will be helpful to you. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report