Related Concepts and system composition of Hadoop 07/13 Update SLTechnology News&Howtos

Related Concepts and system composition of Hadoop

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "the related concepts and system composition of Hadoop". In the daily operation, I believe that many people have doubts about the related concepts and system composition of Hadoop. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "related concepts and system composition of Hadoop". Next, please follow the editor to study!

One: related concepts of Hadoop

1. Hadoop is a MapReduce framework based on java language.

2. Improvement of Hadoop: a, Hadoop Streaming-- any command line script can call the MapReduce framework through Streaming. B, Hadoop Hive:Apache Hive can put massive data sets into the data warehouse, users can write Hive query statements similar to SQL statements to find data, Hive engine transparently converts Hive query statements into underlying MapReduce tasks to execute, and advanced users can use Java language to write user-defined functions (UDF). Hive also supports standard ODBC and JDBC database drivers. Hive can also be used to develop business intelligence analysis programs to process and analyze data stored in Hadoop. C, Pig: procedural language, a scripting language for exploring large-scale data sets. The language used to describe data flow is called Pig latin. D, HBse: a column-oriented distributed data developed on HDFS, which can access very large datasets randomly in real time.

3. At the beginning, the Hadoop system can only execute tasks on the cluster on a first-in-first-out mode after the user has submitted the task. In order to solve this problem, there are more complex task schedulers in Hadoop: fair scheduler and computing power scheduler.

4. Hadoop 2.x solves the scalability problem compared to 1.x.

Introduction to MapReduce programming model

5. The MapReduce model has two independent steps, both of which can be configured and need to be customized in the program. Map: the initial data reading and conversion step in which each individual input data record is processed in parallel. Reduce: a step of data consolidation or addition. In this step, all associated data records are processed in a compute node.

6. The core idea of MapReduce in Hadoop system: a, the input data is logically divided into multiple data blocks, and each logical data block is processed separately by the Map task. B. the results obtained after data block processing will be divided into different data sets, and the data sets will be sorted. C. Each sorted dataset is transferred to the Reduce task for processing.

7. A Map task can be run on any node in the cluster, and multiple Map tasks can be run in parallel on the cluster. The main function of the Map task is to convert the input data records into key-value pairs. The output data of all Map tasks is partitioned and the data for each partition is sorted. Each partition corresponds to a Reduce task.

3: the composition of Hadoop system

8. Hadoop 1.x daemon a, name node (NameNode): maintains metadata for all files stored on the HDFS. B, data node (DateNode): stores the real data blocks on the local hard disk, and these data blocks make up each file saved on the HDFS. C, Job Tracker (JobTracker): responsible for the entire execution process of a task. Schedule each subtask to the respective computing node to run, monitor the operation of the task and the health of the computing node, and reschedule the failed subtasks. D, task tracker (TaskTracker): runs on each data node, which is used to start and manage each Map/Reduce task and communicate with the job tracker.

9. Hadoop system task division (1.x): master node (Master): NameNode/Secondary NameNode/JobTracker slave node (Save): DateNode/TaskTracker

10. The HDFS system provides a unified file system namespace, and users access data on cluster nodes just as if they were using a file system.

11. The essence of the Hadoop file is block storage, with three backups stored on the data node.

12. Hadoop implements rack awareness with a separately configured network topology file that configures the mapping of rack to compute node domain name (DNS) names.

13. The key files on the name node: a, fsimage: the persistent state information of the HDFS system metadata of the most recent checkpoint is saved. B, edits: saves the state change information of the current HDFS system metadata since the most recent checkpoint. C, fstime: the timestamp of the last checkpoint is saved.

14. Task tracker (TaskTracker): receives requests for operating tasks such as MapReduce and shuffle. After receiving a request from the job tracker, the task tracker starts a task, and the task tracker initializes a new JVM for the task.

15. Job Tracker (JobTracker): start and monitor MapReduce jobs.

IV: Hadoop 2.x (YARN):

16. Composition: global resource manager, node manager, application manager for each application, scheduler, container

17. Part of the CPU kernel and part of the memory form a container. An application runs in a container, and an instance of the application manager requests resources from the global resource manager. The scheduler allocates resources (containers) through the node manager of each node. The node manager reports the usage of each container to the global explorer.

18. The relationship between cluster nodes and containers is that a node can run multiple containers, but a container can only run within one node.

19. Hadoop 2.x solved the problem that you could only use the MapReduce framework, and now you can run more frameworks on 2.x systems.

At this point, the study of "related concepts and system composition of Hadoop" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.