What is the basic concept of Hadoop 10/16 Update SLTechnology News&Howtos

What is the basic concept of Hadoop

2025-10-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the basic concept of Hadoop". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the basic concept of Hadoop"?

Why did big data rise? What did you do without big data? In fact, it is not appropriate to answer these two questions based on my experience, but since it is mentioned, it is necessary only from the point of view of personal experience. Entering the first section of big data industry, the company is a mobile SDK company, which mainly embeds SDK to bring revenue to developers through display advertisements, APP recommended downloads and so on. The first project at that time was to analyze these SDK one-day logs (apache logs) and sort each APP downloaded by province. The task is left to the SDK development team, and the specific plan is unclear, but the data need to run for more than 10 hours a day (memory blurred, which was mentioned in the article 4 years ago). Later, I used Hadoop to write a simple MR processing, which was done in about 10 minutes. This is the meaning of big data's existence and can reflect why it has sprung up from an angle.

With regard to Hadoop, we must first introduce the concept. Now there are many beginners who ask what Datanode is as soon as they come up. Can Datanode and NodeManager be on the same node? To this kind of question, my answer is "too white, read the book first". For the study of Hadoop, many people start from Cloudera Manager or Ambari, which I personally do not recommend. I still think that before the actual combat, first read a book, at least after reading a familiar book, the rest can be studied while learning. From the beginning, I rely heavily on these tools, and the concept is not clear, at least it will make people feel unprofessional and not the material to study hard.

Versions and branches:

Referring to the official wiki, https://wiki.apache.org/hadoop/Roadmap, there are currently three mainstream branches of hadoop, namely hadoop 1.x, hadoop 2.x and hadoop 3.x.

Hadoop1.X: it developed from hadoop 0.20. I still remember that when I first started playing hadoop, I used cdh4u3 in the company, and then the company upgraded to 1.0. but until I left my job for 14 years, it was still 1.x, not upgraded to 2.x. in addition, friends chatted in private, in fact, 1.x was mainly used at that time, 2.x was usually used only when a new cluster was built. The main reason is that because of the risk in the upgrade, the detailed process of upgrading Dong Xicheng teacher has an article detailed introduction: http://dongxicheng.org/mapreduce-nextgen/hadoop-upgrade-to-version-2/

In addition to the changes in api, Hadoop 2.X:Hadoop2.0 has the greatest perception of the outside world that it has added Yarn as the scheduling system of mapreduce, and the computing resources have also been changed from single slot to memory / CPU and other resources, which can be configured differently according to different nodemanager. In addition, hadoop2.X also addresses some of the crater points in 1.x, such as a single point of failure, and provides different solutions based on QJM and NFS2.

Hadoop3.X: there are not many materials at present, among which what is worth looking forward to is the implementation of Erasure Coding (erasure code). One of the powerful features of EC is that it can reduce the previous 3 copies to 1.5 copies and ensure that the data will not be lost. This powerful function is widely used in the field of cloud storage.

In the following series of introductions, the version we chose is in Hadoop2.X, 2.6.4

Concept introduction:

HDFS:Hadoop Distributed File System

NameNode:hdfs master node does not actually store metadata information that mainly manages hdfs, maintains the corresponding relationship between file blocks and nodes, and maintains users' modification information to files. DataNode:hdfs work node, actual data processing and storage node. SecondaryNameNode: secondary node to assist NameNode in merging fsimage and edits files, mainly to do the work of checkpoint, which can be recovered in time in case of namenode downtime. CheckPoint Node: same as Secondary NameNode, but added because the name of Secondary is easily confused. Use method Backup Node: similar to Secondary and CheckPoint, providing checkpoint function, but keeping the same information as namenode.

Yarn:

ResourceManager: master node, handling client requests; managing the scheduling and allocation of NodeManager and Application Master; management resources

NodeManager: work node, resource management of a single node; processing commands from RM and AM

Application Master: data segmentation; request resources for application, assign tasks; monitor the operation of tasks

WebAppProxyServer: as the name implies, the proxy for application on yarn web pages is mainly for security reasons.

JobHistoryServer: mainly responsible for processing log information of tasks

Introduction to the concept, the initial question, whether DataNode and NodeManager can be deployed to the same node is obvious. The answer is absolutely yes, and this will be very good, the premise is to pay attention to the configuration of the machine, do a good division.

At this point, I believe you have a deeper understanding of "what is the basic concept of Hadoop". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.