Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What does Hadoop need to know?

2025-10-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what Hadoop needs to know". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

How does it work?

Hadoop is originated from the Google file system, and it is a cross-platform application developed in Java. The core components are: Hadoop Common, with libraries and basic tools that other modules rely on, Hadoop distributed File system (HDFS), responsible for storage, Hadoop YARN, management of computing resources, and Hadoop MapReduce, responsible for processing processes.

Hadoop breaks the files into small pieces and distributes them to the nodes in the cluster. It then uses packaged code to distribute to nodes to process data in parallel. This means that data can be processed faster than using traditional architectures.

A typical Hadoop cluster has a master node and a slave node, or a worker node. The master node consists of a task tracker, a task scheduling, a name node and a data node. The slave node usually acts as a data node and task scheduler, but in special scenarios the program may only have the data node and then process and calculate in other slave nodes.

In large Hadoop clusters, a dedicated name node is usually used to manage the file system index information of HDFS nodes. This prevents data loss and corruption in the file system.

Hadoop file system

Hadoop distributed file system is the core of Hadoop extension. The advantage of HDFS when dealing with big data is that it can store gb or tb-sized files across multiple machines. Because copies of data exist on multiple machines, instead of using additional RAID to guarantee on a single machine. However, RAID will still be used to improve performance. Provides further protection to allow the primary NameNode server to automatically switch to backup failed events.

HDFS is designed to be mounted directly to the user space (FUSE) or virtual file system of Linux systems. Access to files is handled through an Java API. HDFS is designed for portability across hardware platforms and operating systems.

Hadoop can also work with other file systems, including FTP, Amazon S3, and Microsoft Azure. However, it requires a specific file system bridge to ensure that there is no performance loss.

Hadoop and his Cloud

In contrast to traditional data centers, Hadoop is often deployed in the cloud. The advantage of this is that companies can easily deploy Hadoop with faster and lower installation costs. Most cloud providers offer some form of Hadoop deployment.

Microsoft provides Azure HDInsight, which allows users to use the number of nodes they need and charge for the computing power and storage they use. HDInsight is based on Hortonworks and can easily move data between internal systems and cloud backups, or between development and testing.

Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) also support Hadoop, plus Amazon provides Elastic MapReduce products, so it can automate the configuration of Hadoop clusters, run and terminate jobs, and handle data transfer between EC2 and S3 storage.

Google provides a management Spark and Hadoop service called Cloud Dataproc, which uses a series of Shell scripts to create and manage Spark and Hadoop clusters. He supports third-party Hadoop distributions like Cloudera, Hortonworks and MapR.Google Cloud Storage can also be used with Hadoop.

Recent situation of Hadoop

Hadoop has made some initial progress. Only 18% of Gartner study in 2015 said they would use it in the next two years. The reasons for the reluctance to adopt this technology include high costs, relative to expected benefits, and a lack of necessary skills.

There are still some high-profile users. Yahoo's search engine is powered by Hadoop, and the company has made the source code of the version it uses available to the public through the open source community. Facebook also uses Hadoop, and in 2012 the company announced that its cluster had 100PB data and was growing by about one PB a day.

Although initial possession is slow, Hadoop is also growing. A survey by Allied Market Research in early 2016 estimated that revenue in the Hadoop market would exceed $84 billion by 2021.

Because of the way Hadoop works, I see something that goes back to the old days of batch processing information. While it is useful to extract insights from large amounts of historical data, it is less effective for real-time applications or continuous incoming data streams.

Characteristics

Hadoop has always been closely related to big data. With the expansion of Internet of things devices and the increase of the amount of data collected, the processing capacity requirements of Hadoop will also increase. Its ability to handle big data quickly means that the Hadoop system is playing an increasingly important role in making day-to-day business decisions.

Organizations of all sizes are keen to use big data. Hadoop's open source nature and its ability to run on commercial hardware mean that its processing power is not only available in large companies, but also helps the public to use big data.

For all of these successful companies, you need to be able to take advantage of the advantages that Hadoop can offer. This means that the skills gap needs to be addressed, and there may still be a need for employees with a background in Java,Linux, file systems, and databases who can quickly acquire Hadoop skills. This also means that more and more people are using the cloud to provide the advantages of Hadoop in a less complex way.

That's all for the content of "what Hadoop needs to know". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report