Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the basic knowledge points of Hadoop

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article is to share with you what are the basic knowledge points of Hadoop. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

History of Hadoop

Apache's Nutch,Nutch, which began in 2002, is an open source Java-implemented search engine. It provides all the tools we need to run our own search engine. Including full-text search and Web crawler.

Then, in 2003, Google published a technical academic paper, Google File system (GFS). GFS is a special file system designed by google File System,google to store massive search data.

In 2004, Doug Cutting, founder of Nutch, implemented a distributed file storage system called NDFS based on Google's GFS paper.

In 2004, Google published another technical academic paper, MapReduce. MapReduce is a programming model for parallel analytical operations on large datasets (larger than 1TB).

In 2005, Doug Cutting realized this function in Nutch search engine based on MapReduce.

In 2006, Yahoo hired Doug Cutting,Doug Cutting to name the NDFS and MapReduce upgrades Hadoop,Yahoo and set up an independent team for Goug Cutting to specialize in research and development of Hadoop.

I have to say that Google and Yahoo contributed a lot to Hadoop.

Hadoop core

The core of Hadoop is HDFS and MapReduce, and both are only theoretical basis, not specific advanced applications. Hadoop has many classic sub-projects, such as HBase, Hive and so on, which are based on HDFS and MapReduce. To understand Hadoop, you must know what HDFS and MapReduce are.

HDFS

HDFS (Hadoop Distributed File System,Hadoop distributed File system), which is a highly fault-tolerant system, suitable for deployment on cheap machines. HDFS provides high-throughput data access and is suitable for applications with very large data sets (large data set).

The design features of HDFS are:

1. Big data files are very suitable for the storage of large T-level files or a pile of big data files. It doesn't make sense if the files are only a few G or even smaller.

2. File block storage. HDFS stores a complete large file on different calculators averagely. Its significance is that when reading files, you can get different chunks of files from multiple hosts at the same time. Multi-host reading is much more efficient than a single host.

3. Streaming data access, read and write multiple times at a time, this mode is different from traditional files, it does not support dynamic changes in the contents of the file, but requires that the file should not be changed once written, and content can only be added at the end of the file to change.

4. Cheap hardware, HDFS can be applied to ordinary PCs, this mechanism allows some companies to use dozens of cheap computers to support a big data cluster.

5, hardware failure, HDFS believes that all computers may have problems. In order to prevent a host from failing to read the block file of the host, it allocates a copy of the same block to several other hosts. If one of the hosts fails, it can quickly find another copy to get the file.

Key elements of HDFS:

Block: chunks a file, usually 64m.

NameNode: saves the directory information, file information and chunk information of the entire file system, which is specially saved by the only host. Of course, if this host makes an error, NameNode will be invalidated. Activity-standy mode is supported in Hadoop2.*-if the primary NameNode fails, start the standby host to run NameNode.

DataNode: distributed on cheap computers to store Block block files.

MapReduce

It is popular to say that MapReduce is a programming model that extracts analysis elements from massive source data and finally returns the result set. Distributed storage of files to the hard disk is the first step, while extracting and analyzing what we need from massive data is what MapReduce does.

Let's take a calculation of the maximum value of massive data as an example: a bank has hundreds of millions of depositors, and the bank wants to find out what the maximum deposit amount is. According to the traditional calculation, we will do this:

Long moneys []... Long max = 0L; for (int iTuno itself IMAX) {max = moneys [I];}}

If the length of the calculated array is small, there will be no problem with the implementation, or there will be problems in the face of huge amounts of data.

MapReduce will do this: first, the numbers are stored in different blocks, take several blocks as a Map, calculate the maximum value in the Map, and then do the Reduce operation on the maximum value in each Map, and then Reduce takes the maximum value to the user.

The basic principle of MapReduce is: divide the large data analysis into small pieces and analyze them one by one, and then summarize and analyze the extracted data, and finally get the content we want. Of course, how to block analysis, how to do Reduce operation is very complex, Hadoop has provided the implementation of data analysis, we only need to write simple requirements commands to achieve the data we want.

Thank you for reading! This is the end of this article on "what are the basic knowledge points of Hadoop?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report