What preparation does it take to learn big data Hadoop with zero fundamentals? How did Hadoop develop? 04/28 Update SLTechnology News&Howtos

What preparation does it take to learn big data Hadoop with zero fundamentals? How did Hadoop develop?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hadoop is a distributed system infrastructure developed by the Apache Foundation. It is an open source distributed computing platform developed in Java language, which is suitable for big data's distributed storage and computing platform. Today, big data of Gamigu will give a brief introduction to the brief history of Hadoop and what preparations should be made before learning Hadoop.

In a narrow sense, Hadoop refers to the software Hadoop alone.

In a broad sense, Hadoop refers to a biosphere of big data, including many other software.

The Origin of Hadoop

1. In 2001, Nutch came out. The design goal of Nutch is to build a large-scale web-wide search engine, including web page crawling, indexing, query and other functions, but with the increase in the number of crawled web pages, it has encountered serious scalability problems.

2. From 2003 to 2004, Google published papers: GFS, MapReduce. Inspired by this, Doug Cutting et al implemented NDFS (the predecessor of HDFS) and MapReduce mechanism, which made the performance of Nutch soar.

GFS:Google 's distributed file system Google File System

MapReduce:Google 's MapReduce Open Source distributed parallel Computing Framework

3. In 2005, Hadoop was formally introduced into the Apache Foundation as part of Nutch, a subproject of Lucene.

4. In 2006, Hadoop (HDFS+MapReduce) was spun off from Nutch to become an independent project. Doug Cutting joined Yahoo and led the development of Hadoop.

. In the process of getting started learning big data, I have encountered learning, industry, lack of systematic learning route, systematic learning planning, welcome you to join my big data learning communication skirt: 529867072, skirt files have my big data learning manual, development tools, PDF documents and books, you can download them by yourself.

A brief History of Hadoop

5. In 2006, the Apache Hadoop project was officially launched to support the independent development of MapReduce and HDFS; Yahoo built the first Hadoop cluster for development; in April, the first Apache Hadoop was released; in November, Google published the Bigtable paper, the inspiration for the creation of Hbase

BigTable: a large distributed database

Evolutionary relationship:

GFS-- > HDFS

Google MapReduce-- > Hadoop MapReduce

BigTable-- > HBase

6. In 2007, the first Hadoop user group meeting was held, and community contributions began to rise sharply. In the same year, Facebook began to use Hadoop, Baidu began to use Hadoop for offline processing, and China Mobile began to study the use of Hadoop.

7. In 2008, Hive and HBase came out, and Hadoop became the top project of Apache. In August, the first commercial Hadoop company, Cloudera, was established. In the same year, Taobao began to use Hadoop

8. From 2009 to 2012, Hadoop continues to develop.

In 2009, Cloudera launched the CDH platform (the first Hadoop release), which consists entirely of open source software. The first edition of the authoritative Guide to Hadoop was published (known as the Hadoop Bible); in 2010, HBase, Hive (Facebook) and Pig broke away from Hadoop and became Apache top-level projects; the Hadoop community built a large number of new components (Crunch,Sqoop,Flume,Oozie, etc.) to expand Hadoop usage scenarios and usability; in 2011, ZooKeeper broke away from Hadoop to become the Apache top-level project. Gami Valley big data training Institute, in June, big data developed 0 basic class and advanced class, which is about to start. In 2012, Yarn became a Hadoop sub-project.

9. In 2014, Spark gradually replaced MapReduce as the default execution engine of Hadoop and became the top-level project of the Apache Foundation.

What can Hadoop do?

Big data Storage: distributed Storage

Log processing: good at log analysis

ETL: data extraction to oracle, mysql, DB2, mongdb and mainstream databases

Machine learning: such as the Apache Mahout project

Search engine: Hadoop + lucene implementation

Data mining: currently popular advertising recommendation, personalized advertising recommendation

Hadoop is designed for offline and large-scale data analysis and is not suitable for online transaction processing that randomly reads and writes several records.

Preparation before learning Hadoop:

Prepare computer (for learning): at least 8G of memory, at least four cores of CPU (cpu i5 series)

Supporting platform: Linux (CentOS) (platform for product development and operation)

Required software: take Linux as an example

Java8.0, must be installed, it is recommended to choose the Java version issued by Oracle.

Ssh must install and keep sshd running so that the remote Hadoop daemon can be managed with Hadoop scripts.

Install the required software: take Linux as an example

$sudo yum install ssh

$sudo yum install rsync

Download the distribution of Hadoop and extract the installation

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.