Big data Zero basic Learning hadoop introduction course 04/17 Update SLTechnology News&Howtos

Big data Zero basic Learning hadoop introduction course

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. General situation of Hadoop ecology.

Hadoop is a distributed system integration architecture developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution, and make full use of the power of the cluster for high-speed computing and storage. It is reliable, efficient and scalable.

The core of Hadoop is YARN,HDFS,Mapreduce, and the common module architecture is as follows

2 、 HDFS

From Google's GFS paper, published in October 2013, HDFS is a clone of GFS. HDFS is the basis of data storage management in Hadoop. It is a highly fault-tolerant system that can detect and respond to hardware failures.

HDFS simplifies the file consistency model and provides high-throughput application data access through streaming data access. It is suitable for applications with large datasets. It provides a mechanism to write once and read multiple times. The data is in the form of blocks and is distributed in different physical machines of the cluster at the same time.

3 、 Mapreduce

Derived from Google's MapReduce paper, which is used for computing with a large amount of data, it shields the details of the distributed computing framework and abstracts computing into two parts: map and reduce.

4. HBASE (distributed inventory database)

The Bigtable paper, from Google, is a column-oriented, scalable, highly reliable, high-performance, distributed and column-oriented dynamic schema database based on HDFS.

5 、 zookeeper

Solve the problem of data management in distributed environment, such as unified naming, state synchronization, cluster management, configuration synchronization, etc.

6 、 HIVE

Open source by Facebook, a query language similar to sql is defined, which converts SQL to mapreduce tasks and executes on Hadoop

7 、 flume

Log collection tool

8. Yarn distributed Resource Manager

Is the next generation of mapreduce, mainly to solve the poor scalability of the original Hadoop, does not support a variety of computing frameworks, the architecture is as follows

The concepts of big data and artificial intelligence are vague. What route to learn and where to develop after learning. Students who want to learn are welcome to join big data's learning skirt: 606859705. There are a lot of practical information (zero foundation and advanced classic actual combat) to share with you, so that you can understand the most complete big data high-end practical learning process system in China. Start with java and linux, and then gradually go deep into HADOOP-hive-oozie-web-flume-python-hbase-kafka-scala-SPARK and other related knowledge to share!

9 、 spark

Spark provides a faster and more general data processing platform. Compared with Hadoop, spark allows your program to run in memory.

10 、 kafka

Distributed message queues, mainly used to process active streaming data

11. Hadoop pseudo-distributed deployment

At present, there are three main free Hadoop versions, all of which are foreign manufacturers.

1. Original version of Apache

2. CDH version, for domestic users, the vast majority choose this version

3. HDP version

Here we choose the CDH version of hadoop-2.6.0-cdh6.8.2.tar.gz. The environment is that CentOS7.1,jdk needs more than 1.7.0 to 55.

[root@hadoop1 ~] # useradd hadoop

The default java environment that comes with my system is as follows

Add the following environment variables

Do the following authorization

Here, various services of Hadoop are managed and started by Hadoop users.

View service startup

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.