In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Big data may sound strange to you a few years ago, but I'm sure you will feel "familiar" when you hear the word hadoop now! More and more people around me are engaged in hadoop development or learning hadoop. As an entry-level novice to hadoop, where do you find it difficult? I'm afraid the construction of the operating environment is enough to give beginners a headache. If every distribution hadoop can integrate all kinds of environments like Daxi DKHadoop, and install them all at once, it will be a wonderful thing for beginners!
The gossip is a little too much and goes back to the whole. This article is going to share some basic knowledge of hadoop-- hadoop family products-- for friends who are new to hadoop. Through the understanding of hadoop family products, we can further help you learn hadoop! At the same time, you are also welcome to put forward your valuable suggestions!
I. definition of Hadoop
Hadoop is a large family, an open source ecosystem, a distributed operating system, and an architecture based on the Java programming language. However, its smartest technologies are HDFS and MapReduce, which enable it to process large amounts of data distributed.
II. Hadoop products
HDFS (distributed file system):
It is different from the existing file system with many features, such as a high degree of fault tolerance (even if there is an error halfway, it can continue to run), support multimedia data and streaming media data access, efficient access to large data sets, data to maintain rigorous consistency, deployment costs are reduced, deployment efficiency is improved, as shown in the figure is the infrastructure of HDFS.
MapReduce/Spark/Storm (parallel Computing Architecture):
1. In terms of data processing, separation line calculation and on-line calculation:
Role
Description
MapReduce
MapReduce is often used for offline complex big data calculations.
Storm
Storm is used for online real-time big data calculation, and Storm real-time is mainly a piece of data processing.
Spark
Can be used for offline or online real-time big data calculation, Spark real-time is mainly to deal with a time region of data, so Spark is more flexible.
2. Data storage location is divided into disk computing and memory computing:
Role
Description
MapReduce
The data is stored on disk
Spark and Strom
The data is stored in memory
Pig/Hive (Hadoop programming):
Role
Description
Pig
Is a high-level programming language with very high performance in dealing with semi-structured data, which can help us shorten the development cycle.
Hive
Is a data analysis query tool, especially when using SQL-like query analysis to show extremely high performance. You can do what ETL takes one night to do in minutes, and that's the advantage. Take the lead!
HBase/Sqoop/Flume (data Import and Export):
Role
Description
HBase
Is a column storage database running on the HDFS schema and has been well integrated with Pig/Hive. HBase can be used almost seamlessly with Java API.
Sqoop
It is designed to facilitate the import of data from traditional databases into Hadoop data sets (HDFS/Hive).
Flume
It is designed to easily import data directly from the journaling file system into the Hadoop data set (HDFS).
These data transfer tools are greatly convenient for users, improve work efficiency, and focus on business analysis.
ZooKeeper/Oozie (system Management Architecture):
Role
Description
ZooKeeper
Is a system management coordination architecture for managing the basic configuration of a distributed architecture. It provides many interfaces to simplify configuration management tasks.
Oozie
Oozie services are used to manage workflows. It is used to schedule different workflows so that each work has a beginning and an end. These architectures help us to manage big data's distributed computing architecture lightweight.
Ambari/Whirr (system deployment Management):
Role
Description
Ambari
Help relevant personnel to quickly deploy and build the entire big data analysis framework, and monitor the operation of the system in real time.
Whirr
The main role of Whirr is to facilitate the rapid development of cloud computing.
Mahout (Machine Learning):
Mahout is designed to help us quickly complete high-IQ systems. Some of the logic of machine learning has been realized. This architecture allows us to quickly integrate more machine learning intelligence.
People like to pay more attention, and your attention is my biggest motivation.
Those who need big data's information can trust me privately.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.