In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Saturday, 2019-2-16
General introduction of big data's field technology (role of each component)
1. Big data's technical introduction.
Big data technological ecosystem:
Hadoop elder-class distributed massive data storage and processing technology system, good at offline data analysis
Hbase hadoop-based distributed massive database, offline analysis and online business take-all
Hive sql hadoop-based data warehouse tool, easy to use, rich in functions, similar to SQL
Zookeeper Cluster Coordination Service
Sqoop data Import and Export tool
Flume data collection framework / / often combines kafka+flume data flow or is used for a large number of log collection to hdfs log collection analysis most enterprises use elk
Storm real-time streaming computing framework, leading framework in the field of streaming processing
Spark memory-based distributed computing framework, one-stop processing all in one, rookie, rapid development momentum
SparkCore / / application development
SparkSQL / / sql operation is similar to hive
SparkStreaming / / similar to storm
Machine learning:
Mahout Machine Learning algorithm Library based on mapreduce
MLLIB is based on spark machine learning algorithm library.
As can be seen from the picture above, big data's hadoop ecosystem is similar to a zoo, and the zookeeper component is similar to a manager who manages these animals. / / there are many components in big data's ecosystem. I don't know the components we mentioned above. What is shown in the figure is the basic components.
2. Need to dive to depth
First, understand the functions and applicable scenarios of the framework
2. Use (installation and deployment, programming specification, API)
III. Operating mechanism
IV. Structural principle
5. Source code
3. Basic introduction of hadoop
(1) hadoop is a technical platform for processing (computing and analyzing) massive data, and it adopts the way of distributed cluster.
(2) two major functions of hadoop:
provides storage service for massive data.
provides a programming framework and running platform for analyzing massive data.
(3) Hadoop has three core components:
Storage of massive data in HDFS---- hadoop distributed file system (cluster service)
MapReduce---- distributed computing framework (programming framework) (lead jar package writer), massive data operation analysis (replacement: storm / spark, etc.)
Yarn-Resource scheduling management cluster (which can be understood as a distributed operating system that manages and allocates cluster hardware resources)
(4) use Hadoop:
can understand hadoop as a programming framework (analogy: structs, spring, hibernate/mybatis), with its own specific API encapsulation and user programming specifications, users can use these API to achieve data processing logic; from another point of view, hadoop can be understood as a software that provides services (analogy: database service)
Oracle/mysql, index service solr, cache service redis, etc.), user programs implement specific functions by requesting services from the hadoop cluster from the client
(5) the history of Hadoop.
The first three major technical papers from google: GFS/MAPREDUCE/BIG TABLE
Why does google need such a technology? )
Later, after the "shanzhai" of doug cutting, the java version of hdfs mapreduce and hbase appeared.
And become apache's top-level project hadoop, hbase.
After evolution, there is another yarn (mapreduce+ yarn + hdfs) in the components of hadoop.
Moreover, the periphery of hadoop has produced more and more tool components, forming a huge hadoop ecosystem.
Why do you need hadoop
In the case of a large amount of data, the processing capacity of the single machine is not competent, so we must use the distributed cluster to process the data, and the complexity of the implementation increases in series when using the distributed cluster to process the data. under the requirements of massive data processing, a general distributed data processing technology framework can greatly reduce the difficulty of application development and reduce the workload.
The overall development process of hadoop business: see figure
Flume data acquisition-> MapReduce cleaning-> save to hbase or hdfs---- > hive statistical analysis-> save to hive table-> sqoop import and export-> mysql database-> web display
Tip: when the amount of data is very large, we can join the kafka message queue in the flume data collection node to form a cache; in the data cleaning phase, we can use memory such as spark or storm flink and real-time streaming algorithm framework (for different business scenarios); store it in HBASE or hdfs in hadoop; in the data analysis phase, we can use computing tools such as hive or impala When displaying web, you can use the kabina// data visualization tool kabina or Grafana in elk
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.