In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
I. the foundation needed to learn from big data
Java SE,EE (SSM)
90% of big data's frames are written by java.
MySQL
SQL on Hadoop
Linux
Big data's framework is installed on the Linux operating system
Second, what do you need to learn the first aspect: big data offline analysis
General processing of Thum1 data
Hadoop 2.x: (common, HDFS, MapReduce, YARN)
The idea of setting up the environment and processing data
Hive:
Big data data Warehouse
Manipulate data by writing SQL, similar to sql in mysql database
HBase
NOSQL database based on HDFS
Column-oriented storage
Collaboration Framework:
Sqoop (Bridge: HDFS "=" RDBMS)
Flume: collect information in log files
Scheduling framework anzkaban, understand: crotab (included with Linux), zeus (Alibaba), Oozie (cloudera)
Extend the frontier framework:
Kylin, impala, ElasticSearch (ES)
Note: my other blog post about the first aspect has a detailed summary (which is what I got from searching a lot of online materials, which can save you a lot of time)
The second aspect: real-time analysis of big data
Mainly based on spark framework
Scala:OOP + FP
SparkCore: analogical MapReduce
SparkSQL: analogical hive
SparkStreaming: real-time data processing
Kafka: message queuing
Frontier Framework extension: flink
Alibaba blink
The third aspect: big data machine learning (expansion)
Spark MLlib: machine Learning Library
Pyspark programming: the combination of Python and spark
Recommendation system
Python data analysis
Python machine learning
Big data framework installs the function to divide massive data storage:
HDFS, Hive (essentially storing data or hdfs), HBASE, ES
Massive data analysis:
MapReduce 、 Spark 、 SQL
The most primitive Hadoop framework
Data storage: HDFS (Hadoop Distributed File System)
Data analysis: MapReduce
The Origin of Hadoop three papers by Google
Although Google did not release the source code of these three products
But he published detailed design papers for these three products.
Laid the foundation of the popular big data algorithm all over the world!
Google FS HDFSMapReduce MapReduceBigTable HBase
The tasks are decomposed and then processed at the same time in multiple computing nodes with weak processing power, and then the results are merged to complete big data processing.
Google:android, search, big data framework, artificial intelligence framework
Pagerank
Hadoop introduction
Most of big data's frameworks belong to Apache top-level projects.
Http://apache.org/
Hadoop official website:
Http://hadoop.apache.org/
Distributed system
Relative to [centralized]
Multiple machines are needed to assist in the completion.
Metadata: data that records data
Architecture:
Master node Master boss, manager
Administration and Management
Slave node Slave subordinate, slave, managed
Work
Hadoop is also a distributed architecture
Common
HDFS:
Master node: NameNode
Determines which DataNode the data is stored on.
Slave node: DataNode
Store data
MapReduce:
The idea of divide and rule
The vast amount of data is divided into multiple parts, each part of the data is processed separately, and finally all the results are merged.
Map task
Deal with each part of the data separately,
Reduce task
Merge the output of map task
YARN:
Distributed cluster resource management framework, managing cluster resources (Memory,cpu core)
Reasonable scheduling and allocation to each program (MapReduce) for use
Master node: resourceManager
Take charge of the resources in the cluster
Slave node: nodeManager
Manage the resources of each cluster
Summary: installation and deployment of Hadoop
All belong to the java process, that is, the JVM process is started and the service is run.
HDFS: stores data and provides data for analysis
NameNode/DataNode
YARN: the resource on which the provider runs
ResourceManager/NodeManager
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.