In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Hadoop:
Big data cluster can only run on Linux platform.
RDBMS: tabl
Fields, data types, constraints
Structured data
Relational database plays an important role in data.
But not all data can be structured.
Structured data: structured data
Unstructured data: unstructured data
Semi-structured data: semi-structured data
Usually saved as xml, json
Google:pagerank page algorithm
Break it up into parts and process it in parallel
Cut a big problem into multiple small problems
OLAP: data Mining
Machine Learning: deep learning
Multi-node parallel processing
Map reduce:
Functional programming API
Operation framework
HDFS + Mapreduce=Hadoop
HDFS:
Namenode:NN node
Datanode:DN node
MapReduce:
JobTracker:JT node
TaskTracker:TT node
Hadoop is developed in Java, while mapper,reducer is developed in Java.
Hadoop Ecology:
A mapper,reducer can be without reduce, but not without mapper
HDFS:
1. HDFS is designed to store large files, but it is not suitable for large and small files.
2. File system in user space
3. HDFS does not support modification; the new version supports appending
4. It does not support mounting and can be accessed through system calls. You can only use dedicated access interfaces, such as dedicated command line tools and API.
Scribe, facebook
Flume
Hadoop peripheral components
Hadoop cluster ecology, ecosphere
Hive intermediate component
Technology is scene-oriented.
Data modification can be done based on HBASE
HBASE is NoSQL, sparse format storage scheme
Cloudera, CDH famous hadoop technology service provider is similar to redhat
Import relational database data into Hadoop flowchart:
RDBMS-- > Sqoop-- > Hbase-- > HDFS
Avro: serializing data
How to learn Hadoop
1. Install and configure HDFS
2. Install and configure MapReduce
3 、 HBase
4 、 Hive
5 、 sqoop
6 、 flume/scribe/chukwa
HDFS normal number of nodes: four nodes
Local mode debug mode
Pseudo-distributed (using one node)
Fully distributed (more than 4 nodes)
Multiple copies of Hadoop parallel processing system
MapReduce
Processing logic
Relational database:
Row database, table
HBase:
Column database
Key-value pair
Tools for collecting logs
Flume (ASF)
Chukwa (ASF)
Scribe (facebook)
More advanced programming interface read-in tool than hadoop
Hive SQL
Pig
Crunch Java API
Avro serialization tool
Hadoop has a strong ecological environment.
Sqoop:
Let HDFS analyze data in relational databases (Oracle, MySQl, SQL Server, DB2)
Zookeeper Management component
Ecological map
Hadoop core components:
MapReduce
HDFS
R language
R is the language and operating environment for statistical analysis and drawing. R is a free, free and open source software belonging to GNU system. It is an excellent tool for statistical calculation and statistical mapping.
There are five basic processes in pseudo-distributed system:
JobTracker
TaskTracker
NameNode
SecondaryNameNode
DataNode
The compatibility between the components of Hadoop ecosystem is not very good. The components come from various open source projects.
Cloudera CDH combined distribution is a branch of Hadoop, and the more famous
Various configuration files .xml
Address and port on which the Hadoop process listens
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.