In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1 hadoop big data platform architecture and practice
Master the principle of big data's storage and processing technology
Master hadoop development
2 knowledge of course forecast
Linux common commands
The basis of java programming
3 hadoop previous life and present life
Big data came to the PB level of data.
Mapreduce 、 GFS
Parallel, node synchronization, developed technical papers, no open source code.
Hadoop * baby elephant.
4 Features and benefits of hadoop
Open source distributed access + distributed computing platform.
HDFS: distributed file system
Task scheduling.
A high-expansion, low-cost, mature ecosystem.
Hadoop talent demand, development talent, operation and maintenance
5 ecosystem version
HDFS MAPreduce hive
Sql-> hive-> hadoop
Hbase non-relational database
Zookeeper Animal Manager
Version selection, 2.6 version 1.2 stable version
6 installation of hadoop
1 linux environment
2 install jdk
3 configure 4 profiles for hadoop
You can rent a CVM. Ali Yun is a good choice.
7 the core file of hadoop
HDFS architecture
Read metadata dateNode is the work node
Data management strategy: three copies of storage, 64m database, heartbeat detection, regular status report, secondary namenode, regular synchronization, that is, all kinds of backup, automatic synchronization.
Read and write process: any program is possible. Pipeline replication. Update source data.
Features: data redundancy, hardware fault tolerance. Streaming data access, can not be modified, directly deleted and added. Store large files. Batch read and write, large throughput, once write, multiple read and write, poor interactive performance
Command line operation: similar to shell programming.
8 map reduce divides big tasks into small tasks and merges the results together.
100GB's website access log file, find the IP with the most visits.
Exchange is very important.
Run the process:
Basic concept: job task one job to more task
Jobtracker map task and reduce task.
Jobtracker 1 Job scheduling 2 assigns tasks and monitors the progress of task execution
Monitor the status of tasktracker
Carry out the task and report the mission status.
Input data fragmentation, map tasks, intermediate results, reduce tasks, output results.
In HDFS table.
Jobtracker tasks, execution process.
Fault-tolerant mechanism, repeat execution, retry 4 times first, give up by default, speculate execution. After the map side completes, the tasktracker executes.
9 Application case:
Wordcount Classic Records:
Calculate the frequency of each word in the file,
Map process score
Reduce process combination
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.