In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Official website
Http://hadoop.apache.org/
Three components of hadoop
HDFS: distributed storage system
Https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
MapReduce: distributed Computing system
Http://hadoop.apache.org/docs/r2.8.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
YARN: resource scheduling system of hadoop
Http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/YARN.html
Recall that I did a project of laser track leveling of China Railway track in the past. the database size of a section of 50KM is 400G, which is big just to find space to copy it out. Now it can be carried out very conveniently with a distributed database and computing platform.
Mapper
The mapper maps the input key / value pair to a set of intermediate key / value pairs.
Mapping is a single task to convert an input record into an intermediate record. The converted intermediate record does not need to be of the same type as the input record. A given input pair can be mapped to zero or more output pairs. The task of Hadoop's MapReduce framework to generate a map is generated by each InputSplit working InputFormat. In general, the implementation of graphics is a method that is passed to the working setmapperclass (class) through the work. The framework invocation graph (writablecomparable, write, context) each key / value pair, the task in the InputSplit pair. The application can then override the cleanup (context) method to perform any necessary cleanup work. The output pair does not need to be of the same type as the input pair. A given input pair can be mapped to zero or more output pairs. The output is written to the called context (writablecomparable, writable).
Applications can use counters to report their statistics.
All intermediate values associated with a given output key are then grouped by the frame and passed to the reducer to determine the final output. The user can control the grouping through work by specifying a comparator. Setgroupingcomparatorclass (class). Sort the mapper output, and then partition each reducer. The total number of partitions is the same as the reduced number of tasks. The user can control where the key (and therefore record) goes by implementing a custom splitter. The user can choose to specify a synthesizer by working. Setcombinerclass (class), where the intermediate output is gathered, which helps to reduce the amount of data from the drawing to the reducer. The output of the intermediate sort is always stored in a simple (key, key, value, value) format. If the application can be controlled, the intermediate output is compressed and compressioncodec can be configured.
Reducer
Deceleration is the key to reducing a group of values with a smaller share of median values. The number of users working through the working group has been reduced. Setnumreducetasks (int). Generally speaking, the realization of the reducer is through the post work. The setreducerclass (class) method, which you can override to initialize itself. Framework calls are reduced (writablecomparable, one, in the grouped input pair. The application can override cleanup (context) to perform any required cleanup method. The reducer has three main stages: shuffling, sorting and reducing.
Shuffle shuffle
The mapping of the sort output of the input reducer. At this stage of the framework, all mappers output the corresponding partitions through HTTP.
Partitioner partition
Partitions space zoning is the key. Partition assignment of the key middle map output. "key or subset of keys of ice derive) used by the partition, typically A city hash function. The total number of partitions iced tea like the work of the reduce task. This hence meter, Johnson controls the reduce task's intermediate key and hence record) two post-glacial restores. Hashpartitioner is the default partition.
Counter counter
Counters are tools for MapReduce applications to report their statistics. Mapper and reducer implementations can use counters to report statistics. Hadoop's MapReduce comes with a universally useful mapper, a library of reducers, and plans.
In fact, MapReduce is talking about the concept of divide and conquer, dividing a complex task into several simple tasks to do respectively. In addition, it is the scheduling problem of the program, which tasks are given to which Mapper to deal with is a key consideration. The fundamental principle of MapReduce is the localization of information processing. Which PC holds the corresponding data to be processed, which PC is responsible for processing that part of the data, the significance of doing so is to reduce the burden of network communication. Finally, add a classic picture to make the final supplement. After all, charts are often more persuasive than words.
If the 400-gigabyte database is still there, it is divided into 400 tasks, each of which carries out about 1 g of data processing, which is 400 times faster in theory.
Please refer to google mapreduce for details.
Https://wenku.baidu.com/view/1aa777fd04a1b0717fd5dd4a.html
How MapReduce works
Let's use an example to understand this-
Suppose you have the following input data to the MapReduce program to count the number of words in the following data:
Welcome to Hadoop Class
Hadoop is good
Hadoop is bad
The final output of the MapReduce task is:
Bad
one
Class
one
Good
one
Hadoop
three
Is
two
To
one
Welcome
one
These data go through the following stages
Enter split:
The input to MapReduce work is divided into fixed-size blocks called input splits, and the input discount is consumed by a single mapping.
Mapping-Mapping
This is the first phase of execution of the map-reduce program. Each segmented data in this phase is passed to the mapping function to produce the output value. In our example, the task of the mapping phase is to calculate the number of words per input split (more details about input segmentation are given below) and to compile a list in some form
Rearrange
This phase consumes the output of the mapping phase. Its task is to merge the relevant records output from the mapping phase. In our example, the same words and how often they appear.
Reducing
At this stage, a summary of values is output from the rearrangement phase. This phase combines the value from the rearranging phase and returns an output value. In short, this phase summarizes the complete dataset.
In our example, this stage summarizes the values from the rearrangement phase and calculates the sum of the number of occurrences of each word.
How does MapReduce organize its work?
Hadoop divides work into tasks. There are two types of tasks:
Map tasks (split and mapping) Reduce tasks (rescheduling, restoring)
As mentioned above
The complete execution process (executing Map and Reduce tasks) is controlled by two types of entities, called
Jobtracker: like a master (responsible for the full execution of the submitted job) multitasking tracker: acting as a role like a slave, each of which performs work
For each work submitted for execution in the system, there is a JobTracker residing in Namenode and Datanode residing in multiple TaskTracker.
The job is divided into multiple tasks and then run to multiple data nodes in the cluster. The responsibility of JobTracker is to coordinate activity scheduling tasks to run on different data nodes. The execution of a single task, which is then handled by TaskTracker, is part of the execution work, on each data node. The responsibility of TaskTracker is to send progress reports to JobTracker. In addition, TaskTracker periodically sends "heartbeat" signal information to JobTracker to inform the system of its current state. This allows JobTracker to track the overall progress of each piece of work. If a task fails, JobTracker can reschedule it on a different TaskTracker.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.