What are the methods of hadoop rewriting 07/02 Update SLTechnology News&Howtos

What are the methods of hadoop rewriting

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what are the hadoop rewriting methods". In the daily operation, I believe many people have doubts about the hadoop rewriting methods. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what are the hadoop rewriting methods?" Next, please follow the editor to study!

1. Download (skimmed)

two。 Compile (abbreviated)

3. Configuration (pseudo-distribution, clustering)

4. Hdfs

1. Web interface: http://namenode-name:50070/( displays datanode list and cluster statistics)

2. Shell command & dfsadmin comman

3. Checkpoint node & backup node

1. Merge principle of fsimage and edits files

2. (presumably a feature of an earlier version) manually restore a failed cluster: import checkpoint

3. Backupnode: Backup Node maintains a fsimage synchronized from Namenode in memory, and it also receives log streams of edits files from namenode and persists them to the hard disk. Backup Node merges these edits files with fsimage files in memory to create a metadata backup. The secret of Backup Node's efficiency is that it doesn't need to download fsimage and edit from Namenode, persist the metadata in memory to disk and merge it.

4. Banlancer: balance the imbalance between rock and datanodes data

5. Rock awareness: rack-aware

6. Safemode: when the data file is incomplete or enters the safemode manually, the hdfs is read-only. When the cluster check reaches the threshold or manually leaves the security mode, the cluster resumes reading and writing.

7. Fsck: block file check command

8. Fetchdt: get token (security)

9. Recovery mode: recovery model

10. Upgrade and Rollback: upgrade, rollback

11. File Permissions and Security

12. Scalability

13.

5. Mapreduce

Public class MyMapper extends Mapper {

Private Text word = new Text ()

Private IntWritable one = new IntWritable (1)

/ / override the map method

@ Override

Public void map (Object key, Text value, Context context)

Throws IOException, InterruptedException {

StringTokenizer stringTokenizer = new StringTokenizer (value.toString ())

While (stringTokenizer.hasMoreTokens ()) {

Word.set (stringTokenizer.nextToken ())

/ / (word,1) for transmission

Context.write (word, one)

}

Public class MyReducer extends Reducer {

Private IntWritable result = new IntWritable (0)

/ / override the reduce method

@ Override

Protected void reduce (Text key, Iterable iterator

Context context) throws IOException, InterruptedException {

Int sum = 0

For (IntWritable I: iterator) {

Sum + = i.get ()

}

Result.set (sum)

/ / the output value of reduce

Context.write (key, result)

}

Public class WordCountDemo {

Public static void main (String [] args) throws Exception {

Configuration conf = new Configuration ()

Job job = Job.getInstance (conf, "word count")

Job.setJarByClass (WordCountDemo.class)

/ / set map and reduce class

Job.setMapperClass (MyMapper.class)

Job.setReducerClass (MyReducer.class)

Job.setCombinerClass (MyReducer.class)

/ / format the final output

Job.setOutputKeyClass (Text.class)

Job.setOutputValueClass (IntWritable.class)

/ / set FileInputFormat outputFormat

FileInputFormat.addInputPath (job, new Path (args [0]))

FileOutputFormat.setOutputPath (job, new Path (args [1]))

System.exit (job.waitForCompletion (true)? 0: 1)

}

2. Job.setGroupingComparatorClass (Class).

3. Job.setCombinerClass (Class)

4. CompressionCodec

5. Number of Map: Configuration.set (MRJobConfig.NUM_MAPS, int) = > dataSize/blockSize

6. Reducer number: Job.setNumReduceTasks (int).

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

7. Reduce- > shuffle: Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. -> reduce is the result of the output sorted by mapper. At this stage, the framework grabs the relevant partitions of all mapper output through http.

8. Reduce-> sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.- > at this stage, the framework groups the reducer according to the input key (different mapper may output the same key). Shuffle and sort occur at the same time, and when the map output is captured, they merge.

9. Reduce-> reduce:

10. Secondary sort

11. Partitioner

12. Counter: Mapper and Reducer implementations can use the Counter to report statistics.

13. Job conf: configuration-> speculative manner (setMapSpeculativeExecution (boolean)) / setReduceSpeculativeExecution (boolean)), maximum number of attempts per task (setMaxMapAttempts (int) / setMaxReduceAttempts (int)) etc.

Configuration.set (String, String) / Configuration.get (String)

14. Task executor & environment-> The user can specify additional options to the child-jvm via the mapreduce. {map | reduce} .java.opts and configuration parameter in the Job such as non-standard paths for the run-time linker to search shared libraries via-Djava.library.path= etc. If the mapreduce. {map | reduce} .java.opts parameters contains the symbol @ taskid@ it is interpolated with value of taskid of the MapReduce task.

15. Memory management-> Users/admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapreduce. {map | reduce} .roomy.mb. Note that the value set here is a per process limit. The value for mapreduce. {map | reduce} .roomy.mb should be specified in mega bytes (MB). And also the value must be greater than or equal to the-Xmx passed to JavaVM, else the VM might not start.

16. Map Parameters. (http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#MapReduce_Tutorial)

17. Parameters ()

18. Job submission and monitoring:

1.Job provides facilities to submit jobs, track their progress, access component-tasks' reports and logs, get the MapReduce cluster's status information and so on.

2. The job submission process involves:

1. Checking the input and output specifications of the job.

2. Computing the InputSplit values for the job.

3. Setting up the requisite accounting information for the DistributedCache of the job, if necessary.

4. Copying the job's jar and configuration to the MapReduce system directory on the FileSystem.

5. Submitting the job to the ResourceManager and optionally monitoring it's status.

3. Job history

19. Job controller

1. Job.submit () | | Job.waitForCompletion (boolean)

two。 Multiple Mapreduce job

1. Iterative mapreduce (the previous mr is used as the input to the next mr, with disadvantages: the overhead of creating job objects, local disk reading and writing io, and high network overhead)

2. MapReduce-JobControl:job encapsulates the dependencies of each job, and the jobcontrol thread manages the status of each job.

3. MapReduce-ChainMapper/ChainReduce: (chainMapper.addMap () Multiple mapper tasks can be linked in one job, but are not available for multi-reduce job.

20. Job input & output

1. InputFormat TextInputFormat FileInputFormat

2. InputSplit FileSplit

3. RecordReader

4. OutputFormat OutputCommitter

At this point, the study of "what are the methods of hadoop rewriting" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.