Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the basic interview questions for hadoop?

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "what are the basic interview questions of hadoop". The content is simple, easy to understand and clearly organized. I hope it can help you solve your doubts. Let Xiaobian lead you to study and learn this article "what are the basic interview questions of hadoop".

1 Describe the whole process of mr and what classes are used in the middle

Map Initial phase:

InputFormat, defined by job.setInputFormatClass(), splits the input dataset into small chunks, while InputFormat provides an implementation of RecordReader. TextInputFormat is generally used, and the RecordReader provided by TextInputFormat will take the line number of the text as the Key and the text of this line as the Value. This is the custom Mapper input

< LongWritable,Text>

for the first time. Then call the map method of the custom Mapper, and put one by one

< LongWritable,Text>

Key-value pairs are input to Mapper's map method.

Map stage shuffle stage:

The process of taking the output of map as the input of reduce is shuffle, which is the focus of mapreduce optimization. Shuffle is the beginning of the map phase to do output operations, generally mapreduce calculations are massive data, map output when it is impossible to put all files into memory operations, so the process of map writing to disk is very complex, not to mention map output when sorting the results, memory overhead is very large, map in the output time will open a circular memory buffer in memory, this buffer is dedicated to output, the default size is 100mb, And in the configuration file for this buffer set a threshold, the default is 0.80 (This size and threshold can be configured in the configuration file). At the same time, map will also start a guardian thread for output operations. If the buffer memory reaches 80% of the threshold, this guardian thread will write the contents to disk. This process is called spill. The other 20% of the memory can continue to write data to disk. Write to disk and write to memory operations are mutually exclusive. If the buffer is full, Then map blocks the write to memory and lets the write to disk complete before proceeding with the write to memory. Before writing to disk, the thread first divides the data into partitions (job.setPartitionerClass()) according to the reducer that the data will eventually pass. In each partition, the daemon presses the key to perform internal sorting (job.setSortComparatorClass(), if not set back to the compareTo() method of the Key implementation by default), and if we define the combiner function (job.setCombinerClass()), then exclusivity runs on the sorted output. Every spill operation, that is, write to disk operation, will write an overflow file, that is to say, how many spill files will be generated when doing map output, and when map output is all done, map will also call combine to merge these output files.

At the reduce stage, the map output file is merged. Partitioner will find the corresponding map output file and then perform the copy operation. During the copy operation, reduce will open several copy threads. The default number of these threads is 5. Programmers can also change the number of copy threads in the configuration file. This copy process is similar to the map writing process. There are also thresholds and memory sizes. The thresholds can also be configured in the configuration file. The memory size is the memory size of the reduced tasktracker directly. After copying all map outputs, reduce also performs sorting operations and merge file operations, and these operations are completed to perform reduce calculations.

Classes used in reduce

In the Reduce phase, after the reduce() method accepts all map outputs mapped to this Reduce, it will also call the Key comparison function class set by the job.setSortComparatorClass() method to sort all the data. Then start constructing a Value iterator for Key. This is where grouping is needed, using the job.setGroupingComparatorClass() method to set the grouping function class. As long as the two keys compared by this comparator are the same, they belong to the same group, and their Value is placed in a Value iterator, and the Key of this iterator uses the first Key of all keys belonging to the same group. Finally, it is the reduce() method that enters the Reducer. The input of the reduce() method is all Key and its Value iterator. Also note that the input and output types must be consistent with those declared in the custom Reducer.

After the data is reduced, the class output is realized through OutPutFormat.

2 What are the different Hadoop profiles?

3 Hive Add a column of statements How to write

alter table test_table add columns (d string);

4 sqoop What configuration is needed to import relational database data into hive command

--hive-import

5 How to transfer files to HDFS

flume,kettle,shell script

7 hive A partition file is damaged, whether it affects other partitions

no

8 How to understand Hive

Hive can be understood as a data warehouse based on hadoop. He is responsible for managing hdfs and providing an interpreter to translate hive sql into mr program for query (personal understanding, free play)

9 How to understand SQOOP

Relational database and hdfs reciprocal tool between the underlying mr implementation

The above is all the contents of this article, thank you for reading! I believe that everyone has a certain understanding, hope to share the content to help everyone, if you still want to learn more knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report