In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
MapReduce task data skew generally refers to the Reduce side data skew, how can there be data skew on the Map side?~~
Mapper tasks are shown below, one of which takes a particularly long time
Run an ETL program, two Maps read two types of data, one is lzo format, the other is txt format.
The map task is usually a map of a data block. Why is one of the map calculations so time-consuming? Look at the chunking of two data files separately.
Input 1 has 50 blocks
Input 2 has 11 blocks
There are only 52 maps in total. The map format should be 50+11=61.
The conclusions are as follows
MultipleInputs Two inputs, one of which did not Split successfully when the data was input, considering that one of the inputs was lzo, most likely because the file did not have an index index.
So check the directory where the lzo file is located to see if the lzo.index file exists.
The existence of the lzo.index file can also cause the read data to be unchunked, so checking the code shows that TextInputFormat is used when processing the lzo file (the code has been modified to LzoTextInputFormat)
So there's no chunking of files according to lzo.index. An lzo file is computed in a map. This results in the wrong number of maps to process, and one of them takes a long time to process.
if (commonPath != null && commonPath.length() != 0) {
MultipleInputs.addInputPath(job, new Path(commonPath.toString()), TextInputFormat.class, MidHotelMapper.class);
} else {
logger.error("Input path is empty:-->{}", conf.get(CommonConstant.COMMON_TASK_INPUT));
System.exit(-1);
}
if (ctripPath != null && ctripPath.length() != 0) {
MultipleInputs.addInputPath(job, new Path(ctripPath.toString()), LzoTextInputFormat.class, MidCtripHotelMapper.class);
} else {
logger.error("Input path is empty:-->{}", conf.get(Constant.CTRIP_TASK_INPUT));
System.exit(-1);
}
The ETL task took an average of about 16 minutes before modification.
After modification, it took only 2 minutes.
I used TextInputFormat to run ETL for two years. Finally…I checked it when I had time and revised it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.