Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the considerations of distributed cache under Yarn

2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the points for attention of distributed cache under Yarn, which has a certain reference value. Interested friends can refer to it. I hope you will learn a lot after reading this article.

1. Question

Recently, the company's cluster has been upgraded from Apache hadoop 0.20.203 to CDH 4, which has entered a new era of Hadoop 2.0. although the new generation of hadoop has worked hard to do a variety of compatibility on architecture and API, there is always a "lack of care". The following is a case of distributed cache: some MR job migrated to Yarn and found that they had no data and did not report an error.

Looking at the data source and code, it is found that the usage of distributed caching (DistributedCache) has changed slightly. The previous old code is roughly as follows:

(1) add distributed cache files to the main function:... String cacheFilePath = "/ dsap/rawdata/cmc_unitparameter/20140308/part-m-00000"; DistributedCache.addCacheFile (new Path (cacheFilePath). ToUri (), job.getConfiguration ()); (2) read cache files to make data dictionary when MR is initialized:. / / get the file to be cached from the current job Path [] paths = DistributedCache.getLocalCacheFiles (context.getConfiguration ()) For (Path path: paths) {if (path.toString () .contains ("cmc_unitparameter")) {... (3) result:

These two pieces of code are fine in the MR1 era, but in the MR2 era if is always false.

After specially comparing the path format of MR1 and MR2, we can see that under MRv2, Path does not contain the original path information:

MR1 Path: hdfs://host:fs_port/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000MR1 Path: hdfs://host:fs_port/dsap/rawdata/cmc_unitparameter/20140308/part-m-00000MR2 Path: / data4/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000006/part-m-00000MR2 Path: / data17/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_ 1394073762364_1884_01_000002/part-m-00000MR2 Path: / data23/yarn/local/usercache/root/appcache/application_1394073762364_1884/container_1394073762364_1884_01_000005/part-m-00000

After looking at the above two differences, I think you can understand why the distributed cache is "invalid" under MR2.

2. Solution

It is not difficult to solve this problem:

In fact, in the era of MR1, our above code is not standard enough, traversing the entire distributed cache every time, we should use a little trick: createSymlink

(1) add symbolic links to each cache file in the main function: similar to the # anchor of HTTP URL. String cacheFilePath = "/ dsap/rawdata/cmc_unitparameter/20140308/part-m-00000"; Path inPath = new Path (cacheFilePath); the name after / / # is a link to the above file. The link name of different files cannot be the same, although you can choose String inPathLink=inPath.toUri (). ToString () + "#" + "DIYFileName". DistributedCache.addCacheFile (new URI (inPathLink), job.getConfiguration ());

After adding the soft link, the last part of the path message is your DIYFileName:

/ data4/yarn/local/usercache/root/appcache/application_1394073762364_1966/container_1394073762364_1966_01_000005/cmcs_paracontrolvalues/data4/yarn/local/usercache/root/appcache/application_1394073762364_1966/container_1394073762364_1966_01_000005/cmc_unitparameter (2) where you need to use the cache file, you can read it directly according to the file name you just customized after # BufferedReader br = null;br = new BufferedReader (new InputStreamReader (new FileInputStream ("DIYFileName") (3) the usage and code elsewhere are the same as those of MR1. Thank you for reading this article carefully. I hope the article "what are the points for attention of distributed caching under Yarn" shared by the editor will be helpful to you. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report