How to understand the log aggregation of Yarn 04/18 Update SLTechnology News&Howtos

How to understand the log aggregation of Yarn

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is about how to understand Yarn log aggregation, the editor feels very practical, so share with you to learn, I hope you can learn something after reading this article, say no more, follow the editor to have a look.

1. Yarn log aggregation

After the task in Yarn is completed, it is generally impossible to check the history log, because the Container running the task has been released. In this case, we can enable the log aggregation feature of Yarn.

Configure yarn-site.xml first

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 604800 yarn.nodemanager.remote-app-log-dirs / tmp/logs yarn.log.server.url http://localhost:19888/jobhistory/job/

Then start history-server,RM,NM

Start-yarn.cmd,sbin\ mr-jobhistory-daemon.sh start historyserver

Submit the mapreduce job to Yarn and other execution to see if the history log server can see it.

Tip: it's best to install it on linux. I failed to start historyServer under windows.

[yarn log parameters]

Collect the local logs of each container after yarn.log-aggregation-enable true execution. The retention time of the logs collected by yarn.log-aggregation.retain-seconds 2592000 is in seconds. After that, the retention time is deleted. After 30 days, delete the address of yarn.log.server.url http://hostname:19888/jobhistory/logs log server yarn.nodemanager.local-dirs / hadoop/yarn/local and store the root directory of the local files of application execution. Delete after execution, store the root directory of application local execution log by user name yarn.nodemanager.log-dirs / hadoop/yarn/log, delete after execution, store the retention time of yarn.nodemanager.log.retain-second 604800 log by user name, when log aggregation does not have enable After valid yarn.nodemanager.remote-app-log-dir / app-logs aggregates logs, the storage address of hdfs after the yarn.nodemanager.remote-app-log-dir-suffix logs collection log is composed of ${remote-app-log-dir} / ${user} / {thisParam} to form yarn.nodemanager.delete.debug-delay-sec 6000.After application execution ends, delay 10min to delete local files and logs

2. Spark On Yarn log

Spark itself also has log aggregation function (log aggregation is so that you can still see the log after the program has finished running, otherwise you can see the log only during the run, because executor is released after running)

Configure Spark log aggregation, set in spark-defaults.conf:

Spark.eventLog.enable=true

Spark.eventLog.dir = hdfs:///spark-history/logs

Then start spark historyServer and you can see the historical aggregation log of spark on the default port 18080.

When the spark is submitted on Yarn, the History link of the task is entered on port 8088 of Yarn. By default, it goes to port 19888 of historyServer of Yarn, and does not automatically jump to port 18080 of Spark.

If you want to automatically jump to historyServer 18080 port of Spark, you need to add the configuration in spark-defaults.conf:

Spark.yarn.historyServer.address= http://spark-history:18080

Add SPARK_HISTORY_OPTS parameter

# vi spark-env.sh

#! / usr/bin/env bash

Export SCALA_HOME=/root/learnproject/app/scala

Export JAVA_HOME=/usr/java/jdk1.8.0_111

Export HADOOP_CONF_DIR=/root/learnproject/app/hadoop/etc/hadoop

Export SPARK_HISTORY_OPTS= "- Dspark.history.fs.logDirectory=hdfs://mycluster/spark/historylog\

-Dspark.history.ui.port=18080\

-Dspark.history.retainedApplications=20 "

And then it's okay.

The above is how to understand the log aggregation of Yarn. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.