How is the task scheduling of big data computing framework Spark realized? 07/19 Update SLTechnology News&Howtos

How is the task scheduling of big data computing framework Spark realized?

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Big data computing framework Spark task scheduling is how to achieve, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

Spark has several resource scheduling facilities. Each Spark Application (SparkContext instance) runs independently within a set of executor processes. Cluster manager provides facilities for inter-application scheduling. Within each Spark application, if multiple job (multiple spark action) are submitted to different threads, they will run in parallel.

Resource scheduling between 1 Application

On the cluster, each Spark application gets a separate set of executor JVM, which only runs task and stores data for that application. If multiple users want to share the cluster, there are different policies to manage resource allocation, depending on the cluster manager used.

Static partitions (static partitioning) of resources are available to all cluster manager, so that each application can get the most resources it can use during its life cycle. This is the way the three modes standalone, YARN, and coarse-grained Mesos mode use.

1.1 Control of resource usage

Under the cluster type, configure resource allocation as follows:

The Standalone mode:application is submitted to the standalone mode cluster and will be run in the order of FIFO, with each application using as many available nodes as possible, configuring spark.cores.max to limit the number of nodes used by application, or setting spark.deploy.defaultCores. In addition to setting the number of kernels available for application, you can also set spark.executor.memory to control memory usage.

Mesos: in order to use static partitions (static partitioning) on Mesos clusters, spark.mesos.coarse=true, you can restrict resource sharing for each application by setting spark.cores.max, and control executor memory usage by setting spark.executor.memory.

YARN: by setting the-- num-executors option, the spark YARN client can control how much executor is allocated on the cluster (the corresponding configuration attribute is spark.executor.instances), and-- executor-memory (corresponding configuration attribute spark.executor.memory) and-- executor-cores (corresponding configuration attribute spark.executor.cores) control the resources allocated to each executor.

Memory cannot be shared between applications.

1.2 dynamic resource allocation

Spark provides a mechanism to dynamically adjust resources according to the workload of the application. This means that resources that are no longer used by your application will be returned to the cluster and then apply for allocation when needed, which is especially useful for multi-application shared clusters.

This feature is disabled by default, but is available on all coarse-grained cluster manager, such as standalone mode, YARN mode, and Mesos coarse-grained mode.

There are two requirements for using this feature. First, you must set spark.dynamicAllocation.enabled=true, and second, set each worker node of external shuffle service on the cluster and set spark.shuffle.service.enabled=true. External shuffle service is set so that executor can be removed without deleting the shuffle files they generate.

This variable is set as follows:

In standalone mode: set spark.shuffle.service.enabled=true

Mesos coarse-grained mode: run $SPARK_HOME/sbin/start-mesos-shuffle-service.sh on all slave nodes to set spark.shuffle.service.enabled=true

YARN: for more information, see running spark and YARN

1.3 Resource allocation strategy

Sell Spark when it is no longer using executor, and get it when needed. Since there is no definite way to predict whether the executor to be removed will be used in the near future, or whether a new executor to be added will actually be free, we need a series of heuristics to determine whether to remove executor (multiple may be removed) or request executor (multiple may be requested).

Request strategy

Enable the Spark application dynamic resource allocation feature. When pending task waits to be scheduled, Spark application will request additional executor. This means that the current executor cannot satisfy all the task at the same time, and the task has been submitted but has not yet been executed.

Spark takes turns requesting executor. When the task waits longer than the spark.dynamicAllocation.schedulerBacklogTimeout, the real request (the request for the executor) is triggered, and then the request is triggered every spark.dynamicAllocation.sustainedSchedulerBacklogTimeout second if the incomplete task queue exists. The number of executor requests per round increases exponentially. For example, the * round requests one executor, the second round requests 2, the third round and the fourth round request 8 each.

There are two motivations for exponential growth. First, applications should request executor carefully at first, in case the demand can be met with only a few executor, which is similar to the slow start of TCP. Second, when the application does need more executor, the application should be able to increase the use of resources in a timely manner.

Remove policy

When executor is idle for more than spark.dynamicAllocation.executorIdleTimeout seconds, it is removed. Note that in most cases, the removal condition and request condition of executor are mutually exclusive, so that the executor will not be removed if there is still a task to be scheduled.

Executor retired gracefully

In the case of non-dynamic allocation of resources, a Spark executor exits either because of failure or because of the exit of the related application. In both cases, the states associated with the executor are no longer required and can be safely discarded. In the case of dynamically allocating resources, the executor is still running when the application is explicitly removed. If application wants to use these states stored and written by executor, it must recalculate the state. This requires an elegant retirement mechanism to keep executor in his state until he retires.

This mechanism is particularly important for shuffles. During shuffle, executor's own map output is written to the local disk. This executor acts as a file server when other executor wants to get these files. For those lagging executor, their task execution time is longer than their peers, and dynamic resource allocation may remove an executor before the shuffle is completed, in which case the executor written to the local file (that is, the state of the executor) does not have to be recalculated.

The way to keep shuffle files is to use external shuffle services, which were introduced in Spark 1.2. This external shuffle service refers to a long-running process that runs on each node of the cluster, independent of application and executor. If the service is available, executor gets the shuffle file from the service instead of getting the shuffle file from each other. This means that any shuffle file generated by executor can be included by the service, even outside the executor lifecycle.

In addition to writing shuffle files to the local hard disk, executor also caches data to the hard disk or memory. However, when the executor is removed, the data cached in memory will not be available. To solve this problem, the executor that caches data into memory by default is never deleted. This behavior can be configured through spark.dynamicAllocation.cachedExecutorIdleTimeout

2 Resource scheduling in Application

Overview

Given the internal application (SparkContext instance), if multiple parallel job are committed to different threads, these job can be executed at the same time. The job here refers to the computing task triggered by Spark action and Spark action. Spark scheduler is thread-safe and supports spark application to serve multiple requests.

By default, Spark scheduler executes job in the order of FIFO, and each job is split into one or more stage (for example, map and reduce). When the task of * stage of * is started, the job gets all available resources first, followed by the second and third job. If the job at the head of the line does not have to use the entire cluster, the subsequent job can be started immediately. If the job at the head of the line is large, then the job startup delay will be more obvious.

Starting with Spark 0.8, fair scheduling between queues can also be achieved through configuration. The allocation of task resources among Job adopts a single cycle. All job will get roughly the same cluster resources. This means that when a long job exists, the submitted short job can immediately get the resources to start running without having to wait for the long job execution to complete. You can set spark.scheduler.mode to FAIR

Val conf = new SparkConf (). SetMaster (...). SetAppName (...) Conf.set ("spark.scheduler.mode", "FAIR") val sc = new SparkContext (conf)

Fair scheduling pools (possibly multiple)

The Fair Scheduler also supports grouping job in pools and configuring different options for each pool. This helps to set up high-priority pools for more important job, such as grouping each user's job into a group, and giving these users equal resources no matter how many parallel task there are, rather than giving each job equal resources.

Without any intervention, the new job enters the default pool, but the job pool can be set up using spark.scheduler.pool.

Sc.setLocalProperty ("spark.scheduler.pool", "pool1")

Once set, all job submitted by this thread (by calling RDD.save, count, collect) will use the name of the resource pool. The settings are for each thread, which makes it easier for a thread to run multiple job for a user. If you want to clear the pool associated with a thread, call sc.setLocalProperty ("spark.scheduler.pool", null)

Pool default behavior

By default, each pool gets equal resources (in the default pool, each job gets equal resources), but within each pool, job runs in FIFO order. For example, if you create a pool for each user, this means that each user will get equal resources, and each user's query will be run sequentially without subsequent queries preempting the resources of the previous query

Configure Pool Properti

You can change the pool properties by modifying the configuration file. Each pool supports three attributes:

SchedulingMode: it can be FIFO or FAIR, and the job in the control pool is queued or the cluster resources are shared fairly.

Weight: controls the proportion of resources allocated. By default, the proportion of resources allocated by all pools is 1. If you specify a pool with a proportion of 2, he will get twice as many resources as the other pools. If the specific gravity of a pool is set to be very high, such as 1000, then regardless of whether he has an active job or not, he always starts to execute task.

MinShare: in addition to setting the overall share, you can also set a minimum resource allocation (for example, CPU cores) for each pool. Before reallocating resources according to proportion, the fair scheduler always tries to meet the minimum resource requirements of all active pools. MinShare performance in another way ensures that a pool quickly gets a certain amount of resources (10 cores) without giving it a higher priority. MinShare=0 by default.

By calling SparkConf.set, you can configure pool properties through the XML file:

Conf.set ("spark.scheduler.allocation.file", "/ path/to/file")

One per pool, and pools that are not configured in the XML file use the default configuration (scheduling mode FIFO, weight 1, minShare 0), for example:

Is it helpful for you to read the above content after reading FAIR 1 2 FIFO 2 3? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.