How to analyze Spark and MapReduce Task Computing Model 10/22 Update SLTechnology News&Howtos

How to analyze Spark and MapReduce Task Computing Model

2025-10-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how to analyze the Spark and MapReduce task computing model, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

On the whole, both Spark and MapReduce are multi-process models. For example, MapReduce is composed of many process-level instances such as MapTask and ReduceTask, while Spark is composed of multiple process-level instances such as worker, executor, and so on. But when subdivided into specific processing tasks, MapReduce is still at the multi-process level. The unit task that Spark handles tasks is a thread running in executor, which is multithreaded.

For multiple processes, we can easily control the resources they can use, and the failure of one process generally does not affect the normal operation of other processes, but the startup and destruction of the process will take a lot of time, and the resources requested by the process will also be released when the process is destroyed, which results in frequent requests and releases of resources, which is also one of the reasons widely criticized by MapReduce.

For the MapReduce processing task model, it has the following characteristics:

1. Each MapTask and ReduceTask run in an independent JVM process, so it is convenient for fine-grained control of the resources occupied by each task (good resource controllability) 2. Each MapTask/ReduceTask goes through the process of requesting resources-> running task-> releasing resources. Emphasize one point: the resources occupied by each MapTask/ReduceTask after running must be released, and these released resources cannot be used by other task in the task

3. Through JVM reuse, the performance overhead caused by MapReduce allowing each task to dynamically request resources and release resources immediately after running can be alleviated to some extent.

However, JVM reuse does not mean that multiple task can be run in parallel in a JVM process, but the maximum number of task that can be executed sequentially on a JVM for the same job, which requires the configuration parameter mapred.job.reuse.jvm.num.tasks, default 1.

Spark for the multithreaded model is just the opposite of MapReduce, which determines that Spark is more suitable for running low-latency tasks. Task on the same node in Spark runs in an executor process in a multithreaded manner, building a reusable resource pool with the following characteristics:

1. Each executor runs in a separate JVM process, and each task is a thread running in executor. It is obvious that thread-level task starts 2. 5% faster. All task on the same node run in one executor, which facilitates shared memory. For example, if a file is broadcast to the executor through the broadcast variable of Spark, then the task in this executor does not need to copy a copy of each file, but only needs to deal with the common files held by the executor. The resources occupied by the 3.executor will not be released immediately after some task runs, but can be continuously used by multiple tasks, which avoids the overhead of repeatedly applying for resources for each task.

But the multithreading model has a drawback: multiple task in an executor of the same node is prone to resource requisition. After all, the finest granularity of resource allocation is at the executor level, and there is no fine-grained control over the task running in executor. This also leads to instability when running tasks with a large amount of data and limited resources. Comparatively speaking, MapReduce is more conducive to the smooth operation of this big task.

The above content is how to analyze the Spark and MapReduce task computing model, have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.