In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I would like to talk to you about how to use GPU acceleration in Spark 3.0. many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something from this article.
Overview
RAPIDS Accelerator for Apache Spark uses GPUs data to accelerate processing, which is achieved through RAPIDS libraries.
When data scientists shift from traditional data analysis to AI applications to meet the needs of complex markets, traditional CPU-based processing can no longer meet the needs of speed and cost. The rapid growth of AI analysis requires a new framework to quickly process data and save costs, and this can be achieved through GPUs.
RAPIDS Accelerator for Apache Spark integrates RAPIDS cuDF library and Spark distributed computing framework. The RAPIDS Accelerator library also has a built-in acceleration shuffle based on UCX and can be configured for GPU-to-GPU communication and RDMA capabilities.
Spark RAPIDS download v0.4.1
RAPIDS Spark Package
CuDF 11.0 Package
CuDF 10.2 Package
CuDF 10.1 Package
RAPIDS Notebooks
CuML Notebooks
CuGraph Notebooks
CLX Notebooks
CuSpatial Notebooks
Cuxfilter Notebooks
XGBoost Notebooks
Introduction
These notebooks provide examples of using RAPIDS. Designed to self-include runtime version of the RAPIDS Docker Container and RAPIDS Nightly Docker Containers and can run on air-gapped systems. You can quickly get the container and then install and use it according to RAPIDS.ai Getting Started page.
Usage
Get the latest notebook repo update, run. / update.sh or use the command:
Git submodule update-init-remote-no-single-branch-depth 1
Download CUDA Installer for Linux Ubuntu 20.04 x86room64
The basic installation is as follows:
Basic installer installation instructions: wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pinsudo mv cuda-ubuntu2004.pin / etc/apt/preferences.d/cuda-repository-pin-600wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda-repo-ubuntu2004-11-1-local_11.1.0-455.23.05-1_amd64.debsudo dpkg I cuda-repo-ubuntu2004-11-1-local_11.1.0-455.23.05-1_amd64.debsudo apt-key add / var/cuda-repo-ubuntu2004-11-1-local/7fa2af80.pubsudo apt-get updatesudo apt-get-y install cuda
The CUDA Toolkit contains open source project software, which can be found at here.
You can find checksums for installers and patches in Installer Checksums.
Performance & cost and benefit
Rapids Accelerator for Apache Spark benefits from GPU performance while reducing costs. As follows: * ETL for FannieMae Mortgage Dataset (~ 200GB) as shown in our demo. Costs based on Cloud T4 GPU instance market price & V100 GPU price on Databricks Standard edition.
Easy to use
No code changes are required to run previous Apache Spark applications. Start Spark with the RAPIDS Accelerator for Apache Spark plugin jar and open the configuration, as follows:
Spark.conf.set ('spark.rapids.sql.enabled','true')
Physical plan with operators runs on GPU
A unified AI framework for ETL + ML/DL
Single pipeline, from data preparation to model training:
Start using RAPIDS Accelerator for Apache Spark
Apache Spark 3.0 + provides users with plugin to replace SQL and DataFrame operations. No changes are required to API, which replaces SQL operations with the accelerated version of GPU. If this operation does not support GPU acceleration, the Spark CPU version will be used instead.
⚠️ note that plugin does not speed up direct operations on RDDs.
The accelerator library also provides an implementation of Spark's shuffle, which can be used to optimize GPU data transfers,keeping as much data on the GPU as possible and bypassing the CPU to do GPU to GPU transfers using UCX.
The GPU acceleration handles shuffle implementations that plugin does not require acceleration. However, if the acceleration SQL processing is not turned on, the shuffle implementation will use the default SortShuffleManager.
To enable GPU processing acceleration, you need:
Apache Spark 3.0 +
A spark cluster configured with GPUs that comply with the requirements for the version of cudf.
One GPU per executor.
The following jars:
A cudf jar that corresponds to the version of CUDA available on your cluster.
RAPIDS Spark accelerator plugin jar.
To set the config spark.plugins to com.nvidia.spark.SQLPlugin
Overview of Spark GPU scheduling
Apache Spark 3.0 now supports GPU scheduling just like cluster manager. You can ask Spark to request GPUs and then give it to tasks. The exact configuration depends on the configuration of cluster manager. Here are some examples:
Request your executor to have GPUs:
-- conf spark.executor.resource.gpu.amount=1
Specify the number of GPUs per task:
-- conf spark.task.resource.gpu.amount=1
Specify a GPU discovery script (required on YARN and K8S):
-- conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh
Review the details of the deployment to determine its methods and limitations.
Note that spark.task.resource.gpu.amount can be a decimal, and if you want to multiple tasks to be run on an executor at the same time and assigned to the same GPU, you can set it to a decimal less than 1. To correspond to the spark.executor.cores settings. For example, spark.executor.cores=2 will allow 2 tasks in each executor, and if you want 2 tasks to run on the same GPU, spark.task.resource.gpu.amount=0.5 will be set.
After reading the above, do you have any further understanding of how Spark 3.0 uses GPU acceleration? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.