How to carry out big data's distributed Deep Learning 07/13 Update SLTechnology News&Howtos

How to carry out big data's distributed Deep Learning

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how to carry out big data distributed in-depth learning, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Why do you need distributed training?

On the one hand, it is forced to do so, such as: the amount of data is too large, the data can not be loaded or the model is too complex, a GPU can not put parameters. On the other hand, distributed training can be used to improve training speed.

Distributed training strategy

Model parallelism: when the model is too large, different layers of the model need to be placed on different nodes or GPU. The computational efficiency is not high and is not commonly used.

Data parallelism: divide the data into multiple parts, and each data is calculated and updated separately, which is efficient and commonly used.

Distributed parallel mode

Synchronous training: all processes are unified after the completion of the calculation of the gradient, unified reverse update.

Asynchronous training: each process calculates its own gradient and copies the parameters of the master node to update, which is easy to cause confusion and fall into the suboptimal solution.

Distributed training architecture Parameter Server parameter server

One parameter server and multiple worker,server in the cluster need to wait for all nodes to calculate the gradient uniformly, update the parameters on server, and then broadcast the new parameters to worker.

The basic process is as follows:

Woker loads data, trains, updates gradients

Upload gradient to server

Server aggregates gradients and updates parameters

Woker will pull the latest parameters for the next training.

Ring All-Reduce

Only worker, all worker form a closed loop, accept the gradient of the upper home, then transmit the accumulated gradient to the next home, and finally update the gradient of the worker on the whole ring (so that the gradients of all worker are equal), and then find the gradient back propagation. More efficient than the PS architecture.

The algorithm is mainly divided into two steps:

Scatter-reduce: the gradients of each other are gradually exchanged and fused, and eventually each GPU contains part of the full fusion gradient.

Allgather:GPU will gradually exchange incomplete fusion gradients, and finally all GPU will get complete fusion gradients.

# distributed deep learning framework

Elephas https://github.com/maxpumperla/elephas

Elephas is an extension of Keras that allows you to run a distributed deep learning model on a large scale using Spark. In principle, Elephas is shown in the following figure:

Elephas uses Spark's RDD and Dataframe to implement a data parallel algorithm based on Keras. The Keras model is initialized on Spark Driver, then serialized, and sent to Spark Worker along with data and broadcast model parameters. Spark Worker deserializes the model, then trains the data blocks, and sends the trained gradient back to the optimizer on Driver,Driver to update the gradient synchronously or asynchronously, and then update the master model. Summary:

It is very simple to use, spark_model = SparkModel (model, frequency='epoch', mode='asynchronous') one line of code can achieve its function.

Some users find that Driver will fail to store data, and under the same epoch, the accuracy will be greatly reduced.

Due to the use of PS architecture, Driver has relatively high memory requirements, officially requiring at least 1 gigabyte of memory

TensorFlowOnSpark https://github.com/yahoo/TensorFlowOnSpark

TensorFlowOnSpark was developed by Yahoo for large-scale distributed deep learning on Hadoop clusters in the Yahoo private cloud. TensorFlowOnSpark has some important advantages (see our blog):

Existing TensorFlow programs can be easily migrated with less than 10 lines of code change.

Supports all TensorFlow functions: synchronous / asynchronous training, model / data parallelism, reasoning and TensorBoard.

Server-to-server direct communication can learn faster when available.

Allow datasets on HDFS and other sources pushed by Spark or TensorFlow.

Easily integrate with your existing Spark data processing pipeline.

Easily deployed in the cloud or on-premises as well as on CPU or GPU.

Summary:

TensorFlowOnSpark is still a structure that uses parameter servers in parallel with data.

IO takes a lot of time, and some users report For 10 epochs of training, it took about 8.5 hours on a Yarn-Spark cluster with 2 nodes and 2 GPU, but the O took more than 3 hours.

Dist-keras https://github.com/cerndb/dist-keras

Dist-keras (DK) is a distributed deep learning framework based on Apache Spark and Keras. Its goal is to use distributed machine learning algorithms to significantly reduce training time. The way the framework is designed is that developers can easily implement new distributed optimizers so that people can focus on research and model development.

Follow the data parallel method described in the large-scale distributed deep network paper. In this example, copies of the model are distributed across multiple trainers, and each copy of the model will be trained on different partitions of the dataset. After each gradient update, the gradient (or all network weights, depending on the implementation details) will communicate with the parameter server. The parameter server handles all Worker gradient updates and merges all gradient updates into a single master model, which will be returned to the user after the training process is completed.

Summary:

It is easy to use and has few functions. It mainly realizes the distributed optimizer.

Personal development, currently the project has been filed, no longer active development

Horovod https://github.com/horovod/horovod

Horovod is another deep learning tool of Uber open source. Its development absorbs the advantages of Facebook "Training ImageNet In 1 Hour" and Baidu "Ring Allreduce", and can help users to realize distributed training. This article briefly describes how to use Horovod with pytorch to do distributed training more efficiently.

Summary:

Uber development, active community

Supports multiple deep learning frameworks, TensorFlow, as well as Pytorch and Keras

The distribution of horovod seems to only support synchronous update data parallelism, while model parallelism and asynchronous update data parallelism do not.

Determined-ai https://determined.ai/product/

Determined AI is a distributed training platform for deep learning. Algorithmically speaking, determined is based on horovod and uses horovod for deep learning. In addition, use more advanced hyperparameter adjustments to find a better model. With the help of intelligent GPU scheduling function, GPU scheduling efficiency and performance are improved. And the installation is convenient, and the graphical interface is fully functional.

Angel https://github.com/Angel-ML/angel

Angel is a high-performance distributed machine learning platform based on the concept of parameter server (Parameter Server). It is tuned repeatedly based on the massive data within Tencent, and has extensive applicability and stability. The higher the dimension of the model, the more obvious the advantage. Angel, jointly developed by Tencent and Peking University, takes into account the high availability of industry and the innovation of academia.

The core design philosophy of Angel revolves around the model. It divides the high-dimensional large model into multiple parameter server nodes reasonably, and easily implements a variety of efficient machine learning algorithms through efficient model update interfaces and operation functions, as well as flexible synchronization protocols.

Angel is based on Java and Scala development, can be directly dispatched and run on the Yarn of the community, and is based on PS Service, supports Spark on Angel, and will support the integration of graph computing and deep learning framework in the future. Summary:

Domestic, the community is active, but there is little algorithm support at present.

Programming with Scala without resorting to Python's deep learning framework

Support for multiple parameter server models

BigDL

BigDL is an open source Apache Spark-based distributed deep learning library of Intel. With BigDL, users can use their deep learning application as a standard Spark program, which can run directly on existing Spark or Hadoop clusters. Properties:

Rich deep learning support. BigDL mimics Torch by providing comprehensive support for deep learning, including numerical computation (through Tensor) and high-level neural networks. In addition, users can use BigDL to load pre-trained Caffe or Torch models into Spark programs.

Extremely high performance. To achieve high performance, BigDL uses Intel MKL and multithreaded programming in each Spark task. Therefore, it is several orders of magnitude faster than out-of-the-box Caffe, Torch, or TensorFlow on a single-node Xeon.

Scale out effectively. Through the use of Apache Spark and efficient implementation of synchronous SGD, BigDL can comprehensively reduce communication on Spark, effectively scale out, and perform data analysis on the "big data scale".

The above content is how to carry out big data's distributed in-depth learning. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.