Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the TensorFlow based on cloud CPU and cloud GPU

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you what the TensorFlow based on cloud CPU and cloud GPU is like, the content is concise and easy to understand, can definitely make your eyes bright, through the detailed introduction of this article, I hope you can get something.

I have been doing some personal deep learning projects on Keras and TensorFlow, but training in deep learning using and other cloud services is not free. As a personal project research, I will pay extra attention to expenses and reduce costs. Because CPU instances are cheaper than GPU instances, and in practice, I found that in both cases, my training model posture is a little slow. Because I have delved into the pricing mechanisms for these two types of instances to see if CPU is more suitable for my needs.

Prices start at $0.745 per hour on Google Compute Engine. A few months ago, Google had as many as 64 vCPU instances of CPU on the modern Intel CPU architecture. More importantly, they can also be used on GCE, which can be available for 24 hours and can be terminated at any time, but cost about 20% of the standard instance price. The additional charge for preemptable n1-highcpu-64 instances with 64 vCPU and 57.6GB RAM and the use of Skylake CPU is US $0.509 per hour, which is about 3% of the cost of GPU instances.

If the model training speed of 64 vCPU is the same as that of GPU (or even slightly slower), then using CPU will be more cost-effective. This assumes that deep learning software and GCE platform hardware run at 100% efficiency, and if not 100%, it may be more economical to reduce the number and cost of vCPU.

Because GPU is a blade solution for deep learning hardware, there is no benchmark for deep learning libraries. Thanks to Google economies of scale, the existence of preemptive instances makes a huge difference in cost, so compared with using GPU, using CPU for deep learning model training is more cost-effective.

Set up

I already have real-world deep learning use cases, benchmark scripts for the Docker container environment, and result logs from TensorFlow and CNTK articles. By setting the CLI parameter, you can make some minor adjustments to the CPU and GPU instances. I also rebuilt the Docker container to support the latest version of TensorFlow (1.2.1) and created a CPU version of the container with the CPU-appropriate TensorFlow library installed.

There is an obvious CPU-specific TensorFlow behavior, and if you start installing pip (as recommended in the tutorials) and start training the model in TensorFlow, you will see the following warning in the console:

To address these warnings and benefit from / / optimization, we created and created to accomplish this task. When training the model in the new container, the warning is no longer displayed, and the speed is increased and the training time is reduced.

Therefore, we can use Google Compute Engine to test three main cases:

Tesla K80 GPU instance.

64 Skylake vCPU instances, of which TensorFlow was installed through pip (and tested by vCPU in 8-16-32).

64 Skylake vCPU instance, TensorFlow is compiled using the CPU instruction, (+ 8-16-32 vCPU)

Result

For each model architecture and software / hardware configuration, I calculated the total training time relative to the GPU instance training to run the model training of the provided test script. In all cases, GPU should be the fastest training configuration, and systems with more processors should train faster than systems with fewer processors.

Let's start with handwriting plus the common multi-layer perceptron (MLP) architecture and a dense fully connected layer. The shorter the training time, the better. All configurations below the horizontal dotted line are better than GPU; all configurations above the dotted line are worse than GPU.

Here, GPU is the fastest of all platform configurations, but there are some interesting phenomena, such as the similar performance between 32 vCPUs and 64 vCPUs, which is significantly faster than 8vCPUs and 16 vCPUs when compiling the TensorFlow library. Perhaps there is too much negotiation information between vCPUs, thus eliminating the performance benefits of more vCPUs, perhaps these costs are different from the CPU instructions that compile TensorFlow. Finally, it is a black box, which is why I like black box benchmark testing all hardware configurations rather than theoretical production.

Since the difference in training speed between different vCPU counts is the smallest, it does have an advantage by shrinking the vCPU. For each model architecture and configuration, I calculate the standardized training cost relative to the GPU instance training cost. Because the GCE instance cost is allocated proportionally (unlike Amazon EC2), we can simply calculate the experiment cost by multiplying the total number of seconds the experiment was run by the cost of the instance (per second). Ideally, we want to minimize costs.

The lower the better, the lower the number of CPU is more cost-effective for this problem.

Now, let's take a look at data sets that have the same numerical classification method as convolution neural network (CNN):

GPU is twice as fast as any CPU on CNN, but the cost structure is the same, except that the cost of 64 vCPU is lower than GPU. 32 vCPU training speed is faster than 64vCP.

Let's take a closer look at CNNs, take a look at the cifar-10 image classification dataset, and the model using deep covnet + a multilayer perceptron and ideal image classification (similar to the vgg16 architecture).

In this case, all cpu performs better in the compiled TensorFlow library than similar behavior in a simple CNN case.

The fasttext algorithm used on IMDb reviews dataset can determine whether a comment is positive or negative, and its classification speed is very fast compared with other methods.

In this case, GPU is much faster than CPU. The benefits of reducing the number of CPU are not so obvious. Although as an alternative, the formal fasttext implementation is designed for a large number of CPU and can better handle parallelization.

The Bidirectional long-short-term memory (LSTM) architecture is well suited for handling textual data such as IMDb comments but after my previous benchmark article I noticed that TensorFlow used LSTM's inefficient implementation on GPU so the difference was even more significant.

Wait, what? The GPU training of bidirectional LSTMs is twice as much as that of CPU configuration. (to be fair, the benchmark performs better and better with Keras LSTM default of implementation=0,CPU, while LSTM default of implementation=2,GPU performs better, but the gap between the two will not be great.)

Finally, the LSTM text generation of Nietzsche's work follows a pattern similar to other architectures, but does not have a significant impact on GPU.

Conclusion

Facts have proved that 64vcpu does not have economic benefits in the application of deep learning, and the current software and hardware architecture can not make full use of them, so 64 vCPU is always similar to 32vCPU, or even worse. In terms of training speed and cost, the training model using 16vCPUs + compiled TensorFlow seems to perform better. The 30-40% speed increase of the compiled TensorFlow library is a pleasant surprise. I'm surprised Google didn't provide a precompiled version of TensorFlow.

The cost advantage shown here is impossible only if it can be preempted. The cost of a typical high CPU instance of Google's computing engine is about 5x, so cost-effectiveness is completely eliminated.

One of the main implicit assumptions of using cloud CPU training methods is that you don't need ASAP training mode. In professional use cases, it may be a waste of time, but in personal use cases, a person may leave model training overnight, which is a very cost-effective choice.

The above content is what the TensorFlow based on cloud CPU and cloud GPU is like. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report