Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Series IV of High performance Computing-FPGA,GPU and CPU in High performance Computing

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/02 Report--

1. Introduction to FPGAs, GPUs and CPUs in High Performance Computing

In recent years, GPUs traditionally used for image processing have been gradually explored for high-performance computing, and have achieved quite good results, reaching speeds of 5TFLOPs in single-precision floating-point operations and 1TFLOPs in double-precision floating-point operations. Today's best GPU processors (such as NVidea's Tesla K20 and K40) perform very well compared to other multicore processors (such as Intel Xeon Phi processors and some from IBM and Inter).

FPGAs have traditionally been used for single-precision fixed-point operations, but now they can also perform high-performance calculations on floating-point numbers, with single-precision floating-point numbers peaking at more than 1TFLOPs. However, the peak value of the operation does not represent the continuous operation performance of the device under certain conditions. For example, when calculating the FFT of level 2, the 80-teraflop continuous operation performance of Inter can only reach 2.73% of its peak performance (20GFLOPs). FPGAs operate at a lower frequency, with lower peak computing, but hardware optimization can be used to achieve better operational efficiency for specific applications, i.e. sustained performance can reach values closer to peak computing performance, while FPGAs are more power efficient than GPUs and CPUs.

A particular application will operate differently on different platforms, and the results can be evaluated based on performance, power consumption, power efficiency, operational efficiency, cost, and others. In this article, we analyze trends in peak performance and energy consumption for each period, and compare the sustained performance of the three for some scientific applications to find the best computing platform for a particular application.

2. Peak Compute Performance Trends 2.1 GPU

GPU was originally designed for image processing and has shown strong advantages in this respect. In the past 10 years, GPU has gradually been applied to the field of general computing, generally referred to as GPGPU. Based on its powerful parallel computing capabilities, its performance has long been comparable to multi-core CPUs in some other computing and analysis aspects.

Looking at the evolution of GPUs over multiple generations, we find that peak performance does not simply increase linearly, both for single-precision floating point operations and double-precision floating point operations. There is no way to describe the overall development of GPUs simply, because there are so many different GPU structures, so only some GPUs with the best performance in a certain year are selected for analysis.

The performance improvement between GPUs exceeds 1TFLOP per generation, and in some years the process technology does not improve, but the performance improves, indicating that the performance increase is not only related to new manufacturing processes but also to structural optimization. The performance gap between single precision and double precision has been reduced from 10 times the original difference to about 4 times the difference of the latest generation.

In terms of power consumption, GPU power efficiency (peak performance to thermal design power (TDP) ratio) also increased at steady state, increasing from 0.5 GFLOPs/W to GFLOPs/W for single precision and from 0.5 GFLOPs/W to 6GFLOPs/W for double precision. This means that GPUs deliver an amazing performance with an increased power efficiency.

GPU external storage bandwidth is also very high, Geforce 6800 bandwidth of 35 Gbytes/s, K20, K20X and K40 bandwidth of 208, 250 and 288 Gbytes/s.

2.2 multicore CPU

The peak performance of general-purpose CPUs has also improved significantly in recent years, and Figure 2 shows the peak performance of some of Intel's famous CPUs.

Intel's recently introduced Xeon PHI 7120P processor can achieve peak performance of 2416GFLOPs in single precision and 1208GFLOPs in double precision. Intel processor computing performance is improved by increasing the number of CPU cores. The power efficiency of these processors is relatively low compared with GPU. The power efficiency of the original 65nm technology CPU was 0.1 GFLOPs/W, and the power efficiency of the current 22nm technology CPU has increased to 9 GFLOPs/W in single precision and 4.5 GFLOPs/W in double precision.

Among Intel's processors, a multi-core processor released in 2008 can achieve a peak speed of 1 TFLOP at a frequency of 3.16GHz and a voltage of 1.07V. CPU or multi-core CPU memory bandwidth is also very high. For example, the Xeon PHI 7120P has a bandwidth of 352Gbytes/s, which is slightly higher than the most recent GPU.

2.3 FPGA

The peak computing performance of FPGA is determined by the multiplier and LUT resources it contains. Observing some products of Xilinx Company, it is found that the multiplier and LUT resources do not increase linearly (as shown in Figure 3). The XC7VX980T contains 3600 18*18 multipliers and 612000 LUTs, while the XC7V2000T contains 2160 multipliers and 1221600 LUTs.

If we want to analyze the peak performance of FPGAs, we need to consider three cases: only adders, only multipliers, and multipliers and adders. Adders can be implemented with LUTs only, while multipliers require different combinations of DSP and LUTs (combination structures M0, M1, M2). Multiplication and addition is really a combination of these multipliers and adders.

For double-precision peak operations, the best peak performance with adder operations only occurs in the XC7V2000T FPGA at 671GFLOPs; the best peak performance with multiplier operations only drops to 168GFLOPs; and the best performance with adder and multiplier combinations is 302GFLOPs, obtained by the XC7VX980T.

FPGAs have a power efficiency of more than 10GFLOPs/W, which is generally higher than that of CPUs and GPUs, and the power efficiency of FPGAs will continue to improve with the development of technology. For example, Altera's high-performance FPGA, which can reach 5TFLOPs, uses Intel's 14nm Tri-Gate with a power efficiency of 100GFLOPs.

2.4 Trends in peak computing performance

In order to better understand the relative development of GPU, CPU and FPGA, the following product parameters are selected to compare the best performance in some years, including single precision and double precision operations. The results are shown in Figure 6 and Figure 7.

In single-precision floating-point operations, GPU performance has always been far ahead. In 2011, FPGA and CPU performance improved greatly, while CPU performance continued to improve in 2013, while FPGA performance declined relative to GPU performance. Until 2013, FPGAs outperformed CPUs, but in 2013, multi-core CPUs such as Intel Phi appeared, so the situation changed.

For double-precision floating-point operations, we can see that GPU performance, in addition to poor performance in the previous years, has been leading the latter two since 2011. At the same time, the performance of CPU also exceeded FPGA after 2011, and the performance of CPU and GPU has been very close in 2013, only about 5% difference.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report