In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
The current situation of data center with the "new infrastructure" to 5G, artificial intelligence, industrial Internet as a new basic field, a large number of applications based on high-performance computing, such as machine learning, intelligent voice interaction, autopilot and so on emerge one after another. these applications have brought explosive growth of data and brought great challenges to the processing capacity of the data center.
Computing, storage and networking are the troika that drive the development of data centers. With the development of CPU, GPU and FPGA, computing power has been greatly improved. With the introduction of flash drive (SSD), the data access delay has been greatly reduced. However, the development of the network obviously lags behind and the transmission delay is high, which has gradually become the bottleneck of the high performance of the data center.
In the data center, 70 percent of the traffic is east-west traffic (traffic between servers), which is typically the process data flow for high-performance distributed parallel computing in the data center and is transmitted over the TCP / IP network. If the TCP / IP transfer rate between servers increases, so will the performance of the data center.
Let's take a look at the process of data TCP / IP transmission between servers to understand "where the time has gone" in order to "prescribe the right medicine".
The TCP / IP between servers is transmitted in the data center, and the process of server A sending data to server B is as follows:
1. CPU control data is copied from the App Buffer of A to the operating system Buffer.
2. CPU control data add TCP and IP message headers to the operating system (OS) Buffer.
3. After adding TCP and IP message headers, the data is transferred to the network card (NIC), and Ethernet message headers are added.
4. The message is sent by the network card and transmitted to the server B network card through the Ethernet network.
5. The server B network card unloads the Ethernet header of the message and transmits it to the operating system Buffer.
6. CPU controls the unloading of TCP and IP headers in the operating system Buffer.
7. CPU controls the unloaded data to be transferred to App Buffer.
From the process of data transmission, we can see that the data is copied many times in the Buffer of the server, and the TCP and IP headers need to be added / unloaded in the operating system. These operations not only increase the data transmission delay, but also consume a lot of CPU resources, which can not meet the needs of high-performance computing.
So, how to construct a high-performance data center network with high throughput, ultra-low latency and low CPU overhead? RDMA technology can do that.
What is RDMARDMA (Remote Direct Memory Access, remote direct address access technology) is a new memory access technology, which allows the server to read and write memory data of other servers directly at high speed without the time-consuming processing of the operating system / CPU.
RDMA is not a new technology, but has been widely used in high performance (HPC) scientific computing. With the development of high bandwidth and low latency in data center, RDMA is gradually applied to some scenarios that require high performance in data center. For example, the volume of Singles' Day transactions in a large online mall reached a record high of more than 500 billion in 2021, an increase of nearly 10 percent over 2020. Behind such a huge transaction volume is a huge amount of data processing, the online mall uses RDMA technology to support the high-performance network, ensuring the smooth shopping of Singles Day.
Let's take a look at RDMA's trick to enable the network to achieve low latency.
RDMA transfers the server application data directly from memory to the intelligent network card (solidified RDMA protocol), and the intelligent network card hardware completes the RDMA transmission message encapsulation, which liberates the operating system and CPU.
This gives RDMA two major advantages:
Zero Copy (zero copy): transmission latency is significantly reduced without the need to copy data to the kernel state of the operating system and process the packet header.
Kernel Bypass (Kernel Bypass) and Protocol Offload (Protocol offload): without the participation of the operating system kernel, there is no tedious header logic in the data path, which not only reduces the latency, but also greatly saves the resources of CPU.
At present, there are three types of RDMA networks: InfiniBand, RoCE (RDMA over Converged Ethernet,RDMA over converged Ethernet) and iWARP (RDMA over TCP, Internet wide area RDMA protocol). RDMA was originally dedicated to the Infiniband network architecture, ensuring reliable transmission from the hardware level, while RoCE and iWARP are Ethernet-based RDMA technologies.
InfiniBandInfiniBand is a network designed specifically for RDMA.
The Cut-Through forwarding mode (cut-through forwarding mode) is adopted to reduce the forwarding delay.
Credit-based flow control mechanism (credit-based flow control mechanism) to ensure no packet loss.
Network cards, switches and routers dedicated to InfiniBand are required, and the cost of building the network is the highest.
The RoCE transport layer is the InfiniBand protocol.
There are two versions of RoCE: RoCEv1 is based on Ethernet link layer and can only be transmitted at L2 layer; RoCEv2 is based on UDP to carry RDMA and can be deployed in layer 3 networks.
Need to support RDMA dedicated intelligent network card, do not need private exchanges and routers (support ECN / PFC and other technologies to reduce packet loss rate), the lowest cost of network construction.
The iWARP transport layer is the iWARP protocol.
IWARP is the implementation of TCP layer in Ethernet TCP / IP protocol, which supports L2 / L3 layer transmission. Large-scale networking TCP connections will consume a lot of CPU, so there are few applications.
IWARP only requires the network card to support RDMA, and does not need private exchanges and routers, and the network construction cost is between InfiniBand and RoCE.
Infiniband technology is advanced, but the price is high, and its application is limited to the field of HPC high-performance computing. With the emergence of RoCE and iWARPC, it reduces the cost of using RDMA and promotes the popularization of RDMA technology.
The use of these three types of RDMA networks in high-performance storage and computing data centers can greatly reduce the data transmission delay and provide higher CPU resource availability for applications. Among them, InfiniBand network brings extreme performance to the data center, and the transmission delay is as low as 100 nanoseconds, which is an order of magnitude lower than that of Ethernet devices. RoCE and iWARP networks bring ultra-high cost performance to the data center. Based on Ethernet carrying RDMA, it makes full use of the advantages of high performance and low CPU utilization of RDMA. At the same time, the cost of network construction is not high. RoCE based on UDP protocol has better performance than iWARP based on TCP protocol. Combined with the flow control technology of lossless Ethernet, the problem of packet loss sensitivity is solved. RoCE network has been widely used in high-performance data centers in various industries.
Conclusion with the development of 5G, artificial intelligence, industrial Internet and other new fields, the application of RDMA technology will become more and more popular, and RDMA will become a major contributor to the high performance of the data center.
This article comes from the official account of Wechat: ZTE documents (ID:ztedoc)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.