In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/03 Report--
In 2015, UCloud first launched K80-GPU cloud host among domestic cloud manufacturers. Since then, we have successively launched GPU cloud hosts such as P40 and V100, customized physical machines, and GPU-based AI products such as UAI-Train and UAI-Inference to continuously create value for artificial intelligence users. Today, we're taking it a step further and introducing dedicated GPU Availability Zones. By refining the architecture, GPU price is reduced by 20% and bandwidth price is reduced by 64% compared with ordinary availability area, and 10G/25G physical network and VPC Private Cloud are supported. With exclusive performance, rich product interconnection, self-service purchase and monthly lease, users can avoid high investment in maintaining GPU cluster for AI training.
At present, Fujian GPU Availability Zone A is open to all users and supports direct purchase orders in the console.
Reduce costs by 20% and support monthly payments
GPU use cost is high, on the one hand, GPU card itself is very expensive, on the other hand, power consumption and cabinet cost almost occupy 40% of the overall cost, and this part of the cost can be effectively reduced. For this reason, UCloud selects the computer rooms with low power cost and meeting the basic standards in China to establish GPU availability zones. The GPU availability area launched this time is located in Fujian Province. It is a provincial backbone IDC room, which conforms to the international data center standard Tier3 and provides mobile lines.
UCloud's cloud computing core was originally designed for standard availability zones to support tens of thousands of servers and nearly 100 different cloud computing services. In order to improve the overall price/performance ratio, it took us a week to quickly customize the GPU availability area and launch a new version of the mini cloud computing core, internally codenamed "ant." Ant core reduces the cost of cloud control surface by more than 50%, and can still support complete physical cloud host and network products and provide stable services.
Benefiting from lower amortization costs for power consumption, cabinets, and cloud computing cores, the physical cloud unit price of GPU Availability Zones is 20% cheaper than other standard UCloud Availability Zones. Taking V100 physical cloud as an example, the list unit price of GPU availability area in Beijing II availability area E is reduced by 5000 yuan/month, which also has outstanding price advantage compared with the industry. UCloud also offers more cost-effective GPU models to choose from.
GPU Availability Zone billing model is consistent with other Availability Zones, physical machines support monthly and annual payment, can be released at any time. Users do not have to invest huge expenses at one time, and can freely increase or decrease the cluster size to cope with dynamic changes in the market. In addition, Fujian GPU Availability Zone provides mobile single-wire networks with bandwidth costs 64% lower than other Availability Zones.
Maximum single-precision floating-point performance 104 TFLOPs, exclusive physical machine
GPU Availability Area is supported by mature physical cloud product system. Compute, storage, and network performance without the overhead of virtualization. This is very important for AI training in scenarios where absolute performance is important.
A GPU physics machine can support up to 104 TFLOPs of single-precision floating-point performance, which is equivalent to about 2000 CPUs. 10G and 25G physical network environments are adopted. 25G network brings higher cluster computing efficiency. When the cluster size is ≥10 computing nodes, 25G is recommended. Double the overall performance compared to GPU cloud hosts provided by normal availability zones.
Physical cloud host products have achieved a high degree of automation in the process of background resource delivery and warehousing, system installation, etc., and support multiple mirroring and RAID modes. The installation operation will be automatically executed after the user clicks directly on the console, and the installation can be completed within 30 minutes, eliminating the lengthy process of transportation, construction, deployment and debugging of traditional physical machines.
Physical cloud hosting installed
For hardware failures that are difficult to avoid in physical machines, the UCloud hardware operation and maintenance team maintains a detailed firmware problem list, and initiates a network-wide firmware upgrade in time if any hidden trouble is found. Complete hardware testing will be automatically performed before the physical cloud host is delivered to the user and after the user returns the machine. In addition, the physical cluster has become a UCloud monitoring platform, which detects hardware problems such as disk failure and GPU card temperature in advance through monitoring, and notifies the NOC team to deal with them quickly (7*24 hours).
GPU physical cloud gateway has A/B two sets of mutual backup clusters, network traffic can be smoothly switched between AB clusters. Due to this architecture, the primary gateway can be quickly switched to the standby gateway in case of failure, minimizing the impact on users; and the network architecture can be smoothly upgraded through cluster switching. Through this set of modes, the physical cloud cluster in Beijing has realized online dynamic upgrade from 10G gateway to 25G gateway, and users are completely unaware except for network interruption during low peak period. Gateways to future GPU availability zones will also be able to keep up to date with this capability.
Physical Cloud Gateway Backup Cluster
Rich product connectivity
GPU Availability Zone supports standard networking products, including Elastic EIP, Private Cloud VPC, NAT Gateway, etc. In the future, it is planned to launch UDPN, a high-speed channel, and interconnection of Guangzhou available areas.
In AI training scenarios, how to move TB-level training data to the cloud is the core demand of users. UCloud provides a high-performance NAS enclosure shipping service that supports offline data migration of up to 100 terabytes. After the transfer operation is completed, the data will be erased in a low-level format to ensure data security.
Distributed training scenarios usually require a large central storage node. GPU Availability currently provides high-performance physical machines of SSD disks as storage nodes. UFS (Distributed File Storage) products will be available later to provide users with optimal storage options.
If you have any suggestions and questions about Fujian GPU availability area, please click http://ucloudtml.mikecrm.com/aiTDtNg to inquire.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.