Cloud change 6: cloud trainers who make AI ubiquitous 10/22 Update SLTechnology News&Howtos

Cloud change 6: cloud trainers who make AI ubiquitous

2025-10-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

With the "Cloud change" series coming to an end, you can clearly feel that the current development direction of cloud services, no matter which form of delivery is IaaS, PaaS or SaaS, the industrial value it seeks must be inseparable from one characteristic, that is, "AI as a service", AI is service.

In this big wave, more and more enterprises are looking for ways to integrate AI into their own businesses and products, and countless developers are eager to release their creativity on the AI stage. However, in the face of huge data sets represented by deep learning, what if the self-built data center or personal computer cannot carry the "computing monster" of AI?

As a cloud service provider that generally exists in infrastructure, it is given a new role-AI trainer.

Fancy AIaaS can't do without the word "training".

With the popularity of cloud computing, a variety of AI capabilities appear in various industries in the form of "as-a-service". Last year, RightScale's cloud research report pointed out that enterprises are particularly focused on machine learning in the AI technology system. When asked what type of public cloud service they plan to use in the future, the vast majority of respondents chose machine learning, 12% said they were using the service, and 46% said they were testing or planning to deploy machine learning services.

At present, it seems that AI is mainly introduced into the industry by "as-a-service" in three forms: one is Chatbot, such as Apple Siri, Microsoft Cortana or Amazon Alexa, which can directly access the AI experience and liberate manpower after business integration; the second is API. AI models developed by cloud service providers, such as NLP, image classification, video recognition, etc., are integrated into their own platform in the form of application programming interface (API) to avoid development from scratch. At present, face recognition and speech translation are widely used in various forms. The third is the machine learning framework. Developers use the cloud access machine learning framework to build a model, and then train the model based on their existing data, which is more convenient and time-saving than the self-built algorithm model.

Obviously, these mainstream ways to make AI blossom in an all-round way still depend on one link, and that is training.

We know that although the vast majority of cloud service providers provide a variety of AI models to help various industries achieve intelligence. However, cloud service providers can not go deep into every fine grain of industrial texture, so highly customized data training is necessary if AI is to accurately match the actual needs when it hits the ground.

Even if cloud service providers have similar platform models for enterprise customers to call, a good model still needs to be scalable and trainable, that is, it can update itself at any time according to the actual data and constantly improve its performance. in order to truly become a magic weapon to improve quality and efficiency.

From this point of view, AI training services for enterprise and individual developers have almost become a key capability that cannot be bypassed by the public cloud.

What does AI training mean to the public cloud?

Today, deep learning and training on the public cloud is an important trend in artificial intelligence, but there are few cloud service providers capable of exporting cloud training services to enterprise and individual developers.

For example, Amazon has launched an AWS deep learning container, which makes it convenient for customers to customize AI training process; Google and Facebook have also launched training platforms suitable for their own deep learning framework TensorFlow; in China, Huawei, Baidu, Alibaba, Tencent and others have also brought customized AI training services to the cloud and integrated into their enterprise service solutions.

We know that deep learning is difficult to leave the support of big data and large-scale training, both of which are like closely integrated axles to promote the development of the algorithm to the direction of high performance and high precision, and then affect the AI process of the whole society. But at present, there are only a few public cloud vendors on the market that have similar services. Why cloud AI training how to "spring snow"?

A large part of the reason is that the training task of customized neural network often needs strong computing power, that is, GPU cluster. However, today, AI computing is still an expensive computing resource, and cloud training often releases computing resources without training to achieve flexible deployment. Service providers pay according to the actual computing consumption, and individual developers and enterprises can save the high cost of buying computing units or building their own data centers, thus greatly reducing the cost of AI landing.

At present, however, there are not many cloud training platforms that users can choose from. The main reason is that the GPU chip used for neural training is almost dominated by Nvidia, and it is very expensive for cloud service providers to set up a training platform. Later, Google and Huawei respectively launched their own large-scale computing units, which played a certain role in market checks and balances. But on the whole, the cloud chip in the training session is still difficult to meet a wide range of deployment needs.

Another concern is that the cloud giant's investment and innovation in the AI field happens to have the dual ability to output basic computing power and application tools. If most enterprises want AI, they still need to spend a lot of time and energy and manpower to familiarize themselves with the corresponding deep learning framework, label data, tune parameters, design fault tolerance, and so on. In a Vanson Bourne survey report on the state of enterprise artificial intelligence, 34% of enterprise IT decision makers said they did not have the right talent to support the successful deployment of technology, and 30% lacked the implementation budget.

For example, most small and medium-sized enterprises use the public cloud for super-large-scale AI training, a basic starting point is to try and verify the new ideas of AI entering the industry, so the time cost is very important, which needs a more efficient and scalable deep learning framework and special acceleration. Therefore, if you want to help enterprises reduce the learning threshold and risk cost of customized training, only a few willing and powerful head cloud technology giants can cut in.

It is also worth noting that whether it is an enterprise that needs a good financial report or a developer eager to embrace AI, the cloud platform faces a variety of training tasks, and the data resources received are likely to release themselves. Different programs and business models may correspond to different access patterns and storage structures. therefore, how to store, process, analyze and finally output training models based on any type of data requires the cloud platform to build and manage the data lake. to deal with all kinds of structured or unstructured data, and feed them all to the neural network. Obviously, if you want to accumulate such a large and full amount of data, the performance of the head player is better and more complete.

Generally speaking, AI training, as a necessary raw material forging process for this intelligent building, urgently needs a flexible all-round player "on call" to complete the meticulous craftsmanship of the special module on the spot and then retire, instead of handling the material in its place of origin and then transporting it to the construction site.

The "engineering team" with this flexible combat capability obviously has the key ability to compete for the market. This is why today almost all head cloud manufacturers have begun to export their cloud training capabilities, even at a loss.

So what does it mean for public cloud vendors to cut into the training services industry chain to the technological upstream of AI? Is it "connected" in the way of algorithm API and application? Or provide tools and computing platforms to be "integrated"? Or move on to "hard power" such as chips at the bottom?

If the ambition of a certain cloud is to really become the container and infrastructure of the intelligent era, and to build an omni-directional and three-dimensional AI technology system, then AI training, which integrates hardware computing power, software technology and ecological development, is a complex and long adventure, but it is a necessary investment and support for China's AI industry to enter thousands of industries.

On the one hand, cloud service providers need to open up their own computing resources, in order not to constrain others, they must force the semiconductor industry to upgrade itself. In particular, China's shortcomings, such as cloud training chips that undertake training tasks, computing units that specifically accelerate and improve performance for the deep learning framework, the release of high-precision basic models, and so on, are the necessary support for AI training, along with the industry development of cloud service providers to achieve linkage upgrading, is the current trend.

In addition, the combination of cloud distributed training and terminal model deployment is becoming a full-cycle model of the AI development process. Most of the proprietary models trained by enterprises using the computing power of public cloud and solutions need to be deployed and applied at the end and side, and cloud platforms are often needed for collaborative and comprehensive consideration in the process of "from hard to soft to hard". This also makes it possible to build an industrial closed loop from training to application. On the other hand, Chinese enterprises and developers, as well as key data and innovative applications of various industries can operate in the domestic cloud environment, which also has important strategic significance for industrial security at a time when the regional mood and environment are unstable.

From this, we can lead to a new topic: what are the capabilities of a good cloud-based AI training platform?

Let AI now fly into the cloud "magic hand" of ordinary people's homes.

AI began to enter the public view, with the deep learning technology represented by AlphaGo as the starting point. The role of cloud service providers is to constantly "materialize" the shadowy technology in the laboratory, turn tools into props, and use a pair of virtual and real "magic hands" with a combination of software and hardware to present AI amazingly in front of various industries and the general public.

Through these magic hands, we can reverse to understand what conditions are needed to support "cloud training" in the process of AI inclusive:

1. Continuous upgrades in computing performance. Computing power is the basic guarantee of cloud training. Two basic propositions are involved here. One is the absolute scale, that is, hardware computing ability. During training, the data will be assigned to many training machines, and then reassembled through feedback and flag variables, thus creating a complete training model, which poses a lot of challenges to hardware such as GPU drivers and the compatibility between underlying libraries. The second thing to consider is accuracy. Through the combination of network optimization and super parameters, the cloud platform can use a small amount of data to achieve excellent training results and high-performance models. For some small and medium-sized and micro developers, it is of practical significance to make it impossible.

two。 The development state of friendly mode. To put it simply, it is to reduce the training cost and learning threshold for developers. One way is to provide easy-to-use development tools and interactive interfaces. For example, the data sets trained by neural networks are often as large as 1PB's, and even using a 1G network to transmit takes nearly four months, and cauliflower is getting cold. some cloud giants can load 1PB data into data centers within 25 hours with new transmission tools, such as Google's Transfer Appliance. There are also some automated and visual task management tools that can greatly liberate developers' repetitive work, such as one-stop hosting of training tasks, which can automatically track the training status of tasks and provide output log functions. Developers only need to monitor them in real time.

The second meaning of friendliness is the compatibility of cloud platforms. We know that there are many deep learning frameworks, and developers need to complete specific training and inference tasks under different frameworks, so the inclusiveness of the cloud platform is very important. For example, the new AWS container can support different machine learning architectures such as Google's TensorFlow, Apache's MXNet and Facebook's PyTorch. Huawei's newly released Atlas intelligent computing platform also aims to solve the problems of computing power and compatibility among Chinese enterprises and developers. This means providing targeted optimization and acceleration for each architecture, allowing specific model training speed to climb to a higher floor, which also helps to allay the concerns of enterprise developers.

3. Reduce cost and increase efficiency through each scene. On the one hand, cost control, as the core advantage of cloud training, is indispensable in the whole development process. This means that the cloud platform needs to have reasonable scalability and flexibility, so that enterprises can easily get the AI resources they need and pay fees flexibly and reasonably, and if the pilot project is not successful, it can be easily shut down; after the project is successful, it can also easily expand the scale of resources.

In addition, after the completion of the training based on the original scene data, how to quickly extend the model to other business departments of the enterprise or industry as well as software and hardware is a difficult problem perplexing the AI development ecology. The development ecology of being able to centrally communicate with data and enable terminals and clouds to work together to complete complex task processing on a unified intelligent infrastructure will release more valuable energy in the future.

4. The security of cloud data training. Customized training means that enterprises and developers need to upload their own critical and sensitive data to the cloud, and multiple "tenant" tasks are carried out at the same time, so the secure isolation of data between different training tasks becomes critical. Otherwise, it will not only affect the accuracy and performance of the model, but also face the risk of data leakage in migration, training and storage.

On the one hand, the cloud platform needs to ensure the compliance of its own data and ensure that the algorithm does not fail because of the data policy restrictions of local regulations; at the same time, it also needs to deal with potential network attacks and adopt digital encryption and other means to achieve perfect and secure service invocation.

In general, cloud training enables AI to be hardened on the dual channels of software and hardware, and then truly adapt to the intelligent needs of thousands of industries in a low-threshold and applicable way. At the same time, we should see that cloud services need to cross towering peaks in order to map out the inclusive blueprint of AI that empowers countless industries and reaches all aspects of life. In the runway of this era, what is needed is not only the gorgeous words on the propaganda caliber, but also the sharpening of sweat and tears.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.