Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The financial industry ushered in the era of big model, and the deposit and calculation infrastructure became the key to success.

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

At the end of last year, ChatGPT was born, which shocked users around the world with its powerful and accurate ability to understand and generate natural language.

Since then, various industries have joined the large model research and development competition, setting off a new round of technological innovation upsurge. This is especially true in the financial sector. It has become a hot topic for financial institutions that how to build a new type of computing and survival infrastructure facing the era of large models, and how to realize the transfer of large model capabilities to the financial field.

In which scenarios does the financial model have the opportunity to show its talents?

As a new infrastructure of AI, large model has a wide range of applications in the financial industry.

At the front desk, intelligent customer service is one of the most common application directions of AI in the financial field. Remember Jarvis, the AI butler in the Iron Man movie? The financial model will greatly improve the professional level and service ability of the account manager, greatly reduce the operating cost of the account manager, and let everyone have a 24-hour online professional account manager similar to Jarvis.

In China Taiwan, the AI model has the opportunity to change the way of knowledge acquisition, content creation, conference and communication, code development and testing in financial institutions, improve internal office efficiency, and even lead to changes in R & D testing mode, and improve the internal operational efficiency of financial institutions in an all-round way.

In the background, the large model will become the standard of intelligent technology base, greatly reduce the threshold of intelligent technology application, only a small amount of labeled data can make intelligent technology cover a wide range of scenes.

In a word, the AI model has excellent ability in content generation and creation, information summary and summary, knowledge understanding and Q & A, natural interaction and dialogue, and has a wide application prospect in the financial industry.

Vanka scale, trillions of parameters, large model has a "high threshold"

The fast iteration of large models requires efficient computing power and storage infrastructure.

On the one hand, computing power is the engine of the big model. The capacity of language and visual models and the corresponding demand for computing power are expanding rapidly. Behind the development of large financial models, there is a huge support of computing power. If we use PetaFlops (Computational Power equivalent), that is, the total amount of computing power consumed by a computer running trillions per second in a day, to measure the total amount of computing power required for artificial intelligence tasks, large model training requires hundreds or even thousands of PD of computing power support, which also means a huge computing cost.

Computing power is the core element of the development of large model

For example, the GPT-3 launched by OpenAI in 2020 requires at least tens of thousands of A100 GPU, and the total computing power of a model training consumes about 3640PD. For another example, the "source" Chinese language model launched by tide information has nearly 250 billion model parameters, and the computational power consumption is up to 4000PD. For example, at present, the computational power equivalent of GPT-4 and PaLM-2 has reached dozens of times that of GPT-3. In addition, Google is developing the next generation of multimodal large model Gemini, which has five times as much training to crush GPT-4.

The rapid rise of AI computing power consumption and limited IT budget put most financial institutions in a dilemma: want to build a big model, but lack of resources, high cost pressure, lack of talent; do not build a big model, but have no choice but to sit back and watch the opportunity miss.

In this regard, divide and conquer may be a feasible way. The so-called sub-model is to divide the large model into the general model and the industry model. Financial institutions do not have to build their own general models, but based on third-party general models, on this basis focus on building large models of the industry. According to the "Research report on the Standard system and capability Architecture of Industry large Model" issued by the Institute of Information and Communication, the general large model lacks professional knowledge and industry data, and the cost of construction and training is very high, so it is difficult to achieve commercial use. In order to better solve the problem of specific industry, the industry model came into being. The industry large model can meet the needs of specific scenarios, better provide quality services for the industry, and promote the intelligent transformation and upgrading of the industry.

Guo Lei, an expert on AI server products of Chaochao Information, said, "Financial institutions can concentrate their resources on the big model of the industry, not 'digging a ditch one meter deep in a kilometer of ground', but 'digging a kilometer deep in a place of one meter'."

Four stages of large model training

Specifically, the first stage of large model training is the unsupervised pre-training stage, the training cycle often lasts from tens of days to several months, requires thousands of GPU cards to calculate at the same time, the calculation is huge, the training time is very long, and the trained model is the basic language model. Financial institutions can acquire basic language capabilities through the use of open source platforms or third-party cooperation (such as the "source" model of tide information). The second to the fourth stages are supervised fine tuning, reward model training and intensive learning, which require dozens or even hundreds of GPU cards to be calculated at the same time, and the scale of computing power consumption and the length of training time are significantly lower than those of the first stage, so financial institutions can train in these three stages to build a large model with the advantages of the financial industry.

On the other hand, the computational power of the large model is far from enough, and it also depends on the data scale and data quality.

The advantage of large model lies in the ability of collecting, extracting and analyzing massive information, which is difficult for human beings to reach.

Evolution of parameter scale of large model

In recent years, the number of parameters of general large model has increased rapidly. OpenAI released Gym reinforcement learning platform in 2016, and GPT-1 came out in 2018 with a model parameter of 117 million. After continuous iteration, the scale of GPT-4 parameter reached 1.76 trillion. Since Google released Transformer (6500 million parameters) architecture in 2017, BERT (2018, 300 million parameters) and T5 (2019, 11 billion parameters) have been released one after another, and the scale of parameters has gradually increased. Recently, Google released the generalist model PaLM-E, which is by far the largest visual language model in the world, with 562 billion parameters.

In the vertical industry, the data set of the large financial model needs to include professional knowledge such as financial research report, stock, banking, insurance and so on on the basis of the general large model. By adding a large number of financial dialogue data in the training process and specific pre-training tuning for the financial field, to improve its performance in the financial vertical field.

At the same time, multimodal and cross-modal become the norm, and the data types of large financial models become more abundant. Among them, unsupervised data, that is, raw data, the data format can be web pages, text or voice data; supervised data, that is, marked data, the format can be json or Query. In addition, in order to provide investors with services such as real-time market public opinion and risk forecasting, financial institutions also need to efficiently handle financial data such as financial industry news, stock trading, and even social comments. These huge, multimodal, real-time financial data new requirements and new features, the traditional centralized storage is difficult to cope with, need flexible and flexible new distributed storage architecture to support.

Thus it can be seen that with the evolution of the financial model, the architecture of the whole data center will change, and the whole stack scheme from the AI server to the storage and then to the network needs to meet the needs of the large model era.

How the infrastructure can be saved, calculated quickly, and spread steadily?

Only when the data can be saved, the computing power can be calculated quickly, and the network can be transmitted steadily, can the digital infrastructure give full play to the value of data elements, promote the landing of large model applications, and promote the prosperity and development of new business type.

In this regard, wave information is based on intelligent computing strategy to promote product innovation from the four aspects of computing power, algorithm, data and interconnection, so as to create a strong base for large models.

In terms of computing power, through the innovation practice of large models on the scale of hundreds of billions of parameters, Tide Information has built a leading solution of large model computing power system in the aspects of computing power cluster construction, computing power scheduling deployment, algorithm model development, and so on. assist in the training and development of large models. Among them, the latest generation of converged architecture AI training server NF5688G7 adopts GPU based on Hopper architecture, which is nearly 7 times higher than the measured performance of the previous generation platform large model, and supports the latest liquid cooling solution, which can achieve lower cluster energy consumption ratio and operating cost. The PUE is less than 1.15. taking a 4000-card computing center as an example, it can save 6.2 million kilowatt-hours of electricity and 1700 tons of carbon per year.

In terms of storage, the tide information generation AI storage solution uses a set of AS13000 converged storage to support the full-stage application of generated AI, providing all-flash, mixed flash, tape library and CD-ROM media, supporting file, object, big data, video and block protocols. Combined with the five stages of AIGC data processing: data collection, preparation, training, reasoning and archiving, tide information is supported by end-to-end data flow from the same set of storage to meet the needs of text, audio, image, video, code and other multimodal data storage and processing.

Tide information storage product

At the high-speed interconnection level of the cluster, the tide information realizes the full-line-speed networking of the whole cluster based on the native RDMA, and optimizes the network topology, which can effectively eliminate the computing bottleneck of hybrid computing and ensure that the cluster is always in the best state when training large models.

At present, large state-owned banks, joint-stock banks and some city commercial banks have carried out or plan to carry out research and development of large financial models, and AI computing and data infrastructure will usher in rapid development. According to IDC, the compound annual growth rate of intelligent computing scale in China will reach 52% in the next five years, and the growth rate of distributed storage will be twice that of the Chinese market. In the era of large model, financial institutions need to take the scene and architecture of AI as the starting point, combined with the data characteristics of each bank, to create a new generation of intelligent computing infrastructure.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report