Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Celebrities explore the opportunities of the AGI era together, and Tencent Cloud helps to speed up the large-scale application of large models.

2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

In 2023, the "top stream" of the technology circle is no more than the big model. Since the advent of ChatGPT opened the prelude to the development of large model and generative AI industry, domestic large model has quickly followed up, completed the stage from technology to product, and then to business, and went deep into the field of vertical industry.

The outbreak of new technology gives birth to new application scenarios and product models, and leverages the intelligent changes that affect the whole industry. Under the rolling trend, as practitioners and entrepreneurs, what kind of opportunities and challenges will they face, and how to break the situation and usher in a new era of AGI?

Recently, the Tencent Cloud TVP AI Innovation Seminar on "opportunities and challenges in the Big Model era" was held in Tengyun Building in Shanghai, where top AI leaders were invited to share and discuss hot topics around the Big Model, and jointly explore the future wind direction of the Big Model era.

Big Model-- Technology, value, Ecology

Zhang Jiaxing, a lecturer at the Center for Cognitive Computing and Natural language Research at IDEA Research Institute and a professor at Tencent Cloud TVP, brought the theme of "Big Model-Technology, value, Ecology" to share.

Talking about the course of the birth of the GPT model, starting from more than ten years of experienced research experience in the field of deep learning, teacher Zhang Jiaxing uses the four main lines of model structure, training technology, computing + system and data to explain the trend behind the development of the whole technology, and focuses on sharing several key nodes:

● model structure innovation: the rise of deep learning promotes the innovation of model structure, in which Transformer structure plays a key role. It breaks through the bottleneck of 100 million parameters of the model, unifies the trial methods of various attention mechanisms, and solves the difficult problem of task design.

Breakthrough in ● training technology: the landmark event is the 2018 BERT model. Teacher Zhang Jiaxing believes that the model structure is the physical basis, and the training technology makes artificial intelligence have specific capabilities.

Progress in ● computing and data: the underlying chip continues to improve, and its performance has been improved by more than 100 times.

Teacher Zhang Jiaxing pointed out that any major change in technological paradigm is a disappearance of types, or a process towards unification, and the big model is such a new technological paradigm change. After the emergence of ChatGPT, the model structure tends to be unified, and then it will quickly "diverge", and the whole technical field will re-divide the labor, promoting the formation of a new production chain. This change marks that the big model will become a new industry.

In the whole process of technological paradigm change, the direction of the model developed by the team led by teacher Zhang Jiaxing is also changing, from the initial divine list to the construction of Jiang Ziya series of expert models. Teacher Zhang Jiaxing analyzed that there are certain challenges in building a large model of full capacity, and there may be conflicts and incompatibility between different abilities, so each ability is divided into an independent model so that it can focus on the development of each ability. By customizing targeted training strategies, we can achieve the best performance of each ability.

Teacher Zhang Jiaxing believes that in the competitive pattern of "hundred model wars", the exploration of training techniques is extremely important. He stressed that the training technology itself is a process of exploration. Explore a good generation way in the process of training, and guide the development of the model in human feedback learning.

In terms of large model application products. Teacher Zhang Jiaxing put forward the idea of layer-by-layer encapsulation from the expert model to the client:

The first layer of packaging is integrated packaging: including code models and fine-tuning, applications and efficient reasoning tools, and set up a variety of usage scenarios

The second layer of encapsulation is the integration of model and computing power: Mr. Zhang Jiaxing is working with Tencent Cloud in this regard to actively promote the combination of model and computing power in a large model product to provide customers with "out of the box".

The Paradigm and thinking of technological Innovation in the AGI era

Li Jianzhong, chief technical expert of Boolan, chairman of the Global Machine Learning Technology Conference and Tencent Cloud TVP teacher Li Jianzhong, delivered a keynote speech entitled "technological Innovation Paradigm and thinking in the AGI era".

Teacher Li Jianzhong first combed the time line of the development of technology from the perspective of industry. he believes that connectivity and computing have undergone revolutionary changes from 1.0 to 2.0. The 100 years from 1840 to 1940 was the era of connection. Telephone, radio and television were born one after another after Telegraph, which was the earliest connection technology. The first generation of computers appeared in 1946, followed by mainframes, minicomputers, microcomputers and PC. Then, with the advent of the Internet in 1995, Web2.0, mobile Internet, and cloud services came out. This is the era of connection 2.0, which has changed from one-way to two-way compared to the previous generation. Then, with the emergence of the Transformer structure in 2017, the iteration of GPT is the era of computing 2.0, which will continue. Mr. Li Jianzhong believes that according to the curve of past technological development, this time will last until around 2035.

At the same time, teacher Li Jianzhong analyzed and pointed out that in the process of technological development, there is a "pendulum" state of connection and calculation. The relationship between the two, he believes that the connection is to solve the relations of production, while computing is to solve the problem of productivity. The logic of connection mode is to provide information for users to make decisions, which is the natural soil of advertising, while the logic of computing model is to require users to provide data to the machine to help make decisions, and its business model tends to charge. Under the computational logic, efficiency is given priority and results are paramount.

Teacher Li Jianzhong proposed a "cube" model of paradigm transformation, in which the X axis represents human needs, such as information, entertainment, search, social networking, business, and the Y axis represents the technology platform. that is, connection 1.0, computing 1.0, connection 2.0, computing 2.0 X axis represents media interaction, such as text, pictures, audio, video, three-dimensional and so on. He believes that the intersection of demand and technology is the key to innovation, while emphasizing the impact of media changes on products and innovation. In the intelligent era, filling different quadrants represents different directions, such as the combination of large models and different fields, which provides new ideas for its innovation and product development.

Based on this, Mr. Li Jianzhong summed up that the big model has four core competencies:

● generation model: it is the most mature and powerful part, and it can generate all kinds of content.

● knowledge abstraction: compressing human knowledge and bringing innovation to knowledge-intensive industries

● language interaction: it is the core of man-machine dialogue, and there is a huge imagination space.

● logical reasoning: have the ability of logic, planning and memory, and become incarnate intelligence.

What kind of innovation opportunities will be brought by combining with different fields with the core competence of large model as the fulcrum? Teacher Li Jianzhong puts forward two main directions based on the application layer of the large model: AI-Native and AI-Copilot. AI-Native is a new product or service that is fully integrated into AI, with high risk and high return. AI-Copilot, on the other hand, embeds AI capabilities into the existing commercial closed loop and is compatible and extended with the existing infrastructure in a progressive way.

Similarly, in the field of software, Mr. Li Jianzhong shared the three paradigm shifts brought about by the big model for software development:

● development paradigm: the big model will change the way the code is written, from engineer writing code to AIGC generating code.

● interaction paradigm: from graphical interactive interface (GUI) to natural language interactive interface (NUI), including NUI+GUI collaboration, the transformation of channel structured input intermediate links, and the removal of barriers between isolated applications to achieve seamless integration of applications and services.

● delivery paradigm: that is, users co-create malleable software, this openness will make the functional scope of the software become more extensive.

Teacher Li Jianzhong believes that in the next three to five years, the maturity of the entire AGI industry will reach a new height, bringing great opportunities for innovation.

Using ubiquitous hardware computing power and open software to unlock generative artificial intelligence

Intel Academician, big data Technology Global CTO, Tencent Cloud TVP Dai Jinquan teacher, brought the theme of "using ubiquitous hardware computing power and open software to unlock generative artificial intelligence".

Mr. Dai Jinquan first shared the work of the Intel team in the field of generative artificial intelligence. He mentioned that among the many factors affecting generative AI, computing power is a very important supporting factor. Intel has targeted at how to improve the efficiency of end-to-end AI pipelining and how to optimize AI acceleration.

Through the combination of software and hardware, Intel has successfully improved the speed of deep learning of AI, and can even achieve a free software AI accelerator; in terms of generative AI computing acceleration, Mr. Dai Jinquan mentioned that the data center side is the key point, which will strongly support the training of large models and super-large-scale reasoning.

In Intel's recently released Gaudi2 Deep Learning Accelerator, work with Hugging Face to optimize the model. At the same time, Intel added Intel AMX to the server, which consists of two parts: one is 2D register file, and the other is matrix acceleration support. Teacher Dai Jinquan mentioned that the advantage of this is the ability to achieve hardware acceleration on the general CPU server, which makes sense in the general computing scenario.

According to the needs of the industry that how to ensure the security of user data stored in the cloud and the large model of privatized deployment, teacher Dai Jinquan shared that full-link privacy protection can be achieved through hardware protection and software security technology. ensure that data and models are not visible to other users in the computing process, and only calculate in a hardware-protected environment, which not only ensures security, but also approaches the efficiency of plaintext computing.

In order to realize the ubiquitous vision of AI, Intel recently opened up a large model reasoning library based on INT4 on Intel CPU, which supports large models with more than 10 billion parameters on Intel. Mr. Dai Jinquan introduced and demonstrated its features:

● supports multiple technologies such as INT3, INT4, NF4, INT8, etc.

● technology is easy to use and migrate, accelerates any large PyTorch-based model, and enables efficient optimization

Existing applications can be migrated with one or two lines of API; code commonly used in the ●-compatible community.

Finally, Mr. Dai Jinquan expressed his expectation for the future trend of seamless expansion of large model applications from PC to GPU to cloud. This new application scenario is worth exploring together.

Facing the large model, how to build the strongest computing power cluster on the cloud

Qi Yuanhai, head of research and development of Tencent Cloud high-performance computing, brought the theme of "facing large models, how to build the most powerful computing cluster on the cloud".

First of all, Qi Yuan Hajj teacher introduced deep learning and AI distributed training. He mentioned that in order to solve the problems of large corpus data set and sharp increase of model parameters in large model training, distributed computing is needed. In this regard, Mr. Qi Yuan shared some distributed computing schemes in the current large model training:

● data parallelism: split according to the data set of the model and sent to each GPU for calculation, each GPU calculates its own gradient, and then performs global synchronization to update the model parameters

● model parallelism-pipeline parallelism: split according to the level of the model, different parts will be assigned to different GPU for calculation, gradient calculation and transfer

● model parallelism-tensor parallelism: the model is segmented more finely, and the parameter weights of the model are segmented horizontally or vertically.

In addition, it is like experts in parallel, which is composed of various expert systems and routed to different systems for calculation.

Qi Yuanhai teacher mentioned that distributed computing can make full use of the computing resources of multiple GPU, speed up the training speed, and solve the problem of insufficient memory in a single GPU. Different methods are suitable for different scenarios and model structures, and choosing appropriate parallel strategies can improve training efficiency and performance.

Distributed training methods have higher requirements for network communication, and 3D parallel methods are mostly used in the industry, especially in 3D parallel scenarios, the bandwidth requirement is sensitive to throughput. In training, in order not to let the network become the bottleneck of computing, the communication bandwidth between machines needs to reach 1.6Tbps.

To meet the above challenges, Tencent Cloud launched the AI computing base-high-performance computing cluster HCC, which can be widely used in artificial intelligence model training scenarios such as large models, autopilot, business recommendation systems, image recognition, and so on. It has the following advantages:

● with high-performance GPU: provides powerful computing power

● low latency RDMA network: the node interconnection network is as low as 2us, and the bandwidth supports 1.6Tbps-3.2Tbps

There is no need to bypass the calculated data of ● GpuDirect RDMA:GPU, and it is directly connected point to point across machines.

● TACO training acceleration kit: one click to improve the performance of artificial intelligence training.

Tencent Cloud's first H800 computing cluster uses a multi-track traffic architecture, which can greatly reduce unnecessary data transmission and improve network performance, and is in a leading position in the industry.

In addition to hardware support, Tencent Cloud also provides a self-developed collective communication library TCCL. Thanks to the self-developed switch architecture, TCCL implements end-to-end network cooperation, solves the problem of uneven traffic load, and can increase traffic by about 40% in a dual-port environment. At the same time, topology-aware affinity scheduling is provided to minimize traffic bypass. It has the ability of dynamic perception and can assign tasks according to the optimal order to avoid communication data congestion.

Qi Yuan, a Hajj teacher, mentioned that Tencent Cloud's solutions all adopt a double-connected network design structure, which is more available than single-port training. In terms of data storage, Tencent Cloud provides Turbo CF5 file storage solution and COS solution to improve data access performance through multi-level acceleration.

At the same time, to improve users' computing power utilization, Tencent Cloud launched TACO Kit acceleration Suite, which reduces data moving back and forth and speeds up parameter updates through unified management of memory and video memory. There is also TACO lnfer reasoning acceleration, which makes supporting reasoning transparent and accelerated, and brings users a better experience service.

Qi Yuanhai teacher concluded that Tencent Cloud high-performance computing cluster HCC solution can help users complete each training task quickly and continuously from many aspects, such as data reading, training computing, network exchange, and provide complete process support for users' cloud training.

Discuss the debate link

After the topic sharing, the host, Shen Xin, a technical expert from the low Code / No Code Promotion Center of China Information and Communication Institute and Mr. Shen Xin, TVP of Tencent Cloud, made a wonderful summary. He mentioned that the core and key impact brought about by the development of the large model is the change in production relations. For example, the question of "will programmers disappear?" programmers can be compared to horse drivers in the era of horse-drawn carriages, and there will still be horse owners, but they have been eliminated by drivers. The software development industry will be reshaped by AI, which is an iterative and changing challenge for future programmers.

This was followed by a sparkling discussion and debate. The host, Mr. Shen Xin, put forward four deep open topics and two debate topics, and the guests at the scene fully discussed each topic in the form of a group, and collided with many wonderful viewpoints in the heated exchange and debate.

Topic 1: with the development of the big model, what kind of AI ecology will be formed in the future, and how will it affect the pattern of the IT industry?

The speaker from the second group, Shengpai Network founder and Chief architect and Tencent Cloud TVP teacher Su Zhenwei, proposed that AI will reshape the ecology and business model of the entire software industry in the future, including the current form of software applications, the mode of Internet operation, the way users pay, and so on. At the same time, as AI further promotes the development of productivity, it can be predicted that the demand for personnel in enterprises will change greatly in the future, and programmers will be reduced to a certain extent.

Teacher Su Zhenwei further concluded that AI will affect our future business and work in three major ways: AI promotes changes in production efficiency and affects changes in productivity and production relations; changes in the way knowledge is acquired and used to improve efficiency; AI will become a part of assets, data rights and other issues are worthy of attention.

What are the differences and advantages between private deployment and cloud deployment of 2:AI computing power, and which scenarios are more suitable respectively?

The speaker of the third group, Meituan Financial Services platform researcher and Tencent Cloud TVP teacher Ding Xuefeng, compared the private deployment and cloud deployment of AI computing from the perspectives of cost, security and flexibility.

● from the cost point of view: cloud deployment for small and medium-sized enterprises, both in terms of hardware investment and maintenance are more in line with the current needs of enterprises to reduce costs and increase efficiency

● from a security perspective: he believes that some industries, such as the financial sector, have extremely high security and compliance requirements, and privatization deployment is more suitable.

● from the perspective of flexibility: public cloud can not only provide computing power on demand, but also provide one-stop solution for mature scenarios. Users can choose the appropriate use method according to their actual needs, and cloud deployment is more recommended in scenarios that meet security and compliance.

Topic 3: how should enterprises measure the value of AI, how to quantify the cost structure and value, and what are the cases in different businesses?

The speaker from the fourth group, Tencent Cloud TVP teacher Xu Wei, proposed the following five evaluation dimensions: whether to create value for the enterprise, save costs, improve enterprise productivity, improve customer satisfaction, and promote business growth. Xu Wei added that different enterprises and industries face different challenges and goals, so evaluating the value of AI needs to be comprehensively considered in the light of its specific circumstances and objectives.

At the same time, in terms of ToB and ToC business scenarios, in the field of ToB, intelligent customer service, digital human, AI knowledge base and enterprise training have been applied by many enterprises; in the field of ToC, AI generation is the mainstream application scenario.

When it comes to the cost composition of AI, teacher Xu Wei believes that it mainly includes computing costs, AI technology development and maintenance costs, as well as AI product operation and promotion costs.

Topic 4: in the craze of big models, what are the innovation opportunities for large companies and startups respectively?

The speaker of the first group, Chief Technical expert of Boolan, President of the Global Machine Learning Technology Conference and Tencent Cloud TVP teacher Li Jianzhong, believes that from the perspective of data advantages, the current innovation in AI is friendly to large companies or mature companies, but from an open source perspective, he thinks it is more friendly to startups.

Teacher Li Jianzhong explained with the product development model that the AI-Native model is more suitable for start-up companies, because they have a new starting point and mode of thinking in the face of new things, and the investment of some start-ups is not weaker than that of large companies.

Topic 1: open source of future big models is the mainstream or closed source is the mainstream?

The speaker of the first group, Chief Technical expert of Boolan, Chairman of the Global Machine Learning Technology Conference, and Tencent Cloud TVP teacher Li Jianzhong, is the "open source side". He first defines the term "mainstream": the most users are the mainstream; he believes that compared with closed sources, open source can achieve good standardization of edge layer and model layer. At the same time, open source can bring together the efforts of the whole industry to optimize at one point, bringing more resources and investment.

Subsequently, a speaker from the second group, Shengpai Network founder and Chief architect, Tencent Cloud TVP Su Zhenwei, refuted the definition of "mainstream" as a "closed source". He believed that the mainstream was the mainstream that could really influence the change of the entire industry and form a lasting cycle in business, with a healthier ecology, and took closed-source ChatGPT4 as an example. He stressed that the large model includes the model itself and the data source, so the open source of algorithms and results does not represent the open source of the large model, and gives examples of the various limitations of Lama2. Teacher Su Zhenwei believes that some of the current so-called open source frameworks are used as marketing tools, which goes against the true spirit of open source.

After that, Li Jianzhong, a teacher from the "open source side", made a targeted rebuttal, first correcting the other side's "open source marketing theory" and emphasizing that open source is an ecological revolution. At the same time, in the case of ChatGPT4, he believes that it originally originated from Google's open source, and OpenAI is also preparing for open source.

Su Zhenwei, a teacher of "closed Source", later added that he does not deny the ecological revolution of open source, but in fact, many open sources are commercial activities to seize market share under the pressure of competition. At the same time, he said that knowledge sharing does not mean that it is open source.

Debate topic 2: more optimistic about the general model track or vertical model track?

The speaker of the third group, Meituan Financial Services platform researcher and Tencent Cloud TVP teacher Ding Xuefeng, is more optimistic about the GM large model racetrack. He believes that from a larger and higher historical perspective, the development of the general large model is inevitable, and the limitations of the vertical large model can be avoided at the application layer. At the same time, with the continuous expansion of the learning scope of the general large model in the future, the current vertical fields will be covered.

More optimistic about the fourth group speaker of the vertical large model track, Tencent Cloud TVP teacher Xu Wei expressed his point of view from three angles: from the business model point of view, the vertical large model has rich application scenarios and can be landed, and the business model has been verified to be established; from the cost point of view, the computing cost of the large model is extremely high, and the cost of the vertical model is more controllable From the data point of view, as a very important part of the large model training, the general large model needs a large amount of data, the data source is highly restrictive, and the vertical knowledge base is more realizable.

Then Ding Xuefeng, a teacher from the "general big model" side, further discussed that the importance of the general large model in the current AI field is self-evident, that it provides a technical base and provides support for various applications; moreover, basic and universal ability development is an inevitable requirement of self-control.

Xu Wei, a teacher of "Vertical Big Model", made the final supplement. He believes that from the point of view of track ecology, the Vertical Big Model track has more players, can better form an ecology in which a hundred flowers blossom, and bring higher commercial and social value.

Conclusion there is no definite answer to the discussion and debate topic of this seminar, and the development of the large model is in the ascendant, which will bring new influence to every technical practitioner, enterprise and industry. This event has come to a successful conclusion, but Tencent Cloud TVP experts will continue to explore technology. With the original intention and vision of "using science and technology to influence the world", they continue to actively embrace the changes and trends of the big model era with a heart of innovation, and rationally meet the opportunities and challenges in the future with awe.

TVP, the most valuable expert (Tencent Cloud Valuable Professional) of Tencent Cloud, is an award awarded by Tencent Cloud to technical experts in the field of cloud computing. TVP strives to create a communication platform with industry technical experts and promote effective communication between Tencent Cloud and technical experts and users, so as to build cloud computing technology ecology and realize the beautiful vision of "influencing the world with science and technology".

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 263

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report