GPT-4 model architecture disclosure: including 1.8 trillion parameters, using hybrid expert model 04/15 Update SLTechnology News&Howtos

GPT-4 model architecture disclosure: including 1.8 trillion parameters, using hybrid expert model

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

CTOnews.com July 13 news, foreign media Semianalysis recently revealed the GPT-4 model released by OpenAI in March this year, including GPT-4 model architecture, training and reasoning infrastructure, number of parameters, training data set, token number, cost, hybrid expert model (Mixture of Experts) and other specific parameters and information.

▲ image source Semianalysis foreign media said that GPT-4 contains a total of 1.8 trillion parameters in layer 120, while GPT-3 has only about 175 billion parameters. In order to maintain a reasonable cost, OpenAI uses a hybrid expert model to build.

CTOnews.com Note: hybrid expert Model (Mixture of Experts) is a kind of neural network, which trains multiple models separately according to the data. After each model is output, the system integrates these models into a single task.

▲ map source Semianalysis it is reported that GPT-4 uses 16 hybrid expert models (mixture of experts), each with 111 billion parameters, and each forward route passes through two expert models.

In addition, it has 55 billion shared attention parameters and is trained with a dataset containing 13 trillion tokens. Tokens is not unique and is calculated as more tokens based on the number of iterations.

The context length of the GPT-4 pre-training stage is 8kPower32k, which is the result of fine-tuning 8k, and the training cost is quite high. Foreign media said that 8x H100 can not provide the required dense parameter model at the speed of 33.33 Token per second, so training this model needs to lead to a very high reasoning cost. If the H100 physical machine is calculated at US $1 per hour, then the training cost will be as high as US $63 million (about RMB 451 million).

In response, OpenAI chose to use the cloud-based A100 GPU training model, which reduced the final training cost to about $21.5 million (about 154 million yuan), and reduced the training cost in a slightly longer time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.