Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The biggest change in 40 years, Intel Meteor Lake Analysis

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

At this year's Intel ON Technology Innovation Conference, Intel formally introduced the latest Metror Lake architecture, which is the first time Intel has launched Intel 4 process products. To further advance the goal of Moore's Law of five nodes in four years, Intel is now mass-producing Intel 7 processors, and the Intel 4 processors announced at the conference are also in climbing production.

Intel is expected to put into production Intel 3 process processors in the second half of 2023, Intel 20A process processors in the first half of 2024, and Intel 18A process processors in the second half of 2024. Intel 12th and 13th generation core has been successfully launched and provides a large number of innovative technologies. Intel will continue to provide new innovative technologies in PC and edge computing in the future. Meteor Lake will be an important node. It uses Intel's first AI accelerated NPU, which can efficiently implement local reasoning on PC and bring leading technical support for future Arrow Lake and Lunar Lake.

Meteor Lake is an important turning point for Intel. In addition to the independent NPU architecture, there are more additional engines. Next, let's take a closer look at the changes brought about by this Meteor Lake.

The Meteor Lake architecture Meteor Lake includes four separate modules that seal links through Foveros 3D encapsulation technology. It includes calculation module, SoC module, graphics module and IO module. Among them, the computing module uses the latest generation of energy efficiency core and performance core as the architecture and enhanced functions, and this part uses the latest generation of Intel 4 process process, which has a significant improvement in energy consumption than before.

The SoC module integrates NPU, low-power island energy core, Wi-Fi and Bluetooth, and supports 8K HDR, AV1 codec, HDMI 2.1and DP2.1 standards. The NPU part can bring efficient AI function performance, and is compatible with OpenVINO and other standardized program interfaces.

The graphics module integrates Intel's sharp graphics architecture, providing graphics performance up to twice the performance of the previous generation.

Because Meteor Lake has a low-power energy core on SoC, it forms a third-order high-performance hybrid architecture to match existing energy-efficiency cores and performance cores. Intel has a new level of hybrid architecture compared with 12th-generation and 13th-generation cool products.

In terms of AI support, Meteor Lake provides NPU with built-in AI capabilities of all computing engines to achieve more energy-efficient AI computing. Among them, GPU has performance parallelism and high throughput, which makes it very suitable to introduce AI functions into media, 3D applications and rendering pipelines. NPU is a dedicated, low-power AI engine used to maintain AI running and AI offloading. CPU has the ability to respond quickly and is very suitable for lightweight, single-reasoning, low-latency AI tasks. Through the AI computing power network with different levels of GPU, NPU and CPU, Meteor Lake can well realize the local AI capability. Bring AI from the cloud to the client PC and the enterprise edge PC.

Let's take a closer look at each specific part along the overall architecture of Meteor Lake. First of all, let's take a look at the SoC module, which has changed a lot. This part contains two buses, namely NOC bus and IO Fabric bus. NOC bus is characterized by high bandwidth and fast response, which enables devices mounted on the bus to access the whole memory quickly and with low power consumption. At the same time, NOC bus also plays the role of connecting computing modules and graphics modules. The devices directly connected to the bus in the SoC module include low power efficiency core, memory controller, multimedia block, NPU, IPU and so on.

The IO Fabric bus below mainly connects PCIe, USB 3swap 2, SATA, Wi-Fi and Bluetooth, network, sensor, audio and other related devices, as well as chip-level SSE security engine and platform-level security module. Through the IO Fabric bus, it is connected with the external IO module, and the IO module integrates USB 4, Thunderbolt 4 and PCIe controller.

Each module of the latest Meteor Lake has its own power management module, which will work together with the upper operating system and software to achieve modular, system-level power management. The low power island in the SoC module emphasizes achieving lower energy consumption on the premise of extreme performance to prolong the use time of the computer. Many external equipment modules are concentrated on SoC, and most of the functions integrated on SoC can meet most of the needs of most users. Computing modules and graphics modules are needed in high-performance computing and graphics computing.

Then there is the problem of scheduling between different cores. Considering that the low-power performance core in SoC and the performance core and performance core in computing module all participate in the computing process of PC, task scheduling is also an important topic for Meteor Lake. On the basis of the previous two-level hybrid architecture of performance core and performance core, SoC low-power performance core as a third-level hybrid architecture undoubtedly increases the complexity of scheduling.

According to the daily use of different threads, Intel hardware thread scheduler also adapts to the low-power performance core in SoC. This time Intel and Microsoft work together to classify common instructions.

Class 0 represents the situation where the number of instructions in each clock cycle is basically the same between the performance core and the performance core. Class 1 represents the situation where the number of instructions in the performance core is greater than the efficiency core in each clock cycle, such as floating-point operation instructions, and Class 2 represents the situation in which the number of instructions in the performance core is much larger than the energy core in each clock cycle, such as AI computing. Class 3 represents the case where the number of instructions in each cycle of the energy kernel is greater than that of the performance core. Depending on the instruction, the Intel thread scheduler provides a feedback form that scores each core, where EE represents energy efficiency and Perf represents performance. Cores with high scores will be preferred to the operating system. As an example in the figure above, if the operating system wants to pursue performance, then the Intel thread scheduler will recommend P-Core N, and the operating system will put the relevant tasks on this core according to these recommendations; if the operating system wants to pursue better energy performance, then the thread scheduler will recommend E-Core N. For different levels of task types, the thread scheduler can dynamically recommend the appropriate core for the operating system.

Compared with the previous hardware thread scheduling, Meteor Lake enhances the feedback to the operating system, when other processes occupy power consumption, the core power consumption will be dynamically allocated to report the ability of the whole core and each core. Through the evaluation and judgment of the internal power consumption ratio, the hardware thread scheduler on Meteor Lake provides the feedback table to the operating system more accurately.

The overall characteristics of the system operation mode, software operation mode and hardware characteristics of the platform are incorporated into the control logic, so that the hardware thread scheduler has better support for the third-order high-performance hybrid architecture.

In the graphics and media part, Meteor Lake transfers the multimedia and display engine originally located in GPU to the SoC module, and the IO module has a physical display interface responsible for displaying the output of the signal.

The upgraded multimedia engine supports up to 8K 60Hz 10bit HDR video decoding and 8K 30Hz 10bit HDR video coding, and supports VP9, AVC, HEVC, AV1 and other traditional formats.

The display engine further optimizes the power consumption and compresses the full path. When the display output does not match with the display solution, the display output can be well provided by this compression technology.

In addition, this display engine supports HDMI 2.1and DP2.1, as well as the full eDP 1.4 output specification, with a resolution of up to 8K 60Hz HDR, or four 4K 60Hz HDR outputs.

Next is the graphics module of Meteor Lake. Compared with the previous generation of graphics card products, this Meteor Lake has a higher main frequency and lower elegance. It has also done a lot of optimization in the interconnected cache, and the core frequency has been improved.

Meteor Lake has 8 GPU cores and 128 vector engines, 2 geometry rendering pipelines, 8 samplers and 4 texture mapping units, and 8 new hardware light tracking units.

This time Meteor Lake's graphics module inherits some of the advanced features of Intel's unique products, makes more optimizations to DX12 Ultimate, and brings better performance in games, productivity and scientific research.

Under the Blender software, compared with CPU,Meteor Lake's GPU, it can bring more than 2 times the performance improvement.

In other technologies, Meteor Lake brings a new Wi-Fi 7 network, which greatly improves data throughput and multiplex performance. In addition, Meteor Lake also supports Bluetooth 5.4 specification, including new audio codec specification, which can greatly reduce power consumption and delay and improve audio quality.

Intel connection management software ICPS is very popular in the industry, upgrading to version 3.0 on Meteor Lake and continuing to improve the wireless and wired network connectivity of devices. Intel's Unison multi-device interconnection software can cross ecosystems and devices, not only the Windows operating system, we can also support Mac OS, iOS, iPAD OS, can support a variety of Android smart devices, cross-ecosystem, cross-device types, through Intel Unison software, PC, mobile phones, tablets and other smart devices to do interconnection. This is a very mature software solution, and Intel will also launch the second generation of Unison software on Meteor Lake.

Another is Wi-Fi Sensing technology, which uses the existing Wi-Fi and antennas of notebooks to achieve human body proximity sensing through software without adding additional hardware basis, such as waking up the operating system, or the human body is away from the automatic power reduction locking system, and so on. Intel will also use Wi-Fi for positioning or gesture recognition in the future.

Meteor Lake also has good support for Thunderbolt 4, enabling storage, display, and virtualization expansion through stronger throughput and bandwidth performance.

Intel 4 process and Foveros process next, let's take a look at the Intel 4 process used by Meteor Lake. According to Intel's previously announced IDM 2.0 strategy, Intel plans to achieve five process nodes in four years, and Intel 4 is the second node in this strategy.

The previous Intel 7 process is proof that Intel can continue to improve node performance, and its transistor optimization focuses on performance. The Intel 4 process of Meteor Lake injury aims to improve the yield and area reduction of EUV lithography to further achieve high energy efficiency and lay the foundation for Intel 3.

The Intel 3 process currently under development will bring a higher density design library, increase the transistor driving current and reduce through-hole resistance, and make more use of EUV lithography technology. The future Intel 20A marks Intel's entry into the Emi era and will adopt Ribbon FET and PowerVin technologies, while the subsequent Intel 18A will continue to be based on Intel 20A, increasing performance per watt by another 10%, and establishing Intel's leading position in process nodes.

This time, the Intel 4 adopted by Meteor Lake has achieved twice the area miniaturization, resulting in a high-performance logic library, while introducing a number of innovations.

Among them, EUV lithography technology simplifies and improves the interconnection architecture design. Although the EUV lithography machine is expensive, it does bring great simplification to Intel's new manufacturing process. With EUV support, Intel 4 reduces mask by 20% and process steps by 5%. At the same time, Intel 4 is also compatible with EMIB and Foveros packaging technology.

In terms of packaging technology, Intel introduced at this year's Malaysia airliner tour that starting from Meteor Lake, Foveros packaging technology will be introduced into client products to create more powerful notebook computers.

Although a variety of functions of the 13th generation Kerry processors have been integrated into the SoC, as these functions become more diverse and more complex, it becomes more and more difficult and expensive to design and manufacture these monolithic system-on-chip chips. In order to solve this technical problem, Foveros packaging technology appears, which uses high-density, high-bandwidth, low-power interconnection to combine many modules manufactured by a variety of processes into a chip complex composed of large-scale separation module architecture.

Prior to this, Intel adopted the extended Foveros packaging technology-Co-EMIB to package GPU Max products for the first time in the data center GPU Max series products. The newly launched Meteor Lake processor will introduce Foveros technology into client products for the first time.

The dramatic changes in the architecture of the Meteor Lake processor pose packaging challenges. This is a three-module chip that provides graphics modules with large capacitors, SoC modules connected with Fovers 36x pitch chips, and computing modules built using Intel 4 process technology, in which the IO / power supply and inter-chip routing of the computing modules are made of metal layers.

Due to the complexity of the overall structure of the Meteor Lake architecture, this brings more challenges to the package. The assembly process is divided into five steps: first, the wafer is cut into a single chip from the wafer factory and external foundry; secondly, the single chip is tested to ensure that the chip quality can enter the Foveros assembly stage, which is the key to ensure the reliability of heterogeneous design. The third step is to assemble the wafer on the substrate through operations such as chip attachment, bottom filling, wafer mold assembly, as well as manufacturing processes such as collision, passivation, grinding, polishing and so on. Then, the Meteor Lake Foveros complex is packaged and assembled on the BGA substrate. At present, this composite is compatible with the existing package assembly tools and processes and can be completed with only a little optimization. Finally, the packaged chip is tested, including pressure, aging test, class test and system-level platform test. after the test is completed, the chip can be put into the market for assembly and production.

Compared with Raptor Lake, the advanced Foveros process has many advantages. The bump spacing is only 36U, the trace width is less than 1 micron, the bump density is nearly 8 times, the trace length is less than 2 mm, the bandwidth is up to 160GB/s/mm, and the power consumption is less than 0.3pJ / bit. This improvement greatly reduces the low-power wafer interconnection partition overhead, while the cell block also improves the wafer yield, in addition, an ideal silicon process can be selected for each block, so as to reduce the cost and performance, simplify the creation of SKU and improve the customization ability more easily. All these bring benefits to the improvement of Meteor Lake quality rate and cost control.

Intel is currently working on more than Meteor Lake and subsequent products to support wafer-level assembly, and these new facilities will provide capacity for Foveros Direction 9 microns and future products.

Intel's previous changes from FCBGA to FCLGA to EMIB and Foveros, as well as subsequent changes in Foveros Direct packaging technology, will enable Intel to successfully fulfill its four-year five-process node commitment to achieve 2.5D packaging in 2024-2025.

NPU opens a new chapter in AI the Meteor Lake processor has an independent NPU acceleration unit, which works with CPU and GPU to form a three-tier AI acceleration architecture that collaborates with each other to bring a powerful artificial intelligence experience. The NPU architecture host interface and device management created by Meteor Lake support Microsoft's new driver model, which enables Meteor Lake's NPU to support Microsoft's accelerated driver model while ensuring security.

The memory management unit provides isolation in a variety of situations and supports power and workload scheduling, thus achieving fast and low-power state transition.

Meteor Lake's NPU consists of multiple engine architectures, which are equipped with two neural computing engines that can work together to handle a single workload or different workloads. There are two main computing components in the neural computing engine, namely, inference ancient salt path and SHAVE DSP, in which inference pipeline is the core driver of energy-efficient computing. By minimizing data movement and using fixed function operation to deal with common large computational tasks, high efficiency and energy saving can be achieved in the execution of neural network.

Most of the computing takes place on the reasoning pipeline, which is a fixed functional pipeline hardware that supports the operation of standard neural networks. The pipeline consists of a multiplication accumulation plus operation array, an activation function block and a data conversion block.

SHAVE DSP is a highly optimized VLIW DSP designed for AI. The streaming hybrid architecture vector engine can be pipelined with inference pipelines and direct memory access engines to achieve true heterogeneous computing in parallel on the NPU, thus maximizing performance. In addition, the DMA engine optimizes orchestrating data movement for maximum energy efficiency and performance.

Like the MobileNet network model, when its complexity is relatively low, using CPU processing is faster and more effective. However, NPU is more suitable for high complexity and large-scale operations, because NPU has higher processing power than CPU, and it has higher processing power for AI workload.

As an image network structure, Stable Diffusion requires different computational densities in different scenes in the generative AI. In the process of generating images for natural language, it mainly includes three core processes: text decoder, Unet and VAE. The performance on CPU, GPU and NPU is not the same, and the time, power and efficiency are also different. If the computing power of Meteor Lake AI works together, it can bring more comprehensive performance. The positive prompt of Unet runs on GPU, and the Unet of negative prompt runs on NPU. This time is shortened to 11.3 seconds. Because of the participation of GPU, the power consumption is 30W. Thus it can be seen that different tasks are undertaken on different architectures, so that the overall performance is very good and the power consumption is very low.

At present, Intel is testing terminal-side AI applications with more than 100 partners in the industry, introducing AI into daily life. In the AI API layer, Intel has jointly developed WinML, ONNX RT, DirectML and other interfaces with Microsoft, as well as Intel's own OpenVINO, these API interfaces can better call the underlying resources of CPU, GPU and NPU, and help AI applications achieve better use of computing power.

In addition to the latest NPU module on Meteor Lake, Intel GPU is also very powerful in AI acceleration. Through the DP4a instruction set, Intel GPU can implement 64 INT8 integer accumulations in a cycle. This has been mentioned in the previous Intel GPU architecture analysis, so I won't repeat it here.

Through the three-layer AI computing network of CPU, GPU and NPU, Meteor Lake pushes the artificial intelligence acceleration ability of client processors to a new level. With the support of such computing network, the local large language model and AIGC related topics can jump away from the cloud computing power, which makes AI further deepen in the field of edge computing.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report