Tensor RT-LLM, which makes the large language model run 4 times faster on the PC platform with RTX. 04/25 Update SLTechnology News&Howtos

Tensor RT-LLM, which makes the large language model run 4 times faster on the PC platform with RTX.

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Generative AI is one of the most important trends in the history of personal computing, promoting the development of games, creation, video editing, daily work, development and so on.

GeForce RTX and NVIDIA RTX GPU are equipped with a dedicated AI processor called Tensor Cores, which natively introduces the power of generative AI to more than 100 million Windows PC and workstations.

Today, Tensor RT-LLM for Windows increases the speed of PC-generated AI by four times, and Tensor RT-LLM for Windows is an open source library that accelerates the reasoning performance of the latest AI large language models such as Llama 2 and Code Llama. Earlier, Tensor RT-LLM for Datacenter was released last month.

NVIDIA also released tools to help developers accelerate LLM, including scripts to optimize custom models using Tensor RT-LLM, an open source model optimized by Tensor RT, and a developer reference project that demonstrates the speed and quality of LLM response.

Tensor RT acceleration has now been applied to Stable Diffusion WebUI, a popular app released by Automatic 1111. It triples the speed of the generative AI diffusion model, faster than the previous fastest.

In addition, as part of today's release of the Game Ready driver, RTX VSR video super-resolution v1.5 is now available, and the NVIDIA Studio driver released in early November will also support this technology.

Tensor RT increases productivity for LLM LLM is improving productivity-chatting, summarizing document and web content, drafting emails and blogs, and is at the heart of a new workflow made up of AI and other software that automatically analyzes data and generates large amounts of content.

Tensor RT-LLM is a library used by NVIDIA to speed up LLM reasoning, allowing developers and end users to enjoy the benefits of running LLM. LLM now runs four times faster on Windows PC with RTX.

At larger batch sizes, this acceleration can significantly improve more complex LLM experiences, such as writing and coding assistants, which can output multiple unique autocomplete results at the same time, speeding up performance and improving quality, giving users the best choice.

Tensor RT-LLM acceleration also facilitates the combination of LLM capabilities with other technologies, such as LLM combining with vector libraries or vector databases in retrieval enhanced generation (RAG). RAG enables LLM to provide more targeted answers based on specific datasets, such as users' emails or website articles.

In practical application, when we propose to the basic model of LLaMa 2, which technologies of NVIDIA are integrated into mind Killer 2? When it comes to this question, it gives the unhelpful answer that "the game has not yet been released".

Instead, using RAG to add GeForce news to the vector library and connect to the same Llama2 model, you not only get the right answers-- NVIDIA DLSS 3.5, NVIDIA Reflex, and panoramic raytracing-- but also respond faster with the help of TensorRT-LLM acceleration. This combination of speed and capability provides users with smarter solutions.

Tensor RT-LLM will soon be available for download from the NVIDIA developer website.

Tensor RT's optimized open source model and RAG Demo, which uses GeForce News as an example project, are available for download from ngc.nvidia.com and GitHub.com/ NVIDIA.

Automatically accelerated Diffusion models, such as Stable Diffusion, are used to imagine and create amazing and novel works of art. Image generation is an iterative process that can take hundreds of loops to get perfect output. If done on an underperforming PC, this loop can add hours of waiting time.

Tensor RT aims to accelerate the AI model through neural network layer fusion, precision calibration, automatic kernel selection and other functions, thus significantly improving the efficiency and speed of reasoning. This makes it an indispensable tool for real-time applications and resource-intensive tasks.

Now, Tensor RT doubles the speed of Stable Diffusion generation.

Compatible with the hottest Automatic1111 WebUI, Tensor RT-accelerated Stable Diffusion can help users speed up iterations, reduce PC wait time, and generate final images faster. On the GeForce RTX 4090, it runs seven times faster than the top version of Mac, which uses Apple's M2 Ultra. The extension is available for download from now on.

Tensor RT Demo based on the Stable Diffusion process provides developers with a reference implementation of how to prepare a diffusion model and deploy acceleration for Tensor RT acceleration. This is a starting point that can speed up the Diffusion process for interested developers and bring lightning-fast reasoning capabilities to applications.

The Super Video experience AI is improving many of the daily PC experiences of all users. Streaming video is one of the most popular activities on PC, with almost all sources, such as YouTube, Twitch, Prime Video, Disney+ and so on. Thanks to AI and RTX, its image quality has been completely improved.

RTX VSR video super-resolution is a major breakthrough in AI pixel processing to improve the quality of live video content by reducing or eliminating the distortion caused by compressed video. In addition, it sharpens edges and details.

Now, RTX VSR Video Super Resolution v1.5 further improves video picture quality by updating the model, eliminating artifacts of playing content at the original resolution, and adding support for RTX 20 series GPU with NVIDIA Turing architecture, including professional graphics cards and GeForce RTX 20 series GPU.

Retraining the VSR AI model helps it learn to accurately identify the difference between subtle details and compression distortion. Therefore, the AI-enhanced image can retain details more accurately during magnification. The details are clearer and the overall image looks sharper and clearer. The new feature of v1.5 is to eliminate distortion when playing video at the original screen resolution. The original version enhanced the video effect only when the video resolution was improved. Now, for example, streaming 1080p video on a 1080p resolution display is smoother, because severe distortion can be significantly improved.

RTX VSR can now eliminate distortion when playing video at its original resolution. From now on, all RTX users can get RTX VSR video super resolution v1.5 in the latest Game Ready driver, and the NVIDIA Studio driver released early next month will also support this technology.

RTX VSR video super resolution is part of NVIDIA software, tools, libraries and SDK (such as the software, tools, libraries and SDK mentioned above, as well as DLSS, Omniverse, AI Workbench, etc.), which bring more than 400 AI accelerated applications and games to consumers.

The era of AI is coming. RTX is adding momentum to every step of its development.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.