Brighten the future: TensorRT-LLM updates accelerate AI reasoning performance and support running new models on RTX-driven Windows PC 07/02 Update SLTechnology News&Howtos

Brighten the future: TensorRT-LLM updates accelerate AI reasoning performance and support running new models on RTX-driven Windows PC

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

AI on Windows PC marks a critical moment in the history of technology, which will revolutionize the experience of players, creators, anchors, office workers, students, and even ordinary PC users.

AI brings unprecedented opportunities to increase productivity for more than 100 million Windows PC and workstations with RTX GPU. NVIDIA RTX technology makes it easier for developers to create AI applications, thus changing the way people use computers.

New optimizations, models, and resources unveiled at Microsoft Ignite will help developers provide a new end-user experience faster.

TensorRT-LLM is an open source software that improves the reasoning performance of AI. Its upcoming update will support more large language models, making it easier to complete demanding AI workloads on PC and laptops with RTX GPU 8GB and above.

Tensor RT-LLM for Windows is about to be compatible with OpenAI's popular chat API through a new encapsulation interface. This will enable hundreds of developer projects and applications to run locally on RTX PC rather than in the cloud, so users can retain private and proprietary data on PC.

Customized generative AI takes time and effort to maintain the project. This process can be extremely complex and time-consuming, especially when collaborating and deploying across multiple environments and platforms.

AI Workbench is a unified, easy-to-use toolkit that allows developers to quickly create, test, and customize pre-training generated AI models and LLM on PC or workstations. It provides a single platform for developers to organize their AI projects and adapt the model to specific user needs.

This enables developers to collaborate and deploy seamlessly to quickly create cost-effective and scalable generative AI models. Join the pre-emptive experience list and become the first batch of users to learn about constantly updated features and receive updated information.

To support AI developers, NVIDIA and Microsoft have released DirectML enhancements to accelerate Llama 2, one of the most popular base AI models. In addition to the new performance standards, developers now have more cross-vendor deployment options.

Portable AI

In October, NVIDIA released TensorRT-LLM for Windows, a library for accelerating large language model (LLM) reasoning.

The TensorRT-LLM v0.6.0 update released at the end of this month will bring up to a fivefold improvement in reasoning performance and support more popular LLM, including the new Mistral 7B and Nemotron-3 8B. These LLM versions will run on all GeForce RTX 30 series and 40 series GPU with 8GB and above video memory, so that even the most portable Windows PC devices can get fast and accurate local operation of LLM functions.

TensorRT-LLM v0.6.0 brings up to 5x improvement in reasoning performance

The newly released TensorRT-LLM can be downloaded and installed in the / NVIDIA/ TensorRT-LLM GitHub codebase, and the newly tuned model will be available on ngc.nvidia.com.

A leisurely conversation

Developers and enthusiasts around the world use OpenAI's chat API for a wide range of applications-from summarizing web content, drafting documents and emails, to analyzing and visualizing data, and creating presentations.

A big challenge for such cloud-based AI is that they require users to upload input data, so it is not practical for private or proprietary data and for dealing with large data sets.

To meet this challenge, NVIDIA is about to enable TensorRT-LLM for Windows to provide an API interface similar to OpenAI's popular ChatAPI through a new encapsulation interface, bringing a similar workflow to developers, regardless of whether their models and applications are designed to run locally on RTX PC or in the cloud. Hundreds of AI-driven developer projects and applications can now benefit from fast native AI with just one or two lines of code modification. Users can save their data on PC without having to worry about uploading it to the cloud.

▲ uses the Microsoft VS Code plug-in Continue.dev coding assistant https://images.nvidia.cn/cn/youtube-replicates/-P17YXulhDc.mp4 driven by TensorRT-LLM

In addition, the most important point is that many of these projects and applications are open source, and developers can easily take advantage of and extend their functions to speed up the application of generative AI on RTX-driven Windows PC.

The encapsulation interface can be used with all LLM optimized for TensorRT-LLM (such as Llama 2, Mistral, and NV LLM) and published on GitHub as a reference project, as well as other developer resources for using LLM on RTX.

Model acceleration

Developers can now take advantage of the cutting-edge AI model and deploy through cross-vendor API. NVIDIA and Microsoft have been working to enhance the developer's ability to accelerate Llama on RTX through DirectML API.

On top of the fastest reasoning performance for these models announced in October, this new option for cross-vendor deployment makes it easier than ever to introduce AI into PC.

Developers and enthusiasts can download the latest ONNX runtime and follow Microsoft's installation instructions, while installing the latest NVIDIA driver (which will be released on November 21) for the latest optimization experience.

These new optimizations, models and resources will accelerate the development and deployment of AI functions and applications on 100 million RTX PC worldwide, joining more than 400 partners who have released AI-driven applications and games accelerated by RTX GPU.

With the improvement of the ease of use of the model, and developers bring more generative AI functions to RTX-driven Windows PC, RTX GPU will become the key for users to take advantage of this powerful technology.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.