NVIDIA Chat with RTX - Large language models running locally

Introduction

NVIDIA's Chat with RTX is a demo application that personalizes GPT large language models (LLMs), connecting them to your own content—documents, notes, videos, or other data. Leveraging retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration technology, you can query a customized chatbot to quickly obtain context-relevant answers. Since all of this runs locally on a Windows RTX PC or workstation, you will get fast and secure results.

Official website download ↓ address: https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generative-ai/#sys-req

Device requirements

PlatformWindows
GPUNVIDIA GeForce™ RTX 30 or 40 Series GPU or NVIDIA RTX™ Ampere or Ada Generation GPU with at least 8GB of VRAM
RAM16GB or greater
OSWindows 11
Driver535.11 or later

Application scenarios

NVIDIA's Chat with RTX provides two core application scenarios:

  1. Chat with RTX supports multiple file formats, including text, pdf, doc/docx, and xml. Just point the app to the folder containing your files, and it will load them into the library within seconds. Additionally, you can provide the URL of a YouTube playlist, and the app will load the transcripts of the videos in the playlist, allowing you to query the content they cover.

  2. The Chat with RTX technical demonstration is built based on the TensorRT-LLM RAG developer reference project available on GitHub. Developers can use this reference to develop and deploy RAG-based applications that are accelerated for RTX and supported by TensorRT-LLM.

Demonstration effect