Nvidia announces seven Vera Rubin chips in volume production

NVIDIA today announced the NVIDIA Vera Rubin platform is opening the next frontier of agentic AI, with seven new chips now in full production.

They are: the NVIDIA Vera CPU, NVIDIA Rubin GPU, NVIDIA NVLink;6 Switch, NVIDIA ConnectX-9 SuperNIC, NVIDIA BlueField-4 DPU and NVIDIA Spectrum-6 Ethernet switch, as well as the newly NVIDIA Groq 3 LPU.

Designed to operate together as one AI supercomputer, the chips power every phase of AI — from massive-scale pretraining, post-training and test-time scaling to real-time agentic inference.

“Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant supercomputer — built to power every phase of AI,” said CEOmJensen Huang, “the agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history.”

NVIDIA Vera Rubin NVL72 Rack

Integrating 72 Rubin GPUs and 36 Vera CPUs connected by NVLink 6, along with ConnectX-9 SuperNICs and BlueField-4 DPUs, Vera Rubin NVL72 delivers breakthrough efficiency — training large mixture-of-experts models with one-fourth the number of GPUs compared with the NVIDIA Blackwell platform and achieving up to 10x higher inference throughput per watt at one-tenth the cost per token.

Designed for hyperscale AI factories worldwide, NVL72 scales seamlessly with NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet to sustain high utilization across massive GPU clusters while reducing time to train and total cost of ownership.

NVIDIA Vera CPU Rack
Reinforcement learning and agentic AI workloads rely on large numbers of CPU-based environments to test and validate the results generated by models running on GPU systems.

The NVIDIA Vera CPU Rack delivers dense, liquid-cooled infrastructure built on NVIDIA MGX, integrating 256 Vera CPUs to provide scalable, energy-efficient capacity with world- class single-threaded performance, unlocking agentic AI at scale.

Integrated with Spectrum-X Ethernet networking, Vera CPU racks keep CPU environments tightly synchronized across the AI factory. Together with GPU compute racks, they provide the CPU foundation for large-scale agentic AI and reinforcement learning — with Vera delivering results twice as efficiently and 50% faster than traditional CPUs.

NVIDIA Groq 3 LPX Rack
NVIDIA Groq 3 LPX is designed for the low-latency and large-context demands of agentic systems, LPX and Vera Rubin unite the extreme performance of both processors to deliver up to 35x higher inference throughput per megawatt and up to 10x more revenue opportunity for trillion-parameter models.

At scale, a fleet of LPUs function as a giant single processor for fast, deterministic inference acceleration. The LPX rack with 256 LPU processors features 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth. Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost decode by jointly computing every layer of the AI model for every output token.

Optimised for trillion-parameter models and million-token context, the codesigned LPX architecture pairs with Vera Rubin to maximize efficiency across power, memory and compute. The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers. Fully liquid cooled and built on MGX infrastructure, LPX integrates seamlessly into next-generation Vera Rubin AI factories to be available in the second half of this year.

NVIDIA BlueField-4 STX Storage Rack
The NVIDIA BlueField-4 STX rack-scale system is an AI-native storage infrastructure that extends GPU memory seamlessly across the POD. Powered by BlueField-4 — combining the NVIDIA Vera CPU and NVIDIA ConnectX-9 SuperNIC — STX delivers a high-bandwidth shared layer optimized for storing and retrieving the massive key-value cache data generated by large language models and agentic AI workflows.

NVIDIA DOCA Memos — a new DOCA framework that supercharges BlueField-4 storage — enables dedicated KV cache storage processing to boost inference throughput by up to 5x while significantly improving power efficiency compared with general-purpose storage architectures. The result is POD-wide context that delivers faster multi-turn interactions with AI agents, more scalable AI services and higher overall infrastructure utilization.

“The NVIDIA BlueField-4 STX rack-scale context memory storage system will enable a critical performance boost needed to exponentially scale our agentic AI efforts,” said Timothée Lacroix, cofounder and chief technology officer of Mistral AI. “By delivering a new storage tier purpose-built for AI agents memory, STX is ideally positioned to ensure that our models can maintain coherence and speed when reasoning across massive datasets.”

NVIDIA Spectrum-6 SPX Ethernet Rack
Spectrum-6 SPX Ethernet is engineered to accelerate east-west traffic across AI factories. Configurable with either Spectrum-X Ethernet or NVIDIA Quantum-X800 InfiniBand switches, it delivers low-latency, high-throughput rack-to-rack connectivity at scale.

Spectrum-X Ethernet Photonics with co-packaged optics achieves up to 5x greater optical power efficiency and 10x higher resiliency compared with traditional pluggable transceivers.

NVIDIA, along with over 200 data center infrastructure partners, announced the NVIDIA DSX platform for Vera Rubin. This includes DSX Max-Q to enable dynamic power provisioning across the entire AI factory, resulting in the deployment of 30% more AI infrastructure within a fixed-power data center. The new DSX Flex software enables AI factories to be grid-flexible assets, unlocking 100 gigawatts of stranded grid power.

NVIDIA also today released the Vera Rubin DSX AI Factory reference design, a blueprint for codesigned AI infrastructure that maximizes tokens per watt and overall goodput, improving system resiliency and accelerating time to first production.

By tightly integrating compute, networking, storage, power and cooling, the architecture increases energy efficiency and ensures AI factories can scale reliably under continuous, high-intensity workloads with maximum uptime.

Vera Rubin-based products will be available from partners starting the second half of this year. This includes leading cloud providers Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure, along with NVIDIA Cloud Partners CoreWeave, Crusoe, Lambda, Nebius, Nscale and Together AI.

Global system manufacturers Cisco, Dell Technologies, HPE, Lenovo and Supermicro are expected to deliver a wide range of servers based on Vera Rubin products, as well as Aivres, ASUS, Foxconn, GIGABYTE, Inventec, Pegatron, Quanta Cloud Technology (QCT), Wistron and Wiwynn.

AI labs and frontier model developers including Anthropic, Meta, Mistral AI and OpenAI are looking to use the NVIDIA Vera Rubin platform to train larger, more capable models and to serve long-context, multimodal systems at lower latency and cost than with prior GPU generations.

See all our AI content.