NVIDIA GB200 NVL2

Bringing the new era of computing to every data center.

Unparalleled Single-Server Performance

The NVIDIA GB200 NVL2 platform brings the new era of computing to every data center, delivering unparalleled performance for mainstream large language model (LLM) inference, vector database search, and data processing through 2 Blackwell GPUs and 2 Grace GPUs. With its scale-out, single node NVIDIA MGX™ architecture, its design enables a wide variety of system designs and networking options to seamlessly integrate accelerated computing into existing data center infrastructure.

Highlights

Turbocharging Accelerated Computing

Llama 3 Inference

5Xvs. NVIDIA H100 Tensor Core GPU

Vector Database Search

9Xvs. H100

Data Processing

18Xvs. CPU

Llama3 LLM inference: Token-to-token latency (TTL) = 50 milliseconds (ms) real time, first token latency (FTL) = 2s, input sequence length = 2.048, output sequence length = 128 output, 8x NVIDIA HGX™ H100 air-cooled vs. GB200 NVL2 air-cooled single node, per-GPU performance comparison
Vector database search performance within RAG pipeline using memory shared by NVIDIA Grace CPU and Blackwell GPU. 1x x86, 1x H100 GPU, and 1x GPU from GB200 NVL2 node.
Data processing: A database join and aggregation workload with Snappy/Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GU, and single GPU from GB200 NVL2 node: GB200 vs. Intel Xeon 8480+
Projected performance subject to change.

Real-Time Mainstream LLM Inference

GB200 NVL2 introduces massive coherent memory up to 1.3 terabytes (TB) shared between two Grace CPUs and two Blackwell GPUs. This shared memory is coupled with fifth-generation NVIDIA® NVLink™ and high-speed, chip-to-chip (C2C) connections to deliver 5X faster real-time LLM inference performance for mainstream language models such as Llama 3 70B.

Vector Database Search

GB200 NLV2 accelerates RAG vector search operation by up to 9X. The vector database of the Wikipedia dataset is over 200 gigabytes (GB), and access to the Grace CPU’s 960GB of memory and 900GB/s high-speed C2C link supercharges low-latency vector search.

Data Processing

Databases play critical roles in handling, processing, and analyzing large volumes of data for enterprises. GB200 NVL2 takes advantage of high-bandwidth memory performance, NVLink-C2C, and dedicated decompression engines in the NVIDIA Blackwell architecture to speed up key database queries by 18X compared to CPU.

Features

Technological Breakthroughs

Specifications

GB200 NVL2¹ Specs

Configuration	2x Grace CPUs, 2x Blackwell GPUs
FP4 Tensor Core²	40 PFLOPS
FP8/FP6 Tensor Core²	20 PFLOPS
INT8 Tensor Core²	20 POPS
FP16/BF16 Tensor Core²	10 PFLOPS
TF32 Tensor Core²	5 PFLOPS
FP32	180 TFLOPS
FP64/FP64 Tensor Core	90 TFLOPS
GPU Memory \| Bandwidth	Up to 384GB \| 16TB/s
CPU Core Count	144 Arm® Neoverse V2 cores
LPDDR5X Memory \| Bandwidth	Up to 960GB \| Up to 1,024GB/s
Interconnect	NVLink: 1.8TB/s NVLink-C2C: 2x 900GB/s PCIe Gen6: 2x 256GB/s
Server Options	Various NVIDIA GB200 NVL2 configuration options using NVIDIA MGX
¹ Preliminary specifications. May be subject to change. ² With sparsity.

NVIDIA GB200 NVL2

Unparalleled Single-Server Performance

Turbocharging Accelerated Computing

Llama 3 Inference

Vector Database Search

Data Processing

Real-Time Mainstream LLM Inference

Vector Database Search

Data Processing

Technological Breakthroughs

Blackwell Architecture

NVIDIA Grace CPU

NVLINK C2C

Key Value (KV) Caching

Fifth-Generation NVIDIA NVLink

NVIDIA Networking

GB200 NVL2¹ Specs

NVIDIA GB200 NVL72

Broadway Store

Valencia Store

Emeryville Store

Alameda Store