![](https://catalogone.com/wp-content/uploads/2024/06/gb-nvl72-kv-bm-l580-d-副本.jpg)
The NVIDIA GB200 NVL72 is an exascale computer in a single rack. With 36 GB200s interconnected by the largest NVIDIA® NVLink® domain ever offered, NVLink Switch System provides 130 terabytes per second (TB/s) of low-latency GPU communications for AI and high-performance computing (HPC) workloads.
NVIDIA GB200 NVL72
Powering the new era of computing.
Unlocking Real-Time Trillion-Parameter Models
GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution that boasts a 72-GPU NVLink domain that acts as a single massive GPU and delivers 30X faster real-time trillion-parameter LLM inference.
The GB200 Grace Blackwell Superchip is a key component of the NVIDIA GB200 NVL72, connecting two high-performance NVIDIA Blackwell Tensor Core GPUs and an NVIDIA Grace CPU using the NVIDIA® NVLink®-C2C interconnect to the two Blackwell GPUs.
Supercharging Next-Generation AI and Accelerated Computing
LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768
A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+
Projected performance subject to change.
![](https://catalogone.com/wp-content/uploads/2024/06/hopper-grace-gtc24-blackwell-gpt-moe-1-8t-real-time-throughput.png)
![](https://catalogone.com/wp-content/uploads/2024/06/hopper-grace-gtc24-blackwell-gpt-moe-1-8t-model-training-speedup.png)
Real-Time LLM Inference
Massive-Scale Training
![](https://catalogone.com/wp-content/uploads/2024/06/hopper-grace-gtc24-blackwell-energy-efficiency-tech-blog.png)
![](https://catalogone.com/wp-content/uploads/2024/06/hopper-grace-gtc24-blackwell-join-query-tech-blog.png)
Energy-Efficient Infrastructure
Data Processing
Features
Technological Breakthroughs
![](https://catalogone.com/wp-content/uploads/2024/06/blackwell-ai-superchip-icon.png)
Blackwell Architecture
The NVIDIA Blackwell architecture delivers groundbreaking advancements in accelerated computing, powering a new era of computing with unparalleled performance, efficiency, and scale.
![](https://catalogone.com/wp-content/uploads/2024/06/m48-gpu-chip-text.png)
NVIDIA Grace CPU
The NVIDIA Grace CPU is a breakthrough processor designed for modern data centers running AI, cloud, and HPC applications. It provides outstanding performance and memory bandwidth with 2X the energy efficiency of today’s leading server processors.
![](https://catalogone.com/wp-content/uploads/2024/06/m48-nvswitch-1.png)
Fifth-Generation NVIDIA NVLink
Unlocking the full potential of exascale computing and trillion-parameter AI models requires swift, seamless communication between every GPU in a server cluster. The fifth-generation of NVLink is a scale–up interconnect that unleashes accelerated performance for trillion- and multi-trillion-parameter AI models.
![](https://catalogone.com/wp-content/uploads/2024/06/m48-networking-ethernet-2u-switch.png)
NVIDIA Networking
The data center’s network plays a crucial role in driving AI advancements and performance, serving as the backbone for distributed AI model training and generative AI performance. NVIDIA Quantum-X800 InfiniBand, NVIDIA Spectrum™-X800 Ethernet, and NVIDIA BlueField®-3 DPUs enable efficient scalability across hundreds and thousands of Blackwell GPUs for optimal application performance.
Specifications
GB200 NVL721 Specs
GB200 NVL72 | GB200 Grace Blackwell Superchip | |
Configuration | 36 Grace CPU : 72 Blackwell GPUs | 1 Grace CPU : 2 Blackwell GPU |
FP4 Tensor Core2 | 1,440 PFLOPS | 40 PFLOPS |
FP8/FP6 Tensor Core2 | 720 PFLOPS | 20 PFLOPS |
INT8 Tensor Core2 | 720 POPS | 20 POPS |
FP16/BF16 Tensor Core2 | 360 PFLOPS | 10 PFLOPS |
TF32 Tensor Core | 180 PFLOPS | 5 PFLOPS |
FP32 | 6,480 TFLOPS | 180 TFLOPS |
FP64 | 3,240 TFLOPS | 90 TFLOPS |
FP64 Tensor Core | 3,240 TFLOPS | 90 TFLOPS |
GPU Memory | Bandwidth | Up to 13.5 TB HBM3e | 576 TB/s | Up to 384 GB HBM3e | 16 TB/s |
NVLink Bandwidth | 130TB/s | 3.6TB/s |
CPU Core Count | 2,592 Arm® Neoverse V2 cores | 72 Arm Neoverse V2 cores |
CPU Memory | Bandwidth | Up to 17 TB LPDDR5X | Up to 18.4 TB/s | Up to 480GB LPDDR5X | Up to 512 GB/s |
1. Preliminary specifications. May be subject to change. |