NVIDIA HGX AI Supercomputer
The world’s leading AI computing platform.
Purpose-Built for AI and HPC
AI, complex simulations, and massive datasets require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX™ AI supercomputing platform brings together the full power of NVIDIA GPUs, NVLink®, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights.
Unmatched End-to-End Accelerated Computing Platform
The NVIDIA HGX B200 and HGX B100 integrate NVIDIA Blackwell Tensor Core GPUs with high-speed interconnects to propel the data center into a new era of accelerating computing and generative AI. As a premier accelerated scale-up platform with up to 15X more inference performance than the previous generation, Blackwell-based HGX systems are designed for the most demanding generative AI, data analytics, and HPC workloads.
NVIDIA HGX includes advanced networking options—at speeds up to 400 gigabits per second (Gb/s)—using NVIDIA Quantum-2 InfiniBand and Spectrum™-X Ethernet for the highest AI performance. HGX also includes NVIDIA® BlueField®-3 data processing units (DPUs) to enable cloud networking, composable storage, zero-trust security, and GPU compute elasticity in hyperscale AI clouds.
Deep Learning Inference: Performance and Versatility
Real-Time Inference for the Next Generation of Large Language Models
Projected performance subject to change. Token-to-token latency (TTL) = 50 milliseconds (ms) real time, first token latency (FTL) = 5s, input sequence length = 32,768, output sequence length = 1,028, 8x eight-way NVIDIA HGX™ H100 GPUs air-cooled vs. 1x eight-way HGX B200 air-cooled, per GPU performance comparison.
Real-Time Inference for the Next Generation of Large Language Models
HGX B200 achieves up to 15X higher inference performance over the previous NVIDIA Hopper™ generation for massive models such as GPT-MoE-1.8T. The second-generation Transformer Engine uses custom Blackwell Tensor Core technology combined with TensorRT™-LLM and Nemo™ Framework innovations to accelerate inference for large language models (LLMs) and Mixture-of-Experts (MoE) models.
Deep Learning Training: Performance and Scalability
Next-Level Training Performance
Next-Level Training Performance The second-generation Transformer Engine, featuring 8-bit floating point (FP8) and new precisions, enables a remarkable 3X faster training for large language models like GPT-MoE-1.8T. This breakthrough is complemented by fifth-generation NVLink with 1.8TB/s of GPU-to-GPU interconnect, InfiniBand networking, and NVIDIA Magnum IO™ software. Together, these ensure efficient scalability for enterprises and extensive GPU computing clusters.
Next-Level Training Performance
The second-generation Transformer Engine, featuring 8-bit floating point (FP8) and new precisions, enables a remarkable 3X faster training for large language models like GPT-MoE-1.8T. This breakthrough is complemented by fifth-generation NVLink with 1.8TB/s of GPU-to-GPU interconnect, InfiniBand networking, and NVIDIA Magnum IO™ software. Together, these ensure efficient scalability for enterprises and extensive GPU computing clusters.
Accelerating HGX With NVIDIA Networking
The data center is the new unit of computing, and networking plays an integral role in scaling application performance across it. Paired with NVIDIA Quantum InfiniBand, HGX delivers world-class performance and efficiency, which ensures the full utilization of computing resources.
For AI cloud data centers that deploy Ethernet, HGX is best used with the NVIDIA Spectrum-X networking platform, which powers the highest AI performance over Ethernet. It features NVIDIA Spectrum™-X switches and BlueField-3 DPUs to deliver consistent, predictable outcomes for thousands of simultaneous AI jobs at every scale through optimal resource utilization and performance isolation. Spectrum-X also enables advanced cloud multi-tenancy and zero-trust security. As a reference design, NVIDIA has designed Israel-1, a hyperscale generative AI supercomputer built with Dell PowerEdge XE9680 servers based on the NVIDIA HGX eight-GPU platform, BlueField-3 DPUs, and Spectrum-X switches.
Connecting HGX With NVIDIA Networking
NVIDIA Quantum-2 InfiniBand Platform: Quantum-2 Switch, ConnectX-7 Adapter, BlueField-3 DPU | NVIDIA Spectrum-X Platform: Spectrum-4 Switch, BlueField-3 SuperNIC | NVIDIA Spectrum Ethernet Platform: Spectrum Switch, ConnectX Adapter, BlueField DPU | |
Deep Learning Training | Best | Better | Good |
Scientific Simulation | Best | Better | Good |
Data Analytics | Best | Better | Good |
Deep Learning Inference | Best | Better | Good |
NVIDIA HGX Specifications
NVIDIA HGX is available in single baseboards with four or eight H200 or H100 GPUs or eight Blackwell GPUs. These powerful combinations of hardware and software lay the foundation for unprecedented AI supercomputing performance.
HGX B200 | HGX B100 | |
---|---|---|
GPUs | HGX B200 8-GPU | HGX B100 8-GPU |
Form factor | 8x NVIDIA B200 SXM | 8x NVIDIA B100 SXM |
FP4 Tensor Core | 144 PFLOPS | 112 PFLOPS |
FP8/FP6 Tensor Core | 72 PFLOPS | 56 PFLOPS |
INT8 Tensor Core | 72 POPS | 56 POPS |
FP16/BF16 Tensor Core | 36 PFLOPS | 28 PFLOPS |
TF32 Tensor Core | 18 PFLOPS | 14 PFLOPS |
FP32 | 640 TFLOPS | 480 TFLOPS |
FP64 | 320 TFLOPS | 240 TFLOPS |
FP64 Tensor Core | 320 TFLOPS | 240 TFLOPS |
Memory | Up to 1.5TB | Up to 1.5TB |
NVIDIA NVLink | Fifth generation | Fifth generation |
NVIDIA NVSwitch™ | Fourth generation | Fourth generation |
NVSwitch GPU-to-GPU bandwidth | 1.8TB/s | 1.8TB/s |
Total aggregate bandwidth | 14.4TB/s | 14.4TB/s |
HGX H200 | ||||
---|---|---|---|---|
4-GPU | 8-GPU | |||
GPUs | HGX H200 4-GPU | HGX H200 8-GPU | ||
Form factor | 4x NVIDIA H200 SXM | 8x NVIDIA H200 SXM | ||
FP8/FP6 Tensor Core | 16 PFLOPS | 32 PFLOPS | ||
INT8 Tensor Core | 16 POPS | 32 POPS | ||
FP16/BFLOAT16 Tensor Core | 8 PFLOPS | 16 PFLOPS | ||
TF32 Tensor Core | 4 PFLOPS | 8 PFLOPS | ||
FP32 | 270 TFLOPS | 540 TFLOPS | ||
FP64 | 140 TFLOPS | 270 TFLOPS | ||
FP64 Tensor Core | 270 TFLOPS | 540 TFLOPS | ||
Memory | Up to 564GB | Up to 1.1TB | ||
NVLink | Fourth generation | Fourth generation | ||
NVSwitch | N/A | Third generation | ||
NVSwitch GPU-to-GPU bandwidth | N/A | 900GB/s | ||
Total aggregate bandwidth | 3.6TB/s | 7.2TB/s |
HGX H100 | ||||
---|---|---|---|---|
4-GPU | 8-GPU | |||
GPUs | HGX H100 4-GPU | HGX H100 8-GPU | ||
Form factor | 4x NVIDIA H100 SXM | 8x NVIDIA H100 SXM | ||
HPC and AI compute (FP64/TF32/FP16/FP8/INT8)* | 268TF/4PF/8PF/16PF/16 POPS | 535TF/8PF/16PF/32PF/32 POPS | ||
FP8/FP6 Tensor Core | 16 PFLOPS | 32 PFLOPS | ||
INT8 Tensor Core | 16 POPS | 32 POPS | ||
FP16/BFLOAT16 Tensor Core | 8 PFLOPS | 16 PFLOPS | ||
TF32 Tensor Core | 4 PFLOPS | 8 PFLOPS | ||
FP32 | 270 TFLOPS | 540 TFLOPS | ||
FP64 | 140 TFLOPS | 270 TFLOPS | ||
FP64 Tensor Core | 270 TFLOPS | 540 TFLOPS | ||
Memory | Up to 320GB | Up to 640GB | ||
NVLink | Fourth generation | Fourth generation | ||
NVSwitch | N/A | Third generation | ||
NVLink Switch | N/A | N/A | ||
NVSwitch GPU-to-GPU bandwidth | N/A | 900GB/s | ||
Total aggregate bandwidth | 3.6TB/s | 7.2TB/s |
* With sparsity