NVIDIA H200 Tensor Core GPU

Supercharging AI and HPC workloads.

The GPU for Generative AI and HPC

The NVIDIA H200 Tensor Core GPU supercharges generative AI and high-performance computing (HPC) workloads with game-changing performance and memory capabilities. As the first GPU with HBM3e, the H200’s larger and faster memory fuels the acceleration of generative AI and large language models (LLMs) while advancing scientific computing for HPC workloads.

Highlights

Experience Next-Level Performance

Llama2 70B Inference

 

1.9X Faster

GPT-3 175B Inference

 

1.6X Faster

High-Performance Computing

 

110X Faster

Benefits

Higher Performance With Larger, Faster Memory

Based on the NVIDIA Hopper™ architecture, the NVIDIA H200 is the first GPU to offer 141 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s) —that’s nearly double the capacity of the NVIDIA H100 Tensor Core GPU with 1.4X more memory bandwidth. The H200’s larger and faster memory accelerates generative AI and LLMs, while advancing scientific computing for HPC workloads with better energy efficiency and lower total cost of ownership.

H200 VS H100

Preliminary specifications. May be subject to change. Llama2 13B: ISL 128, OSL 2K | Throughput | H100 SXM 1x GPU BS 64 | H200 SXM 1x GPU BS 128 GPT-3 175B: ISL 80, OSL 200 | x8 H100 SXM GPUs BS 64 | x8 H200 SXM GPUs BS 128 Llama2 70B: ISL 2K, OSL 128 | Throughput | H100 SXM 1x GPU BS 8 | H200 SXM 1x GPU BS 32.

Unlock Insights With High-Performance LLM Inference

In the ever-evolving landscape of AI, businesses rely on LLMs to address a diverse range of inference needs. An AI inference accelerator must deliver the highest throughput at the lowest TCO when deployed at scale for a massive user base.

The H200 boosts inference speed by up to 2X compared to H100 GPUs when handling LLMs like Llama2.

Supercharge High-Performance Computing

Memory bandwidth is crucial for HPC applications as it enables faster data transfer, reducing complex processing bottlenecks. For memory-intensive HPC applications like simulations, scientific research, and artificial intelligence, the H200’s higher memory bandwidth ensures that data can be accessed and manipulated efficiently, leading up to 110X faster time to results compared to CPUs.

dataset AUSURF112 | 1x H100 SXM | 1x H200 SXM.

Preliminary specifications. May be subject to change. HPC MILC- dataset NERSC Apex Medium | HGX H200 4-GPU | dual Sapphire Rapids 8480 HPC Apps- CP2K: dataset H2O-32-RI-dRPA-96points | GROMACS: dataset STMV | ICON: dataset r2b5 | MILC: dataset NERSC Apex Medium | Chroma: dataset HMC Medium | Quantum Espresso: dataset AUSURF112 | 1x H100 SXM | 1x H200 SXM.

H100 SXM 1x GPU BS 8 | H200 SXM 1x GPU BS 32

Preliminary specifications. May be subject to change. Llama2 70B: ISL 2K, OSL 128 | Throughput | H100 SXM 1x GPU BS 8 | H200 SXM 1x GPU BS 32

Reduce Energy and TCO

With the introduction of the H200, energy efficiency and TCO reach new levels. This cutting-edge technology offers unparalleled performance, all within the same power profile as the H100. AI factories and supercomputing systems that are not only faster but also more eco-friendly, deliver an economic edge that propels the AI and scientific community forward.

Unleashing AI Acceleration for Mainstream Enterprise Servers With H200 NVL

The NVIDIA H200 NVL is the ideal choice for customers with space constraints within the data center, delivering acceleration for every AI and HPC workload regardless of size. With a 1.5X memory increase and a 1.2X bandwidth increase over the previous generation, customers can fine-tune LLMs within a few hours and experience LLM inference 1.8X faster.

Enterprise-Ready: AI Software Streamlines Development and Deployment

NVIDIA AI Enterprise, together with NVIDIA H200, simplifies the building of an AI-ready platform, accelerating AI development and deployment of production-ready generative AI, computer vision, speech AI, and more. Together, with NIM inference microservices, deployments have enterprise-grade security, manageability, stability, and support. The result is faster, actionable insights and achieving tangible business value sooner.

Specifications

NVIDIA H200 Tensor Core GPU

 
Form FactorH200 SXM¹H200 NVL¹
FP6434 TFLOPS34 TFLOPS
FP64 Tensor Core67 TFLOPS67 TFLOPS
FP3267 TFLOPS67 TFLOPS
TF32 Tensor Core989 TFLOPS²989 TFLOPS2²
BFLOAT16 Tensor Core1,979 TFLOPS²1,979 TFLOPS²
FP16 Tensor Core1,979 TFLOPS²1,979 TFLOPS²
FP8 Tensor Core3,958 TFLOPS²3,958 TFLOPS²
INT8 Tensor Core3,958 TFLOPS²3,958 TFLOPS²
GPU Memory141GB141GB
GPU Memory Bandwidth4.8TB/s4.8TB/s
Decoders7 NVDEC
7 JPEG
7 NVDEC
7 JPEG
Confidential ComputingSupportedSupported
Max Thermal Design Power (TDP)Up to 700W (configurable)Up to 600W (configurable)
Multi-Instance GPUsUp to 7 MIGs @16.5GB eachUp to 7 MIGs @16.5GB each
Form FactorSXMPCIe
InterconnectNVIDIA NVLink®: 900GB/s
PCIe Gen5: 128GB/s
2- or 4-way NVIDIA NVLink bridge: 900GB/s PCIe Gen5: 128GB/s
Server OptionsNVIDIA HGX™ H200 partner and NVIDIA-Certified Systems™ with 4 or 8 GPUsNVIDIA MGX™ H200 NVL partner and NVIDIA-Certified Systems with up to 8 GPUs
NVIDIA AI EnterpriseAdd-onIncluded