Published at 2023-10-25 | Last Update 2023-10-25
This post provides a concise reference for the performance of popular GPU models from NVIDIA and Huawei/HiSilicon, primarily intended for personal use.
The first letter in GPU model names denote their GPU architectures, with:
T
for Turing;A
for Ampere;V
for Volta;H
for Hopper; 2022L
for Ada Lovelace;T4 | A10 | A10G | A30 | V100 PCIe/SMX2 | |
---|---|---|---|---|---|
Designed for | Data center workloads | (Desktop) Graphics-intensive workloads | Desktop | Desktop | Data center |
Year | 2018 | 2020 | 2017 | ||
Manufacturing | 12nm | 12nm | 12nm | ||
Architecture | Turing | Ampere | Ampere | Ampere | Volta |
Max Power | 70 watts | 150 watts | 165 watts | 250/300watts | |
GPU Mem | 16GB GDDR6 | 24GB GDDR6 | 48GB GDDR6 | 24GB HBM2 | 16/32GB HBM2 |
GPU Mem BW | 400 GB/s | 600 GB/s | 933GB/s |
900 GB/s |
|
Interconnect | PCIe Gen3 32GB/s | PCIe Gen4 66 GB/s | PCIe Gen4 64GB/s, NVLINK 200GB/s | PCIe Gen3 32GB/s, NVLINK 300GB/s |
|
FP32 | 8.1 TFLOPS | 31.2 TFLOPS | 10.3TFLOPS | 14/15.7 TFLOPS | |
BFLOAT16 TensorCore | 125 TFLOPS | 165 TFLOPS | |||
FP16 TensorCore | 125 TFLOPS | 165 TFLOPS | |||
INT8 TensorCore | 250 TFLOPS | 330 TOPS | |||
INT4 TensorCore | 661 TOPS |
Datasheets:
A800 (PCIe/SXM) | A100 (PCIe/SXM) | Huawei Ascend 910B | H800 (PCIe/SXM) | H100 (PCIe/SXM) | |
---|---|---|---|---|---|
Year | 2022 | 2020 | 2023 | 2022 | 2022 |
Manufacturing | 7nm | 7nm | 7+nm | 4nm | 4nm |
Architecture | Ampere | Ampere | HUAWEI Da Vinci | Hopper | Hopper |
Max Power | 300/400 watt | 300/400 watt | 400 watt | 350/700 watt | |
GPU Mem | 80G HBM2e | 80G HBM2e | 64G HBM2e | 80G HBM3 | 80G HBM3 |
GPU Mem BW | 1935/2039 GB/s | 2/3.35 TB/s | |||
Interconnect | NVLINK 400GB/s | PCIe Gen4 64GB/s, NVLINK 600GB/s | HCCS 392GB/s | NVLINK 400GB/s | PCIe Gen5 128GB/s, NVLINK 900GB/s |
FP32 | 19.5 TFLOPS | 51/67 TFLOPS | |||
TF32 (TensorFloat) | 156/312 TFLOPS | 756/989 TFLOPS | |||
BFLOAT16 TensorCore | 156/312 TFLOPS | ||||
FP16 TensorCore | 312/624 TFLOPS | 320 TFLOPS | 1513/1979 TFLOPS | ||
FP8 TensorCore | NOT support | NOT support | 3026/3958 TFLOPS | ||
INT8 TensorCore | 624/1248 TFLOPS | 640 TFLOPS | 3026/3958 TFLOPS |
H100 vs. A100 in one word: 3x performance, 2x price.
Datasheets: