Float FMA vs Integer DP4A & DPX Instructions #35

ashvardanian · 2025-02-12T11:39:07Z

CUDA natively supports Fused-Multiply-Accumulate operations for every float type, including f16 and bf16. It also provides DP4A instructions for 8-bit integer dot-products with 32-bit accumulators and umul24 instructions for 24-bit integer multiplication. Starting with Hopper, Dynamic Programming eXtensitons (DPX) were added for combinatorial problems that can be used to implement Algebraic Graph Theory algorithms using matrix multiplications over alternative semirings.

How do those instructions stack up, and how much performance can we expect from recent State-of-the-Art GPUs like the Nvidia H200?

f64 FMA: 4.5 T
i64 FMA: 3.1 T
f32 FMA: 22 T
i32 FMA: 15.5 T ...so we should always prefer 32-bit ops
u8u32 DP4A: 39.3 T
u24u32 UMUL: 13.4 T ...not really better than i32 FMA
f16 FMA on Volta: 12.2 T
bf16 FMA on Ampere: 12.2 T
DPX for Floyd-Warshall algorithm with u16 and u32 on Hopper: 11 T
DPX for Needleman-Wunsch algorithm with i16 and i32 on Hopper: 11 T
DPX for Smith-Waterman algorithm with i32 on Hopper: 27 T

Check the code and inline comments for more details!

Otherwise PTXAS optimizes-out integer kernels

ashvardanian added 7 commits February 11, 2025 15:58

Add: In-register FMA benchmarks for GPUs

97991fd

Fix: bf16 requires Ampere

306ee3f

Improve: Naming variables

80e1d83

Add: DPX instructions on Hopper

1ab4f41

Fix: Initialize FMA inputs

22f52c4

Otherwise PTXAS optimizes-out integer kernels

Add: dp4a & umul24 instructions

ce1e3b7

Docs: FMA CUDA throughput

c00e421

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Float FMA vs Integer DP4A & DPX Instructions #35

Float FMA vs Integer DP4A & DPX Instructions #35

ashvardanian commented Feb 12, 2025 •

edited

Loading

Float FMA vs Integer DP4A & DPX Instructions #35

Are you sure you want to change the base?

Float FMA vs Integer DP4A & DPX Instructions #35

Conversation

ashvardanian commented Feb 12, 2025 • edited Loading

ashvardanian commented Feb 12, 2025 •

edited

Loading