Paged attention implementation with prepare_model_inputs_hook #30

JRosenkranz · 2025-04-22T16:39:42Z

This PR adds the paged attention torch custom op to test cpu/aiu paged attention. It includes a prepare_model_inputs_hook which handles paged attention metadata during generation.

This PR depends on foundation-model-stack/foundation-model-stack#396

…ntion_op param in fms Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

updated with paged attention implementation using the new custom_atte…

f57e2b8

…ntion_op param in fms Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

JRosenkranz requested a review from ani300 April 22, 2025 16:39

JRosenkranz self-assigned this Apr 22, 2025

JRosenkranz marked this pull request as draft April 22, 2025 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paged attention implementation with prepare_model_inputs_hook #30

Paged attention implementation with prepare_model_inputs_hook #30

JRosenkranz commented Apr 22, 2025

Paged attention implementation with prepare_model_inputs_hook #30

Are you sure you want to change the base?

Paged attention implementation with prepare_model_inputs_hook #30

Conversation

JRosenkranz commented Apr 22, 2025