[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support #3442

Jeff-Huang · 2025-12-16T14:02:49Z

Proposed changes

This PR introduces significant improvements to the Batch Prefill kernels in ck_tile, enabling codegen support, change the KV layouts and support 1024 page size.

Codegen Support: Enabled code generation for batch_prefill kernels.
* Updated fmha_batch_prefill.py to generate kernels with different page sizes 1024.
* Pipeline & Problem Structure:
* Introduced BlockFmhaBatchPrefillPipelineProblem structure for better management of batch prefill problems.
* Updated block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp to handle async operations and offsets
correctly.

KV Cache & Layout Enhancements:
- Added support for k cache layout: [num_blocks, num_kv_heads, head_size/8, block_size, 8].
- Added support for v cache layout: [num_blocks, num_kv_heads, block_size/8, head_size, 8].
- Added kv_last_page_lens to kernel arguments to support handling varying sequence lengths in paged attention.
- Refined kv_offset_array_transform to calculate k/v offset.
- Added support for large page size (1024).
Bug Fixes:
- Fixed incorrect page ID calculation in kv_offset_array_transform specifically for gfx950.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

- create new problem struct BlockFmhaBatchPrefillPipelineProblem for batch prefill kernels - generate different page sizes of batch prefill kernels (1, 16)

…fx950 2. support page size 1024

include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_problem.hpp

example/ck_tile/01_fmha/codegen/ops/fmha_batch_prefill.py

include/ck_tile/ops/fmha/kernel/fmha_batch_prefill_kernel.hpp

2. correct some comments 3. add static assert to make sure v offsets is in same page within a tile.

kv_page_indices as a pointer of the lookup table.

include/ck_tile/ops/fmha/kernel/fmha_batch_prefill_kernel.hpp

2. add static check to make sure vlayout is row-major.

include/ck_tile/ops/fmha/kernel/fmha_batch_prefill_kernel.hpp

afagaj · 2026-01-03T05:26:34Z

@Jeff-Huang This is a significant PR.

Please add one or more entries to CHANGELOG.md as appropriate.

Jeff-Huang · 2026-01-05T00:53:53Z

@Jeff-Huang This is a significant PR.

Please add one or more entries to CHANGELOG.md as appropriate.

Thanks for the reminder! I've updated the CHANGELOG.md with the relevant entries in the latest commit.

Jeff-Huang requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners December 16, 2025 14:02

Jeff-Huang changed the title ~~[FMHA] Batch Prefill Support Improvements: Codegen, SGLang Layout & Page Size Enhancements~~ [FMHA] Batch Prefill Support Improvements: SGLang/vLLM Layout & Large Page Size Support Dec 16, 2025

Jeff-Huang changed the title ~~[FMHA] Batch Prefill Support Improvements: SGLang/vLLM Layout & Large Page Size Support~~ [FMHA] Batch Prefill Support Improvements: SGLang/vLLM KV Cache Layout & Large Page Size Support Dec 16, 2025

Jeff-Huang force-pushed the ck_tile/batch_prefill_paged_size_16_rebase branch 4 times, most recently from 3c5f0b0 to 311bbbb Compare December 18, 2025 23:04

ltqin and others added 8 commits December 19, 2025 18:03

add page_block_size parameter

c7a9340

add is_sglang_layout to parameters

01ab82e

add kv_offset_array_transform to batch async for page size 16

967b841

add kv_last_page_lens to kernel

6a86b55

change kv layout to [num_total_pages, page_block_size, hdim]

c48f67e

format

3ff16af

- enable codegen of batch_prefill kernels

70ea729

- create new problem struct BlockFmhaBatchPrefillPipelineProblem for batch prefill kernels - generate different page sizes of batch prefill kernels (1, 16)

1. fix wrong calculation of page id in kv_offset_array_transform in g…

ad6e151

…fx950 2. support page size 1024

poyenc reviewed Dec 29, 2025

View reviewed changes

include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_problem.hpp Outdated Show resolved Hide resolved

poyenc reviewed Dec 29, 2025

View reviewed changes

example/ck_tile/01_fmha/codegen/ops/fmha_batch_prefill.py Outdated Show resolved Hide resolved

poyenc reviewed Dec 29, 2025

View reviewed changes

example/ck_tile/01_fmha/codegen/ops/fmha_batch_prefill.py Outdated Show resolved Hide resolved

poyenc reviewed Dec 29, 2025

View reviewed changes

include/ck_tile/ops/fmha/kernel/fmha_batch_prefill_kernel.hpp Outdated Show resolved Hide resolved

Jeff-Huang added 5 commits December 29, 2025 23:05

1. remove batch prefill pipeline with sk_pad=false

fa288b9

2. correct some comments 3. add static assert to make sure v offsets is in same page within a tile.

fix vgpr spill count

ddfe125

Merge branch 'develop' into ck_tile/batch_prefill_paged_size_16_rebase

8ac25aa

remove unnecessary t2s functions

93c0dcf

add fp8 support for receipt 200 and 600 in fmha_bath_prefill.py

58e79ad

poyenc assigned Jeff-Huang Dec 31, 2025

Jeff-Huang added 3 commits December 31, 2025 11:05

support linear kv cache layout

37190f2

Remove block_table_ptr from fwd_batch_prefill_args. Instead, reuse

db42e95

kv_page_indices as a pointer of the lookup table.

Merge branch 'develop' into ck_tile/batch_prefill_paged_size_16_rebase

1bcb9ae

poyenc reviewed Dec 31, 2025

View reviewed changes

include/ck_tile/ops/fmha/kernel/fmha_batch_prefill_kernel.hpp Show resolved Hide resolved

Jeff-Huang added 2 commits December 31, 2025 17:59

1. merge multiple transforms into single transform.

d98bd16

2. add static check to make sure vlayout is row-major.

Merge branch 'develop' into ck_tile/batch_prefill_paged_size_16_rebase

6193348

poyenc reviewed Jan 2, 2026

View reviewed changes

include/ck_tile/ops/fmha/kernel/fmha_batch_prefill_kernel.hpp Outdated Show resolved Hide resolved

move FmhaFwdCommonKargs::seqlen_k_ptr to VllmPageTableKargs.

2a9e6cf

poyenc previously approved these changes Jan 2, 2026

View reviewed changes

Jeff-Huang added 2 commits January 4, 2026 11:50

update changelog

961dbb6

Merge branch 'develop' into ck_tile/batch_prefill_paged_size_16_rebase

424b81e

Jeff-Huang dismissed poyenc’s stale review via 424b81e January 4, 2026 03:51

Jeff-Huang requested review from a team and ddembeckAMD as code owners January 4, 2026 03:51

Merge branch 'develop' into ck_tile/batch_prefill_paged_size_16_rebase

ed7d130

poyenc approved these changes Jan 5, 2026

View reviewed changes

Jeff-Huang merged commit cc75a1d into develop Jan 5, 2026
24 of 29 checks passed

Jeff-Huang deleted the ck_tile/batch_prefill_paged_size_16_rebase branch January 5, 2026 10:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support #3442

[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support #3442

Uh oh!

Jeff-Huang commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afagaj commented Jan 3, 2026

Uh oh!

Jeff-Huang commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support #3442

[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support #3442

Uh oh!

Conversation

Jeff-Huang commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afagaj commented Jan 3, 2026

Uh oh!

Jeff-Huang commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Jeff-Huang commented Dec 16, 2025 •

edited

Loading