Feat: Support Linear block scale layout in FP4 quantization #3045

yibinl-nvidia · 2025-03-24T22:40:22Z

Support Linear (row major) block scale factor layout in FP4 quantize kernel. This layout is used for trtllm-gen MOE FP4 kernel.
New Unit tests added to test the linear layout FP4 quantize kernel. Note that FP4 linear layout GEMM kernel is not supported yet. We should add FP4 GEMM when kernel is ready.

yibinl-nvidia · 2025-03-25T04:57:09Z

Need to update internal_cutlass_kernel libs.

nv-guomingz · 2025-03-25T09:35:40Z

Need to update internal_cutlass_kernel libs.
@yibinl-nvidia is there mr for updating internal_cutlass_kernels?

yibinl-nvidia · 2025-03-25T16:26:30Z

Need to update internal_cutlass_kernel libs.
@yibinl-nvidia is there mr for updating internal_cutlass_kernels?

Yes, I will post a MR soon. I am still familiarizing myself with the internal kernel change workflow, and need to check trtllm test can pass with the updated lib files.

Signed-off-by: Yibin Li <[email protected]>

yibinl-nvidia · 2025-03-26T21:16:15Z

/bot run

yibinl-nvidia · 2025-03-26T21:17:18Z

@mikeiovine could you re-approve this PR? This is a mirror of the internal MR, with the minor changes on the internal_cutlass_kernel lib files. Thanks!

yibinl-nvidia · 2025-03-26T21:49:17Z

/bot kill

tensorrt-cicd · 2025-03-26T21:55:33Z

PR_Github #615 [ kill ] triggered by Bot

tensorrt-cicd · 2025-03-26T21:55:34Z

PR_Github #615 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit aa306bf

yibinl-nvidia · 2025-03-26T22:04:44Z

/bot run

tensorrt-cicd · 2025-03-26T22:13:30Z

PR_Github #618 [ run ] triggered by Bot

tensorrt-cicd · 2025-03-26T23:00:16Z

PR_Github #618 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #520 completed with status: 'FAILURE'

yibinl-nvidia self-assigned this Mar 25, 2025

mikeiovine self-requested a review March 25, 2025 16:15

yibinl-nvidia marked this pull request as draft March 25, 2025 16:25

yibinl-nvidia added 2 commits March 26, 2025 14:15

update FP4 quantize layout

37be99f

Signed-off-by: Yibin Li <[email protected]>

update internal_cutlass_kernels lib

aa306bf

Signed-off-by: Yibin Li <[email protected]>

yibinl-nvidia force-pushed the feat-fp4-layout branch from 5decd10 to aa306bf Compare March 26, 2025 21:16

yibinl-nvidia marked this pull request as ready for review March 26, 2025 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Support Linear block scale layout in FP4 quantization #3045

Feat: Support Linear block scale layout in FP4 quantization #3045

yibinl-nvidia commented Mar 24, 2025

yibinl-nvidia commented Mar 25, 2025

nv-guomingz commented Mar 25, 2025

yibinl-nvidia commented Mar 25, 2025 •

edited

Loading

yibinl-nvidia commented Mar 26, 2025

yibinl-nvidia commented Mar 26, 2025

yibinl-nvidia commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

yibinl-nvidia commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

Feat: Support Linear block scale layout in FP4 quantization #3045

Are you sure you want to change the base?

Feat: Support Linear block scale layout in FP4 quantization #3045

Conversation

yibinl-nvidia commented Mar 24, 2025

yibinl-nvidia commented Mar 25, 2025

nv-guomingz commented Mar 25, 2025

yibinl-nvidia commented Mar 25, 2025 • edited Loading

yibinl-nvidia commented Mar 26, 2025

yibinl-nvidia commented Mar 26, 2025

yibinl-nvidia commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

yibinl-nvidia commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

tensorrt-cicd commented Mar 26, 2025

yibinl-nvidia commented Mar 25, 2025 •

edited

Loading