-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Support Linear block scale layout in FP4 quantization #3045
base: main
Are you sure you want to change the base?
Conversation
yibinl-nvidia
commented
Mar 24, 2025
- Support Linear (row major) block scale factor layout in FP4 quantize kernel. This layout is used for trtllm-gen MOE FP4 kernel.
- New Unit tests added to test the linear layout FP4 quantize kernel. Note that FP4 linear layout GEMM kernel is not supported yet. We should add FP4 GEMM when kernel is ready.
Need to update internal_cutlass_kernel libs. |
|
Yes, I will post a MR soon. I am still familiarizing myself with the internal kernel change workflow, and need to check trtllm test can pass with the updated lib files. |
Signed-off-by: Yibin Li <[email protected]>
Signed-off-by: Yibin Li <[email protected]>
5decd10
to
aa306bf
Compare
/bot run |
@mikeiovine could you re-approve this PR? This is a mirror of the internal MR, with the minor changes on the internal_cutlass_kernel lib files. Thanks! |
/bot kill |
PR_Github #615 [ kill ] triggered by Bot |
PR_Github #615 [ kill ] completed with state |
/bot run |
PR_Github #618 [ run ] triggered by Bot |
PR_Github #618 [ run ] completed with state |