Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Support Linear block scale layout in FP4 quantization #3045

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yibinl-nvidia
Copy link
Collaborator

  • Support Linear (row major) block scale factor layout in FP4 quantize kernel. This layout is used for trtllm-gen MOE FP4 kernel.
  • New Unit tests added to test the linear layout FP4 quantize kernel. Note that FP4 linear layout GEMM kernel is not supported yet. We should add FP4 GEMM when kernel is ready.

@yibinl-nvidia yibinl-nvidia self-assigned this Mar 25, 2025
@yibinl-nvidia
Copy link
Collaborator Author

Need to update internal_cutlass_kernel libs.

@nv-guomingz
Copy link
Collaborator

Need to update internal_cutlass_kernel libs.
@yibinl-nvidia is there mr for updating internal_cutlass_kernels?

@mikeiovine mikeiovine self-requested a review March 25, 2025 16:15
@yibinl-nvidia yibinl-nvidia marked this pull request as draft March 25, 2025 16:25
@yibinl-nvidia
Copy link
Collaborator Author

yibinl-nvidia commented Mar 25, 2025

Need to update internal_cutlass_kernel libs.
@yibinl-nvidia is there mr for updating internal_cutlass_kernels?

Yes, I will post a MR soon. I am still familiarizing myself with the internal kernel change workflow, and need to check trtllm test can pass with the updated lib files.

@yibinl-nvidia
Copy link
Collaborator Author

/bot run

@yibinl-nvidia yibinl-nvidia marked this pull request as ready for review March 26, 2025 21:16
@yibinl-nvidia
Copy link
Collaborator Author

@mikeiovine could you re-approve this PR? This is a mirror of the internal MR, with the minor changes on the internal_cutlass_kernel lib files. Thanks!

@yibinl-nvidia
Copy link
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #615 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #615 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit aa306bf

@yibinl-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #618 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #618 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #520 completed with status: 'FAILURE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants