feat: Support cos_sin_cache in all cases. #3020

yuxianq · 2025-03-24T09:44:21Z

This MR contains the following updates:

Handle fuse_pos_embd=True/False and create RotaryEmbedding inside attention module, so that the users don't need to handle it in the modeling files.
Cache cos_sin for unfused rope implementation. If flashinfer is available, use apply_rope_with_cos_sin_cache_inplace instead of apply_rope_inplace. Otherwise, we fallback to pure pytorch implementation, which can support any rope now.
We use create_rope_const_params to create and cache cos_sin_cache for all rope types, including Deepseek yarn rope.

yuxianq · 2025-03-24T09:45:06Z

/bot run --add-multi-gpu-test

niukuo · 2025-03-24T09:50:37Z

PR_Github #283 [ run ] triggered by Bot

niukuo · 2025-03-24T11:15:30Z

PR_Github #283 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #272 completed with status: 'FAILURE'

yuxianq · 2025-03-25T06:39:08Z

/bot run --add-multi-gpu-test

niukuo · 2025-03-25T06:46:37Z

PR_Github #387 [ run ] triggered by Bot

niukuo · 2025-03-25T07:09:26Z

PR_Github #387 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #345 completed with status: 'FAILURE'

yuxianq · 2025-03-25T12:01:38Z

/bot run --add-multi-gpu-test

niukuo · 2025-03-25T12:10:32Z

PR_Github #430 [ run ] triggered by Bot

niukuo · 2025-03-25T14:03:13Z

PR_Github #430 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #369 completed with status: 'FAILURE'

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq · 2025-03-26T04:01:04Z

/bot run --add-multi-gpu-test

niukuo · 2025-03-26T04:07:45Z

PR_Github #510 [ run ] triggered by Bot

BestJuly · 2025-03-26T05:32:21Z

I think I am pinged by mistake, is the review request actually pointed to @litaotju ?

niukuo · 2025-03-26T05:33:38Z

PR_Github #510 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #437 completed with status: 'FAILURE'

QiJune · 2025-03-26T06:15:04Z

@yuxianq Can we split this PR to several small PRs? For example, the first item can be a single PR.

Handle fuse_pos_embd=True/False and create RotaryEmbedding inside attention module, so that the users don't need to handle it in the modeling files.

yuxianq · 2025-03-26T07:07:03Z

Can we split this PR to several small PRs? For example, the first item can be a single PR.

@QiJune I will have a try. Let me pass the CI first to validate that these features work correctly.

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq · 2025-03-26T08:47:15Z

/bot run --add-multi-gpu-test

niukuo · 2025-03-26T08:56:31Z

PR_Github #550 [ run ] triggered by Bot

niukuo · 2025-03-26T11:38:21Z

PR_Github #550 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #469 completed with status: 'FAILURE'

yuxianq · 2025-03-26T12:13:38Z

/bot run --disable-fail-fast --add-multi-gpu-test

niukuo · 2025-03-26T12:21:57Z

PR_Github #584 [ run ] triggered by Bot

niukuo · 2025-03-26T15:24:50Z

PR_Github #584 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #497 completed with status: 'FAILURE'

litaotju · 2025-04-01T04:06:33Z

examples/quantization/quantize.py

@@ -5,9 +5,8 @@
 from tensorrt_llm.quantization import (quantize_and_export,
                                       quantize_nemo_and_export)

-mp.set_start_method("spawn", force=True)


Just curious, are someone importing this file?

It should be used a cli command only.

litaotju · 2025-04-01T04:10:58Z

tensorrt_llm/_torch/attention_backend/interface.py

@@ -327,57 +338,77 @@ def from_config(config) -> "RopeParams":
            rope_params.beta_slow = rope_scaling.get("beta_slow", 1)
            rope_params.mscale = rope_scaling.get("mscale", 1.0)
            rope_params.mscale_all_dim = rope_scaling.get("mscale_all_dim", 0.0)
+        if config.model_type == "deepseek_v3":


This looks somewaht ad-hoc to me. Is it possible to not relying on the model type hard code string here in a general interface?

litaotju · 2025-04-01T04:12:36Z

tensorrt_llm/_torch/attention_backend/interface.py

+        assert self.scale_type != RotaryScalingType.longrope, "Long RoPE is not yet supported."
+        if self.scale_type == RotaryScalingType.yarn:
+            rope_inv_freq = None
+            rope_cos_sin = RopeEmbeddingUtils.create_sinusoidal_positions_for_deepseek_attention_plugin(


Maybe Nit.

Can we say create*positions_for_yarn or something similar w/o to be too specific to DeepSeek easily?

litaotju · 2025-04-01T04:25:11Z

tensorrt_llm/_torch/modules/attention.py

@@ -102,7 +102,17 @@ def __init__(
        self.quant_config = config.get_quant_config()
        self.attn_backend = config.attn_backend
        self.pos_embd_params = pos_embd_params
-        self.rotary_emb = rotary_emb
+
+        self.support_rope = self.attn_backend == "TRTLLM"


you should be the custom op will do rope fusion right?

Maybe renaming to something like? self.rope_fused_in_custom_op = True

litaotju · 2025-04-01T04:26:57Z

tensorrt_llm/_torch/modules/attention.py

-        self.rotary_emb = rotary_emb
+
+        self.support_rope = self.attn_backend == "TRTLLM"
+        self.support_fused_qkv = self.attn_backend == "TRTLLM"


"support" is a vague word.
support means which one?

configuable, both fused qkv and unfused qkv can run. or

requires fused qkv?

litaotju · 2025-04-01T04:30:15Z

tensorrt_llm/_torch/pyexecutor/model_engine.py

@@ -249,6 +249,8 @@ def __init__(
            attn_backend=attn_backend,
            load_format=pytorch_backend_config.load_format,
        )
+        if not hasattr(self.model, 'extra_attrs'):


when will this be true?
Can we always attach the extra_attrs in _load_model such that this won't be needed?

litaotju · 2025-04-01T04:31:27Z

tensorrt_llm/_torch/pyexecutor/model_engine.py

-                                                       gather_ids)
-            else:
-                return self._forward_step(inputs, gather_ids)
+            with model_extra_attrs(self.model.extra_attrs):


shall we wrap the whole forward function be inside this context manager?

litaotju · 2025-04-01T04:32:50Z

tensorrt_llm/llmapi/mpi_session.py

@@ -122,7 +122,7 @@ def submit_sync(self, task: Callable[..., T], *args, **kwargs) -> List[T]:

    def shutdown(self):
        if self.mpi_pool is not None:
-            self.mpi_pool.shutdown(wait=False)


@Superjomn do you remember if we have some reason to make this "wait=False"?

litaotju · 2025-04-01T04:33:54Z

tests/unittest/_torch/modeling/test_modeling_out_of_tree.py

@@ -36,6 +36,7 @@ def test_llm_api(self, import_oot_code: bool):
                llm = LLM(model=model_dir,
                          kv_cache_config=kv_cache_config,
                          max_num_tokens=2048)
+                del llm


will not this llm object be automatically destoryed when the function return? its just a local var.

litaotju · 2025-04-01T04:34:52Z

tests/unittest/_torch/multi_gpu_modeling/test_deepseek.py

@@ -216,3 +216,4 @@ async def test():
                           1.0), f"Expected '{expected}' but get '{result}'"

    asyncio.run(test())
+    del llm


same here, is there some reason that the object not deleted by python when function returns?

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq · 2025-04-02T08:27:01Z

/bot run --disable-fail-fast --stage-list "A30-7"

yuxianq requested review from hlu1, BestJuly, QiJune and kaiyux March 24, 2025 09:44

Support cos_sin_cache in all cases.

4eb47f1

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq force-pushed the user/yuxianq/cos-sin-cache branch from 86c5593 to 95840f8 Compare March 26, 2025 03:57

Fix OOM issues.

54d797b

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq force-pushed the user/yuxianq/cos-sin-cache branch from 95840f8 to 54d797b Compare March 26, 2025 04:00

BestJuly removed their request for review March 26, 2025 05:20

yuxianq requested a review from litaotju March 26, 2025 07:01

Fix CI errors.

932d92e

Signed-off-by: Yuxian Qiu <[email protected]>

yuxianq changed the title ~~Support cos_sin_cache in all cases.~~ feat: Support cos_sin_cache in all cases. Mar 26, 2025

litaotju reviewed Apr 1, 2025

View reviewed changes

Fix OOM issues.

77b5b14

Signed-off-by: Yuxian Qiu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support cos_sin_cache in all cases. #3020

feat: Support cos_sin_cache in all cases. #3020

yuxianq commented Mar 24, 2025

yuxianq commented Mar 24, 2025

niukuo commented Mar 24, 2025

niukuo commented Mar 24, 2025

yuxianq commented Mar 25, 2025

niukuo commented Mar 25, 2025

niukuo commented Mar 25, 2025

yuxianq commented Mar 25, 2025

niukuo commented Mar 25, 2025

niukuo commented Mar 25, 2025

yuxianq commented Mar 26, 2025

niukuo commented Mar 26, 2025

BestJuly commented Mar 26, 2025

niukuo commented Mar 26, 2025

QiJune commented Mar 26, 2025

yuxianq commented Mar 26, 2025

yuxianq commented Mar 26, 2025

niukuo commented Mar 26, 2025

niukuo commented Mar 26, 2025

yuxianq commented Mar 26, 2025

niukuo commented Mar 26, 2025

niukuo commented Mar 26, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

litaotju Apr 1, 2025

yuxianq commented Apr 2, 2025

feat: Support cos_sin_cache in all cases. #3020

Are you sure you want to change the base?

feat: Support cos_sin_cache in all cases. #3020

Conversation

yuxianq commented Mar 24, 2025

yuxianq commented Mar 24, 2025

niukuo commented Mar 24, 2025

niukuo commented Mar 24, 2025

yuxianq commented Mar 25, 2025

niukuo commented Mar 25, 2025

niukuo commented Mar 25, 2025

yuxianq commented Mar 25, 2025

niukuo commented Mar 25, 2025

niukuo commented Mar 25, 2025

yuxianq commented Mar 26, 2025

niukuo commented Mar 26, 2025

BestJuly commented Mar 26, 2025

niukuo commented Mar 26, 2025

QiJune commented Mar 26, 2025

yuxianq commented Mar 26, 2025

yuxianq commented Mar 26, 2025

niukuo commented Mar 26, 2025

niukuo commented Mar 26, 2025

yuxianq commented Mar 26, 2025

niukuo commented Mar 26, 2025

niukuo commented Mar 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuxianq commented Apr 2, 2025