ollama: add reasoning model support (e.g. deepseek) #29689

BobMerkus · 2025-02-08T17:13:03Z

Description

This PR adds reasoning model support for langchain-ollama by extracting reasoning token blocks, like those used in deepseek. It was inspired by ollama-deep-researcher, specifically the parsing of thinking blocks:

  # TODO: This is a hack to remove the <think> tags w/ Deepseek models 
  # It appears very challenging to prompt them out of the responses 
  while "<think>" in running_summary and "</think>" in running_summary:
      start = running_summary.find("<think>")
      end = running_summary.find("</think>") + len("</think>")
      running_summary = running_summary[:start] + running_summary[end:]

This notes that it is very hard to remove the reasoning block from prompting, but we actually want the model to reason in order to increase model performance. This implementation extracts the thinking block, so the client can still expect a proper message to be returned by ChatOllama (and use the reasoning content separately when desired).

This implementation takes the same approach as ChatDeepseek, which adds the reasoning content to chunk.additional_kwargs.reasoning_content;

  if hasattr(response.choices[0].message, "reasoning_content"):  # type: ignore
      rtn.generations[0].message.additional_kwargs["reasoning_content"] = (
          response.choices[0].message.reasoning_content  # type: ignore
      )

This should probably be handled upstream in ollama + ollama-python, but this seems like a reasonably effective solution. This is a standalone example of what is happening;

async def deepseek_message_astream(
    llm: BaseChatModel,
    messages: list[BaseMessage],
    config: RunnableConfig | None = None,
    *,
    model_target: str = "deepseek-r1",
    **kwargs: Any,
) -> AsyncIterator[BaseMessageChunk]:
    """Stream responses from Deepseek models, filtering out <think> tags.

    Args:
        llm: The language model to stream from
        messages: The messages to send to the model

    Yields:
        Filtered chunks from the model response
    """
    # check if the model is deepseek based
    if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model):
        async for chunk in llm.astream(messages, config=config, **kwargs):
            yield chunk
        return

    # Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over
    buffer = ""
    async for chunk in llm.astream(messages, config=config, **kwargs):
        # start or append
        if not buffer:
            buffer = chunk.content
        else:
            buffer += chunk.content if hasattr(chunk, "content") else chunk

        # Process buffer to remove <think> tags
        if "<think>" in buffer or "</think>" in buffer:
            if hasattr(chunk, "tool_calls") and chunk.tool_calls:
                raise NotImplementedError("tool calls during reasoning should be removed?")
            if "<think>" in chunk.content or "</think>" in chunk.content:
                continue
            chunk.additional_kwargs["reasoning_content"] = chunk.content
            chunk.content = ""
        # upon block completion, reset the buffer
        if "<think>" in buffer and "</think>" in buffer:
            buffer = ""
        yield chunk

Issue

Integrating reasoning models (e.g. deepseek-r1) into existing LangChain based workflows is hard due to the thinking blocks that are included in the message contents. To avoid this, we could match the ChatOllama integration with ChatDeepseek to return the reasoning content inside message.additional_arguments.reasoning_content instead.

Dependenices

None

vercel · 2025-02-08T17:13:08Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Mar 21, 2025 3:43pm

ccurme

Hi @BobMerkus, we can move forward with this but I don't think the current implementation is working correctly. I'm finding that reasoning_content is just "<think></think>". The reason is that _generate runs through streaming, so as tokens are generated in the stream, we don't know whether they're part of the thinking block or not.

BobMerkus · 2025-03-19T13:34:53Z

Hi @BobMerkus, we can move forward with this but I don't think the current implementation is working correctly. I'm finding that reasoning_content is just "<think></think>". The reason is that _generate runs through streaming, so as tokens are generated in the stream, we don't know whether they're part of the thinking block or not.

Hey @ccurme,

Good catch, the standalone example I supplied is aggregating a buffer to account for this. The initial prompt inside the test did not actually lead to any content inside the thinking block (because it was relatively simple), I overlooked this while integrating in to ChatOllama. I've updated the test and implementation, I think it should be working correctly now. In streaming mode, we keep track whether we are inside a 'thinking block' and if so, the tokens are added to additional_kwargs["reasoning_content"] instead. When the chunks are aggregated based on the stream, then it should lead to the same result as invoke().

libs/partners/ollama/langchain_ollama/chat_models.py

ccurme

Thanks @BobMerkus. I updated the default for extract_reasoning to False. I also refactored this:

7fac207 refactors the original streaming code to share a single iterator method
e8ff5a8 adds back in the extraction of reasoning content

The motivation is that extracting reasoning content is arguably a niche use-case (disabled by default, only applies to a subset of models), so I was hesitant to introduce complexity into the basic streaming loops. This separates it out a bit more.

Would appreciate a review :)

BobMerkus · 2025-03-21T11:33:35Z

Thanks @BobMerkus. I updated the default for extract_reasoning to False. I also refactored this:

7fac207 refactors the original streaming code to share a single iterator method

e8ff5a8 adds back in the extraction of reasoning content

The motivation is that extracting reasoning content is arguably a niche use-case (disabled by default, only applies to a subset of models), so I was hesitant to introduce complexity into the basic streaming loops. This separates it out a bit more.

Would appreciate a review :)

Yeah, makes sense to separate this from existing runtime. Implementation seems good to me and thanks for fixing the linter errors! As a side note; some of the other integration tests seem to fail when running locally, although not related to the changes in this MR.

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 8, 2025

feat: reasoning model support

052f76c

BobMerkus force-pushed the feat/ollama-reasoning-support branch from be00618 to 052f76c Compare February 8, 2025 18:34

efriis assigned ccurme Feb 13, 2025

format

6f9ce1f

ccurme reviewed Mar 18, 2025

View reviewed changes

BobMerkus added 2 commits March 18, 2025 23:13

Merge branch 'langchain-ai:master' into feat/ollama-reasoning-support

dde95d4

fix: support reasoning content for streaming methods

cc91ca9

BobMerkus force-pushed the feat/ollama-reasoning-support branch from 517a870 to cc91ca9 Compare March 18, 2025 23:11

fix: update invoke() implementation to aggregate thinking blocks

5331c53

Merge branch 'master' into feat/ollama-reasoning-support

71038dd

BobMerkus commented Mar 19, 2025

View reviewed changes

libs/partners/ollama/langchain_ollama/chat_models.py Outdated Show resolved Hide resolved

ccurme added 4 commits March 19, 2025 11:59

Merge branch 'master' into feat/ollama-reasoning-support

5877bc9

revert

1cffc1d

refactor

7fac207

re-implement extraction of reasoning content

e8ff5a8

ccurme reviewed Mar 19, 2025

View reviewed changes

fix for python 3.9

2855516

ccurme added 4 commits March 21, 2025 10:56

Merge branch 'master' into feat/ollama-reasoning-support

940b2d8

fix unrelated update in langchain-tests

b38a3fc

update

88acdb0

nit

d242031

ccurme approved these changes Mar 21, 2025

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 21, 2025

ccurme enabled auto-merge (squash) March 21, 2025 15:44

ccurme merged commit 5700646 into langchain-ai:master Mar 21, 2025
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama: add reasoning model support (e.g. deepseek) #29689

ollama: add reasoning model support (e.g. deepseek) #29689

BobMerkus commented Feb 8, 2025

vercel bot commented Feb 8, 2025 •

edited

Loading

ccurme left a comment

BobMerkus commented Mar 19, 2025

ccurme left a comment

BobMerkus commented Mar 21, 2025

ollama: add reasoning model support (e.g. deepseek) #29689

ollama: add reasoning model support (e.g. deepseek) #29689

Conversation

BobMerkus commented Feb 8, 2025

Description

Issue

Dependenices

vercel bot commented Feb 8, 2025 • edited Loading

ccurme left a comment

Choose a reason for hiding this comment

BobMerkus commented Mar 19, 2025

ccurme left a comment

Choose a reason for hiding this comment

BobMerkus commented Mar 21, 2025

vercel bot commented Feb 8, 2025 •

edited

Loading