Improper handling of MultimodalWebSurfer content led to premature task termination #6064

ode126 · 2025-03-22T07:23:17Z

What happened?

Describe the bug

When using MultimodalWebSurfer in MagenticOneGroupChat, the task prematurely terminates after the first WebSurfer action. This is caused by improper handling of multimodal content (images + text) in the MagenticOneOrchestrator's progress assessment logic.

To Reproduce

from autogen_ext.teams.magentic_one import MagenticOne

async def main():

    client = xxx
    autoctf = MagenticOne(client=client)
    task = "open bing.com search weather then click third result link"
    result = await autoctf.run_stream(task=task)

# The task terminates after WebSurfer's first action without completing click

The issue occurs because:

WebSurfer returns MultiModalMessage containing both text and image
MagenticOneOrchestrator fails to properly process this multimodal content when assessing task progress
This leads to incorrect termination assessment in _orchestrate_step

Expected behavior

The WebSurfer should be able to perform multiple actions as needed
Task should continue until the actual goal is achieved
The orchestrator should properly handle multimodal content in progress assessment

Technical Details

The issue occurs in two key components:

MagenticOneOrchestrator._thread_to_context:

# Original problematic code
if isinstance(m, (TextMessage, MultiModalMessage, ToolCallSummaryMessage)):
    context.append(UserMessage(content=m.content, source=m.source))

Progress assessment logic incorrectly processes multimodal content, leading to premature task completion judgment.

Fix Implementation

The fix involves properly handling MultiModalMessage content in the orchestrator:

# Fixed version
if isinstance(m, MultiModalMessage):
    if isinstance(m.content, list) and len(m.content) > 0:
        content = m.content[0] if isinstance(m.content[0], str) else str(m.content[0])
    else:
        content = str(m.content)
else:
    content = m.content
context.append(UserMessage(content=content, source=m.source))

Additional context

This issue specifically affects tasks that require multiple WebSurfer actions
The issue affects standard MagenticOneGroupChat implementations

Environment

AutoGen Studio Version: 0.4.2
Python Version: 3.10
OS: Tested on both Linux and Windows

Which packages was the bug in?

Python AgentChat (autogen-agentchat>=0.4.0)

AutoGen library version.

Python dev (main branch)

Other library version.

No response

Model used

qwen-vl-max-latest

Model provider

Other (please specify below)

Other model provider

Qwen

Python version

3.10

.NET version

None

Operating system

None

The text was updated successfully, but these errors were encountered:

ode126 added the needs-triage label Mar 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improper handling of MultimodalWebSurfer content led to premature task termination #6064

Improper handling of MultimodalWebSurfer content led to premature task termination #6064

ode126 commented Mar 22, 2025

Improper handling of MultimodalWebSurfer content led to premature task termination #6064

Improper handling of MultimodalWebSurfer content led to premature task termination #6064

Comments

ode126 commented Mar 22, 2025

What happened?

Describe the bug

To Reproduce

Expected behavior

Technical Details

Fix Implementation

Additional context

Environment

Which packages was the bug in?

AutoGen library version.

Other library version.

Model used

Model provider

Other model provider

Python version

.NET version

Operating system