fix: handle consecutive system messages with plain string content by giulio-leone · Pull Request #2021 · huggingface/smolagents

giulio-leone · 2026-03-01T00:12:24Z

Summary

Fixes #1972

get_clean_message_list crashes with AssertionError when consecutive messages with the same role have plain string content instead of the structured list format [{"type": "text", "text": "..."}].

Root Cause

When ChatMessage.from_dict() creates a message from {"role": "system", "content": "text"}, the content remains a plain string. The merging logic at line 376 asserts isinstance(message.content, list), which fails for string content.

Fix

Normalize string content to the structured list format [{"type": "text", "text": content}] early in the processing loop — right after role conversion and before image encoding. This ensures all downstream code (merging, output, flatten) can assume content is always a list of dicts.

Changes

src/smolagents/models.py: Added early normalization of string content to list format in get_clean_message_list
tests/test_models.py: Added 2 regression tests (structured and flatten modes)

Tests

pytest tests/test_models.py -k 'clean_message' -v
# 9 passed, 113 deselected

Copilot

Pull request overview

Fixes a crash in get_clean_message_list when consecutive messages share the same role but use plain string content (instead of the structured list-of-dicts format), by normalizing string content early so merging logic is consistent.

Changes:

Normalize ChatMessage.content from str → [{"type": "text", "text": ...}] within get_clean_message_list before downstream processing/merging.
Add regression tests covering consecutive system messages with string content in both structured and flatten_messages_as_text modes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/smolagents/models.py`	Normalizes string message content early to prevent merge-time `AssertionError` for consecutive same-role messages.
`tests/test_models.py`	Adds regression tests for consecutive system string content in both structured and flatten modes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T00:16:21Z

src/smolagents/models.py

+                # Normalize it if needed before concatenating.
+                if isinstance(output_message_list[-1]["content"], list):
+                    output_message_list[-1]["content"] = output_message_list[-1]["content"][0]["text"]
                output_message_list[-1]["content"] += "\n" + message.content[0]["text"]
            else:
+                # In structured mode, output content is stored as a list of dicts.
+                # Normalize it if needed before merging.
+                if isinstance(output_message_list[-1]["content"], str):
+                    output_message_list[-1]["content"] = [
+                        {"type": "text", "text": output_message_list[-1]["content"]}
+                    ]


The extra normalization of output_message_list[-1]["content"] inside the merge branch appears unreachable given the invariants in this function: when flatten_messages_as_text=True the "else" branch always stores a string (so it can't be a list here), and when flatten_messages_as_text=False it always stores a list (so it can't be a string here). Keeping these branches adds complexity and can mask invariant violations; consider removing them or replacing them with assertions that the stored type matches the selected mode.

Suggested change

# Normalize it if needed before concatenating.

if isinstance(output_message_list[-1]["content"], list):

output_message_list[-1]["content"] = output_message_list[-1]["content"][0]["text"]

output_message_list[-1]["content"] += "\n" + message.content[0]["text"]

else:

# In structured mode, output content is stored as a list of dicts.

# Normalize it if needed before merging.

if isinstance(output_message_list[-1]["content"], str):

output_message_list[-1]["content"] = [

{"type": "text", "text": output_message_list[-1]["content"]}

]

assert isinstance(

output_message_list[-1]["content"], str

), "Error: expected string content in flatten mode"

output_message_list[-1]["content"] += "\n" + message.content[0]["text"]

else:

# In structured mode, output content is stored as a list of dicts.

assert isinstance(

output_message_list[-1]["content"], list

), "Error: expected list content in structured mode"

The defensive normalization is intentional — while the invariants should hold under normal operation, the function receives data from LLM tool call results that may not always conform. Keeping the normalization as a safety net prevents runtime crashes from unexpected content types. Replacing with assertions would be less resilient in production.

…sage merge - Replace defensive normalization with explicit type assertions - Assertions enforce invariants and surface violations clearly Refs: huggingface#2021

get_clean_message_list crashes with AssertionError when consecutive messages with the same role have plain string content instead of the structured list format [{"type": "text", "text": "..."}]. This commonly occurs when users pass system messages as dicts with string content, e.g. {"role": "system", "content": "instruction"}. The fix normalizes string content to structured list format early in the processing loop, before any merging or output logic runs. This ensures uniform handling regardless of whether content was provided as a string or as a list of dicts. Fixes huggingface#1972 Co-authored-by: Copilot <[email protected]>

…sage merge - Replace defensive normalization with explicit type assertions - Assertions enforce invariants and surface violations clearly Refs: huggingface#2021

Copilot AI review requested due to automatic review settings March 1, 2026 00:12

Copilot started reviewing on behalf of giulio-leone March 1, 2026 00:12 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

This was referenced Mar 1, 2026

fix: handle string content in consecutive same-role message merging #2019

Closed

fix: handle consecutive system messages with string content in get_clean_message_list #2008

Closed

giulio-leone force-pushed the fix/issue-1972-consecutive-system-messages branch from 15809ee to 52a07bc Compare March 1, 2026 05:37

giulio-leone and others added 2 commits March 2, 2026 00:12

fix(review): replace unreachable normalization with assertions in mes…

ef0126d

…sage merge - Replace defensive normalization with explicit type assertions - Assertions enforce invariants and surface violations clearly Refs: huggingface#2021

giulio-leone force-pushed the fix/issue-1972-consecutive-system-messages branch from 52a07bc to ef0126d Compare March 1, 2026 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle consecutive system messages with plain string content#2021

fix: handle consecutive system messages with plain string content#2021
giulio-leone wants to merge 2 commits intohuggingface:mainfrom
giulio-leone:fix/issue-1972-consecutive-system-messages

giulio-leone commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

giulio-leone Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

giulio-leone commented Mar 1, 2026

Summary

Root Cause

Fix

Changes

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants