Skip to content

fix: handle consecutive system messages with plain string content#2021

Open
giulio-leone wants to merge 2 commits intohuggingface:mainfrom
giulio-leone:fix/issue-1972-consecutive-system-messages
Open

fix: handle consecutive system messages with plain string content#2021
giulio-leone wants to merge 2 commits intohuggingface:mainfrom
giulio-leone:fix/issue-1972-consecutive-system-messages

Conversation

@giulio-leone
Copy link

Summary

Fixes #1972

get_clean_message_list crashes with AssertionError when consecutive messages with the same role have plain string content instead of the structured list format [{"type": "text", "text": "..."}].

Root Cause

When ChatMessage.from_dict() creates a message from {"role": "system", "content": "text"}, the content remains a plain string. The merging logic at line 376 asserts isinstance(message.content, list), which fails for string content.

Fix

Normalize string content to the structured list format [{"type": "text", "text": content}] early in the processing loop — right after role conversion and before image encoding. This ensures all downstream code (merging, output, flatten) can assume content is always a list of dicts.

Changes

  • src/smolagents/models.py: Added early normalization of string content to list format in get_clean_message_list
  • tests/test_models.py: Added 2 regression tests (structured and flatten modes)

Tests

pytest tests/test_models.py -k 'clean_message' -v
# 9 passed, 113 deselected

Copilot AI review requested due to automatic review settings March 1, 2026 00:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash in get_clean_message_list when consecutive messages share the same role but use plain string content (instead of the structured list-of-dicts format), by normalizing string content early so merging logic is consistent.

Changes:

  • Normalize ChatMessage.content from str[{"type": "text", "text": ...}] within get_clean_message_list before downstream processing/merging.
  • Add regression tests covering consecutive system messages with string content in both structured and flatten_messages_as_text modes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/smolagents/models.py Normalizes string message content early to prevent merge-time AssertionError for consecutive same-role messages.
tests/test_models.py Adds regression tests for consecutive system string content in both structured and flatten modes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 383 to 393
# Normalize it if needed before concatenating.
if isinstance(output_message_list[-1]["content"], list):
output_message_list[-1]["content"] = output_message_list[-1]["content"][0]["text"]
output_message_list[-1]["content"] += "\n" + message.content[0]["text"]
else:
# In structured mode, output content is stored as a list of dicts.
# Normalize it if needed before merging.
if isinstance(output_message_list[-1]["content"], str):
output_message_list[-1]["content"] = [
{"type": "text", "text": output_message_list[-1]["content"]}
]
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extra normalization of output_message_list[-1]["content"] inside the merge branch appears unreachable given the invariants in this function: when flatten_messages_as_text=True the "else" branch always stores a string (so it can't be a list here), and when flatten_messages_as_text=False it always stores a list (so it can't be a string here). Keeping these branches adds complexity and can mask invariant violations; consider removing them or replacing them with assertions that the stored type matches the selected mode.

Suggested change
# Normalize it if needed before concatenating.
if isinstance(output_message_list[-1]["content"], list):
output_message_list[-1]["content"] = output_message_list[-1]["content"][0]["text"]
output_message_list[-1]["content"] += "\n" + message.content[0]["text"]
else:
# In structured mode, output content is stored as a list of dicts.
# Normalize it if needed before merging.
if isinstance(output_message_list[-1]["content"], str):
output_message_list[-1]["content"] = [
{"type": "text", "text": output_message_list[-1]["content"]}
]
assert isinstance(
output_message_list[-1]["content"], str
), "Error: expected string content in flatten mode"
output_message_list[-1]["content"] += "\n" + message.content[0]["text"]
else:
# In structured mode, output content is stored as a list of dicts.
assert isinstance(
output_message_list[-1]["content"], list
), "Error: expected list content in structured mode"

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defensive normalization is intentional — while the invariants should hold under normal operation, the function receives data from LLM tool call results that may not always conform. Keeping the normalization as a safety net prevents runtime crashes from unexpected content types. Replacing with assertions would be less resilient in production.

giulio-leone added a commit to giulio-leone/smolagents that referenced this pull request Mar 1, 2026
…sage merge

- Replace defensive normalization with explicit type assertions
- Assertions enforce invariants and surface violations clearly

Refs: huggingface#2021
@giulio-leone giulio-leone force-pushed the fix/issue-1972-consecutive-system-messages branch from 15809ee to 52a07bc Compare March 1, 2026 05:37
giulio-leone and others added 2 commits March 2, 2026 00:12
get_clean_message_list crashes with AssertionError when consecutive
messages with the same role have plain string content instead of the
structured list format [{"type": "text", "text": "..."}].

This commonly occurs when users pass system messages as dicts with
string content, e.g. {"role": "system", "content": "instruction"}.

The fix normalizes string content to structured list format early in
the processing loop, before any merging or output logic runs. This
ensures uniform handling regardless of whether content was provided
as a string or as a list of dicts.

Fixes huggingface#1972

Co-authored-by: Copilot <[email protected]>
…sage merge

- Replace defensive normalization with explicit type assertions
- Assertions enforce invariants and surface violations clearly

Refs: huggingface#2021
@giulio-leone giulio-leone force-pushed the fix/issue-1972-consecutive-system-messages branch from 52a07bc to ef0126d Compare March 1, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: LiteLLMModel crashes when multiple consecutive system messages are provided

2 participants