fix: handle consecutive system messages with plain string content#2021
fix: handle consecutive system messages with plain string content#2021giulio-leone wants to merge 2 commits intohuggingface:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a crash in get_clean_message_list when consecutive messages share the same role but use plain string content (instead of the structured list-of-dicts format), by normalizing string content early so merging logic is consistent.
Changes:
- Normalize
ChatMessage.contentfromstr→[{"type": "text", "text": ...}]withinget_clean_message_listbefore downstream processing/merging. - Add regression tests covering consecutive
systemmessages with string content in both structured andflatten_messages_as_textmodes.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/smolagents/models.py |
Normalizes string message content early to prevent merge-time AssertionError for consecutive same-role messages. |
tests/test_models.py |
Adds regression tests for consecutive system string content in both structured and flatten modes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/smolagents/models.py
Outdated
| # Normalize it if needed before concatenating. | ||
| if isinstance(output_message_list[-1]["content"], list): | ||
| output_message_list[-1]["content"] = output_message_list[-1]["content"][0]["text"] | ||
| output_message_list[-1]["content"] += "\n" + message.content[0]["text"] | ||
| else: | ||
| # In structured mode, output content is stored as a list of dicts. | ||
| # Normalize it if needed before merging. | ||
| if isinstance(output_message_list[-1]["content"], str): | ||
| output_message_list[-1]["content"] = [ | ||
| {"type": "text", "text": output_message_list[-1]["content"]} | ||
| ] |
There was a problem hiding this comment.
The extra normalization of output_message_list[-1]["content"] inside the merge branch appears unreachable given the invariants in this function: when flatten_messages_as_text=True the "else" branch always stores a string (so it can't be a list here), and when flatten_messages_as_text=False it always stores a list (so it can't be a string here). Keeping these branches adds complexity and can mask invariant violations; consider removing them or replacing them with assertions that the stored type matches the selected mode.
| # Normalize it if needed before concatenating. | |
| if isinstance(output_message_list[-1]["content"], list): | |
| output_message_list[-1]["content"] = output_message_list[-1]["content"][0]["text"] | |
| output_message_list[-1]["content"] += "\n" + message.content[0]["text"] | |
| else: | |
| # In structured mode, output content is stored as a list of dicts. | |
| # Normalize it if needed before merging. | |
| if isinstance(output_message_list[-1]["content"], str): | |
| output_message_list[-1]["content"] = [ | |
| {"type": "text", "text": output_message_list[-1]["content"]} | |
| ] | |
| assert isinstance( | |
| output_message_list[-1]["content"], str | |
| ), "Error: expected string content in flatten mode" | |
| output_message_list[-1]["content"] += "\n" + message.content[0]["text"] | |
| else: | |
| # In structured mode, output content is stored as a list of dicts. | |
| assert isinstance( | |
| output_message_list[-1]["content"], list | |
| ), "Error: expected list content in structured mode" |
There was a problem hiding this comment.
The defensive normalization is intentional — while the invariants should hold under normal operation, the function receives data from LLM tool call results that may not always conform. Keeping the normalization as a safety net prevents runtime crashes from unexpected content types. Replacing with assertions would be less resilient in production.
…sage merge - Replace defensive normalization with explicit type assertions - Assertions enforce invariants and surface violations clearly Refs: huggingface#2021
15809ee to
52a07bc
Compare
get_clean_message_list crashes with AssertionError when consecutive
messages with the same role have plain string content instead of the
structured list format [{"type": "text", "text": "..."}].
This commonly occurs when users pass system messages as dicts with
string content, e.g. {"role": "system", "content": "instruction"}.
The fix normalizes string content to structured list format early in
the processing loop, before any merging or output logic runs. This
ensures uniform handling regardless of whether content was provided
as a string or as a list of dicts.
Fixes huggingface#1972
Co-authored-by: Copilot <[email protected]>
…sage merge - Replace defensive normalization with explicit type assertions - Assertions enforce invariants and surface violations clearly Refs: huggingface#2021
52a07bc to
ef0126d
Compare
Summary
Fixes #1972
get_clean_message_listcrashes withAssertionErrorwhen consecutive messages with the same role have plain string content instead of the structured list format[{"type": "text", "text": "..."}].Root Cause
When
ChatMessage.from_dict()creates a message from{"role": "system", "content": "text"}, the content remains a plain string. The merging logic at line 376 assertsisinstance(message.content, list), which fails for string content.Fix
Normalize string content to the structured list format
[{"type": "text", "text": content}]early in the processing loop — right after role conversion and before image encoding. This ensures all downstream code (merging, output, flatten) can assume content is always a list of dicts.Changes
src/smolagents/models.py: Added early normalization of string content to list format inget_clean_message_listtests/test_models.py: Added 2 regression tests (structured and flatten modes)Tests