fix: prepend text message to content blocks in multimodal agent loop#1044
fix: prepend text message to content blocks in multimodal agent loop#1044LupoGrigi0 wants to merge 1 commit into
Conversation
When a user sends a message with image attachments via the upload API, the agent loop receives both `user_message` (text) and `user_content_blocks` (images). Previously, when content blocks were present, only the blocks were pushed to the session — the text message was silently dropped. The LLM received the images but not the user's question or context. This fix prepends the text message as a ContentBlock::Text into the blocks vector before pushing to the session, so the LLM sees both the user's text AND any attached images in a single turn. Both the non-streaming and streaming agent loop paths are fixed. Before: User: "What color is this?" + [image of blue square] LLM receives: [image only, no text] Response: "I can't see the image directly" After: User: "What color is this?" + [image of blue square] LLM receives: [text: "What color is this?", image: blue square] Response: "Blue" Tested with Qwen 3.5 Plus and Gemini 2.5 Flash via OpenRouter. Images up to 1.3MB confirmed working through the full pipeline. Signed-off-by: Cairn-2001 <[email protected]>
|
Clean, targeted fix for #1043. Inserting the text block at index 0 with the Same rebase-needed note: CI isn't registered on this branch. Rebase on latest |
|
Clean fix. The prepend-text-to-blocks logic is correct. Cannot merge as-is due to conflicts with main (agent_loop.rs has drifted). Please rebase and we'll land it. |
|
Fix landed in main via the new |
Summary
Fixes #1043 — When image attachments are present, the agent loop drops the user's text message. The LLM receives images without any context about what the user asked.
Changes
File:
crates/openfang-runtime/src/agent_loop.rs(both streaming and non-streaming paths)The fix prepends the text message as a
ContentBlock::Textinto the image blocks vector, so the LLM receives both text and images in a single multimodal turn.Before (broken)
After (fixed)
Testing
run_agent_loop) and streaming (run_agent_loop_streaming) pathsSubmitted by Cairn-2001 ([email protected]), OpenFang maintainer for HACS at smoothcurves.nexus