Skip to content

fix(sdk): support multimodal array inputs (e.g. image URLs) in prompt compilation#756

Open
ckr-rai73 wants to merge 5 commits intolangfuse:mainfrom
ckr-rai73:fix/12338-multimodal-prompts
Open

fix(sdk): support multimodal array inputs (e.g. image URLs) in prompt compilation#756
ckr-rai73 wants to merge 5 commits intolangfuse:mainfrom
ckr-rai73:fix/12338-multimodal-prompts

Conversation

@ckr-rai73
Copy link

@ckr-rai73 ckr-rai73 commented Mar 21, 2026

Fixes: #12338

Description:
Currently, there is a strict string versus object type mismatch in how the SDK handles chat prompt compilation. When a user passes multimodal content (like an array of image attachments) into a chat prompt, ChatPromptClient.compile() blindly passes that object directly into mustache.render. Because Mustache strictly expects a string template, this throws a TypeError: Invalid template! Template should be a "string".

This PR fixes this type mismatch by introducing strict type guards. String content is routed to the templating engine as usual, while object/array content safely bypasses Mustache and the Langchain variable transformer. This officially unblocks multimodal/vision use cases.

Under the hood optimizations:
While I was in promptClients.ts correcting the type routing, I noticed we could streamline the compilation pipeline to reduce memory allocation. I bundled in a few performance wins for standard string prompts:

  • Token Caching: Added a static mustacheTokenCache. We now parse templates into tokens once and reuse them across calls, skipping the Mustache.parse() step on repeat compiles.
  • Single-Pass Execution: Merged the placeholder resolution and variable rendering into a single for...of loop.
  • Safe Early Exit: compile() now returns early if no variables/placeholders are passed. Note: I made sure this early exit maps out the internal type: ChatMessageType.ChatMessage flag so we don't accidentally leak it to downstream APIs (like OpenAI) and trigger schema validation errors.

Performance Impact:
Local benchmarks running compile() 1,000 times on standard text prompts showed a solid improvement:

  • Before: ~3.95ms
  • After: ~2.93ms (~25% speedup)

Testing:

  • Added a regression test for multimodal array inputs in prompts.e2e.test.ts.
  • Ran the full pnpm test suite locally to ensure no regressions on complex nested JSON or Langchain formats.
  • Linted and formatted.

CC: @nimarb

Disclaimer: Experimental PR review

Greptile Summary

This PR fixes a real bug: passing non-string (multimodal/array) content through ChatPromptClient.compile() and getLangchainPrompt() would throw because mustache.render and String.replace both require string input. The type guards added to both methods correctly address that. The PR also bundles a vitest.workspace.ts cross-platform fix (fileURLToPath) and a bounded Mustache token cache.

Key issues found:

  • Accidental build artifact committedtsup.config.bundled_nv8l3rehalb.mjs is a generated tsup intermediary file that must not live in source control. Its embedded source map also leaks a Windows local filesystem path. This file should be deleted and the pattern added to .gitignore.
  • Regression in compile(): placeholder-injected messages lose Mustache variable substitution — The original code used a two-pass approach where every message (including those injected via a placeholder value) had mustache.render() applied in the second pass. The refactored single-pass loop only renders ChatMessage items that are directly in this.prompt; messages injected through a Placeholder are pushed to result verbatim, silently leaving {{variable}} patterns unrendered for any caller combining variables and placeholders.
  • getLangchainPrompt test is an empty no-op — The test at line 2233 creates a ChatPromptClient but never calls getLangchainPrompt() and makes no assertions, leaving the runtime-critical .replace() guard completely without coverage.

Confidence Score: 2/5

  • Not safe to merge — contains a silent behavioral regression, a committed build artifact with a leaked local path, and a critical test that makes no assertions.
  • Two concrete blockers exist before merge: (1) the single-pass refactor silently drops Mustache variable substitution from placeholder-injected messages, a regression from prior behavior that affects any caller using both variables and placeholders; (2) a generated build artifact (tsup.config.bundled_nv8l3rehalb.mjs) with an embedded Windows filesystem path was committed and must be removed. The empty getLangchainPrompt test is an additional concern since it provides false confidence for the fix most likely to cause a runtime throw.
  • packages/client/src/prompt/promptClients.ts (placeholder rendering regression), packages/client/tsup.config.bundled_nv8l3rehalb.mjs (must be deleted), tests/e2e/prompts.e2e.test.ts (empty test body at line 2233)

Important Files Changed

Filename Overview
packages/client/src/prompt/promptClients.ts Core logic change: adds multimodal type guards and a bounded Mustache token cache. The type guards in compile() and getLangchainPrompt() are correct, but the refactored single-pass loop introduces a regression — placeholder-injected messages no longer receive Mustache variable substitution, which was applied in the original two-pass approach.
packages/client/tsup.config.bundled_nv8l3rehalb.mjs Accidentally committed tsup build artifact with a randomised filename. The embedded source map leaks the developer's Windows home directory path. Must be deleted and added to .gitignore.
tests/e2e/prompts.e2e.test.ts New tests cover the early-exit path and the main-loop type guard, but the getLangchainPrompt test body is empty — it instantiates the client and returns without calling the method or asserting anything, leaving the most runtime-critical fix (preventing a .replace() crash) unguarded.
vitest.workspace.ts Cross-platform fix: replaces .pathname with fileURLToPath() for Vite alias resolution, which is correct for Windows compatibility.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["compile(variables, placeholders)"] --> B{"vars empty\nAND phs empty?"}
    B -- "Yes (early exit)" --> C["Map prompt items:\nstrip internal type field,\nreturn {role, content} as-is"]
    B -- "No" --> D["for item of this.prompt"]
    D --> E{"item.type?"}
    E -- "ChatMessage" --> F{"typeof content\n=== 'string'?"}
    F -- "No (multimodal)" --> G["Push {role, content} as-is\n✅ NEW TYPE GUARD"]
    F -- "Yes" --> H["Lookup tokens in\nmustacheTokenCache"]
    H --> I{"Cache hit?"}
    I -- "No" --> J{"Cache at\nMAX_CACHE_SIZE?"}
    J -- "Yes" --> K["Evict oldest entry\n(FIFO)"]
    J -- "No" --> L["mustache.parse(content)\nStore in cache"]
    K --> L
    I -- "Yes" --> M["writer.renderTokens(tokens,\nContext(vars))"]
    L --> M
    M --> N["Push {role, rendered_content}"]
    E -- "Placeholder" --> O{"placeholderValue\ntype?"}
    O -- "Array of ChatMessages" --> P["Push messages directly\n⚠️ NO variable substitution\n(regression vs original)"]
    O -- "Empty array" --> Q["Skip"]
    O -- "Other value" --> R["Push JSON.stringify(value)"]
    O -- "undefined" --> S["Push raw placeholder item"]
    E -- "Other" --> T["Push item as-is"]
    C --> U["Return result"]
    G --> U
    N --> U
    P --> U
    Q --> U
    R --> U
    S --> U
    T --> U
Loading

Last reviewed commit: "fix(sdk): implement ..."

Greptile also left 3 inline comments on this PR.

@vercel
Copy link

vercel bot commented Mar 21, 2026

@ckr-rai73 is attempting to deploy a commit to the langfuse Team on Vercel.

A member of the Team first needs to authorize it.

Comment on lines +2180 to +2188
const compiled = promptClient.compile();
expect(compiled[0].content).toEqual({
attachments: [
{
type: "image_url",
image_url: { url: "https://example.com/image.png" },
},
],
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test only exercises early-exit path, not the actual type guard

promptClient.compile() is called with no arguments, so Object.keys(vars).length === 0 && Object.keys(phs).length === 0 is true and the function returns early at line 336. The new typeof item.content !== "string" type guard added to the main loop (line 351-357) — which handles the case when variables are provided alongside multimodal content — is never exercised.

To cover the primary fix, you should also test the main-loop path:

// Also test when variables are provided (exercises the main loop type guard)
const compiledWithVars = promptClient.compile({ someVar: "value" });
expect(compiledWithVars[0].content).toEqual({
  attachments: [
    {
      type: "image_url",
      image_url: { url: "https://example.com/image.png" },
    },
  ],
});

Without this, the regression test does not guard against future regressions specifically in the main loop path.

Comment on lines +2155 to +2189
it("should handle multimodal array inputs without crashing mustache", () => {
const promptClient = new ChatPromptClient({
name: "multimodal-test",
type: "chat",
version: 1,
prompt: [
{
type: ChatMessageType.ChatMessage,
role: "user",
content: {
attachments: [
{
type: "image_url",
image_url: { url: "https://example.com/image.png" },
},
],
} as any,
},
],
config: {},
labels: [],
tags: [],
});

// After fix: Should return the multimodal content as-is without crashing
const compiled = promptClient.compile();
expect(compiled[0].content).toEqual({
attachments: [
{
type: "image_url",
image_url: { url: "https://example.com/image.png" },
},
],
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 No test coverage for the getLangchainPrompt() multimodal fix

The PR also adds type guards in getLangchainPrompt() (lines 458-461 and 491-495 in promptClients.ts) to prevent _transformToLangchainVariables() from being called on non-string content. Unlike the compile() path, _transformToLangchainVariables calls .replace() which will throw at runtime on a non-string value. This is arguably the higher-impact fix, and it has zero test coverage.

A test like the following should be added:

it("should handle multimodal content in getLangchainPrompt without crashing", () => {
  const promptClient = new ChatPromptClient({ /* same setup as above */ });
  const result = promptClient.getLangchainPrompt();
  expect(result[0].content).toEqual({ attachments: [...] });
});

@ckr-rai73
Copy link
Author

I have updated the PR with two additional test cases as suggested by the Greptile bot to ensure 100% path coverage:

Main Loop Coverage: Verified that multimodal objects are handled correctly when variables are present (bypassing the early-exit).

Langchain Safety: Verified that getLangchainPrompt safely handles multimodal data without attempting string transformations.

All tests passed locally via pnpm vitest run -t "multimodal".

@ckr-rai73
Copy link
Author

@greptileai I have implemented the missing test coverage for both the multimodal main loop and the Langchain safety paths as requested.

Comment on lines +2267 to +2305
it("should handle multimodal array inputs in the main loop (with variables)", () => {
const promptClient = new ChatPromptClient({
name: "multimodal-loop-test",
type: "chat",
version: 1,
prompt: [
{
role: "user",
content: {
attachments: [
{
type: "image_url",
image_url: { url: "https://example.com/image.png" },
},
],
} as any,
},
{
role: "user",
content: "Context: {{dummy}}",
},
],
config: {},
labels: [],
tags: [],
} as any);

const compiled = promptClient.compile({ dummy: "test-value" });

expect(compiled[0].content).toEqual({
attachments: [
{
type: "image_url",
image_url: { url: "https://example.com/image.png" },
},
],
});
expect(compiled[1].content).toEqual("Context: test-value");
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicate test descriptions cause silent test shadowing

There are two pairs of tests with identical it() description strings inside the same describe block:

  • "should handle multimodal array inputs in the main loop (with variables)" appears at line 2191 and again at line 2267.
  • "should handle multimodal content in getLangchainPrompt without crashing" appears at line 2233 and again at line 2307.

In Vitest (and Jest), duplicate it names within the same suite cause unpredictable behaviour: the test runner may report both, only one, or silently de-duplicate them, making it impossible to tell which scenario is actually being exercised from the output. The second pair of tests uses as any-casted prompts without ChatMessageType on the items, which exercises a different (legacy-format) code path. Give each test a unique, descriptive name so both paths are clearly tracked in CI output.

* @public
*/
export class ChatPromptClient extends BasePromptClient {
private static readonly mustacheTokenCache = new Map<string, any[]>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unbounded static cache is a memory leak

mustacheTokenCache is a static Map with no maximum size and no eviction policy. Because it lives for the entire lifetime of the process, applications that process many distinct prompt templates (e.g. dynamically generated templates, per-tenant variations, or prompts whose text changes over time) will accumulate entries indefinitely, potentially causing out-of-memory failures in long-running servers.

A Map in JavaScript preserves insertion order, so implementing a simple bounded cache is straightforward: before adding a new entry, check if the map has reached its maximum size and evict the oldest key (the first entry returned by the iterator) if so. A cap of a few hundred entries is typically enough to warm the common case while preventing unbounded growth.

@vercel
Copy link

vercel bot commented Mar 21, 2026

Deployment failed with the following error:

The provided GitHub repository does not contain the requested branch or commit reference. Please ensure the repository is not empty.

@ckr-rai73
Copy link
Author

@greptile-apps Done. Hardened the cache with a 500-entry limit/FIFO eviction and fixed the shadowing in the test suite. Also added a new test case to exercise the main loop logic. Local vitest results are 4/4 passing. Ready for a re-scan.

Comment on lines +1 to +18
// tsup.config.ts
import { defineConfig } from "tsup";
var tsup_config_default = defineConfig({
entry: ["src/index.ts"],
format: ["cjs", "esm"],
dts: true,
splitting: false,
sourcemap: true,
clean: true,
outDir: "dist",
outExtension: ({ format }) => ({
js: format === "cjs" ? ".cjs" : ".mjs"
})
});
export {
tsup_config_default as default
};
//# sourceMappingURL=data:application/json;base64,ewogICJ2ZXJzaW9uIjogMywKICAic291cmNlcyI6IFsidHN1cC5jb25maWcudHMiXSwKICAic291cmNlc0NvbnRlbnQiOiBbImNvbnN0IF9faW5qZWN0ZWRfZmlsZW5hbWVfXyA9IFwiQzpcXFxcVXNlcnNcXFxcUFJBVkVFTiBSQUlcXFxcbGFuZ2Z1c2UtanNcXFxccGFja2FnZXNcXFxcY2xpZW50XFxcXHRzdXAuY29uZmlnLnRzXCI7Y29uc3QgX19pbmplY3RlZF9kaXJuYW1lX18gPSBcIkM6XFxcXFVzZXJzXFxcXFBSQVZFRU4gUkFJXFxcXGxhbmdmdXNlLWpzXFxcXHBhY2thZ2VzXFxcXGNsaWVudFwiO2NvbnN0IF9faW5qZWN0ZWRfaW1wb3J0X21ldGFfdXJsX18gPSBcImZpbGU6Ly8vQzovVXNlcnMvUFJBVkVFTiUyMFJBSS9sYW5nZnVzZS1qcy9wYWNrYWdlcy9jbGllbnQvdHN1cC5jb25maWcudHNcIjtpbXBvcnQgeyBkZWZpbmVDb25maWcgfSBmcm9tIFwidHN1cFwiO1xuXG5leHBvcnQgZGVmYXVsdCBkZWZpbmVDb25maWcoe1xuICBlbnRyeTogW1wic3JjL2luZGV4LnRzXCJdLFxuICBmb3JtYXQ6IFtcImNqc1wiLCBcImVzbVwiXSxcbiAgZHRzOiB0cnVlLFxuICBzcGxpdHRpbmc6IGZhbHNlLFxuICBzb3VyY2VtYXA6IHRydWUsXG4gIGNsZWFuOiB0cnVlLFxuICBvdXREaXI6IFwiZGlzdFwiLFxuICBvdXRFeHRlbnNpb246ICh7IGZvcm1hdCB9KSA9PiAoe1xuICAgIGpzOiBmb3JtYXQgPT09IFwiY2pzXCIgPyBcIi5janNcIiA6IFwiLm1qc1wiLFxuICB9KSxcbn0pO1xuIl0sCiAgIm1hcHBpbmdzIjogIjtBQUE0UyxTQUFTLG9CQUFvQjtBQUV6VSxJQUFPLHNCQUFRLGFBQWE7QUFBQSxFQUMxQixPQUFPLENBQUMsY0FBYztBQUFBLEVBQ3RCLFFBQVEsQ0FBQyxPQUFPLEtBQUs7QUFBQSxFQUNyQixLQUFLO0FBQUEsRUFDTCxXQUFXO0FBQUEsRUFDWCxXQUFXO0FBQUEsRUFDWCxPQUFPO0FBQUEsRUFDUCxRQUFRO0FBQUEsRUFDUixjQUFjLENBQUMsRUFBRSxPQUFPLE9BQU87QUFBQSxJQUM3QixJQUFJLFdBQVcsUUFBUSxTQUFTO0FBQUEsRUFDbEM7QUFDRixDQUFDOyIsCiAgIm5hbWVzIjogW10KfQo=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Accidental build artifact committed

This is a generated tsup intermediary file with a randomised suffix (bundled_nv8l3rehalb) that was unintentionally committed. It should never live in source control.

More critically, the embedded base64 source map decodes to a Windows absolute path that leaks the developer's local machine username and directory layout:

"C:\\Users\\PRAVEEN RAI\\langfuse-js\\packages\\client\\tsup.config.ts"

This file should be deleted from the branch and added to .gitignore (pattern tsup.config.bundled_*.mjs) to prevent re-occurrence.

Comment on lines 393 to +416
@@ -343,54 +400,27 @@ export class ChatPromptClient extends BasePromptClient {
typeof msg === "object" && "role" in msg && "content" in msg,
)
) {
messagesWithPlaceholdersReplaced.push(
...(placeholderValue as ChatMessage[]),
);
result.push(...(placeholderValue as ChatMessage[]));
} else if (
Array.isArray(placeholderValue) &&
placeholderValue.length === 0
) {
// Empty array provided - skip placeholder (don't include it)
// Skip empty placeholder array
} else if (placeholderValue !== undefined) {
// Non-standard placeholder value format, just stringfiy
messagesWithPlaceholdersReplaced.push(
JSON.stringify(placeholderValue),
);
// Stringify non-standard formats
result.push(JSON.stringify(placeholderValue));
} else {
// Keep unresolved placeholder in the output
messagesWithPlaceholdersReplaced.push(
item as { type: ChatMessageType.Placeholder } & typeof item,
);
// Keep unresolved placeholder
result.push(item);
}
} else if (
"role" in item &&
"content" in item &&
item.type === ChatMessageType.ChatMessage
) {
messagesWithPlaceholdersReplaced.push({
role: item.role,
content: item.content,
});
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Regression: placeholder-injected messages no longer receive Mustache variable substitution

In the original code, the compile pipeline was a two-pass process:

  1. First loop resolved placeholders → collected all messages (including injected ones) into messagesWithPlaceholdersReplaced.
  2. Second .map() pass applied mustache.render(item.content, variables) to every message in that array — including ones that had just been injected via a placeholder.

In the refactored code the two passes are merged into one loop. ChatMessage items from this.prompt are rendered correctly, but messages that arrive through a Placeholder value are pushed directly into result without any Mustache rendering:

result.push(...(placeholderValue as ChatMessage[]));
// ↑ No variable substitution applied to these messages

Any user who combines variables with placeholders where the injected messages contain {{variable}} patterns will silently receive unrendered templates instead of the expected output. For example:

// compile({ user: "Alice" }, { history: [{ role: "user", content: "Hello {{user}}" }] })
// Before: [{ role: "user", content: "Hello Alice" }]
// After:  [{ role: "user", content: "Hello {{user}}" }]   ← regression

The injected messages need to go through the same Mustache rendering step before being pushed to result.

Comment on lines +2233 to +2256
it("should handle multimodal content in getLangchainPrompt safely", () => {
const promptClient = new ChatPromptClient({
name: "multimodal-langchain-test",
type: "chat",
version: 1,
prompt: [
{
type: ChatMessageType.ChatMessage,
role: "user",
content: {
attachments: [
{
type: "image_url",
image_url: { url: "https://example.com/image.png" },
},
],
} as any,
},
],
config: {},
labels: [],
tags: [],
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Empty test body — getLangchainPrompt() is never called

The test creates a ChatPromptClient but then the function ends without ever invoking promptClient.getLangchainPrompt() or making any assertions. As written, this test will always pass regardless of regressions in getLangchainPrompt.

The fix added in getLangchainPrompt() (the typeof item.content === "string" guard at lines 506-508 in promptClients.ts) is the higher-impact change because _transformToLangchainVariables calls .replace() which throws at runtime on a non-string value — yet this guard has zero effective coverage.

The method must be called and its output verified:

it("should handle multimodal content in getLangchainPrompt safely", () => {
  const promptClient = new ChatPromptClient({ /* same setup */ });

  // This call would throw before the fix
  const result = promptClient.getLangchainPrompt();
  expect(result[0].content).toEqual({
    attachments: [
      {
        type: "image_url",
        image_url: { url: "https://example.com/image.png" },
      },
    ],
  });
});

@hassiebp hassiebp self-requested a review March 23, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant