Skip to content

Conversation

@zouyonghe
Copy link

@zouyonghe zouyonghe commented Jan 18, 2026

Summary

This PR adds support for Kitty graphics protocol V2, which includes explicit image ID parameters.
This improves compatibility with terminals like iTerm2 that have issues with the V1 protocol.

Changes

  1. Fix base64 chunk alignment (commit 1)

    • Ensure chunks are multiples of 3 bytes (63/510)
    • Prevents padding issues in iTerm2 and Screen
  2. Add image ID generation (commit 2)

    • Thread-safe atomic ID generation
    • Unique IDs across multiple images
  3. Implement V2 API (commit 3)

    • New BEGIN_KITTY_IMMEDIATE_IMAGE_V2 with 6 parameters
    • Runtime detection and automatic fallback to V1
    • Full backward compatibility maintained

API Compatibility

  • ✅ V1 API unchanged (5 parameters)
  • ✅ New V2 API added (6 parameters)
  • ✅ Automatic runtime selection
  • ✅ No breaking changes

Testing

Tested on:

  • Kitty terminal (uses V2)
  • iTerm2 (uses V2, fixes black screen issue)
  • Terminals without V2 support (falls back to V1)

Technical Details

The implementation checks for CHAFA_TERM_SEQ_BEGIN_KITTY_IMMEDIATE_IMAGE_V2 availability at
runtime:

  • If available (Kitty, Ghostty): uses V2 with image ID
  • If not available: falls back to V1 without image ID

This approach ensures maximum compatibility without breaking existing functionality.

@zouyonghe
Copy link
Author

@hpjansson
Copy link
Owner

Thanks for the PR. Have you checked that this still works on the wezterm, ghostty or kitty terminals?

I will review it in depth soon.

@hpjansson
Copy link
Owner

The BEGIN_KITTY_IMMEDIATE_IMAGE_V1 definition is public API, so it cannot be changed. Is the image ID needed to fix this issue? If not, we can look at creating a BEGIN_KITTY_IMMEDIATE_IMAGE_V2 seq definition with more parameters and some API to go with it. But I think it should have a clear use case from our point of view before adding it.

If you look at

end = p + (ptenc->mode == CHAFA_PASSTHROUGH_SCREEN ? 64 : 512);
you will see the max chunk sizes are 64 (with passthrough) and 512 (without). Can't we just change these to 63 and 510, which are multiples of 3? That's a simpler fix, which should eliminate padding from chunks.

This commit fixes video playback issues in iTerm2 by properly implementing
image ID support in the Kitty graphics protocol while maintaining backward
compatibility with the existing V1 API.

Key changes:

1. Created V2 API (BEGIN_KITTY_IMMEDIATE_IMAGE_V2) with 6 parameters including
   image_id, while keeping V1 API unchanged at 5 parameters to preserve public
   API compatibility.

2. Added chafa_kitty_next_image_id() function with atomic operations for
   thread-safe unique image ID generation.

3. Implemented runtime API detection with graceful fallback - prefers V2 for
   better iTerm2 compatibility, falls back to V1 for older terminals.

4. Fixed chunk sizes to exact multiples of 3 (63 bytes for iTerm2/Screen,
   510 bytes for tmux) to avoid base64 padding issues.

5. Restored independent chunk encoding with encode_chunk() function for
   simpler implementation compared to streaming base64.

6. Updated capability detection to check for both V2 and V1 APIs.

Technical details:
- iTerm2 requires image ID (i=) parameter as it uses identifiers as dictionary
  keys in its _images storage
- Separates image_id (identifies image data) from placement_id (identifies
  display position)
- Thread-safe ID generation prevents collisions in multi-threaded environments
- Chunk sizes are multiples of 3 to ensure clean base64 encoding without
  padding characters in the middle of transmission

Fixes video playback in iTerm2 terminal.
@zouyonghe zouyonghe changed the title kitty: stream base64 across chunks; add image id to immediate transfers Add Kitty Graphics Protocol V2 API with Image ID Support Jan 21, 2026
@zouyonghe
Copy link
Author

Because some compatibility issues, the PR has been closed for now and will be reopened after the fixes and testing are completed.

@zouyonghe zouyonghe closed this Jan 21, 2026
@zouyonghe
Copy link
Author

  1. V2 API sends image ID: The modified code uses BEGIN_KITTY_IMMEDIATE_IMAGE_V2 which includes
    i=<image_id> parameter in the escape sequence
  2. Terminals send responses: When terminals receive V2 sequences with image IDs, they send back
    acknowledgment responses in the format \033_Gi=;OK\033\
  3. chafa cannot consume responses: chafa is a non-interactive command-line tool that outputs
    image data and immediately exits. It doesn't read stdin or wait for terminal responses.
  4. Responses pollute output: The terminal sends responses after chafa exits. These responses are
    delivered to the shell, which doesn't know how to handle them, so they appear as visible text in
    the terminal.

Problem:
When using Kitty graphics protocol V2 API (with image IDs) to fix
iTerm2 compatibility, terminals send acknowledgment responses that
appear as visible text pollution in the shell after chafa exits:
- "Gi=1;OK" from BEGIN_KITTY_IMMEDIATE_IMAGE_V2 sequence
- "Gi=0;OK" from END_KITTY_IMAGE sequence

Root Cause:
The Kitty protocol sends responses by default when processing image
commands. Since chafa is a non-interactive tool that exits immediately
after rendering, these responses arrive after the program terminates
and are displayed as text in the user's shell.

Solution:
Add the q=2 parameter (quiet/suppress responses mode) to both:
1. BEGIN_KITTY_IMMEDIATE_IMAGE_V2 sequence
2. END_KITTY_IMAGE sequence

This suppresses terminal responses while maintaining image ID support
required for iTerm2 compatibility.

Testing:
Verified in iTerm2, Kitty, and Ghostty - images display correctly
with no terminal response text pollution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@zouyonghe
Copy link
Author

Problem

When using Kitty graphics protocol V2 API (with image IDs) to fix iTerm2 compatibility, terminal
acknowledgment responses appear as visible text pollution in the shell:

  • Gi=1;OK from BEGIN sequence
  • Gi=0;OK from END sequence

Root Cause

chafa is a non-interactive CLI tool that exits immediately after rendering. Terminal responses
arrive after program termination and are displayed as text in the user's shell.

Solution

Add q=2 (quiet mode) parameter to suppress terminal responses in:

  1. BEGIN_KITTY_IMMEDIATE_IMAGE_V2 sequence
  2. END_KITTY_IMAGE sequence

This maintains image ID support (required for iTerm2) while preventing response pollution.

Testing

✅ Verified in iTerm2, Kitty, and Ghostty
✅ Images display correctly with no text pollution
✅ iTerm2 compatibility maintained

@zouyonghe zouyonghe reopened this Jan 21, 2026
@zouyonghe
Copy link
Author

The updated code fixes the issue where iTerm2 couldn’t display images when using the Kitty graphics protocol. WezTerm, Ghostty, and Kitty all worked normally.

@hpjansson
Copy link
Owner

Thanks for the update. The chunk size needs to be big also for CHAFA_PASSTHROUGH_NONE, so the if/else should stay the same. Just the numbers need to change. Or is iTerm unable to parse 510-byte chunks?

The _V2 macro handling needs a bit more work. ChafaImage will need an ID like ChafaPlacement has, that will have to be passed down and not generated on site.

If iTerm can't handle un-numbered images, isn't that technically an iTerm bug? Maybe it should just be fixed there?

Pass a stable ChafaImage-owned ID down to the kitty renderer for V2 transfers instead of generating IDs at render time.

Keep Screen on small base64 chunks but use larger (multiple-of-3) chunks for other passthrough modes (including CHAFA_PASSTHROUGH_NONE) to avoid mid-stream base64 padding issues.
@zouyonghe
Copy link
Author

Thanks for suggestions!

This updates Kitty graphics protocol output in two ways:

  • Use a stable, ChafaImage-owned ID for Kitty V2 (i=) transfers and pass it down to the renderer, instead of generating IDs at render time. This makes V2 image identification consistent (similar to ChafaPlacement IDs) and improves compatibility with iTerm2.
  • Adjust raw data chunk sizes to multiples of 3 to avoid intermediate base64 padding. Screen passthrough uses 63-byte chunks, while all other passthrough modes (including CHAFA_PASSTHROUGH_NONE) use 510-byte chunks.

I believe that’s technically an iTerm2 bug.

If i= is omitted (unnumbered images), iTerm2 appears to mishandle multipart transmit+display: it stores identifier==0 as a “last image”, but later looks it up using key 0, so the image may not be found after transmission.

I’m in the process of reporting this to iTerm2 (and can follow up with a fix there). In the meantime, Chafa emitting a stable i= when V2 is available is a practical compatibility workaround for existing iTerm2 releases.

@zouyonghe
Copy link
Author

It seems that iTerm has an upstream patch fixing this.
I’ll check whether it solves the issue without needing to submit a PR to chafa.

@zouyonghe
Copy link
Author

image

When testing with the iTerm2 nightly build, I found chafa outputs via the kitty graphics protocol cause terminal responses to be printed directly to the terminal.

This is likely due to kitty quiet mode not being enabled.
According to the implementation and discussion in this PR, it is recommended to set q=2 to enable quiet mode, which suppresses terminal responses to kitty image commands and prevents them from being displayed as plain text output.

This behavior aligns with the silent handling mechanism described in the PR, and enabling q=2 should resolve the issue.

@zouyonghe
Copy link
Author

If you think that introducing a new V2 API is unnecessary, and that the issue can be resolved simply by adding the q=2 parameter, please close this PR after completing the corresponding changes.

@hpjansson
Copy link
Owner

I think we do need more Kitty seqs eventually, but the "immediate" one is supposed to not need any ID handling by the client. The Kitty docs say this:

When specifying an image id, the terminal emulator will reply to the placement request with an acknowledgement code [...]

So although the specification is a bit ambiguous, I'm choosing to interpret the acknowledgement condition as being "if and only if" an image ID is specified. So iTerm2 seems to be doing the wrong thing here - other terminals don't print an acknowledgement.

However, I suppose we could add q=2 to the V1 seq if it doesn't cause issues elsewhere.

tl;dr: I think we should keep the block size change and add q=2 to the V1 seq. If I understand correctly, that's the minimum change required to make this work.

@zouyonghe
Copy link
Author

zouyonghe commented Feb 2, 2026

The upstream change in iTerm now decodes each chunk from Base64 individually and accumulates the binary data, which allows each chunk to be independently encoded and to include padding.

The Kitty graphics protocol documentation states:

“If you are using the graphics protocol from a limited client, such as a shell script, it might be useful to avoid having to process responses from the terminal. For this, you can use the q key. Set it to 1 to suppress OK responses and to 2 to suppress failure responses.”

Therefore, whether q=1 or q=2 should be used requires further investigation.

I plan to close this PR and have you implement this change directly.

@zouyonghe zouyonghe closed this Feb 2, 2026
@hpjansson
Copy link
Owner

Yeah, the documentation seems unclear with respect to whether images uploaded without an ID should generate a response or not. Kitty does not seem to generate a response.

Thanks for following up on this in the iTerm tracker and here. Sometimes the research is the biggest task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants