Skip to content

font/shaper: add RTL shaping for Arabic and Hebrew#11079

Open
DiaaEddin wants to merge 9 commits into
ghostty-org:mainfrom
DiaaEddin:bidi-support
Open

font/shaper: add RTL shaping for Arabic and Hebrew#11079
DiaaEddin wants to merge 9 commits into
ghostty-org:mainfrom
DiaaEddin:bidi-support

Conversation

@DiaaEddin
Copy link
Copy Markdown

@DiaaEddin DiaaEddin commented Feb 28, 2026

Arabic and Hebrew text currently renders as disconnected, reversed glyphs because shaping is forced to LTR. This adds proper bidirectional text shaping using the Unicode Bidirectional Algorithm (UAX #9) so RTL scripts render with correct joining forms and visual order.

Scope: shaping only. Cursor positioning, selection, terminal model, and protocol changes are intentionally out of scope and will be addressed in follow-up work.

Approach

Uses itijah v0.2.0 — a pure-Zig UAX #9 implementation I wrote for this integration. MIT licensed, passes the Unicode conformance test suite (BidiTest.txt, BidiCharacterTest.txt), zero C dependencies. Happy to have it mirrored to deps.files.ghostty.org.

itijah provides embedding level resolution and visual run derivation. The run iterator uses visual runs to emit text runs in display order with correct RTL flags, which the shapers (HarfBuzz/CoreText) then shape with the proper direction.

Changes

  • build.zig.zon — add itijah v0.2.0 dependency
  • SharedDeps.zig, GhosttyZig.zig — wire itijah module, share uucode tables
  • uucode_config.zig — add bidi_class, bidi_paired_bracket, joining_type, is_bidi_mirrored
  • run.zig — itijah-based visual run iteration with cached row layout resolution
  • harfbuzz.zig — RTL direction, cluster anchor tracking, sort for monotonic-x
  • coretext.zig — RTL embedding via CTTypesetter, cluster anchor tracking for mark re-anchoring, sort for monotonic-x
  • bidi_helpers.zig — shared Arabic combining mark detection
  • noop.zig, web_canvas.zig — bidi scratch state and RunIterator hook

Known limitations

  • The run iterator walks visual positions, but cursor_x and selection are still in logical coordinates. On bidi lines, run splitting around cursor/selection boundaries can be incorrect. This will be addressed in the terminal-side follow-up.
  • Cursor positioning is wrong when typing RTL in an interactive shell (terminal model is still LTR). Pre-composed output (echo, cat) renders correctly.
  • Selection and copy follow logical order, not visual order.

Testing

21 new tests across both backends covering Arabic and Hebrew RTL shaping, LTR/RTL direction splits, tashkeel positioning, tanween placement, digit handling in RTL context, Bengali cluster anchoring, and mixed direction runs.

Tested manually on macOS arm64 (CoreText backend). Would appreciate help verifying on Linux/x86_64 (HarfBuzz backend) and other targets.

Closes #1442
Related to #1740

AI disclosure: Claude Code (Anthropic, Claude Opus 4.6) and ChatGPT (OpenAI, GPT-5.4) for implementation assistance, code review, and test generation. Reviewed and validated manually.

@DiaaEddin DiaaEddin requested a review from a team as a code owner February 28, 2026 15:32
@trag1c trag1c added the font Issue within the font stack (typically src/font) label Mar 1, 2026
@jcollie jcollie added this to the 1.4.0 milestone Mar 4, 2026
CoreText emits glyphs in non-monotonic order for RTL runs containing
combining marks (tashkeel). The shaper was anchoring cell offsets to
run_offset.x (cumulative advances) instead of position.x (CoreText's
absolute glyph position), causing adjacent base characters to visually
overlap when tashkeel like shadda+kasra appeared at word endings.

Use position.x for RTL anchor writes in both the reset and mark-fallback
paths. Add a re-anchor path for the case where CoreText emits a
combining mark before its base glyph within the same cluster, which
caused overlap in multi-word phrases like "الحيِّ الذي".

Extract isArabicCombiningMark into a shared bidi_helpers module used by
both shapers. Add tashkeel overlap regression tests to both CoreText and
HarfBuzz shapers.

Also clean up the run iterator: cache row layout resolution across
calls, extract fontStyleForStyle/presentationForCell helpers, and
replace an unreachable with continue for neutral characters not found
in the current run font.
Bump itijah dependency to v0.1.8 and add .lazy = true to match
project convention. Extract duplicated font-resolution logic into
resolveFontInfo helper.
The existing tashkeel tests checked cluster assignments but not the
actual x_offset values that were broken. Add assertions that base
glyphs outside the tashkeel cluster have x_offset == 0, which catches
the position.x/run_offset.x divergence the previous commit fixed.
@ramysami
Copy link
Copy Markdown

Is there a build I can try for this ?

@DiaaEddin
Copy link
Copy Markdown
Author

Is there a build I can try for this ?

you can build it yourself on your target system, if you are on macos arm I can share one with you but I am not sure it would work (needs signing etc)

@ramysami
Copy link
Copy Markdown

ramysami commented Apr 17, 2026 via email

@DiaaEddin
Copy link
Copy Markdown
Author

@ramysami here you go https://github.com/DiaaEddin/ghostty/releases

I'm on mac arm, I'd appreciate it if you throw me the build

@ramysami
Copy link
Copy Markdown

ramysami commented May 1, 2026

Works! Thank you.

@oronbz
Copy link
Copy Markdown

oronbz commented May 1, 2026

The build works great! Finally proper support! Hope to see it in soon, thank you so much <3

@commandlinetips
Copy link
Copy Markdown

Thank you for this it helped, I have test your version and it appear your fix is better in minimal change but I encounter some litters don't join correctly like الأن appear not normally i hope my fix help you https://github.com/commandlinetips/ghostty

@DiaaNoon
Copy link
Copy Markdown

DiaaNoon commented May 5, 2026

Thank you for this it helped, I have test your version and it appear your fix is better in minimal change but I encounter some litters don't join correctly like الأن appear not normally i hope my fix help you https://github.com/commandlinetips/ghostty

could you share screenshot of how it appears now
and screenshot of the expected result ?

@commandlinetips
Copy link
Copy Markdown

Thank you for this it helped, I have test your version and it appear your fix is better in minimal change but I encounter some litters don't join correctly like الأن appear not normally i hope my fix help you https://github.com/commandlinetips/ghostty

could you share screenshot of how it appears now and screenshot of the expected result ?

sure this 2 image show the result :
1
2

@DiaaEddin
Copy link
Copy Markdown
Author

DiaaEddin commented May 7, 2026

Thank you for this it helped, I have test your version and it appear your fix is better in minimal change but I encounter some litters don't join correctly like الأن appear not normally i hope my fix help you https://github.com/commandlinetips/ghostty

could you share screenshot of how it appears now and screenshot of the expected result ?

sure this 2 image show the result : 1 2

interesting, I assume what you shared is the correct output from your fork, what I also needed is the output that is wrong and what's the exact issue?
so could you share:
two screen shots of the exact same text that is problematic, one from this pr build and why/how is it wrong, the other screenshot is how the text should look like

I am testing couple of things on my end as-well but your input would be very helpful to pinpoint the issue

UPDATE:

tested your fork on the coretext_harfbuzz backend. This is a font-level Arabic ligature issue, not a shaping bug.

Some Arabic fonts collapse multiple codepoints into a single glyph, e.g. ل + ا (two codepoints / two terminal cells) shape into one لا glyph drawn at the width of a single cell, leaving the other cell empty. Longer sequences like الله collapse four codepoints into one glyph, with three cells left empty. No shaper can fix this in a monospace grid, the only real workaround is to use a monospace-friendly Arabic font that keeps these letters as separate glyphs. Out of scope for this PR.

Could you testالله, الإستقلال, ...and a few tashkeel cases on your fork? Some of those render incorrectly there too.

@commandlinetips
Copy link
Copy Markdown

Thank you @DiaaEddin for testing and the detailed explanation, that really helped me understand the ligature issue. Here are the screenshots you asked for:
3
this is in my machine with your repo .
4
this one with my repo .

Copy link
Copy Markdown
Member

@pluiedev pluiedev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry for taking a while to review this. I think what really made the difference here for me was using a pure-Zig library instead of something like Fribidi — the logic here is a lot more straightforward than any of my experiments a while back. I only caught one Zig nitpick but that shouldn't be hard to fix.

Just one request — I'd like to see more comments documenting the RTL codepaths and any special-case handling we have to do there. The LTR exceptions are already fairly obvious to me who only knows LTR languages, so it'd be great to see the RTL parts elaborated upon too. Other than that, this looks pretty great

Comment thread src/font/shaper/bidi_helpers.zig Outdated
Comment on lines +1 to +5
/// Returns true for Arabic combining marks used by the RTL shaper fallback.
///
/// Scoped to Arabic marks to avoid regressing other scripts with different
/// mark emission behavior (e.g. Chakma/Bengali). Uses explicit ranges because
/// script/general_category are not yet exposed as runtime uucode fields.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't make that much sense to me from an admittedly ignorant LTR user — could you elaborate on this a bit more without using obscure terminology like e.g. "emission behavior"?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded. I replaced “mark emission behavior” with a more concrete explanation: this fallback handles Arabic RTL marks that can arrive before their base glyph, but broadening it to all zero-width marks regressed Bengali/Chakma positioning.

Comment thread src/font/shaper/harfbuzz.zig Outdated
Comment on lines +186 to +187
try self.cluster_anchor_x.ensureUnusedCapacity(self.alloc, run.cells);
self.cluster_anchor_x.appendNTimesAssumeCapacity(anchor_sentinel, run.cells);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear + ensure unused + append N times assuming capacity is just clear + append N times

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here and in the matching CoreText path. After clearRetainingCapacity(), appendNTimes(...) expresses the same operation directly.

- Bump itijah to v0.2.1 (LTR fast path, hasStrongRtl, N0 IRS fix).
- Rewrite bidi_helpers comment to drop "emission behavior" jargon
  and explain the Arabic mark fallback in plain terms.
- Expand RTL path comments in coretext/harfbuzz and simplify the
  cluster_anchor_x init to a single appendNTimes call.
@DiaaEddin
Copy link
Copy Markdown
Author

Thanks for the review!

I pushed the two inline fixes and added more notes around the RTL bits in coretext.zig and harfbuzz.zig: cluster-to-cell mapping, the CoreText position.x anchor case, and the final sort before returning cells.

Also bumped itijah to 0.2.1; no Ghostty-facing API changes there.

Verified with zig build test.

@DiaaEddin DiaaEddin requested a review from pluiedev May 17, 2026 12:42
@DiaaEddin
Copy link
Copy Markdown
Author

@commandlinetips which distro is this btw? could you share fontname/distro ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

font Issue within the font stack (typically src/font)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

minimal RTL support for single lines

8 participants