font/shaper: add RTL shaping for Arabic and Hebrew#11079
Conversation
CoreText emits glyphs in non-monotonic order for RTL runs containing combining marks (tashkeel). The shaper was anchoring cell offsets to run_offset.x (cumulative advances) instead of position.x (CoreText's absolute glyph position), causing adjacent base characters to visually overlap when tashkeel like shadda+kasra appeared at word endings. Use position.x for RTL anchor writes in both the reset and mark-fallback paths. Add a re-anchor path for the case where CoreText emits a combining mark before its base glyph within the same cluster, which caused overlap in multi-word phrases like "الحيِّ الذي". Extract isArabicCombiningMark into a shared bidi_helpers module used by both shapers. Add tashkeel overlap regression tests to both CoreText and HarfBuzz shapers. Also clean up the run iterator: cache row layout resolution across calls, extract fontStyleForStyle/presentationForCell helpers, and replace an unreachable with continue for neutral characters not found in the current run font.
Bump itijah dependency to v0.1.8 and add .lazy = true to match project convention. Extract duplicated font-resolution logic into resolveFontInfo helper.
The existing tashkeel tests checked cluster assignments but not the actual x_offset values that were broken. Add assertions that base glyphs outside the tashkeel cluster have x_offset == 0, which catches the position.x/run_offset.x divergence the previous commit fixed.
|
Is there a build I can try for this ? |
you can build it yourself on your target system, if you are on macos arm I can share one with you but I am not sure it would work (needs signing etc) |
|
I'm on mac arm, I'd appreciate it if you throw me the build
|
|
@ramysami here you go https://github.com/DiaaEddin/ghostty/releases
|
|
Works! Thank you. |
|
The build works great! Finally proper support! Hope to see it in soon, thank you so much <3 |
|
Thank you for this it helped, I have test your version and it appear your fix is better in minimal change but I encounter some litters don't join correctly like الأن appear not normally i hope my fix help you https://github.com/commandlinetips/ghostty |
could you share screenshot of how it appears now |
|
interesting, I assume what you shared is the correct output from your fork, what I also needed is the output that is wrong and what's the exact issue? I am testing couple of things on my end as-well but your input would be very helpful to pinpoint the issue UPDATE:tested your fork on the coretext_harfbuzz backend. This is a font-level Arabic ligature issue, not a shaping bug. Some Arabic fonts collapse multiple codepoints into a single glyph, e.g. Could you test |
|
Thank you @DiaaEddin for testing and the detailed explanation, that really helped me understand the ligature issue. Here are the screenshots you asked for: |
pluiedev
left a comment
There was a problem hiding this comment.
Hey, sorry for taking a while to review this. I think what really made the difference here for me was using a pure-Zig library instead of something like Fribidi — the logic here is a lot more straightforward than any of my experiments a while back. I only caught one Zig nitpick but that shouldn't be hard to fix.
Just one request — I'd like to see more comments documenting the RTL codepaths and any special-case handling we have to do there. The LTR exceptions are already fairly obvious to me who only knows LTR languages, so it'd be great to see the RTL parts elaborated upon too. Other than that, this looks pretty great
| /// Returns true for Arabic combining marks used by the RTL shaper fallback. | ||
| /// | ||
| /// Scoped to Arabic marks to avoid regressing other scripts with different | ||
| /// mark emission behavior (e.g. Chakma/Bengali). Uses explicit ranges because | ||
| /// script/general_category are not yet exposed as runtime uucode fields. |
There was a problem hiding this comment.
This comment doesn't make that much sense to me from an admittedly ignorant LTR user — could you elaborate on this a bit more without using obscure terminology like e.g. "emission behavior"?
There was a problem hiding this comment.
Reworded. I replaced “mark emission behavior” with a more concrete explanation: this fallback handles Arabic RTL marks that can arrive before their base glyph, but broadening it to all zero-width marks regressed Bengali/Chakma positioning.
| try self.cluster_anchor_x.ensureUnusedCapacity(self.alloc, run.cells); | ||
| self.cluster_anchor_x.appendNTimesAssumeCapacity(anchor_sentinel, run.cells); |
There was a problem hiding this comment.
clear + ensure unused + append N times assuming capacity is just clear + append N times
There was a problem hiding this comment.
Fixed here and in the matching CoreText path. After clearRetainingCapacity(), appendNTimes(...) expresses the same operation directly.
- Bump itijah to v0.2.1 (LTR fast path, hasStrongRtl, N0 IRS fix). - Rewrite bidi_helpers comment to drop "emission behavior" jargon and explain the Arabic mark fallback in plain terms. - Expand RTL path comments in coretext/harfbuzz and simplify the cluster_anchor_x init to a single appendNTimes call.
|
Thanks for the review! I pushed the two inline fixes and added more notes around the RTL bits in coretext.zig and harfbuzz.zig: cluster-to-cell mapping, the CoreText position.x anchor case, and the final sort before returning cells. Also bumped itijah to 0.2.1; no Ghostty-facing API changes there. Verified with |
|
@commandlinetips which distro is this btw? could you share fontname/distro ? |






Arabic and Hebrew text currently renders as disconnected, reversed glyphs because shaping is forced to LTR. This adds proper bidirectional text shaping using the Unicode Bidirectional Algorithm (UAX #9) so RTL scripts render with correct joining forms and visual order.
Scope: shaping only. Cursor positioning, selection, terminal model, and protocol changes are intentionally out of scope and will be addressed in follow-up work.
Approach
Uses itijah v0.2.0 — a pure-Zig UAX #9 implementation I wrote for this integration. MIT licensed, passes the Unicode conformance test suite (BidiTest.txt, BidiCharacterTest.txt), zero C dependencies. Happy to have it mirrored to
deps.files.ghostty.org.itijah provides embedding level resolution and visual run derivation. The run iterator uses visual runs to emit text runs in display order with correct RTL flags, which the shapers (HarfBuzz/CoreText) then shape with the proper direction.
Changes
bidi_class,bidi_paired_bracket,joining_type,is_bidi_mirroredKnown limitations
cursor_xandselectionare still in logical coordinates. On bidi lines, run splitting around cursor/selection boundaries can be incorrect. This will be addressed in the terminal-side follow-up.echo,cat) renders correctly.Testing
21 new tests across both backends covering Arabic and Hebrew RTL shaping, LTR/RTL direction splits, tashkeel positioning, tanween placement, digit handling in RTL context, Bengali cluster anchoring, and mixed direction runs.
Tested manually on macOS arm64 (CoreText backend). Would appreciate help verifying on Linux/x86_64 (HarfBuzz backend) and other targets.
Closes #1442
Related to #1740
AI disclosure: Claude Code (Anthropic, Claude Opus 4.6) and ChatGPT (OpenAI, GPT-5.4) for implementation assistance, code review, and test generation. Reviewed and validated manually.