Add IME language selection to InputOptions for CJK support#7914
Add IME language selection to InputOptions for CJK support#7914rustbasic wants to merge 23 commits intoemilk:mainfrom
Conversation
|
Preview available at https://egui-pr-preview.github.io/pr/7914-patch167 View snapshot changes at kitdiff |
|
Hello. In my previous comment, I mentioned that I suspected the root cause of the IME behavior differences on Windows lies in In addition, that PR addresses several issues not covered here. One of them involves a Korean IME case (Hanja confirmation) where simply setting the IME language to Korean does not help, as its behavior more closely resembles that of typical Japanese IMEs. I would appreciate your feedback on my PR (#7967). If you have time to review it, I’d be grateful for any comments or suggestions for improvement (similar to the concerns I raised in my previous comment). Separately, you mentioned that this PR also fixes another IME-related issue. Could you provide more details about that? From the changes, I can tell it relates to a Korean IME using the Cheonjiin layout on Android (web?), but I’m unclear on what the exact issue was and which specific IME is affected. As I understand it, multiple IMEs support this layout. Thank you. |
|
Regarding why language-specific IME selection was previously unnecessary on certain OSs: Until now, the reason we haven't needed to manually select IMEs for each language on specific operating systems is that users typically install an OS tailored to a specific language. During this process, the language-specific IME is installed either automatically or manually, allowing users to use it without much concern. However, since egui is a cross-platform framework that operates across all platforms, it must implement its own IME handling logic internally. Given the unique characteristics of CJK (Chinese, Japanese, and Korean) languages and the existence of various IMEs within each language, it is extremely difficult to handle everything with a single, unified function. To achieve a "one-function-fits-all" approach, we must first complete the handling logic for every language and IME, and then implement a system to automatically recognize them. We cannot skip these essential intermediate steps and jump straight to a perfect, universal solution. It would be ideal if a library like winit handled everything, but we can't just wait around for that. |
I don't quite understand this. what exactly is the “concern” being referred to? Are you suggesting that users typically use IMEs matching their system language? If so, that distinction isn't particularly meaningful. Whether the system language matches the IME language doesn't affect how IMEs function. Many users, myself included, run their systems in English while using IMEs for other languages. If an OS couldn't support multiple IMEs simultaneously, some would fail under an English system, which would be considered a bug. A mature platform must ensure that IMEs work reliably regardless of the system language. Fortunately,
If handling IMEs for multiple languages isn't inherently problematic on a single platform, it's not clear why expanding to multiple platforms would introduce language-specific issues. The real differences lie between platforms, which can be handled using mechanisms like
The reason we should adopt a “universal solution”, regardless of the language or IME a user is using, is precisely because we CAN'T and SHOULD'T attempt to handle every language and IME individually.
Therefore, in The current IME handling logic already follows a “one-function-fits-all” approach. The “language selection” proposed in this PR breaks it, yet can't be pushed further: because, as explained above, handling every language and IME individually is infeasible. In my opinion, it only moves the codebase away from the “one-function-fits-all” goal rather than toward it.
If there are bugs upstream, the proper approach is to fix them there. Temporary workarounds in The main reason I haven't contributed fixes to |
|
on macOS Safari, the builtin Chinese IME (Shuangpin) is broken by default with this PR: on macOS Safari, the builtin Japanese IME is broken is broken by default with this PR: on Windows, the builtin Chinese IME (Shuangpin) is broken by default with this PR: Since these IMEs previously worked out of the box prior to this PR, they should also work out of the box after this PR as well. As I have pointed out here, most developers will not bother integrating this language selection thing, and most affected users are unlikely to discover it even if it is integrated in the app they are using. Note that I am not implying any prioritization of languages. The point is simply that the current approach proposed by this PR is not that feasible. |
|
I'm not home and can't verify this, so this is merely a baseless speculation: By any chance the IME you are addressing here handles the batchim situation by sending a backspace event to delete the first character. And the reason it is broken is because |
|
Since I rely on translators or AI translation, it can be difficult to convey precise nuances. The "Cheonjiin keyboard" does not trigger a backspace event; instead, it returns The primary issue currently lies in the conflicts between CJK languages, so I would like to keep the Korean logic separate for the time being. These IME bugs have remained unresolved for years, and since I slightly mitigated them a year ago, they have stayed in the same state. Fixing one side often leads to issues on the other. It seems wiser to address the bugs by separating CJK first and then determine if they can be merged later. I may be speaking from my own limitations, as I can only test Korean on Windows, WASM, and Android. Perhaps Let's make this happen! |
I can definitely relate. I'm not very confident in my English writing either.
I’ll do my best to fix this Cheonjiin bug while trying not to introduce IME-specific logic.
It would be very helpful if you could reproduce the bug while logging the following 6 events (The more informative, the better): egui/crates/eframe/src/web/text_agent.rs Lines 109 to 117 in 49fad9a Once you have that, please paste the logs here. To access the console on Android Chrome, you can use: Thanks. |
|
For reference, the ** Below are the logs generated while typing the word "않았다" ** |
|
I noticed that I am mistakenly assigning Below are the logs generated while typing "않았다": |
It appears that Chrome dispatches My interpretation is that Chrome does not expect the text agent's content to be cleared while processing Cheonjiin. To avoid introducing negative values in ranges, it may attempt to recover the text state. (e.g. recovering I wonder if we always keep one character at the beginning of the text agent, what would happen in such cases. If the former pattern holds, a possible workaround is to check whether the selection starts at A more robust approach may be to build another IME-handling implementation on top of the EditContext API, as an alternative to the text agent. A proper implementation there should eliminate this kind of inconsistencies for many users(~70%). For the remaining ~30% of users, we can then try to find workarounds, which I feel more difficult to get right. |
|
It should be handled as follows. In #7914, it is currently behaving like this: |
<!-- Please read the "Making a PR" section of [`CONTRIBUTING.md`](https://github.com/emilk/egui/blob/main/CONTRIBUTING.md) before opening a Pull Request! * Keep your PR:s small and focused. * The PR title is what ends up in the changelog, so make it descriptive! * If applicable, add a screenshot or gif. * If it is a non-trivial addition, consider adding a demo for it to `egui_demo_lib`, or a new example. * Do NOT open PR:s from your `master` branch, as that makes it hard for maintainers to test and add commits to your PR. * Remember to run `cargo fmt` and `cargo clippy`. * Open the PR as a draft until you have self-reviewed it and run `./scripts/check.sh`. * When you have addressed a PR comment, mark it as resolved. Please be patient! I will review your PR, but my time is limited! --> * Closes #7809 * Closes #7876 * Closes #7908 * Supersedes #7877 * Supersedes #7898 * The author of the PR above replaced it with #7914, which additionally fixes another IME issue. I believe that fix deserves a separate PR. * Reverts #4794 * [x] I have followed the instructions in the PR template This approach is better than #7898 (#7914) because it correctly handles all three major IME types (Chinese, Japanese, and Korean) without requiring a predefined “IME mode”. ## Environments I haved tested this PR in <details><summary>macOS 15.7.3 (AArch64, Host of other virtual machines)</summary> Run command: `cargo run -p egui_demo_app --release` Tested IMEs: - builtin Chinese IME (Shuangpin - Simplified) - builtin Japanese IME (Romaji) - builtin Korean IME (2-Set) </details> <details><summary>Windows 11 25H2 (AArch64, Virtual Machine)</summary> Build command: `cargo build --release -p egui_demo_app --target=x86_64-pc-windows-gnu --features=glow --no-default-features` (I cannot use `wgpu` due to [this bug](#4381), which prevents debugging inside the VM. Anyways, the rendering backend should be irrelevant here.) Tested IMEs: - builtin Chinese IME (Shuangpin) - Sogou IME (Chinese Shuangpin) - WeType IME (Chinese Shuangpin) - builtin Japanese IME (Hiragana) - builtin Korean IME (2 Beolsik) </details> <details><summary>Linux [Wayland + IBus] (AArch64, Virtual Machine)</summary> Fedora KDE Plasma Desktop 43 [Wayland + IBus 1.5.33-rc2] (Not working at the moment because of [another issue](#7485) that will be fixed by #7983. It is [a complicated story](#7973 (comment)). ) > [!NOTE] > > IBus is partially broken in this system. The Input Method Selector refuses to select IBus. As a workaround, I have to open System Settings -> Virtual Keyboard and select “IBus Wayland” to start an IBus instance that works in egui. > > The funny thing is: the Chinese Intelligent Pinyin IME is broken in native Apps like System Settings and KWrite, but works correctly in egui! > > <details><summary>Screencast: What</summary> > >  > </details> Build command: `cross build --release -p egui_demo_app --target=aarch64-unknown-linux-gnu --features=wayland,wgpu --no-default-features` (The Linux toolchain on my mac is somehow broken, so I used `cross` instead.) Tested IMEs: - Chinese Intelligent Pinyin IME (Shuangpin) - Japanese Anthy IME (Hiragana) - Korean Hangul IME </details> <details><summary>Linux [X11 + Fcitx5] (AArch64, Virtual Machine)</summary> Debian 13 [Cinnamon 6.4.10 + X11 + Fcitx5 5.1.2] Build command: `cross build --release -p egui_demo_app --target=aarch64-unknown-linux-gnu --features=x11,wgpu --no-default-features` Tested IMEs: - Chinese Shuangpin IME - Chinese Rime IME with `luna-pinyin` - Japanese Mozc IME (Hiragana) - Korean Hangul IME Unlike macOS and Linux + Wayland, key-release events for keys processed by the IME are still forwarded to `egui`. These appear to be harmless in practice. Unlike on Windows, however, they cannot be filtered reliably because there are no corresponding key-press events marked as “processed by IME”. </details> --- There are too many possible combinations to test (Operating Systems × [Desktop Environment](https://en.wikipedia.org/wiki/Desktop_environment)s × [Windowing System](https://en.wikipedia.org/wiki/Windowing_system)s × [IMF](https://wiki.archlinux.org/title/Input_method#Input_method_framework)s × [IME](https://en.wikipedia.org/wiki/Input_method)s × …), and I only have access to a limited subset. For example, Google Japanese Input refused to install on my Windows VM, and some paid Japanese IMEs are not accessible to me. Therefore, I would appreciate feedback from people other than me using all kinds of environments. ## Details There are two possible approaches to removing keyboard events that have already been processed by an IME: * Approach 1: Filter out events inside `egui` that appear to have been received during IME composition. * Approach 2: Filter out such events in the platform backend (terminology [borrowed from imgui](https://github.com/ocornut/imgui/blob/master/docs/BACKENDS.md#using-standard-backends), e.g. the `egui-winit` crate or the code under `web/` in the `eframe` crate.). Both approaches already exist in `egui`: * #4794 uses the first approach, filtering these events in the `TextEdit`-related code. * `eframe` uses the second approach in its web integration. See: <https://github.com/emilk/egui/blob/14afefa2521d1baaf4fd02105eec2d3727a7ac36/crates/eframe/src/web/events.rs#L173-L176> Compared to the first approach, the second has a clear advantage: when events are passed from the platform backends into `egui`, they are simplified and lose information. In contrast, events in the platform backends are the original events, which allows them to be handled more flexibly. This is also why #7898 (#7914), which attempts to address the issue from within the `egui` crate, struggles to make all IMEs work correctly at the same time and requires manually selecting an “IME mode”: the events received by `egui` have already been reduced and therefore lack necessary information. A more appropriate solution is to consistently follow the second approach, explicitly requiring platform backends not to forward events that have already been processed by the IME to `egui`. This is the method used in this PR. Specifically, this PR works within the `egui-winit` crate, where the original `KeyboardInput` events can be accessed. At least for key press events, these can be used directly to determine whether the event has already been processed by the IME on Windows (by checking whether `logical_key` equals `winit::keyboard::NamedKey::Process`). This makes it straightforward to ensure that all IMEs work correctly at the same time. This PR also reverts #4794, which took the first approach. It filters out some events that merely look like they were received during IME composition but actually are not. It also messes up the order of those events along the way. As a result, it caused several IME-related issues. One of the sections in the Demonstrations below will illustrate these problems. ## Demonstrations <details><summary>Changes not included in this PR for displaying Unicode characters in demonstrations</summary> Download `unifont-17.0.03.otf` from <https://unifoundry.com/pub/unifont/unifont-17.0.03/font-builds/>, and place it at `crates/egui_demo_app/src/unifont-17.0.03.otf`. In `crates/egui_demo_app/src/wrap_app.rs`, add these lines at the beginning of `impl WrapApp`'s `pub fn new`: ```rust { const MAIN_FONT: &'static [u8] = include_bytes!("./unifont-17.0.03.otf"); let mut fonts = egui::FontDefinitions::default(); fonts.font_data.insert( "main-font".to_owned(), std::sync::Arc::new(egui::FontData::from_static(MAIN_FONT)), ); let proportional = fonts .families .entry(egui::FontFamily::Proportional) .or_default(); proportional.insert(0, "main-font".to_owned()); cc.egui_ctx.set_fonts(fonts); } ``` (I took this from somewhere, but I forgot where it is. Sorry…) </details> [GNU Unifont](https://unifoundry.com/unifont/index.html) is licensed under [OFL-1.1](https://unifoundry.com/OFL-1.1.txt). ### This PR Fixes: Focus on a single-line `TextEdit` is lost after completing candidate selection with Japanese IME on Windows (#7809) <details><summary>Screencast: ✅ Japanese IME now behaves correctly while Korean IME behaves as before</summary>  </details> ### This PR Fixes: Committing Japanese IME text with <kbd>Enter</kbd> inserts an unintended newline in multiline `TextEdit` on Windows (#7876) <details><summary>Screencast: ✅ Japanese IME now behaves correctly while Korean IME behaves as before</summary>  </details> ### This PR Fixes: Backspacing deletes characters during composition in certain Chinese IMEs (e.g., Sogou) on Windows (#7908) <details><summary>Screencast: ✅ Sogou IME now behaves correctly</summary>  </details> ### This PR Obsoletes #4794, because `egui` receives only IME events during composition from now on On Windows, “incompatible” events are filtered in `egui-winit`, aligning the behavior with other systems. <details><summary>Screencasts</summary> Some Chinese IMEs on Windows:  The default Japanese IMEs on Windows:  </details> The 2-set Korean IMEs handle arrow keys differently. It will be discussed in the next section. ### This PR Reverts #4794, because it introduced several bugs Some of its bugs have already been worked around in the past, but those workarounds might also be problematic. For example, #4912 is a workaround for a bug (#4908) introduced by #4794, and that workaround is in fact the root cause of the macOS backspacing bug I have worked around with #7810. (The reversion of #4912 is out of the scope of this PR, I will do that in #7983.) #### It Caused: Arrow keys are incorrectly blocked during typical Korean IME composition When composing Korean text using 2-Set IMEs, users should still be able to move the cursor with arrow keys regardless if the composition is committed. ##### Correct behavior <details><summary>Screencasts</summary> macOS TextEdit:  Windows Notepad:  With #4794 reverted, `egui` also behaves correctly (tested on Linux + Wayland, macOS, and Windows):  </details> ##### Incorrect behavior caused by #4794 `remove_ime_incompatible_events` removed arrow-key events in such cases. As a result, the first arrow key press only commits the composition, and users need to press the arrow key again to move the cursor: <details><summary>Screencast</summary>  </details> This is essentially the same issue described here: #7877 (comment) #### It Caused: Backspacing leaves the last character in Korean IME pre-edit text not removed on macOS <details><summary>Screencasts</summary> Before this PR:  After this PR:  </details> ### Korean IMEs also use <kbd>Enter</kbd> to confirm Hanja selections, and will not work properly in the Korean “IME mode” proposed by #7898 (#7914) <details><summary>Screencast: Korean IME using <kbd>Enter</kbd> and <kbd>Space</kbd> for confirmation (IBus Korean Hangul IME)</summary> The screencast below demonstrates that some Korean IMEs handle Hanja selection in a way similar to Japanese IMEs: the <kbd>Up</kbd>/<kbd>Down</kbd> arrow keys are used to navigate candidates, and <kbd>Enter</kbd> confirms the selected candidate.  </details> <details><summary>Screencasts: Another example</summary> Using the built-in Korean IME on Windows, I type two lines: the first line in Hangul, and the second line as the same word converted to Hanja. Correct behavior in Notepad (reference):  Behavior after applying this PR, which matches the Notepad behavior:  Behavior after applying #7914 with the “IME mode” set to Korean (which is also the behavior before this PR being applied):  On the second line, each time a Hanja character is confirmed, an unintended newline is inserted. This mirrors the Japanese IME issues that are supposed to be fixed by setting the “IME mode” to Japanese. (These Japanese IME issues are fixed in this PR as mentioned before.) </details>



Add IME language selection to InputOptions for CJK support
Overview
This PR adds a new configuration option to select specific IME (Input Method Editor) processing modes, particularly for CJK (Chinese, Japanese, Korean) languages. This allows for more specialized handling of character composition depending on the selected language.
Changes
ImeLanguageenum (None, Korean, Japanese, Chinese) withserdesupport.ime_languagefield toInputOptionsstruct.InputOptions::uito include aComboBoxfor easy language selection in the settings panel.Why this is needed
Proper CJK support often requires different composition logic depending on the language. By allowing users to explicitly select their IME mode, we can provide a more robust input experience for international users.
In addition to #7898, a bug in Korean input on mobile (Android) has been fixed.