fix(io): reconfigure Windows stdio to UTF-8 so piped output emits Unicode (#4294)#5191
Open
leavedrop wants to merge 1 commit into
Open
fix(io): reconfigure Windows stdio to UTF-8 so piped output emits Unicode (#4294)#5191leavedrop wants to merge 1 commit into
leavedrop wants to merge 1 commit into
Conversation
…code aider --show-repo-map > file.md crashes on Windows with charmap codec errors because stdout defaults to the system ANSI codepage (cp1252, gbk, etc.) when redirected to a file or pager. rich's legacy Windows renderer then fails on characters aider routinely emits (vertical ellipsis U+22EE used by repo-map). Add _ensure_utf8_stdio() called at the top of main() that reconfigures sys.stdout / sys.stderr to UTF-8 on Windows (no-op elsewhere). The call sits before InputOutput / rich Console construction so the renderer captures the new encoding. Test harnesses that replaced the streams with non-TextIOWrapper objects are tolerated via try/except. Verified on Windows 11 / Python 3.12 / zh-CN GBK locale: piping output to a file now succeeds for chars that previously crashed. Fixes Aider-AI#4294.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #4294.
aider --show-repo-map > .aider.map.mdcrashes on Windows withcharmap codec can't encode character '⋮'. When stdout isredirected to a file or pager, Windows defaults the stream to the
system ANSI codepage (cp1252 in en-US, gbk in zh-CN, etc.). rich's
legacy Windows renderer then tries to write
⋮(used by repo-map)through that codec and fails.
This patch reconfigures
sys.stdoutandsys.stderrto UTF-8 at thetop of
main()on Windows. UTF-8 can encode every Unicode codepoint,so the same flow now writes successfully to a pipe or file. Non-Windows
platforms are unaffected. Test harnesses that replaced the streams
with non-TextIOWrapper objects (e.g.
StringIO) are tolerated viatry/except —
reconfigure()raisesAttributeError/ValueErrorthere and we swallow it.
The call sits at the very top of
main()(beforeInputOutput()/rich
Console()are constructed at line 577) so the renderer picksup the new encoding when it captures stdout.
Verification (Windows 11, Python 3.12, zh-CN GBK locale)
Without the patch:
With the patch:
The original issue comment notes "setting
PYTHONUTF8=1prevents thecrashes for me" — confirming the encoding switch is the right fix.
This PR removes the need for users to set the env var.
Tests
Added
tests/basic/test_main.py::TestEnsureUtf8Stdio:test_noop_on_non_windows— Linux/macOS streams untouchedtest_reconfigures_both_streams_on_windows— both stdout/stderr → utf-8test_safe_when_reconfigure_unavailable—AttributeErrorswallowedAll 3 new tests pass. Sanity-ran the existing
TestMain::test_main_with_empty_dir_no_files_on_command,test_main_with_emptqy_dir_new_file, andtest_cache_without_stream_no_warningto confirm no regression.