Skip to content

fix(io): reconfigure Windows stdio to UTF-8 so piped output emits Unicode (#4294)#5191

Open
leavedrop wants to merge 1 commit into
Aider-AI:mainfrom
leavedrop:fix/4294-windows-stdio-utf8
Open

fix(io): reconfigure Windows stdio to UTF-8 so piped output emits Unicode (#4294)#5191
leavedrop wants to merge 1 commit into
Aider-AI:mainfrom
leavedrop:fix/4294-windows-stdio-utf8

Conversation

@leavedrop
Copy link
Copy Markdown

Fixes #4294.

aider --show-repo-map > .aider.map.md crashes on Windows with
charmap codec can't encode character '⋮'. When stdout is
redirected to a file or pager, Windows defaults the stream to the
system ANSI codepage (cp1252 in en-US, gbk in zh-CN, etc.). rich's
legacy Windows renderer then tries to write (used by repo-map)
through that codec and fails.

This patch reconfigures sys.stdout and sys.stderr to UTF-8 at the
top of main() on Windows. UTF-8 can encode every Unicode codepoint,
so the same flow now writes successfully to a pipe or file. Non-Windows
platforms are unaffected. Test harnesses that replaced the streams
with non-TextIOWrapper objects (e.g. StringIO) are tolerated via
try/except — reconfigure() raises AttributeError/ValueError
there and we swallow it.

The call sits at the very top of main() (before InputOutput() /
rich Console() are constructed at line 577) so the renderer picks
up the new encoding when it captures stdout.

Verification (Windows 11, Python 3.12, zh-CN GBK locale)

Without the patch:

stdout encoding: gbk
UnicodeEncodeError: 'gbk' codec can't encode character '⋮'

With the patch:

stdout encoding: utf-8
vertical ellipsis works: ⋮

The original issue comment notes "setting PYTHONUTF8=1 prevents the
crashes for me" — confirming the encoding switch is the right fix.
This PR removes the need for users to set the env var.

Tests

Added tests/basic/test_main.py::TestEnsureUtf8Stdio:

  • test_noop_on_non_windows — Linux/macOS streams untouched
  • test_reconfigures_both_streams_on_windows — both stdout/stderr → utf-8
  • test_safe_when_reconfigure_unavailableAttributeError swallowed

All 3 new tests pass. Sanity-ran the existing
TestMain::test_main_with_empty_dir_no_files_on_command,
test_main_with_emptqy_dir_new_file, and test_cache_without_stream_no_warning
to confirm no regression.

…code

aider --show-repo-map > file.md crashes on Windows with charmap codec
errors because stdout defaults to the system ANSI codepage (cp1252,
gbk, etc.) when redirected to a file or pager. rich's legacy Windows
renderer then fails on characters aider routinely emits (vertical
ellipsis U+22EE used by repo-map).

Add _ensure_utf8_stdio() called at the top of main() that reconfigures
sys.stdout / sys.stderr to UTF-8 on Windows (no-op elsewhere). The call
sits before InputOutput / rich Console construction so the renderer
captures the new encoding. Test harnesses that replaced the streams
with non-TextIOWrapper objects are tolerated via try/except.

Verified on Windows 11 / Python 3.12 / zh-CN GBK locale: piping output
to a file now succeeds for chars that previously crashed.

Fixes Aider-AI#4294.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

charmap' codec can't encode character In powershell

1 participant