-
Notifications
You must be signed in to change notification settings - Fork 151
Description
Unicode UTF-8 does not work in Windows console. But Unicode in the form of UCS-2, later around Windows 2000 timeframe extended and renamed to UTF-16, does work just fine.
nob.h does use some Unicode output, such as in trace_temp_alloc it uses the → arrow. Maybe it uses some more Unicode strings, but so far I noticed none. In the future it might use the "cancer logger" with colors and Unicode stuff.
Thus I would like to open a discussion how to support Unicode on Windows. Windows is completely capable of using Unicode, but not in form of UTF-8, it does support UTF-16 for more than two decades already. This issue is about console output, there might be separate issue about file names. There are multiple ways how to add Unicode console output. I would call them the "C way" and the "Windows way".
The "C way":
- The C language supports the
wprintffunction since C95 or C99 version. - That would mean wrapping all
printfcalls with some macro, that would then conditionally add anLprefix to all C-string literals and changing fromprintftowprintf. - This would solve only the "simple" cases of primitive
printfcalls with literals only. More advanced calls that construct the format string piece-wise or construct string arguments would also need to be changed to usewchar_tinstead ofcharbased strings. - Basically changing every string ever used from
charto something likenob_char_t. Wherenob_char_twould be conditionally typedef tocharorwchar_t. - The regular old
printffunction also supports the%lsformat specifier. Meaning the format could remain as plain oldchar*and only the argument beingwchar_t*based string. - Basically all nob code and user code that uses nob must be
wchar_taware. This is a viral change (spreads like a virus). This is the same as we all did int he 90's with_Tor_TEXTmacro to support both Windows95 and WindowsNT. - I would call this lot of work for little benefit.
The "Windows way":
- Console output is done via the WriteConsoleW function. This function write directly to the console (if it is even available) and does not redirect to file if the use specifies the redirection when they execute a new command.
- WriteFile function could be used instead of WriteConsoleW if redirection is detected. In that case I would suggest UTF-8 output instead of UTF-16 one.
- Redirection could be detected by combination of GetStdHandle and GetFileType functions. The result could be one of three variants: 1) The stdout does not exist. 2) The stdout does exist and is the console. 3) The stdout does exist and is redirected to a file (or a pipe).
- I would suggest to keep everything as UTF-8, only change from
printftosprintffor formatting. And use different function for final output to the screen (or redirected output). - Only the new final output function would need some
#ifdef Winodwslove. This would be improvement over the first "C way" as this change is not "viral" - it does not need to affect the entire mindset of a nob user. They could still continue to use their UTF-8 strings everywhere and at the end, just before the final console output, we would translate to UTF-16 for them. - There are the WideCharToMultiByte and MultiByteToWideChar functions exactly for this purpose.
- Another change that would be needed is the directory walker. Currently it uses the ANSI variant. Thus unable to walk Unicode directories. It would need to be changed from FindFirstFileA to FindFirstFileW and the found names then immediately converted from UTF-16 to UTF-8, so the "viral infection" does not spread.
- I would call this not a lot of work for a big benefit.