-
Notifications
You must be signed in to change notification settings - Fork 268
Description
Hello,
It seems to me that the recent commit 19d725d "reject C1 control bytes" is misguided. When the terminals work in UTF-8 mode, they aren't interpreting C1 control bytes; they are interpreting C1 control codepoints, and even the example in the commit message uses UTF-8 encoded codepoints (U+009B encoded as 0xC2 0x9B) rather than single bytes. Yet the function filters raw bytes, whenever they happen to appear as part of an UTF-8 sequence.
However, while 0xC2 0x97 is indeed the UTF-8 encoding of the U+0097 control character, 0xC4 0x97 is instead the UTF-8 encoding of the letter ė: not a control character, and not interpreted as one by any UTF-8 capable terminal, but will be blocked by strchriscntrl() regardless. As a result, chfn -f now rejects my last name as invalid.
The function needs to decode input per the locale codepage and check whether the resulting wchar is in the C1 control range.
(If the user uses a UTF-8 locale but an administrator does not, I don't think that is solvable – it sounds like a self-inflicted problem – and best I can suggest is "use cat -v when reading untrusted files".)