Commit 7886cdc
committed
Revert removal of multibyte cross-buffer code (re: 781f0a3)
ksh supports multibyte variable names, but they intermittently fail
in long scripts, when fcfill() is called to read the next 64k part
of the script into the buffer.
Reproducer:
$ for((i=0;i<100000;i++));do echo '日本語の変数名=OK';done >foo
$ ksh foo
foo[2622]: 日本語の変数名=OK: not found
foo[7865]: 日本語の変数名=OK: not found
foo[10486]: 日本語の変数名=OK: not found
foo[13108]: 日本語の変数名=OK: not found
foo[15729]: 日本語の変数名=OK: not found
foo[18351]: 日本語の変数名=OK: not found
foo[20972]: 日本語の変数名=OK: not found
foo[26215]: 日本語の変数名=OK: not found
foo[31458]: 日本語の変数名=OK: not found
foo[36701]: 日本語の変数名=OK: not found
foo[41944]: 日本語の変数名=OK: not found
foo[52429]: 日本語の変数名=OK: not found
foo[57672]: 日本語の変数名=OK: not found
foo[62915]: 日本語の変数名=OK: not found
foo[68158]: 日本語の変数名=OK: not found
foo[73401]: 日本語の変数名=OK: not found
foo[76022]: 日本語の変数名=OK: not found
foo[78644]: 日本語の変数名=OK: not found
foo[81265]: 日本語の変数名=OK: not found
foo[83887]: 日本語の変数名=OK: not found
foo[86508]: 日本語の変数名=OK: not found
foo[91751]: 日本語の変数名=OK: not found
foo[96994]: 日本語の変数名=OK: not found
To optimise performance, ksh reads scripts in 64KiB buffer blocks.
This is handled via fcfill() in fcin.c which calls the Sfio
sfreserve() function to read the next buffer.
The bug is triggered when a multibyte character is split between
the end of the current buffer and the beginning of the next. Both
buffers are not available at the same time, because fcfill()
overwrites the previous buffer. Of course this is a problem for all
multibyte character handling and not just multibyte variable names.
As of the referenced commit, _fcmbget() in fcin.c no longer handles
the case of a multibyte character split between buffers. The Red
Hat patch that removed that code is wrong; the code is necessary
for correct multibyte processing. (The patch misled me into
thinking that the whole of the removed code was "for testing
purposes with small buffers", but that was just the removed
MB_LEN_MAX redefine.)
This commit restores that code (minus that MB_LEN_MAX redefine).
After this commit, the behaviour changes:
$ ksh foo
foo: line 18351: : invalid variable name
This is the same behaviour as ksh 93u+ 2012-08-01. It's actually a
crash due to a buffer overflow in lex.c as it tries to read before
the beginning of the current buffer. Dealing with that, and other
similar buffer overflow bugs in lex.c, will be for another commit.
Progresses: #8611 parent 6571250 commit 7886cdc
1 file changed
+47
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
145 | 145 | | |
146 | 146 | | |
147 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
148 | 155 | | |
149 | 156 | | |
150 | | - | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
151 | 181 | | |
152 | 182 | | |
153 | 183 | | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
154 | 200 | | |
155 | 201 | | |
156 | 202 | | |
| |||
0 commit comments