Skip to content

Commit 7886cdc

Browse files
committed
Revert removal of multibyte cross-buffer code (re: 781f0a3)
ksh supports multibyte variable names, but they intermittently fail in long scripts, when fcfill() is called to read the next 64k part of the script into the buffer. Reproducer: $ for((i=0;i<100000;i++));do echo '日本語の変数名=OK';done >foo $ ksh foo foo[2622]: 日本語の変数名=OK: not found foo[7865]: 日本語の変数名=OK: not found foo[10486]: 日本語の変数名=OK: not found foo[13108]: 日本語の変数名=OK: not found foo[15729]: 日本語の変数名=OK: not found foo[18351]: 日本語の変数名=OK: not found foo[20972]: 日本語の変数名=OK: not found foo[26215]: 日本語の変数名=OK: not found foo[31458]: 日本語の変数名=OK: not found foo[36701]: 日本語の変数名=OK: not found foo[41944]: 日本語の変数名=OK: not found foo[52429]: 日本語の変数名=OK: not found foo[57672]: 日本語の変数名=OK: not found foo[62915]: 日本語の変数名=OK: not found foo[68158]: 日本語の変数名=OK: not found foo[73401]: 日本語の変数名=OK: not found foo[76022]: 日本語の変数名=OK: not found foo[78644]: 日本語の変数名=OK: not found foo[81265]: 日本語の変数名=OK: not found foo[83887]: 日本語の変数名=OK: not found foo[86508]: 日本語の変数名=OK: not found foo[91751]: 日本語の変数名=OK: not found foo[96994]: 日本語の変数名=OK: not found To optimise performance, ksh reads scripts in 64KiB buffer blocks. This is handled via fcfill() in fcin.c which calls the Sfio sfreserve() function to read the next buffer. The bug is triggered when a multibyte character is split between the end of the current buffer and the beginning of the next. Both buffers are not available at the same time, because fcfill() overwrites the previous buffer. Of course this is a problem for all multibyte character handling and not just multibyte variable names. As of the referenced commit, _fcmbget() in fcin.c no longer handles the case of a multibyte character split between buffers. The Red Hat patch that removed that code is wrong; the code is necessary for correct multibyte processing. (The patch misled me into thinking that the whole of the removed code was "for testing purposes with small buffers", but that was just the removed MB_LEN_MAX redefine.) This commit restores that code (minus that MB_LEN_MAX redefine). After this commit, the behaviour changes: $ ksh foo foo: line 18351: : invalid variable name This is the same behaviour as ksh 93u+ 2012-08-01. It's actually a crash due to a buffer overflow in lex.c as it tries to read before the beginning of the current buffer. Dealing with that, and other similar buffer overflow bugs in lex.c, will be for another commit. Progresses: #861
1 parent 6571250 commit 7886cdc

File tree

1 file changed

+47
-1
lines changed

1 file changed

+47
-1
lines changed

src/cmd/ksh93/sh/fcin.c

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,12 +145,58 @@ extern void fcrestore(Fcin_t *fp)
145145
}
146146

147147
#if SHOPT_MULTIBYTE
148+
/* struct for part of a multibyte character that crosses buffer boundaries */
149+
struct Extra
150+
{
151+
unsigned char buff[2*MB_LEN_MAX];
152+
unsigned char *next;
153+
};
154+
148155
int _fcmbget(short *len)
149156
{
150-
int c;
157+
static struct Extra extra;
158+
int i, c, n;
159+
/*
160+
* Check if we need to piece together a split multibyte
161+
* character started at the end of the previous buffer.
162+
*/
163+
if(_Fcin.fcleft)
164+
{
165+
if((c = mbsize(extra.next)) < 0)
166+
c = 1;
167+
if((_Fcin.fcleft -= c) <=0)
168+
{
169+
_Fcin.fcptr = (unsigned char*)fcfirst() - _Fcin.fcleft;
170+
_Fcin.fcleft = 0;
171+
}
172+
*len = c;
173+
if(c==1)
174+
c = *extra.next++;
175+
else if(c==0)
176+
_Fcin.fcleft = 0;
177+
else
178+
c = mbchar(extra.next);
179+
return c;
180+
}
151181
switch(*len = mbsize(_Fcin.fcptr))
152182
{
153183
case -1:
184+
/*
185+
* Invalid multibyte character. Check if we're near the end of the buffer; if so,
186+
* the multibyte character is probably split between this buffer and the next.
187+
*/
188+
if(_Fcin._fcfile && (n=(_Fcin.fclast-_Fcin.fcptr)) < MB_LEN_MAX)
189+
{
190+
memcpy(extra.buff, _Fcin.fcptr, n);
191+
_Fcin.fcptr = _Fcin.fclast;
192+
/* fcgetc() will read the next buffer via fcfill() */
193+
for(i=n; i < MB_LEN_MAX+n; i++)
194+
if((extra.buff[i] = fcgetc())==0)
195+
break;
196+
_Fcin.fcleft = n;
197+
extra.next = extra.buff;
198+
return _fcmbget(len);
199+
}
154200
*len = 1;
155201
/* FALLTHROUGH */
156202
case 0:

0 commit comments

Comments
 (0)