You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Question: is bgp_writes_on() needed in the error path of bgp_connect_success()?
I'm looking at the error path in bgp_connect_success() when bgp_getsockname() fails.
The code calls bgp_notify_send() followed by bgp_writes_on() before returning BGP_FSM_FAILURE:
The same pattern exists in bgp_connect_success_w_delayopen() at line 2347.
My understanding (please correct me if I'm wrong)
As far as I can tell, bgp_notify_send() writes the NOTIFICATION synchronously: bgp_notify_send_internal() cleans the output buffer
(stream_fifo_clean(connection->obuf) at bgp_packet.c:981), pushes the
NOTIFICATION packet, and then calls bgp_write_notify() which pops it from obuf and does a direct write() to the socket (bgp_packet.c:748-759). If
that's correct, by the time bgp_notify_send() returns connection->obuf
should be empty, and the subsequent bgp_writes_on() would have nothing to
flush.
I also noticed that every other call site of bgp_writes_on() in the codebase
seems to follow the pattern of first enqueueing a packet in obuf and then
calling bgp_writes_on() to flush it. These two calls in the error paths
appear to be the only exceptions — unless I'm missing something.
Potential issue
If the bgp_writes_on() call is indeed unnecessary, it could also be
problematic: it schedules connection->t_write on the I/O pthread
(bgp_pth_io), and the subsequent bgp_stop() (triggered by the BGP_FSM_FAILURE return) cancels it with event_cancel_async(). Since the
cancellation targets a different thread, it may not complete immediately. In
theory, if a BGP_Start event arrives in that window (e.g., from an NHT
update), bgp_start() could hit assert(!connection->t_write) at bgp_fsm.c:2513.
We hit what looks like this scenario in our deployment, but I wanted to confirm
my reading of the code before proposing any change.
Context
These two bgp_writes_on() calls seem to have been introduced in commit 424ab01d0f ("bgpd: implement buffered reads") as part of the I/O model
refactoring. In the success path of bgp_connect_success() the calls were
replaced by bgp_reads_on() + bgp_open_send(), but in the error path bgp_writes_on() remained. It looks like it might have been an oversight,
but I'm not sure if there's a reason for it that I'm not seeing.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Question: is
bgp_writes_on()needed in the error path ofbgp_connect_success()?I'm looking at the error path in
bgp_connect_success()whenbgp_getsockname()fails.The code calls
bgp_notify_send()followed bybgp_writes_on()before returningBGP_FSM_FAILURE:The same pattern exists in
bgp_connect_success_w_delayopen()at line 2347.My understanding (please correct me if I'm wrong)
As far as I can tell,
bgp_notify_send()writes the NOTIFICATION synchronously:bgp_notify_send_internal()cleans the output buffer(
stream_fifo_clean(connection->obuf)atbgp_packet.c:981), pushes theNOTIFICATION packet, and then calls
bgp_write_notify()which pops it fromobufand does a directwrite()to the socket (bgp_packet.c:748-759). Ifthat's correct, by the time
bgp_notify_send()returnsconnection->obufshould be empty, and the subsequent
bgp_writes_on()would have nothing toflush.
I also noticed that every other call site of
bgp_writes_on()in the codebaseseems to follow the pattern of first enqueueing a packet in
obufand thencalling
bgp_writes_on()to flush it. These two calls in the error pathsappear to be the only exceptions — unless I'm missing something.
Potential issue
If the
bgp_writes_on()call is indeed unnecessary, it could also beproblematic: it schedules
connection->t_writeon the I/O pthread(
bgp_pth_io), and the subsequentbgp_stop()(triggered by theBGP_FSM_FAILUREreturn) cancels it withevent_cancel_async(). Since thecancellation targets a different thread, it may not complete immediately. In
theory, if a
BGP_Startevent arrives in that window (e.g., from an NHTupdate),
bgp_start()could hitassert(!connection->t_write)atbgp_fsm.c:2513.We hit what looks like this scenario in our deployment, but I wanted to confirm
my reading of the code before proposing any change.
Context
These two
bgp_writes_on()calls seem to have been introduced in commit424ab01d0f("bgpd: implement buffered reads") as part of the I/O modelrefactoring. In the success path of
bgp_connect_success()the calls werereplaced by
bgp_reads_on()+bgp_open_send(), but in the error pathbgp_writes_on()remained. It looks like it might have been an oversight,but I'm not sure if there's a reason for it that I'm not seeing.
Would this removal be safe?
Any insight would be appreciated. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions