Is bgp_writes_on() needed in the error path of bgp_connect_success()? #21067

ecastifor · 2026-03-10T11:39:26Z

ecastifor
Mar 10, 2026

Question: is `bgp_writes_on()` needed in the error path of `bgp_connect_success()`?

I'm looking at the error path in bgp_connect_success() when bgp_getsockname() fails.
The code calls bgp_notify_send() followed by bgp_writes_on() before returning
BGP_FSM_FAILURE:

// bgp_fsm.c:2290-2297
bgp_notify_send(connection, BGP_NOTIFY_FSM_ERR,
                bgp_fsm_error_subcode(connection->status));
bgp_writes_on(connection);
return BGP_FSM_FAILURE;

The same pattern exists in bgp_connect_success_w_delayopen() at line 2347.

My understanding (please correct me if I'm wrong)

As far as I can tell, bgp_notify_send() writes the NOTIFICATION synchronously:
bgp_notify_send_internal() cleans the output buffer
(stream_fifo_clean(connection->obuf) at bgp_packet.c:981), pushes the
NOTIFICATION packet, and then calls bgp_write_notify() which pops it from
obuf and does a direct write() to the socket (bgp_packet.c:748-759). If
that's correct, by the time bgp_notify_send() returns connection->obuf
should be empty, and the subsequent bgp_writes_on() would have nothing to
flush.

I also noticed that every other call site of bgp_writes_on() in the codebase
seems to follow the pattern of first enqueueing a packet in obuf and then
calling bgp_writes_on() to flush it. These two calls in the error paths
appear to be the only exceptions — unless I'm missing something.

Potential issue

If the bgp_writes_on() call is indeed unnecessary, it could also be
problematic: it schedules connection->t_write on the I/O pthread
(bgp_pth_io), and the subsequent bgp_stop() (triggered by the
BGP_FSM_FAILURE return) cancels it with event_cancel_async(). Since the
cancellation targets a different thread, it may not complete immediately. In
theory, if a BGP_Start event arrives in that window (e.g., from an NHT
update), bgp_start() could hit assert(!connection->t_write) at
bgp_fsm.c:2513.

We hit what looks like this scenario in our deployment, but I wanted to confirm
my reading of the code before proposing any change.

Context

These two bgp_writes_on() calls seem to have been introduced in commit
424ab01d0f ("bgpd: implement buffered reads") as part of the I/O model
refactoring. In the success path of bgp_connect_success() the calls were
replaced by bgp_reads_on() + bgp_open_send(), but in the error path
bgp_writes_on() remained. It looks like it might have been an oversight,
but I'm not sure if there's a reason for it that I'm not seeing.

Would this removal be safe?

--- a/bgpd/bgp_fsm.c
+++ b/bgpd/bgp_fsm.c
@@ -2293,7 +2293,6 @@ bgp_connect_success(struct peer_connection *connection)
 			     __func__, peer->host, connection->fd);
 		bgp_notify_send(connection, BGP_NOTIFY_FSM_ERR,
 				bgp_fsm_error_subcode(connection->status));
-		bgp_writes_on(connection);
 		return BGP_FSM_FAILURE;
 	}

@@ -2344,7 +2343,6 @@ bgp_connect_success_w_delayopen(struct peer_connection *connection)
 			     __func__, peer->host, connection->fd);
 		bgp_notify_send(connection, BGP_NOTIFY_FSM_ERR,
 				bgp_fsm_error_subcode(connection->status));
-		bgp_writes_on(connection);
 		return BGP_FSM_FAILURE;
 	}

Any insight would be appreciated. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is bgp_writes_on() needed in the error path of bgp_connect_success()? #21067

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Is bgp_writes_on() needed in the error path of bgp_connect_success()? #21067

Uh oh!

ecastifor Mar 10, 2026

Question: is bgp_writes_on() needed in the error path of bgp_connect_success()?

My understanding (please correct me if I'm wrong)

Potential issue

Context

Would this removal be safe?

Replies: 0 comments

ecastifor
Mar 10, 2026

Question: is `bgp_writes_on()` needed in the error path of `bgp_connect_success()`?