Skip to content

bgpd: Fix BGP-LS initial TED sync and cleanup on peer deactivation#163

Open
suphawitwanl wants to merge 3 commits intoaitestfrom
aitest-21286-test-dot-greptile
Open

bgpd: Fix BGP-LS initial TED sync and cleanup on peer deactivation#163
suphawitwanl wants to merge 3 commits intoaitestfrom
aitest-21286-test-dot-greptile

Conversation

@suphawitwanl
Copy link

This PR fixes two issues in the BGP-LS code.

Initial sync: After registering with the LS database, BGP-LS never requests a sync, leaving the TED empty until the IGP sends unsolicited updates. Fix by calling ls_request_sync() after registration.

Peer deactivation: When the last BGP-LS peer is deactivated, self-originated routes are not withdrawn and the TED is not cleared, leaving stale state. Fix by adding bgp_ls_withdraw_ted() and calling it from peer_deactivate().

After registering with the LS database, no initial sync is requested,
so the TED remains empty until the IGP sends unsolicited updates.
Any topology changes that occurred before registration are
permanently missed and never originated as BGP-LS NLRIs.

Additionally, LS_MSG_EVENT_SYNC messages are not handled in the TED
processors, so any sync response from zebra is silently dropped.

Request a sync via ls_request_sync() immediately after registration,
following the same pattern used by other TED consumers such as pathd.
Handle LS_MSG_EVENT_SYNC alongside LS_MSG_EVENT_ADD in
bgp_ls_process_vertex(), bgp_ls_process_edge(), and
bgp_ls_process_subnet().

Signed-off-by: Carmine Scarpitta <[email protected]>
When the last BGP-LS peer is deactivated, locally originated BGP-LS
routes are not withdrawn from the RIB, leaving stale routes on peers.
The TED is also not cleared, so the next registration re-originates
on top of stale state.

Add bgp_ls_withdraw_ted() which removes all self-originated paths via
bgp_clear_route() and clears all TED entries. Call it in
peer_deactivate() when the last BGP-LS peer is being deactivated,
before unregistering from the LS database.

Signed-off-by: Carmine Scarpitta <[email protected]>
Add test_bgp_ls_peer_deactivate() to verify that deactivating the
last BGP-LS peer on r2 withdraws all locally originated routes on
r2 and clears all received routes on rr.

Add test_bgp_ls_peer_reactivate() to verify that reactivating the
peer triggers a fresh TED sync, re-originates all BGP-LS NLRIs on
r2, and re-advertises them to rr.

Signed-off-by: Carmine Scarpitta <[email protected]>
@greptile-apps
Copy link

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR fixes two correctness bugs in the BGP-LS subsystem: (1) the TED was never populated on startup because ls_request_sync() was never called after registration, and (2) when the last BGP-LS peer was deactivated, self-originated routes and TED state were left dangling. Both fixes are targeted and well-scoped.

  • Initial sync (bgp_ls.c): ls_request_sync() is now called immediately after ls_register() succeeds in bgp_ls_register(); a sync failure logs a warning but does not block registration, which is the correct soft-failure behaviour since IGP updates will eventually populate the TED.
  • SYNC event handling (bgp_ls_ted.c): LS_MSG_EVENT_SYNC is added as a fall-through case alongside LS_MSG_EVENT_ADD/LS_MSG_EVENT_UPDATE in the vertex, edge, and subnet switch statements, and the reverse-edge origination condition is extended similarly — ensuring SYNC messages drive full route origination.
  • Peer deactivation (bgpd.c + bgp_ls_ted.c): bgp_ls_withdraw_ted() is called before bgp_ls_unregister() when the last peer is deactivated. The new function uses bgp_clear_route(peer_self, ...) (async work-queue) to withdraw self-originated RIB entries, then synchronously clears TED entries via bgp_ls_ted_clear(). The ordering — withdraw routes, clear TED, then unregister — is correct. bgp_ls_ted_clear() uses frr_each_safe with the existing ls_vertex_del_all / ls_edge_del_all / ls_subnet_del_all library functions; since ls_vertex_del transitively removes associated edges and subnets from their RB-trees before freeing, the subsequent edge/subnet loops safely handle any orphaned entries.
  • Tests (test_bgp_link_state.py): Two new topotests cover the deactivate/reactivate cycle end-to-end.

Confidence Score: 5/5

  • PR is safe to merge; both fixes are well-contained, the logic is sound, and the new topotests cover the affected paths end-to-end.
  • Both bug fixes are small and targeted with no risky side-effects. The TED-clear ordering is correct, bgp_clear_route is already used for similar cleanup elsewhere in bgpd, and ls_request_sync is the established pattern used by pathd and sharpd. The only finding is a minor Python file-handle style issue in tests.
  • No files require special attention.

Important Files Changed

Filename Overview
bgpd/bgp_ls.c Adds ls_request_sync() call after ls_register() in bgp_ls_register() to populate the TED on initial registration; sync failure is treated as a soft warning while registration still proceeds.
bgpd/bgp_ls_ted.c Adds LS_MSG_EVENT_SYNC handling in vertex/edge/subnet switch statements and reverse-edge origination logic; adds bgp_ls_ted_clear() static helper and the exported bgp_ls_withdraw_ted() function; bgp_ls_ted_clear correctly uses frr_each_safe and ordering is safe since vertex deletion transitively removes associated edges/subnets from the RB-trees.
bgpd/bgp_ls_ted.h Adds the bgp_ls_withdraw_ted() extern declaration with accurate documentation.
bgpd/bgpd.c Calls bgp_ls_withdraw_ted() before bgp_ls_unregister() in peer_deactivate() when the last BGP-LS peer is deactivated; ordering (withdraw → unregister) is correct.
tests/topotests/bgp_link_state/test_bgp_link_state.py Adds check_bgp_ls_empty() helper and two new topotests (test_bgp_ls_peer_deactivate, test_bgp_ls_peer_reactivate); minor: two open() calls lack with context managers.

Sequence Diagram

sequenceDiagram
    participant Operator
    participant bgpd
    participant TED
    participant RIB
    participant Zebra/IGP

    Note over Operator,Zebra/IGP: Initial registration (fixed)
    Operator->>bgpd: neighbor X activate (link-state)
    bgpd->>Zebra/IGP: ls_register()
    bgpd->>Zebra/IGP: ls_request_sync()
    Zebra/IGP-->>bgpd: LS_MSG_EVENT_SYNC messages
    bgpd->>TED: add vertex/edge/subnet
    bgpd->>RIB: originate BGP-LS NLRIs

    Note over Operator,Zebra/IGP: Peer deactivation (fixed)
    Operator->>bgpd: no neighbor X activate (last peer)
    bgpd->>RIB: bgp_clear_route(peer_self) — async withdraw
    bgpd->>TED: bgp_ls_ted_clear() — sync clear
    bgpd->>Zebra/IGP: ls_unregister()

    Note over Operator,Zebra/IGP: Peer reactivation
    Operator->>bgpd: neighbor X activate
    bgpd->>Zebra/IGP: ls_register()
    bgpd->>Zebra/IGP: ls_request_sync()
    Zebra/IGP-->>bgpd: LS_MSG_EVENT_SYNC messages
    bgpd->>TED: repopulate
    bgpd->>RIB: re-originate BGP-LS NLRIs
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: tests/topotests/bgp_link_state/test_bgp_link_state.py
Line: 932

Comment:
**File handle not closed**

`open(reffile).read()` opens the file without a `with` statement, leaving the file handle unclosed until garbage-collected. Use a context manager to ensure the file is closed promptly — this pattern appears in both `test_bgp_ls_peer_reactivate` blocks.

```suggestion
    with open(reffile) as f:
        expected = json.loads(f.read())
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "tests: Verify BGP-LS routes withdrawn on..." | Re-trigger Greptile

# Reactivate the BGP-LS peer.
r2.vtysh_cmd(
"configure terminal\n"
"router bgp 65000\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 File handle not closed

open(reffile).read() opens the file without a with statement, leaving the file handle unclosed until garbage-collected. Use a context manager to ensure the file is closed promptly — this pattern appears in both test_bgp_ls_peer_reactivate blocks.

Suggested change
"router bgp 65000\n"
with open(reffile) as f:
expected = json.loads(f.read())
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/topotests/bgp_link_state/test_bgp_link_state.py
Line: 932

Comment:
**File handle not closed**

`open(reffile).read()` opens the file without a `with` statement, leaving the file handle unclosed until garbage-collected. Use a context manager to ensure the file is closed promptly — this pattern appears in both `test_bgp_ls_peer_reactivate` blocks.

```suggestion
    with open(reffile) as f:
        expected = json.loads(f.read())
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants