Is it platform specific
generic
Importance or Severity
Critical
Description of the bug
During warm-reboot:
- teamd_increase_retry-count.py sets LAGs retry-count to 5 instead of 3 (150s instead of 90s), and notice SONiC peer/s on partner retry-count 5.
- lag_keepalive.py sends LACPDUs every 1 second to all LAG peers, which resets the peer's partner retry-count back to 3.
- 0016-block-retry-count-changes.patch is blocking the peer from resetting the retry count back to 3 for 60 seconds after receiving teamd_increase_retry-count.py packet.
The failure is in the scenario:
- warm-reboot shutdown is still in progress, and lag_keepalive sends LACPDUs more than 60 sec after the last retry-count notification sent to a peer (so the peer resets the partner-retry-count back to 3).
- Control plane downtime is higher than 90 sec (as happens on some systems and described in this issue)
- LAGs are flapped
Steps to Reproduce
Run warm reboot with LAGs connected to Sonic peers.
Actual Behavior and Expected Behavior
Actual behavior - warm reboot test failure:
2026-04-12 12:34:25 : FAILED:<ip>:LAG flapped 1 times on 10.213.80.125 after warm boot
Expected behavior -
The extended retry count shouldn't be reverted, and LAGs timeout should be 150 seconds
Relevant log output
First LACPDU:
2026 Apr 12 12:27:14.390114 sonic INFO lag_keepalive: ready to send LACPDU packets via dict_keys(['Ethernet88', 'Ethernet100', 'Ethernet24', 'Ethernet28', 'Ethernet96', 'Ethernet92', 'Ethernet36', 'Ethernet32'])
2026 Apr 12 12:27:14.404324 sonic INFO lag_keepalive: sent LACPDU packets via dict_keys(['Ethernet88', 'Ethernet100', 'Ethernet24', 'Ethernet28', 'Ethernet96', 'Ethernet92', 'Ethernet36', 'Ethernet32'])
First LAG retry-count notification:
Apr 12 12:27:52.853087 ARISTA05T1 DEBUG teamd#teamd_PortChannel1[24]: Ethernet1: LACPDU version changed from 1 to 241
Apr 12 12:27:53.072572 ARISTA05T1 DEBUG teamd#teamd_PortChannel1[24]: Ethernet1: ignoring resetting retry count to 3
Apr 12 12:27:53.072748 ARISTA05T1 DEBUG teamd#teamd_PortChannel1[24]: Ethernet1: LACPDU version changed from 241 to 1
Apr 12 12:27:53.073757 ARISTA05T1 DEBUG teamd#teamd_PortChannel1[24]: Ethernet1: ignoring resetting retry count to 3
Last LAG retry-count notification:
Apr 12 12:28:07.720858 ARISTA06T1 DEBUG teamd#teamd_PortChannel1[23]: Ethernet1: LACPDU version changed from 1 to 241
Apr 12 12:28:08.145569 ARISTA06T1 DEBUG teamd#teamd_PortChannel1[23]: Ethernet1: ignoring resetting retry count to 3
Apr 12 12:28:08.145569 ARISTA06T1 DEBUG teamd#teamd_PortChannel1[23]: Ethernet1: LACPDU version changed from 241 to 1
Last LACPDU:
2026 Apr 12 12:28:55.077712 sonic INFO lag_keepalive: sent LACPDU packets via dict_keys(['Ethernet88', 'Ethernet100', 'Ethernet24', 'Ethernet28', 'Ethernet96', 'Ethernet92', 'Ethernet36', 'Ethernet32'])
Output of show version, show techsupport
Attach files (if any)
No response
Is it platform specific
generic
Importance or Severity
Critical
Description of the bug
During warm-reboot:
The failure is in the scenario:
Steps to Reproduce
Run warm reboot with LAGs connected to Sonic peers.
Actual Behavior and Expected Behavior
Actual behavior - warm reboot test failure:
2026-04-12 12:34:25 : FAILED:<ip>:LAG flapped 1 times on 10.213.80.125 after warm bootExpected behavior -
The extended retry count shouldn't be reverted, and LAGs timeout should be 150 seconds
Relevant log output
Output of
show version,show techsupportAttach files (if any)
No response