Skip to content

[featured]: Skip redundant CONFIG_DB writes in sync_feature_scope#368

Open
Bojun-Feng wants to merge 5 commits intosonic-net:masterfrom
Bojun-Feng:bug/featured-redundant-configdb-writes
Open

[featured]: Skip redundant CONFIG_DB writes in sync_feature_scope#368
Bojun-Feng wants to merge 5 commits intosonic-net:masterfrom
Bojun-Feng:bug/featured-redundant-configdb-writes

Conversation

@Bojun-Feng
Copy link
Copy Markdown

@Bojun-Feng Bojun-Feng commented Mar 31, 2026

Why I did it

Fix #369
Partially address sonic-net/sonic-buildimage#26350

sync_feature_scope() unconditionally writes to CONFIG_DB for every feature at startup and on feature state changes at runtime without checking whether the values have actually changed.

We should avoid redundant writes that does not update CONFIG_DB.

How I did it

  • Added a read-before-write check in sync_feature_scope(): read the current FEATURE|<name> entry from CONFIG_DB via get_entry(), and only call mod_entry() when has_per_asic_scope or has_global_scope actually differs from the intended value.
  • Guarded the get_entry() return with or {} to safely handle None returns, consistent with the defensive pattern already used in resync_feature_state() and sync_feature_delay_state() in the same file.
  • The conditional write applies to both the host CONFIG_DB and per-namespace CONFIG_DB on multi-ASIC platforms.
  • Added unit tests covering three cases: values unchanged (no write), values changed (write), and missing/empty entry (write).

How to verify it

I used the following script to test the update on a virtual switch in GNS3. We can iterate quickly since the modified Python file is at /usr/local/bin/featured and can be updated on the fly.

The virtual switch image is downloaded from official pipeline build on 2026/03/30 on commit ab081ee.

Bash Script & Setup

I had three files on the switch:

admin@sonic:~$ ls
new.py  old.py  test_featured.sh

old.py is copied from /usr/local/bin/featured on the switch. It is almost identical to the latest previous commit that modified the script, except due to compilation the first line changed from #!/usr/bin/env python3 to #!/usr/bin/python3.

new.py is almost identical to the script in PR, except the first line also changed from #!/usr/bin/env python3 to #!/usr/bin/python3.

The bash script test_featured.sh use either old.py or new.py to overwrite /usr/local/bin/featured, use systemctl to reload featured, then run config reload and see how many feature writes there are & how long did it take until the last write.

#!/bin/bash
# test_featured.sh - Test featured daemon CONFIG_DB write behavior
# Usage: ./test_featured.sh old|new
#
# Measures how many redundant FEATURE scope writes (has_per_asic_scope,
# has_global_scope) the featured daemon makes after config reload, and
# how long the CONFIG_DB mutation window stays open.

set -e

if [ -z "$1" ] || [[ "$1" != "old" && "$1" != "new" ]]; then
echo "Usage: $0 old|new"
exit 1
fi

VERSION="$1"
LOG="/tmp/monitor_${VERSION}.log"

# Clean up any leftover redis-cli MONITOR processes from previous runs
sudo pkill -f "redis-cli.*MONITOR" 2>/dev/null || true
sudo pkill -f "docker exec.*redis-cli.*MONITOR" 2>/dev/null || true
sleep 1
rm -f "$LOG"

# Deploy the selected version of featured
echo "==> Installing ${VERSION} version of featured..."
sudo cp ~/${VERSION}.py /usr/local/bin/featured
sudo systemctl restart featured
echo "==> Waiting for featured to settle..."
sleep 5

# Start capturing all CONFIG_DB operations
echo "==> Starting CONFIG_DB monitor..."
redis-cli -n 4 MONITOR < /dev/null > "$LOG" 2>&1 &
MONITOR_PID=$!
sleep 2

# Run config reload and measure wall-clock time
echo "==> Running config reload..."
RELOAD_START=$(date +%s.%N)
sudo config reload -y -f
RELOAD_END=$(date +%s.%N)
RELOAD_DURATION=$(python3 -c "print(f'{$RELOAD_END - $RELOAD_START:.2f}')")

# Wait for any background daemon writes to finish
echo "==> Waiting 90 seconds for background writes to complete..."
sleep 90

# Stop monitoring
kill $MONITOR_PID 2>/dev/null || true
sudo pkill -f "redis-cli.*MONITOR" 2>/dev/null || true
sleep 1

# --- Analysis ---

# Count: how many times featured wrote scope fields to FEATURE entries
SCOPE_WRITE_COUNT=$(grep --text -c 'HSET.*FEATURE.*\(has_per_asic_scope\|has_global_scope\)' "$LOG" || true)

# Timestamp of the first CONFIG_DB write (sonic-cfggen bulk load start)
FIRST_DB_WRITE=$(grep --text '"HSET"' "$LOG" | head -1 | awk '{print $1}')

# Timestamp of the last FEATURE scope field write by featured daemon
LAST_SCOPE_WRITE=$(grep --text 'HSET.*FEATURE.*\(has_per_asic_scope\|has_global_scope\)' "$LOG" | tail -1 | awk '{print $1}')

# Timestamp of the very last CONFIG_DB write of any kind
LAST_ANY_WRITE=$(grep --text '"HSET"' "$LOG" | tail -1 | awk '{print $1}')

# Calculate how long after config reload returned the scope writes continued
if [ -n "$LAST_SCOPE_WRITE" ] && [ -n "$FIRST_DB_WRITE" ]; then
SCOPE_WINDOW=$(python3 -c "print(f'{$LAST_SCOPE_WRITE - $RELOAD_END:.2f}')")
SCOPE_DURATION=$(python3 -c "print(f'{$LAST_SCOPE_WRITE - $FIRST_DB_WRITE:.2f}')")
else
SCOPE_WINDOW="N/A (no scope writes detected)"
SCOPE_DURATION="N/A"
fi

echo ""
echo "============================================================"
echo " Featured Daemon CONFIG_DB Write Test Results"
echo " Version: ${VERSION}"
echo "============================================================"
echo ""
echo " Config reload wall-clock duration: ${RELOAD_DURATION}s"
echo ""
echo " FEATURE scope writes (has_per_asic_scope / has_global_scope):"
echo " Total write count: ${SCOPE_WRITE_COUNT}"
echo " First CONFIG_DB write: ${FIRST_DB_WRITE:-N/A}"
echo " Last scope write by featured: ${LAST_SCOPE_WRITE:-none}"
echo " Last CONFIG_DB write (any): ${LAST_ANY_WRITE:-N/A}"
echo ""
echo " Seconds from first DB write to last scope write:"
echo " ${SCOPE_DURATION}"
echo ""
echo " Seconds from config reload RETURNING to last scope write:"
echo " ${SCOPE_WINDOW}"
echo ""
echo " Full log: ${LOG}"
echo "============================================================"
Full Logs Running the Script on VS
admin@sonic:~$ chmod +x test_featured.sh
# Run the old and new version alternatively, test each multiple times
# This output is caputred after running `sudo config save-y` and running the script multiple times previously
admin@sonic:~$ ./test_featured.sh old && \
sleep 20 && \
./test_featured.sh new && \
sleep 20 && \
./test_featured.sh old && \
sleep 20 && \
./test_featured.sh new

==> Installing old version of featured...
==> Waiting for featured to settle...
==> Starting CONFIG_DB monitor...
==> Running config reload...
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Running command: sudo systemctl stop aaastatsd.timer
Running command: sudo systemctl stop featured.timer
Running command: sudo systemctl stop hostcfgd.timer
Running command: sudo systemctl stop tacacs-config.timer
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
==> Waiting 90 seconds for background writes to complete...

============================================================
 Featured Daemon CONFIG_DB Write Test Results
 Version: old
============================================================

 Config reload wall-clock duration: 30.63s

 FEATURE scope writes (has_per_asic_scope / has_global_scope):
 Total write count: 84
 First CONFIG_DB write: 1774906714.590470
 Last scope write by featured: 1774906746.737287
 Last CONFIG_DB write (any): 1774906836.590132

 Seconds from first DB write to last scope write:
 32.15

 Seconds from config reload RETURNING to last scope write:
 -0.30

 Full log: /tmp/monitor_old.log
============================================================
==> Installing new version of featured...
==> Waiting for featured to settle...
==> Starting CONFIG_DB monitor...
==> Running config reload...
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Running command: sudo systemctl stop aaastatsd.timer
Running command: sudo systemctl stop featured.timer
Running command: sudo systemctl stop hostcfgd.timer
Running command: sudo systemctl stop tacacs-config.timer
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
==> Waiting 90 seconds for background writes to complete...

============================================================
 Featured Daemon CONFIG_DB Write Test Results
 Version: new
============================================================

 Config reload wall-clock duration: 30.62s

 FEATURE scope writes (has_per_asic_scope / has_global_scope):
 Total write count: 38
 First CONFIG_DB write: 1774906865.598661
 Last scope write by featured: 1774906886.589453
 Last CONFIG_DB write (any): 1774906987.320508

 Seconds from first DB write to last scope write:
 20.99

 Seconds from config reload RETURNING to last scope write:
 -11.33

 Full log: /tmp/monitor_new.log
============================================================
==> Installing old version of featured...
==> Waiting for featured to settle...
==> Starting CONFIG_DB monitor...
==> Running config reload...
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Running command: sudo systemctl stop aaastatsd.timer
Running command: sudo systemctl stop featured.timer
Running command: sudo systemctl stop hostcfgd.timer
Running command: sudo systemctl stop tacacs-config.timer
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
==> Waiting 90 seconds for background writes to complete...

============================================================
 Featured Daemon CONFIG_DB Write Test Results
 Version: old
============================================================

 Config reload wall-clock duration: 30.14s

 FEATURE scope writes (has_per_asic_scope / has_global_scope):
 Total write count: 84
 First CONFIG_DB write: 1774907016.330388
 Last scope write by featured: 1774907046.978066
 Last CONFIG_DB write (any): 1774907137.935647

 Seconds from first DB write to last scope write:
 30.65

 Seconds from config reload RETURNING to last scope write:
 -1.36

 Full log: /tmp/monitor_old.log
============================================================
==> Installing new version of featured...
==> Waiting for featured to settle...
==> Starting CONFIG_DB monitor...
==> Running config reload...
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Running command: sudo systemctl stop aaastatsd.timer
Running command: sudo systemctl stop featured.timer
Running command: sudo systemctl stop hostcfgd.timer
Running command: sudo systemctl stop tacacs-config.timer
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
==> Waiting 90 seconds for background writes to complete...

============================================================
 Featured Daemon CONFIG_DB Write Test Results
 Version: new
============================================================

 Config reload wall-clock duration: 30.02s

 FEATURE scope writes (has_per_asic_scope / has_global_scope):
 Total write count: 38
 First CONFIG_DB write: 1774907166.717689
 Last scope write by featured: 1774907187.528028
 Last CONFIG_DB write (any): 1774907288.738037

 Seconds from first DB write to last scope write:
 20.81

 Seconds from config reload RETURNING to last scope write:
 -11.11

 Full log: /tmp/monitor_new.log
============================================================

TLDR: The number of writes dropped from 84 to 38, shaving off 10 seconds (approximately 30%) from first DB write to the last scope write. Before the update, config reload returns ~2 seconds after the last scope write; after the update, config reload returns ~11 seconds after the last scope write.

Description for the changelog

Skip redundant CONFIG_DB writes in featured daemon's sync_feature_scope to speed up config reload.

* Add read-before-write check in sync_feature_scope to only write
  scope fields when values actually differ from CONFIG_DB
* Add unit tests covering unchanged, changed, and missing entry cases

Signed-off-by: Bojun Feng <bojundf@gmail.com>
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

* Move tests from TestFeatureDaemon to TestFeatureHandler to avoid
  class-level mock.patch argument injection issues
* Improve assertions and add partial-change test case

Signed-off-by: Bojun Feng <bojundf@gmail.com>
@Bojun-Feng Bojun-Feng force-pushed the bug/featured-redundant-configdb-writes branch from e947a49 to 6d4557f Compare March 31, 2026 02:47
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

* Merge skip/write/missing-entry tests into one multi-scenario method
  following the test_feature_resync pattern
* Remove redundant partial-change and empty-dict sub-cases

Signed-off-by: Bojun Feng <bojundf@gmail.com>
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Bojun-Feng Bojun-Feng marked this pull request as ready for review March 31, 2026 03:25
@tirupatihemanth tirupatihemanth requested a review from Copilot March 31, 2026 20:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the featured host daemon by avoiding redundant FEATURE|<name> scope-field writes to CONFIG_DB, reducing unnecessary Redis churn during startup/config reload and on runtime feature updates.

Changes:

  • Added a read-before-write check in FeatureHandler.sync_feature_scope() to only update scope fields when values differ.
  • Applied the conditional write to both host CONFIG_DB and per-namespace CONFIG_DB (multi-ASIC).
  • Added unit tests for unchanged/changed/missing-entry behavior and namespace propagation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
scripts/featured Adds conditional scope syncing to avoid redundant CONFIG_DB writes.
tests/featured/featured_test.py Adds unit tests validating conditional scope writes and namespace propagation.

Comment thread scripts/featured Outdated
Comment thread scripts/featured Outdated
Comment thread scripts/featured Outdated
Comment thread tests/featured/featured_test.py
Comment thread tests/featured/featured_test.py
@tirupatihemanth
Copy link
Copy Markdown

Hi @Bojun-Feng,
for the case if the host FEATURE entry already matches but a per-namespace FEATURE entry doesn't match with this implementation we won't update the per-namespace entries. Please check, thanks

Comment thread scripts/featured Outdated
Signed-off-by: Bojun-Feng <bojundf@gmail.com>
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Bojun-Feng Bojun-Feng requested a review from dgsudharsan April 1, 2026 06:05
Comment thread scripts/featured Outdated
Copy link
Copy Markdown

@tirupatihemanth tirupatihemanth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: can you move the duplicated getting and update fields if things changed into a function as it is being implemented twice now?

Signed-off-by: Bojun-Feng <bojundf@gmail.com>
@mssonicbld
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Bojun-Feng
Copy link
Copy Markdown
Author

Hi @vmittal-msft @tirupatihemanth — just checking in on this PR. Seems like all checks are passing and it is approved for 202511 Branch. Please let me know if there's anything else needed from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[featured]: sync_feature_scope unconditionally writes scope fields to CONFIG_DB on every startup

6 participants