Skip to content

[auth] Hot-reload registry auth via nydusd config API#718

Open
Fricounet wants to merge 1 commit intocontainerd:mainfrom
DataDog:fricounet/hot-reload-nydusd-auth
Open

[auth] Hot-reload registry auth via nydusd config API#718
Fricounet wants to merge 1 commit intocontainerd:mainfrom
DataDog:fricounet/hot-reload-nydusd-auth

Conversation

@Fricounet
Copy link
Contributor

Overview

Please briefly describe the changes your pull request makes.

Hook together the token renewal subsystem with the nydusd config API so that nydusd credentials can be actively renewed.

Note that during my tests I realized that there's a little quirk in the hot-reload api of nydusd. Nydusd will keep a local cache of its last working token which means that even if we update the config's token, it won't get picked up until there's a 401 error on the cached token. It is not ideal because we could avoid this 401 call by proactively updating the cached token but it's not a huge deal so I think we can proceed with this change regardless.
I've opened #1893 to decide what to do regarding this on the nydusd side.

Related Issues

Please link to the relevant issue. For example: Fix #123 or Related #456.

Fixes #690

Change Details

Please describe your changes in detail:

Move renewal orchestration from pkg/auth to snapshot to avoid circular dependencies as the renewal logic now needs access to the dameons client while the daemon package already indirectly imports the auth package.

pkg/auth/renewal.go becomes a pure credential cache with exported InitCredentialStore, RenewCredential, and EvictStaleCredentials. The reconciliation loop formerly in auth/renewal.go and now in snapshot/renewal.go walks all managers->daemons->rags and renews credentials, and hot-reloads daemons directly. It also updates the daemon config file so that they can use fresh creds when they restart.

I'll work on #709 afterwards to avoid writing the auth config to disk anymore

Test Results

If you have any relevant screenshots or videos that can help illustrate your changes, please add them here.

Tested locally with a private ECR. When the renewal thread kicks in, it finds new creds and correctly calls nydusd to update them:

DEBU[2026-03-13T11:17:46.586355382+01:00] renewing credential entry                     ref="111111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-rc.7-jmx-zstd-nydus"
DEBU[2026-03-13T11:17:46.586524304+01:00] Trying to get credentials from docker         ref="111111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-rc.7-jmx-zstd-nydus"
DEBU[2026-03-13T11:17:46.586572659+01:00] Trying to get credentials from kubelet        ref="111111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-rc.7-jmx-zstd-nydus"
INFO[2026-03-13T11:17:53.138997235+01:00] Got credentials from provider kubelet         ref="111111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-rc.7-jmx-zstd-nydus"
DEBU[2026-03-13T11:17:53.139033536+01:00] adding credential entry to store              ref="111111111111.dkr.ecr.us-east-1.amazonaws.com/datadog-agent:7.75.0-rc.7-jmx-zstd-nydus"
[2026-03-13 11:17:53.140103 +01:00] DEBUG [/src/http_handler.rs:183] <--- Put Uri { string: "/api/v1/config?id=%2F" }
[2026-03-13 11:17:53.140501 +01:00] DEBUG [/src/http_handler.rs:188] ---> Put Status Code: NoContent, Elapse: Ok(423.008µs), Body Size: 0

Change Type

Please select the type of change your pull request relates to:

  • Bug Fix
  • Feature Addition
  • Documentation Update
  • Code Refactoring
  • Performance Improvement
  • Other (please describe)

Self-Checklist

Before submitting a pull request, please ensure you have completed the following:

  • I have run a code style check and addressed any warnings/errors.
  • I have added appropriate comments to my code (if applicable).
  • I have updated the documentation (if applicable).
  • I have written appropriate unit tests.

Hook together the token renewal subsystem with the nydusd config API.
Move renewal orchestration from pkg/auth to snapshot to avoid circular
dependencies as the renewal logic now needs access to the dameons client
while the daemon package already indirectly imports the auth package.

pkg/auth/renewal.go becomes a pure credential cache with exported
InitCredentialStore, RenewCredential, and EvictStaleCredentials.
The reconciliation loop formerly in auth/renewal.go and now in
snapshot/renewal.go walks all managers->daemons->rags and renews
credentials, and hot-reloads daemons directly. It also updates the daemon
config file so that they can use fresh creds when they restart.
@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 66.23377% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 23.25%. Comparing base (fc330cc) to head (6a6cbca).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
snapshot/renewal.go 53.12% 14 Missing and 1 partial ⚠️
pkg/daemon/daemon.go 71.42% 3 Missing and 3 partials ⚠️
pkg/daemon/client.go 75.00% 1 Missing and 1 partial ⚠️
snapshot/snapshot.go 0.00% 2 Missing ⚠️
pkg/auth/renewal.go 92.85% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #718      +/-   ##
==========================================
+ Coverage   22.02%   23.25%   +1.22%     
==========================================
  Files         130      132       +2     
  Lines       11931    12109     +178     
==========================================
+ Hits         2628     2816     +188     
+ Misses       8960     8946      -14     
- Partials      343      347       +4     
Files with missing lines Coverage Δ
cmd/containerd-nydus-grpc/snapshotter.go 0.00% <ø> (ø)
pkg/auth/renewal.go 96.36% <92.85%> (ø)
pkg/daemon/client.go 35.03% <75.00%> (+8.85%) ⬆️
snapshot/snapshot.go 5.48% <0.00%> (-0.02%) ⬇️
pkg/daemon/daemon.go 8.11% <71.42%> (+8.11%) ⬆️
snapshot/renewal.go 53.12% <53.12%> (ø)

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Guidance on how to handle (expiring) tokens

1 participant