kicbase: fix deterministic machine-id breaking MAC addresses in multi-node Podman rootless clusters#22823
Conversation
…-node Podman rootless clusters /var/lib/dbus/machine-id was baked into the kicbase container image at build time. When fix_machine_id in the entrypoint ran systemd-machine-id-setup, it found that file and derived /etc/machine-id from it — producing the same machine ID in every container. This breaks anything that depends on the machine ID being unique per node. The most visible symptom in multi-node minikube clusters using Podman rootless mode: a veth interface is placed into each Podman container, and systemd configures it according to MACAddressPolicy=persistent. That policy derives the MAC address from the machine ID (systemd-machine-id-setup reads from the D-Bus machine ID: https://www.freedesktop.org/software/systemd/man/latest/systemd-machine-id-setup.html). With every container sharing the same machine ID, all nodes get identical MAC addresses on eth0, causing network failures. Fix: Dockerfile only (entrypoint fix_machine_id is no longer needed) Per https://systemd.io/CONTAINER_INTERFACE/, add a RUN step that: - truncates /etc/machine-id to an empty file: the spec requires this file to be present but uninitialized so systemd can fill it on boot. - deletes /var/lib/dbus/machine-id: removes the baked-in D-Bus ID that was the source of the deterministic (and shared) machine ID. With these changes, systemd generates a fresh random machine ID on every container boot without any entrypoint assistance, making fix_machine_id in the entrypoint redundant. It has been removed. Tested with Podman rootless using debian:bookworm-slim + systemd: - Before: all runs produce aabbccddeeff00112233445566778899 (the baked-in D-Bus ID) - After: each run produces a unique random ID Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RobinMcCorkell The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
Welcome @RobinMcCorkell! |
|
Hi @RobinMcCorkell. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Can one of the admins verify this patch? |
|
Also I don't think I can sign the CLA on behalf of Copilot 🙈 |
|
/ok-to-build-image |
|
@RobinMcCorkell this code is common with Kind, do you know if this affects kind too ? do we have any users complaining about this in the issues? |
There was a problem hiding this comment.
Pull request overview
Fixes deterministic machine IDs in the kicbase image that caused identical persistent MAC addresses across nodes (notably in multi-node minikube clusters using Podman rootless), leading to networking failures.
Changes:
- Update the kicbase Dockerfile to ensure
/etc/machine-idis present-but-empty and remove/var/lib/dbus/machine-idin the built image. - Remove the now-redundant
fix_machine_idlogic and invocation from the kicbase entrypoint.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
deploy/kicbase/Dockerfile |
Ensures the image does not ship a pre-initialized machine-id and removes the baked-in D-Bus machine-id to avoid deterministic IDs across containers. |
deploy/kicbase/entrypoint |
Removes fix_machine_id since systemd will now initialize a fresh machine-id on boot from the corrected image state. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @RobinMcCorkell, we have updated your PR with the reference to newly built kicbase image. Pull the changes locally if you want to test with them or update your PR further. |
|
I believe this is the cause of #22606, It is also mentioned in #16962 - "I tried the bridge cni in minikube and it wouldn't work regardless of IP forwarding being enabled or disabled" It's possible these Kind bug reports also refer to this issue: kubernetes-sigs/kind#2996 kubernetes-sigs/kind#3412 |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Problem
/var/lib/dbus/machine-idis baked into the kicbase container image at build time. When the entrypoint'sfix_machine_idrunssystemd-machine-id-setup, it finds that file and derives/etc/machine-idfrom it — producing the same machine ID in every container.This breaks anything that depends on the machine ID being unique per node. The most visible symptom is in multi-node minikube clusters using Podman rootless mode: a
vethinterface is placed into each Podman container, and systemd configures it according toMACAddressPolicy=persistent. That policy selects a MAC address based on the machine ID (viasystemd-machine-id-setup, which reads from the D-Bus machine ID: https://www.freedesktop.org/software/systemd/man/latest/systemd-machine-id-setup.html). Because all containers share the same machine ID derived from the baked-in D-Bus ID, all nodes get identical MAC addresses oneth0, causing network failures.Fix
Per https://systemd.io/CONTAINER_INTERFACE/, add a
RUNstep to the Dockerfile (before the final squash) that:/etc/machine-idto an empty file — the spec requires the file to be present but uninitialized so systemd can fill it on boot./var/lib/dbus/machine-id— removes the baked-in D-Bus ID that was the source of the deterministic machine ID.With these changes, systemd generates a fresh random machine ID on every container boot without any entrypoint assistance. This makes the existing
fix_machine_idfunction in the entrypoint redundant — it has been removed.Testing
Tested against the real kicbase image (
gcr.io/k8s-minikube/kicbase-builds:v0.0.50-1772266598-22719) with Podman rootless.Reproducing the bug
Both files contain the same baked-in value in the current image:
Running
systemd-machine-id-setup(asfix_machine_iddoes) produces the same ID every time:Verifying the fix
Build a patched image applying the fix on top of the current kicbase:
Confirm the image state:
Each container now gets a unique machine ID: