Skip to content

Add service account based volume access restriction#6076

Merged
mergify[bot] merged 7 commits intoceph:develfrom
Rakshith-R:restrict-volume-by-sa
Mar 13, 2026
Merged

Add service account based volume access restriction#6076
mergify[bot] merged 7 commits intoceph:develfrom
Rakshith-R:restrict-volume-by-sa

Conversation

@Rakshith-R
Copy link
Contributor

@Rakshith-R Rakshith-R commented Feb 17, 2026

Describe what this PR does

Service Account Based Volume Access Restriction

Ceph-CSI supports optionally restricting RBD/CephFS volume access to specific Kubernetes
service accounts. When configured, only pods running with the allowed service account
can mount the volume. This feature uses RBD image/cephFS subvolume metadata to store the
restriction and the CSI podInfoOnMount mechanism to identify the pod's service
account during mount. Refer https://kubernetes-csi.github.io/docs/pod-info.html#pod-info-on-mount-with-csi-driver-object.

How it works

  1. A user sets the .rbd.csi.ceph.com/serviceaccount metadata on an
    RBD image to specify the allowed service account name.
  2. During ControllerPublishVolume, Ceph-CSI reads this metadata and passes
    it to the node via publish context.
  3. During NodePublishVolume, Ceph-CSI compares the value against the
    pod's service account (provided via volume context by Kubelet).
  4. If the service account was set in metadata and does not match the pod's
    service account, the mount is rejected with a PermissionDenied error.

Prerequisites

The podInfoOnMount field must be set to true in the CSIDriver spec so that
Kubelet passes pod information (including service account name) in the volume
context during NodePublishVolume. Without this, the restriction cannot be
enforced and all mounts are allowed.

Setting the restriction on an RBD image/nvmeof

Use the rbd image-meta set command to set the allowed service account:

rbd image-meta set <pool>/<image> .rbd.csi.ceph.com/serviceaccount <service-account-name>

For example, to restrict a volume to the my-app-sa service account:

rbd image-meta set mypool/csi-vol-abc123 .rbd.csi.ceph.com/serviceaccount my-app-sa

Removing the restriction

To remove the restriction and allow any service account to mount the volume:

rbd image-meta remove <pool>/<image> .rbd.csi.ceph.com/serviceaccount

The Deployment should be scaled down completely and then scaled up for this removing
the restriction after removing metadata from the image.

Setting the restriction on a CephFS subvolume/nfs export

Use the ceph fs subvolume metadata set command to set the allowed service account:

ceph fs subvolume metadata set <filesystem> <subvolume> --group_name=<group> \
  .cephfs.csi.ceph.com/serviceaccount <service-account-name>

For example, to restrict a volume to the my-app-sa service account:

ceph fs subvolume metadata set myfs csi-vol-abc123 --group_name=csi \
  .cephfs.csi.ceph.com/serviceaccount my-app-sa

Removing the restriction

To remove the restriction and allow any service account to mount the volume:

ceph fs subvolume metadata rm <filesystem> <subvolume> --group_name=<group> \
  .cephfs.csi.ceph.com/serviceaccount

The Deployment should be scaled down completely and then scaled up for
removing the restriction after removing metadata from the subvolume.


Use Case: Ceph VolSync Plugin Replication Destination PVC Protection

A primary motivator for this feature is the custom
Ceph VolSync Plugin that
performs incremental data replication across clusters. In a disaster recovery
or migration workflow:

  1. A ReplicationDestination controller creates a PVC on the destination
    cluster to receive replicated data.
  2. A replication worker pod, running under a dedicated service account (e.g.
    volsync-worker-sa), incrementally syncs data from the source cluster into
    this destination PVC.
  3. The destination PVC must remain writable only by the replication worker
    until the replication is complete and a failover is triggered.

Without service account based restriction, any pod in the namespace with a
reference to the destination PVC could write to it, potentially corrupting the
replicated data or breaking the incremental sync state. By binding the
destination volume to the replication worker's service account, the volume is
protected from unintended writes throughout the replication lifecycle. On
failover, the restriction is removed so the application workload can mount
the volume.


Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@Rakshith-R Rakshith-R force-pushed the restrict-volume-by-sa branch from 47467b0 to 713d6e4 Compare February 17, 2026 12:40
@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34

@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e-helm/k8s-1.34

@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34/cephfs

@Rakshith-R Rakshith-R force-pushed the restrict-volume-by-sa branch from 713d6e4 to 7ea4af4 Compare February 17, 2026 16:39
@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34

@Rakshith-R Rakshith-R force-pushed the restrict-volume-by-sa branch from 7ea4af4 to c128ac9 Compare February 18, 2026 10:04
@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34

@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34/rbd

@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34/cephfs

@Rakshith-R
Copy link
Contributor Author

@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34

@Rakshith-R
Copy link
Contributor Author

@Rakshith-R Rakshith-R force-pushed the restrict-volume-by-sa branch from c85ac0e to 8a460ae Compare February 19, 2026 13:06
@Rakshith-R
Copy link
Contributor Author

/test ci/centos/mini-e2e/k8s-1.34

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 20, 2026

@Rakshith-R, what is the usecase this one is trying to cover? its good to have a design first to see how feasible this solution is from a storage admin point of view?

@Rakshith-R Rakshith-R force-pushed the restrict-volume-by-sa branch 2 times, most recently from c03ed0c to 8f75d14 Compare February 20, 2026 06:54
@Rakshith-R
Copy link
Contributor Author

Rakshith-R commented Feb 20, 2026

@Rakshith-R, what is the usecase this one is trying to cover? its good to have a design first to see how feasible this solution is from a storage admin point of view?

Added design proposal document.


Use Case: Ceph VolSync Plugin Replication Destination PVC Protection

A primary motivator for this feature is the custom
Ceph VolSync Plugin that
performs incremental data replication across clusters. In a disaster recovery
or migration workflow:

  1. A ReplicationDestination controller creates a PVC on the destination
    cluster to receive replicated data.
  2. A replication worker pod, running under a dedicated service account (e.g.
    volsync-worker-sa), incrementally syncs data from the source cluster into
    this destination PVC.
  3. The destination PVC must remain writable only by the replication worker
    until the replication is complete and a failover is triggered.

Without service account based restriction, any pod in the namespace with a
reference to the destination PVC could write to it, potentially corrupting the
replicated data or breaking the incremental sync state. By binding the
destination volume to the replication worker's service account, the volume is
protected from unintended writes throughout the replication lifecycle. On
failover, the restriction is removed so the application workload can mount
the volume.


@Rakshith-R Rakshith-R marked this pull request as ready for review February 20, 2026 06:54
@Rakshith-R Rakshith-R requested review from a team February 20, 2026 06:54
@Rakshith-R Rakshith-R force-pushed the restrict-volume-by-sa branch from 8f75d14 to c5cfd44 Compare February 20, 2026 07:14
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the design as a 1st step. Looks okayish to me, but I do have a few questions.

### Metadata Keys

Each driver type uses a driver-specific metadata key to store the allowed
service account name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the format of the value? Does it include the Kubernetes Namespace of the ServiceAccount? If not, would the Namespace add extra security?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the format of the value? Does it include the Kubernetes Namespace of the ServiceAccount? If not, would the Namespace add extra security?

It is just the serviceAccount name.
The pvc and pod are already namespaced and hence NodePublish request is also namespaced.
If a NodePublish request for same volume and with same serviceAccount has to come from another NS,
A user will need to have clusterScoped permission to create a duplicate PV and PVC as well as a a serviceAccount with same name in another namespace, which is very unlikely and another security threat outside the scope here.

But I've mentioned additional parameters (e.g. namespace) validation in future enhancement section.

metadata from the Ceph backend. If present, it is included in the publish
context passed to the node.

1. **NodePublishVolume**: The node plugin compares the publish context value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a very safe way. Both the value that is expected and the value that is checked are in the same CSI RPC call? It isn't really how security checks for access restrictions should work.

Why not read the metadata during NodePublishVolume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a very safe way. Both the value that is expected and the value that is checked are in the same CSI RPC call? It isn't really how security checks for access restrictions should work.

Why not read the metadata during NodePublishVolume?

Yes, the value that is expected and the value that is checked are in the same CSI RPC call
but the source of these values are from two different places.

  • expected <- publish_context <- controllerPublishVolume call <- image/subvolume metadata
  • actual checked <- volume_context <- pod's SA name

The spec and k8s storage framework ensures the values are populated in such manner.

Reading it from metadata on every NodePublishVolume(per pod, re-requested with backoff on errors) is resource intensive and can be totally avoided by making use of publish_context in controllerPublishVolume call (once per pvc per node).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the flow of the Kubernetes CSI procedures makes this work. But it feels wrong to get the valid value from the publish context which is part of the same RPC as the details from podInfoOnMount.

NodePublishVolume is not in a critical path, doing a security check by getting the metadata is not very invasive. It would be more proper to validate it during the call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the flow of the Kubernetes CSI procedures makes this work. But it feels wrong to get the valid value from the publish context which is part of the same RPC as the details from podInfoOnMount.

the source of these values are from two different places.
expected <- publish_context <- controllerPublishVolume call <- image/subvolume metadata
actual checked <- volume_context <- pod's SA name

Kubelet and CSI Spec are trusted entities in this framework.
The entire framework relies on this fact.
Therefore, the content of the RPC is very reliable.

NodePublishVolume is not in a critical path, doing a security check by getting the metadata is not very invasive. It would be more proper to validate it during the call.

It is a critical path, every pod using our volume will be able to come up only after a NodePublish.
CephCSI's nodepublish ops are very light weight and fast, almost zero lookups.

Introducing a metadata lookup here would inherently slow the entire setup.

Combining with the above established fact of Kubelet and CSI Spec being reliable and trustworthy entities, we don't need to look at introducing resource intensive ops in NodePublish call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's all fine, but it needs documenting in the design too.

@Rakshith-R Rakshith-R force-pushed the restrict-volume-by-sa branch 2 times, most recently from 3cfbc63 to b26cb70 Compare February 20, 2026 10:26
@Rakshith-R Rakshith-R requested a review from nixpanic February 20, 2026 10:31
@iPraveenParihar
Copy link
Contributor

/retest ci/centos/mini-e2e/k8s-1.34

@Rakshith-R Rakshith-R requested a review from a team March 13, 2026 05:27
@nixpanic
Copy link
Member

@Mergifyio rebase

Allow restricting RBD volume access to a specific Kubernetes
ServiceAccount using ".rbd.csi.ceph.com/serviceaccount" image
metadata.

During ControllerPublishVolume, the controller reads the
".rbd.csi.ceph.com/serviceaccount" metadata from the backing
RBD image and passes it to the node via publish context.

During NodePublishVolume, the node validates the Pod's
ServiceAccount (provided by Kubelet when `podInfoOnMount` is
enabled) against the allowed value, returning PermissionDenied
on mismatch.

Signed-off-by: Rakshith R <rar@redhat.com>
Allow restricting nvmeof volume access to a specific Kubernetes
ServiceAccount using ".rbd.csi.ceph.com/serviceaccount" image
metadata.

During ControllerPublishVolume, the controller reads the
".rbd.csi.ceph.com/serviceaccount" metadata from the backing
RBD image and passes it to the node via publish context.

During NodePublishVolume, the node validates the Pod's
ServiceAccount (provided by Kubelet when `podInfoOnMount` is
enabled) against the allowed value, returning PermissionDenied
on mismatch.

Signed-off-by: Rakshith R <rar@redhat.com>
Allow restricting CephFS volume access to a specific Kubernetes
ServiceAccount using ".cephfs.csi.ceph.com/serviceaccount"
subvolume metadata.

During ControllerPublishVolume, the controller reads the
".cephfs.csi.ceph.com/serviceaccount" metadata from the backing
CephFS subvolume and passes it to the node via publish context.

During NodePublishVolume, the node validates the Pod's
ServiceAccount(provided by Kubelet when `podInfoOnMount` is enabled)
against the allowed value, returning PermissionDenied on mismatch.

Signed-off-by: Rakshith R <rar@redhat.com>
Allow restricting nfs volume access to a specific Kubernetes
ServiceAccount using ".cephfs.csi.ceph.com/serviceaccount"
subvolume metadata.

During ControllerPublishVolume, the controller delegates
to the CephFS backend to read the ".cephfs.csi.ceph.com/serviceaccount"
metadata from the backing CephFS subvolume and passes it to the node
via publish context.

During NodePublishVolume, the node validates the Pod's ServiceAccount
(provided by Kubelet when `podInfoOnMount` is enabled) against
the allowed value, returning PermissionDenied on mismatch.

Signed-off-by: Rakshith R <rar@redhat.com>
…triction

Signed-off-by: Rakshith R <rar@redhat.com>
Signed-off-by: Rakshith R <rar@redhat.com>
@ceph-csi-bot ceph-csi-bot force-pushed the restrict-volume-by-sa branch from cb100ba to ee282df Compare March 13, 2026 07:59
@mergify
Copy link
Contributor

mergify bot commented Mar 13, 2026

rebase

✅ Branch has been successfully rebased

@nixpanic nixpanic added the ok-to-test Label to trigger E2E tests label Mar 13, 2026
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.33

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.33

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.33

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.34

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.35

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.34

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.35

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.34

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.35

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Mar 13, 2026
@mergify mergify bot added the queued label Mar 13, 2026
@mergify mergify bot merged commit 9011880 into ceph:devel Mar 13, 2026
38 checks passed
@mergify
Copy link
Contributor

mergify bot commented Mar 13, 2026

Merge Queue Status

  • Entered queue2026-03-13 10:51 UTC · Rule: default
  • Checks passed · in-place
  • Merged2026-03-13 10:51 UTC · at ee282df0efe269c89b05d7fb7cdd80baeeaf4fd3

This pull request spent 12 seconds in the queue, with no time running CI.

Required conditions to merge

@mergify mergify bot removed the queued label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependency/k8s depends on Kubernetes features enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants