Add service account based volume access restriction#6076
Add service account based volume access restriction#6076mergify[bot] merged 7 commits intoceph:develfrom
Conversation
47467b0 to
713d6e4
Compare
|
/test ci/centos/mini-e2e/k8s-1.34 |
|
/test ci/centos/mini-e2e-helm/k8s-1.34 |
|
/test ci/centos/mini-e2e/k8s-1.34/cephfs |
713d6e4 to
7ea4af4
Compare
|
/test ci/centos/mini-e2e/k8s-1.34 |
7ea4af4 to
c128ac9
Compare
|
/test ci/centos/mini-e2e/k8s-1.34 |
|
/test ci/centos/mini-e2e/k8s-1.34/rbd |
|
/test ci/centos/mini-e2e/k8s-1.34/cephfs |
|
rbd alone takes
cephfs takes run e2e
We need to increase e2e timeout for now, and then work on optimising e2e testcases. |
c128ac9 to
c85ac0e
Compare
|
/test ci/centos/mini-e2e/k8s-1.34 |
nvmeof testcase passed here: #6097 |
c85ac0e to
8a460ae
Compare
|
/test ci/centos/mini-e2e/k8s-1.34 |
|
@Rakshith-R, what is the usecase this one is trying to cover? its good to have a design first to see how feasible this solution is from a storage admin point of view? |
c03ed0c to
8f75d14
Compare
Added design proposal document. Use Case: Ceph VolSync Plugin Replication Destination PVC ProtectionA primary motivator for this feature is the custom
Without service account based restriction, any pod in the namespace with a |
8f75d14 to
c5cfd44
Compare
nixpanic
left a comment
There was a problem hiding this comment.
Went through the design as a 1st step. Looks okayish to me, but I do have a few questions.
| ### Metadata Keys | ||
|
|
||
| Each driver type uses a driver-specific metadata key to store the allowed | ||
| service account name: |
There was a problem hiding this comment.
what is the format of the value? Does it include the Kubernetes Namespace of the ServiceAccount? If not, would the Namespace add extra security?
There was a problem hiding this comment.
what is the format of the value? Does it include the Kubernetes Namespace of the ServiceAccount? If not, would the Namespace add extra security?
It is just the serviceAccount name.
The pvc and pod are already namespaced and hence NodePublish request is also namespaced.
If a NodePublish request for same volume and with same serviceAccount has to come from another NS,
A user will need to have clusterScoped permission to create a duplicate PV and PVC as well as a a serviceAccount with same name in another namespace, which is very unlikely and another security threat outside the scope here.
But I've mentioned additional parameters (e.g. namespace) validation in future enhancement section.
| metadata from the Ceph backend. If present, it is included in the publish | ||
| context passed to the node. | ||
|
|
||
| 1. **NodePublishVolume**: The node plugin compares the publish context value |
There was a problem hiding this comment.
This isn't a very safe way. Both the value that is expected and the value that is checked are in the same CSI RPC call? It isn't really how security checks for access restrictions should work.
Why not read the metadata during NodePublishVolume?
There was a problem hiding this comment.
This isn't a very safe way. Both the value that is expected and the value that is checked are in the same CSI RPC call? It isn't really how security checks for access restrictions should work.
Why not read the metadata during NodePublishVolume?
Yes, the value that is expected and the value that is checked are in the same CSI RPC call
but the source of these values are from two different places.
- expected <- publish_context <- controllerPublishVolume call <- image/subvolume metadata
- actual checked <- volume_context <- pod's SA name
The spec and k8s storage framework ensures the values are populated in such manner.
Reading it from metadata on every NodePublishVolume(per pod, re-requested with backoff on errors) is resource intensive and can be totally avoided by making use of publish_context in controllerPublishVolume call (once per pvc per node).
There was a problem hiding this comment.
I understand that the flow of the Kubernetes CSI procedures makes this work. But it feels wrong to get the valid value from the publish context which is part of the same RPC as the details from podInfoOnMount.
NodePublishVolume is not in a critical path, doing a security check by getting the metadata is not very invasive. It would be more proper to validate it during the call.
There was a problem hiding this comment.
I understand that the flow of the Kubernetes CSI procedures makes this work. But it feels wrong to get the valid value from the publish context which is part of the same RPC as the details from podInfoOnMount.
the source of these values are from two different places.
expected <- publish_context <- controllerPublishVolume call <- image/subvolume metadata
actual checked <- volume_context <- pod's SA name
Kubelet and CSI Spec are trusted entities in this framework.
The entire framework relies on this fact.
Therefore, the content of the RPC is very reliable.
NodePublishVolume is not in a critical path, doing a security check by getting the metadata is not very invasive. It would be more proper to validate it during the call.
It is a critical path, every pod using our volume will be able to come up only after a NodePublish.
CephCSI's nodepublish ops are very light weight and fast, almost zero lookups.
Introducing a metadata lookup here would inherently slow the entire setup.
Combining with the above established fact of Kubelet and CSI Spec being reliable and trustworthy entities, we don't need to look at introducing resource intensive ops in NodePublish call.
There was a problem hiding this comment.
That's all fine, but it needs documenting in the design too.
3cfbc63 to
b26cb70
Compare
|
/retest ci/centos/mini-e2e/k8s-1.34 |
|
@Mergifyio rebase |
Allow restricting RBD volume access to a specific Kubernetes ServiceAccount using ".rbd.csi.ceph.com/serviceaccount" image metadata. During ControllerPublishVolume, the controller reads the ".rbd.csi.ceph.com/serviceaccount" metadata from the backing RBD image and passes it to the node via publish context. During NodePublishVolume, the node validates the Pod's ServiceAccount (provided by Kubelet when `podInfoOnMount` is enabled) against the allowed value, returning PermissionDenied on mismatch. Signed-off-by: Rakshith R <rar@redhat.com>
Allow restricting nvmeof volume access to a specific Kubernetes ServiceAccount using ".rbd.csi.ceph.com/serviceaccount" image metadata. During ControllerPublishVolume, the controller reads the ".rbd.csi.ceph.com/serviceaccount" metadata from the backing RBD image and passes it to the node via publish context. During NodePublishVolume, the node validates the Pod's ServiceAccount (provided by Kubelet when `podInfoOnMount` is enabled) against the allowed value, returning PermissionDenied on mismatch. Signed-off-by: Rakshith R <rar@redhat.com>
Allow restricting CephFS volume access to a specific Kubernetes ServiceAccount using ".cephfs.csi.ceph.com/serviceaccount" subvolume metadata. During ControllerPublishVolume, the controller reads the ".cephfs.csi.ceph.com/serviceaccount" metadata from the backing CephFS subvolume and passes it to the node via publish context. During NodePublishVolume, the node validates the Pod's ServiceAccount(provided by Kubelet when `podInfoOnMount` is enabled) against the allowed value, returning PermissionDenied on mismatch. Signed-off-by: Rakshith R <rar@redhat.com>
Allow restricting nfs volume access to a specific Kubernetes ServiceAccount using ".cephfs.csi.ceph.com/serviceaccount" subvolume metadata. During ControllerPublishVolume, the controller delegates to the CephFS backend to read the ".cephfs.csi.ceph.com/serviceaccount" metadata from the backing CephFS subvolume and passes it to the node via publish context. During NodePublishVolume, the node validates the Pod's ServiceAccount (provided by Kubelet when `podInfoOnMount` is enabled) against the allowed value, returning PermissionDenied on mismatch. Signed-off-by: Rakshith R <rar@redhat.com>
…triction Signed-off-by: Rakshith R <rar@redhat.com>
Signed-off-by: Rakshith R <rar@redhat.com>
Signed-off-by: Rakshith R <rar@redhat.com>
cb100ba to
ee282df
Compare
✅ Branch has been successfully rebased |
|
/test ci/centos/k8s-e2e-external-storage/1.33 |
|
/test ci/centos/upgrade-tests-cephfs |
|
/test ci/centos/mini-e2e-helm/k8s-1.33 |
|
/test ci/centos/upgrade-tests-rbd |
|
/test ci/centos/mini-e2e/k8s-1.33 |
|
/test ci/centos/k8s-e2e-external-storage/1.34 |
|
/test ci/centos/k8s-e2e-external-storage/1.35 |
|
/test ci/centos/mini-e2e-helm/k8s-1.34 |
|
/test ci/centos/mini-e2e-helm/k8s-1.35 |
|
/test ci/centos/mini-e2e/k8s-1.34 |
|
/test ci/centos/mini-e2e/k8s-1.35 |
Merge Queue Status
This pull request spent 12 seconds in the queue, with no time running CI. Required conditions to merge
|
Describe what this PR does
Service Account Based Volume Access Restriction
Ceph-CSI supports optionally restricting RBD/CephFS volume access to specific Kubernetes
service accounts. When configured, only pods running with the allowed service account
can mount the volume. This feature uses RBD image/cephFS subvolume metadata to store the
restriction and the CSI
podInfoOnMountmechanism to identify the pod's serviceaccount during mount. Refer https://kubernetes-csi.github.io/docs/pod-info.html#pod-info-on-mount-with-csi-driver-object.
How it works
.rbd.csi.ceph.com/serviceaccountmetadata on anRBD image to specify the allowed service account name.
ControllerPublishVolume, Ceph-CSI reads this metadata and passesit to the node via publish context.
NodePublishVolume, Ceph-CSI compares the value against thepod's service account (provided via volume context by Kubelet).
service account, the mount is rejected with a
PermissionDeniederror.Prerequisites
The
podInfoOnMountfield must be set totruein the CSIDriver spec so thatKubelet passes pod information (including service account name) in the volume
context during
NodePublishVolume. Without this, the restriction cannot beenforced and all mounts are allowed.
Setting the restriction on an RBD image/nvmeof
Use the
rbd image-meta setcommand to set the allowed service account:For example, to restrict a volume to the
my-app-saservice account:rbd image-meta set mypool/csi-vol-abc123 .rbd.csi.ceph.com/serviceaccount my-app-saRemoving the restriction
To remove the restriction and allow any service account to mount the volume:
The Deployment should be scaled down completely and then scaled up for this removing
the restriction after removing metadata from the image.
Setting the restriction on a CephFS subvolume/nfs export
Use the
ceph fs subvolume metadata setcommand to set the allowed service account:For example, to restrict a volume to the
my-app-saservice account:ceph fs subvolume metadata set myfs csi-vol-abc123 --group_name=csi \ .cephfs.csi.ceph.com/serviceaccount my-app-saRemoving the restriction
To remove the restriction and allow any service account to mount the volume:
The Deployment should be scaled down completely and then scaled up for
removing the restriction after removing metadata from the subvolume.
Use Case: Ceph VolSync Plugin Replication Destination PVC Protection
A primary motivator for this feature is the custom
Ceph VolSync Plugin that
performs incremental data replication across clusters. In a disaster recovery
or migration workflow:
ReplicationDestinationcontroller creates a PVC on the destinationcluster to receive replicated data.
volsync-worker-sa), incrementally syncs data from the source cluster intothis destination PVC.
until the replication is complete and a failover is triggered.
Without service account based restriction, any pod in the namespace with a
reference to the destination PVC could write to it, potentially corrupting the
replicated data or breaking the incremental sync state. By binding the
destination volume to the replication worker's service account, the volume is
protected from unintended writes throughout the replication lifecycle. On
failover, the restriction is removed so the application workload can mount
the volume.
Checklist:
guidelines in the developer
guide.
Request
notes
updated with breaking and/or notable changes for the next major release.
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)