Skip to content

Commit 8f75d14

Browse files
committed
doc: add design proposal for SA based volume access restriction
Signed-off-by: Rakshith R <rar@redhat.com>
1 parent 92937e2 commit 8f75d14

File tree

1 file changed

+145
-0
lines changed

1 file changed

+145
-0
lines changed
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Service Account Based Volume Access Restriction
2+
3+
## Introduction
4+
5+
This proposal introduces an optional mechanism to restrict volume access based
6+
on the Kubernetes service account of the pod mounting the volume. When
7+
configured, only pods running with the specified service account are allowed to
8+
mount the volume. All other mount attempts are rejected with a
9+
`PermissionDenied` error.
10+
11+
The restriction is stored as metadata on the backend Ceph object (RBD image
12+
metadata or CephFS subvolume metadata) and is enforced at mount time through
13+
the CSI `podInfoOnMount` mechanism.
14+
15+
## Motivation
16+
17+
Ceph-CSI volumes are accessible to any pod that has a valid PVC reference and
18+
the necessary RBAC to use the StorageClass. In multi-tenant and data pipeline
19+
environments, this is insufficient. There are scenarios where a volume should
20+
be exclusively accessible to a specific workload identity even when other pods
21+
in the same namespace can reference the PVC.
22+
23+
### Use Case: Ceph VolSync Plugin Replication Destination PVC Protection
24+
25+
A primary motivator for this feature is the custom
26+
[Ceph VolSync Plugin](https://github.com/RamenDR/ceph-volsync-plugin) that
27+
performs incremental data replication across clusters. In a disaster recovery
28+
or migration workflow:
29+
30+
1. A `ReplicationDestination` controller creates a PVC on the destination
31+
cluster to receive replicated data.
32+
1. A replication worker pod, running under a dedicated service account (e.g.
33+
`volsync-worker-sa`), incrementally syncs data from the source cluster into
34+
this destination PVC.
35+
1. The destination PVC must remain writable only by the replication worker
36+
until the replication is complete and a failover is triggered.
37+
38+
Without service account based restriction, any pod in the namespace with a
39+
reference to the destination PVC could write to it, potentially corrupting the
40+
replicated data or breaking the incremental sync state. By binding the
41+
destination volume to the replication worker's service account, the volume is
42+
protected from unintended writes throughout the replication lifecycle. On
43+
failover, the restriction is removed so the application workload can mount
44+
the volume.
45+
46+
### Other Potential Use Cases
47+
48+
- **Sensitive data volumes**: Restrict access to volumes containing regulated
49+
data to only the service account authorized to process them.
50+
- **Custom usecases**: Similar usecases where a workload identity needs exclusive
51+
access to a volume for data integrity or security reasons.
52+
53+
## Dependency
54+
55+
- The `podInfoOnMount` field must be set to `true` in the CSIDriver
56+
specification. This causes Kubelet to inject pod information (including the
57+
service account name) into the volume context during `NodePublishVolume`.
58+
Without this, the restriction cannot be enforced.
59+
Since this parameter is a mutable field in the CSIDriver spec, it will be enabled
60+
by default going forward(cephcsi v3.17.0+).
61+
62+
## Design
63+
64+
### Metadata Keys
65+
66+
Each driver type uses a driver-specific metadata key to store the allowed
67+
service account name:
68+
69+
| Driver | Metadata Key | Storage |
70+
|--------|-------------|---------|
71+
| RBD | `.rbd.csi.ceph.com/serviceaccount` | RBD image metadata |
72+
| CephFS | `.cephfs.csi.ceph.com/serviceaccount` | CephFS subvolume metadata |
73+
| NVMe-oF | `.rbd.csi.ceph.com/serviceaccount` | RBD image metadata (via RBD backend) |
74+
| NFS | `.cephfs.csi.ceph.com/serviceaccount` | CephFS subvolume metadata (via CephFS backend) |
75+
76+
Only a single service account can be specified per volume.
77+
78+
### CSI Flow
79+
80+
The restriction is enforced across two CSI RPCs:
81+
82+
1. **ControllerPublishVolume**: The controller reads the service account
83+
metadata from the Ceph backend. If present, it is included in the publish
84+
context passed to the node.
85+
86+
1. **NodePublishVolume**: The node plugin compares the publish context value
87+
against the pod's service account (provided by Kubelet via
88+
`csi.storage.k8s.io/serviceAccount.name` in the volume context). A mismatch
89+
results in a `PermissionDenied` error. If no restriction is set, or if
90+
`podInfoOnMount` is not enabled, the mount is allowed (with a warning log
91+
in the latter case).
92+
93+
### Implementation
94+
95+
A shared validation function `ValidateServiceAccountRestriction` in
96+
`internal/util/validate.go` is called at the beginning of `NodePublishVolume`
97+
in all four drivers (RBD, CephFS, NFS, NVMe-oF), ensuring consistent
98+
enforcement.
99+
100+
Each driver reads the restriction metadata in `ControllerPublishVolume` using
101+
its backend:
102+
103+
- **RBD**: reads via `GetMetadata` in `internal/rbd/controllerserver.go`.
104+
- **CephFS**: reads via `ListMetadata` in `internal/cephfs/controllerserver.go`.
105+
- **NVMe-oF**: delegates to the RBD backend and propagates the publish context
106+
in `internal/nvmeof/controller/controllerserver.go`.
107+
- **NFS**: delegates to the CephFS backend in
108+
`internal/nfs/controller/controllerserver.go`.
109+
110+
## Setting and Removing the Restriction
111+
112+
The restriction is managed through Ceph CLI commands. Refer to the
113+
"Service Account Based Volume Access Restriction" sections in
114+
[RBD deploy.md](../../rbd/deploy.md) and [CephFS deploy.md](../../cephfs/deploy.md)
115+
for usage instructions and examples.
116+
117+
## Ceph VolSync Plugin Integration Example
118+
119+
1. The replication destination worker sets the service account restriction on the backing Ceph
120+
object(RBD image or cephFS subvolume) to the replication worker's service account (e.g.
121+
`volsync-worker-sa`) on first use.
122+
1. Only the worker pod mounts the destination PVC successfully because its service
123+
account matches. Any other pod attempting to mount the same PVC is rejected with
124+
`PermissionDenied` during NodePublish call, protecting data integrity during incremental sync.
125+
1. On replication destination deletion, the controller spins up a cleanup job
126+
that removes the service account restriction metadata, allowing the application
127+
workload to mount the volume.
128+
129+
## Limitations
130+
131+
- Only a single service account can be specified per volume.
132+
- Enforced at CSI mount time only; does not prevent direct access to the
133+
underlying Ceph storage from outside Kubernetes.
134+
- If `podInfoOnMount` is not enabled, the restriction is silently unenforced.
135+
- Changing the restriction on an already-mounted volume does not affect
136+
existing mounts. The volume must be unmounted and remounted.
137+
- Managed through Ceph CLI commands, not Kubernetes-native APIs.
138+
139+
## Future Enhancements
140+
141+
- Support restriction based on other attributes (e.g. name, namespace) in
142+
addition to service account.
143+
- Provide more flexible configuration key value options (e.g. receiving both
144+
expected key-value pairs in the volume context instead of a single service
145+
account name).

0 commit comments

Comments
 (0)