Skip to content

OCPSTRAT-2527, OCPSTRAT-2540: Enhancement: etcd data re-encryption for key rotation in HyperShift#1969

Open
muraee wants to merge 2 commits intoopenshift:masterfrom
muraee:hypershift-etcd-reencryption
Open

OCPSTRAT-2527, OCPSTRAT-2540: Enhancement: etcd data re-encryption for key rotation in HyperShift#1969
muraee wants to merge 2 commits intoopenshift:masterfrom
muraee:hypershift-etcd-reencryption

Conversation

@muraee
Copy link
Copy Markdown

@muraee muraee commented Apr 9, 2026

Summary

  • Add enhancement proposal for etcd data re-encryption after encryption key rotation in HyperShift
  • Introduces a new HCCO controller that leverages KubeStorageVersionMigrator from library-go to create StorageVersionMigration CRs in the guest cluster, transparently re-encrypting all encrypted resources with the active key
  • Adds EtcdDataEncryptionUpToDate condition on HCP/HostedCluster for progress tracking
  • Guards against premature backup key removal
  • Supports all encryption types (Azure KMS, AWS KMS, IBM Cloud KMS, AESCBC)

Tracks: OCPSTRAT-2527, OCPSTRAT-2540
Related: ARO-21568, ARO-21456

Test plan

  • Unit tests for key fingerprint computation and controller reconciliation logic
  • Integration tests for StorageVersionMigration CR lifecycle
  • E2E tests for Azure KMS and AESCBC key rotation with re-encryption

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from csrwng and sjenning April 9, 2026 16:17
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@muraee muraee force-pushed the hypershift-etcd-reencryption branch from ae1dfec to eabd02a Compare April 10, 2026 09:33
### Non-Goals

1. Management of the creation and renewal of encryption keys --
keys are managed externally (by the ARO RP or user).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the ARO HCP specific language should be dropped here since this works on other platforms?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The motivation for calling out ARO-HCP specifically is that it's the primary driver for this work (the S360 compliance requirement is what makes re-encryption mandatory rather than nice-to-have). That said, the solution itself is fully generic and platform-agnostic.

I can rephrase to lead with the generic value ("any customer relying on key rotation as a security control") and mention ARO-HCP as the motivating use case rather than making it sound ARO-specific. Happy to update this if you'd like.


AI-assisted response via Claude Code

Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
@muraee muraee force-pushed the hypershift-etcd-reencryption branch from eabd02a to 7b4c875 Compare April 10, 2026 14:39
Copy link
Copy Markdown
Member

@ardaguclu ardaguclu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically focused on Why KubeStorageVersionMigrator Instead of MigrationController section. It looks good to me. I dropped a comment more about agreement instead of any objection.

Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 14, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@muraee muraee force-pushed the hypershift-etcd-reencryption branch from 7b4c875 to 82ddb23 Compare April 14, 2026 10:12
muraee added a commit to muraee/hypershift that referenced this pull request Apr 14, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 14, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 15, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@muraee muraee changed the title Enhancement: etcd data re-encryption for key rotation in HyperShift OCPSTRAT-2527, OCPSTRAT-2540: Enhancement: etcd data re-encryption for key rotation in HyperShift Apr 16, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 16, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 16, 2026

@muraee: This pull request references OCPSTRAT-2527 which is a valid jira issue.

This pull request references OCPSTRAT-2540 which is a valid jira issue.

Details

In response to this:

Summary

  • Add enhancement proposal for etcd data re-encryption after encryption key rotation in HyperShift
  • Introduces a new HCCO controller that leverages KubeStorageVersionMigrator from library-go to create StorageVersionMigration CRs in the guest cluster, transparently re-encrypting all encrypted resources with the active key
  • Adds EtcdDataEncryptionUpToDate condition on HCP/HostedCluster for progress tracking
  • Guards against premature backup key removal
  • Supports all encryption types (Azure KMS, AWS KMS, IBM Cloud KMS, AESCBC)

Tracks: OCPSTRAT-2527, OCPSTRAT-2540
Related: ARO-21568, ARO-21456

Test plan

  • Unit tests for key fingerprint computation and controller reconciliation logic
  • Integration tests for StorageVersionMigration CR lifecycle
  • E2E tests for Azure KMS and AESCBC key rotation with re-encryption

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Apr 16, 2026

Just a couple of thoughts for kms in general:

  • we currently run the kube-storage-version-migrator in the data plane. I really see no good reason to do so, since we don't need workers to migrate storage, but having no workers would block migration if we leave it where it is.
  • (orthogonal but relevant) we're not consistently encrypting resources. Latest code seems to enable encryption for secrets if doing aescbc encryption, but secrets,configmaps,routes,oauthaccesstokens,oauthauthorizetokens when doing kms encryption. Furthermore, routes,oauthaccesstokens, and oauthauthorizetokens are not getting encrypted in any case because we would need the kms sidecar on the openshift apiservers that serve those resources.
  • we should fix the API. Currently, we only allow main/backup key of the same type (all aescbc, or all kms, etc). It should be possible to have multiple keys of different types given that it's what upstream kubernetes allows.
  • we're not doing key rotation properly. When we introduce a new main key, that key should first be added as a backup/read key to all instances of kube-apiserver. Once that has rolled out, then we can start making the new key the write key. Otherwise, when we introduce the new write key to the first instance of kube-apiserver, it could start encrypting with the new key, the other instances that have not been updated, will potentially start crashlooping because they cannot decode the secrets encoded with the new key.

@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Apr 16, 2026

Could we reflect in HostedCluster status which keys are actively being used and reject spec changes that could potentially result in data loss?

read provider. However, there is no mechanism to re-encrypt existing
etcd data with the new key after rotation. This enhancement adds a
re-encryption controller in the Hosted Cluster Config Operator (HCCO)
that leverages the existing `kube-storage-version-migrator` in every
Copy link
Copy Markdown
Member

@enxebre enxebre Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would tie our ability to rotate with having data plane compute which we don't want to

Copy link
Copy Markdown
Author

@muraee muraee Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed, the kube-storage-version-migrator will run in the ControlPlane

…ackupKey

- Deploy kube-storage-version-migrator in HCP namespace (control plane)
  instead of data plane, enabling re-encryption with zero worker nodes
- Disable data-plane operator via annotation removal in
  cluster-kube-storage-version-migrator-operator repo
- Add status.secretEncryption.activeKey field to HC/HCP with full key
  spec for rotation detection and EncryptionConfiguration resilience
- Deprecate backupKey spec fields in favor of status-based tracking
- Update workflow, architecture, risks, and support procedures

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@muraee muraee force-pushed the hypershift-etcd-reencryption branch from 5c2acd4 to 4401ea1 Compare April 21, 2026 14:09
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants