Skip to content

HYPERFLEET-837 - feat: Add resource recreation design for adapters#150

Open
mliptak0 wants to merge 3 commits into
openshift-hyperfleet:mainfrom
mliptak0:HYPERFLEET-837
Open

HYPERFLEET-837 - feat: Add resource recreation design for adapters#150
mliptak0 wants to merge 3 commits into
openshift-hyperfleet:mainfrom
mliptak0:HYPERFLEET-837

Conversation

@mliptak0

@mliptak0 mliptak0 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Define a design doc for recreation flow design in adapters

Test Plan

  • Unit tests added/updated
  • make test-all passes
  • make lint passes
  • Helm chart changes validated with make test-helm (if applicable)
  • Deployed to a development cluster and verified (if Helm/config changes)
  • E2E tests passed (if cross-component or major changes)

@openshift-ci openshift-ci Bot requested review from crizzo71 and rafabene June 3, 2026 06:54
@openshift-ci

openshift-ci Bot commented Jun 3, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kuudori for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@mliptak0, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 54 minutes and 52 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 21b47fd5-7a0b-46af-b888-8469ba57c72b

📥 Commits

Reviewing files that changed from the base of the PR and between a437bb4 and 8683a09.

📒 Files selected for processing (1)
  • hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md
📝 Walkthrough

Walkthrough

Adds a design document that defines a CEL-driven delete-then-recreate lifecycle for immutable/replace-only resources. It specifies pre-discovery, framework-enforced ordering where lifecycle.delete.when short-circuits recreate, an async transport-agnostic delete with post-delete discovery branching (404 -> ApplyResource, pending/error -> return and record state), single pre-discovery pass for multi-resource CEL, status conditions keyed on metadata.deletionTimestamp, CEL evaluation context and examples, sequence diagrams for happy and pending flows, and listed edge cases and validation rules.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly identifies the feature (resource recreation design) and references the tracking issue, directly matching the documentation addition.
Description check ✅ Passed Description references the design document objective and includes the related issue tracker link, though most checklist items don't apply to documentation-only changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Sec-02: Secrets In Log Output ✅ Passed PR adds documentation only (markdown design file); no log statements with secrets found. Check applies to executable code (slog, log, logr, zap, fmt.Print*), which is absent.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md`:
- Around line 89-95: The design must explicitly require the adapter to keep
status.Finalized=false while it returns Operation=Recreate with Reason="deletion
pending" during the post-delete discovery path, and only set
status.Finalized=true after a successful ApplyResource and subsequent post-apply
discovery; update any logic in the post-delete discovery and status reporting
paths (the code that sets Operation=Recreate / Reason="deletion pending", the
status.Finalized setter, and ApplyResource/post-apply discovery handlers) to
ensure Finalized is false during pending deletion and flipped to true only after
ApplyResource succeeds and the resource is confirmed present.
- Around line 46-52: The adapter recreation-flow design currently assigns CEL
decision logic to the executor while HyperFleet’s architecture expects Sentinel
to own those evaluations, so update the design to place
`lifecycle.recreate.when` decision-making in Sentinel or explicitly revise the
separation of concerns to allow adapter-side CEL evaluation. Also align the
design with the existing adapter implementation by adding
`lifecycle.recreate.when` support to the adapter config/schema and executor
flow, since the current code only handles `lifecycle.delete.when` and
`recreate_on_change`; reference the recreation flow section and the executor
decision logic in the design doc when updating.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4987018a-1fb7-4552-907d-dd61bb92be66

📥 Commits

Reviewing files that changed from the base of the PR and between 346e4d8 and c646d6d.

📒 Files selected for processing (1)
  • hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md

Comment thread hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md Outdated
@mliptak0 mliptak0 force-pushed the HYPERFLEET-837 branch 3 times, most recently from 8e62cc2 to 0d7f54d Compare June 3, 2026 07:27

@ciaranRoche ciaranRoche left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this, couple of things inline and also a few follow up questions

  1. Post Processing, since we evalute conditions against resources, how will that be handled during recreation. We have a gaurd for deletion but curious how we will handle it during recreation
  2. Multi resource recreation, since we can have multi resource, how do we handle ordering of recreation

expression: |
# Recreate when the generation has changed (indicating spec changed)
resources.?job.hasValue()
&& string(generation) != resources.?job.metadata.annotations["hyperfleet.io/generation"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of a potential gap we will need to solve here. What happens if delete is true and recreate is true?

For both to fire, they need deleted_time set and generation annotation mismatch, this will happen when deletion is set 🤔 so essentially both will be true. We should define this pattern :

  • Explicit evaluation ordering
  • Config author responsibility
  • Framework-level gaurd
  • Something else

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lifecycle.delete has priority in this case and if evaluates to true, then lifecycle.recreate should be ignored - not evaluated or set to false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multi-resource recreation, I updated the doc. From what I see, currently it's done sequentially, I think that can persist how it is


### Out of Scope

- Intermediate status conditions during recreation (future enhancement)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious as to why this was pushed out of scope? What is the reasoning behind it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out, this was a left-over comment. It should be now specified how status conditions will reflect the state

```

**Requirements when `lifecycle.recreate` is configured:**
- `discovery` block must exist — pre-discovery populates resource state so CEL expressions can compare current vs. desired state

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we reject configs if the block does not exist? We might want to be explicit here if we do to ensure we cover it and have a test case for it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited the doc to be explicit that if there is lifecycle.recreate, the pre-discovery must be present, otherwise lifecycle.recreate should be omitted

@mliptak0 mliptak0 force-pushed the HYPERFLEET-837 branch 3 times, most recently from e726c45 to 528baf1 Compare June 4, 2026 07:12

3. **Evaluates** `lifecycle.recreate.when` (a CEL expression). If true, enters the recreation flow; otherwise, uses the normal apply path (update in-place or create).

4. **In the recreation flow**: deletes the old resource, waits for it to be fully gone, then creates a new one. This flow is safe to retry — multiple attempts converge to the same end state.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for maestro?
If we delete a bundle in maestro, ADAIK it doesn't wait synchronously for the remote resource to be deleted

Do we plan to add a synchronous wait here?
In maestro-cli there is a --wait flag that does polling every second for this.

My concerns:

  • This can be costly on the adapter
  • We need to take into account errors in polling and timeouts
  • What if different events are triggered from sentinel.... do multiple instances of adapter enter the recreate state and poll the maestro server?

@rh-amarin rh-amarin Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some logic contradiction between:

  • Waiting to be fully gone
  • Detecting pending deletion during recreation (explained later in the doc)

If we were to wait for deletion, we will not have to detect for pending deletion

  • Unless the wait ends in some error/timeout (?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is a misunderstanding, but there is no internal retry (inside adapter), since retries are handled already by sentinel and doing otherwise would complicate this flow a lot. Pushed new commit, now it should be more clear


### Status Reporting During Recreation

Post-processing always runs after the resources phase, **including when deletion is still pending**. There is no framework-level guard that skips post-processing during recreation. When the resource hasn't been fully removed yet, `resources.<name>` contains the **still-present, mid-deletion object** — typically with `metadata.deletionTimestamp` set. Config authors encode awareness of this state into their CEL conditions.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift-hyperfleet/hyperfleet-adapter#122 added support for when conditions to skip posting (but technically doesn't skip the post-processing)


### Pending Deletion Sequence

This is the critical case: deletion takes multiple reconciliation loops due to finalizers or async behavior. Sentinel sends the same event repeatedly (~10s intervals). Recreation must be safe to retry.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this sentence a bit confusing, here is why

It mentions "reconciliation loops", which we normally refer to the adapter reconciliation loop.

But in this case it seems to refer to an internal retry loop that attempts the deletion, to make it as a 'sync' operation in the resources phase of the adapter.

How many attempts are there? do attempts time-out?

My main concern is that every new sentinel event will create a new instance of the adapter that will enter this loop until deletion is detected

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if It's misleading. In this context, it should refer to adapter reconcillation loop, eg. retries are driven by sentinel, no internal retry mechanism

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the sequence diagram is a bit misleading.
The diagram already shows time advancing on lower flow lines.
What we are showing as "Attempt 1", "Attempt 2" should be IMO the actor triggering the event, in this case the Sentinel

If we want to make clear the retry attempt number, add it as a text in the arrow or a note for better highlighting it

The "return early" is also a bit confusing without enough context.

  • The adapter still reports back to the API (not shown in the diagram)
  • What does it mean "early", compared to what?
  • In one sentence it is followed with IMO a clearer explanation "returns with pending deletion", or maybe in-progress recreate operation.


**CEL accesses non-existent resource**: Use optional chaining (`resources.?job.hasValue() && ...`) to avoid crashes. CEL short-circuits evaluation, so type errors are prevented.

**Deletion in progress when recreation is triggered**: The framework detects if deletion is still pending (finalizers, async behavior) and defers creation. Retries naturally occur on the next Sentinel event (~10s), when the deletion may be complete.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does framework detect this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really detect if the deletion is pending, if recreate.when.expression is true, on every reconcilation loop it just retries recreation and idempotently continues where it left off, and doesn't create resource until old ones still exist.

Will rewrite this one so it's clear


**Deletion in progress when recreation is triggered**: The framework detects if deletion is still pending (finalizers, async behavior) and defers creation. Retries naturally occur on the next Sentinel event (~10s), when the deletion may be complete.

**Both `lifecycle.delete.when` and `lifecycle.recreate.when` evaluate to true**: This is expected when `deleted_time` is set — the existing resource still carries the old generation annotation, causing a generation mismatch. The framework short-circuits on the higher-priority rule. See **Conflict Resolution: Delete vs. Recreate** above.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO mentioning deleted_time is confusing here.

The problem is that both and update and a delete API operations update the generation field, so the recreate condition evaluates to true if it only takes the generation change into account.

There are two possible solutions:

  • Lifecycle delete skips creation (the proposed one)
  • Introduce a ! is_deleting into the recreate.when.expression

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a section called "Conflict Resolution: Delete vs. Recreate", talking about this scenario. !is_deleting doesn't have to be included in the CEL expression for recreate.when.expression, since it is used to determine which flow has priority and guards against triggering recreate while delete was already executed/scheduled. If deleted_time is set, the recreation is always skipped and will never execute.

However, I agree this section is a bit confusing, I will refactor that

1. Test recreation flow in staging environment before production
2. Validate that stuck deletions are prevented through proper RBAC, finalizers, and cleanup logic in their domain

**Trade-off:** Deletion is fire-and-forget; actual deletion may span multiple reconciliation loops. The framework cannot recover from stuck deletions — prevention through proper design in the staging environment is the only viable approach.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For stuck deletions we have some metrics in the API that can help operators identifying the issue.

https://github.com/openshift-hyperfleet/hyperfleet-api/blob/main/docs/metrics.md#deletion-alerts

I wonder if something similar could be applied for deletions that happen because of a recreation cycle, not a pure deletion one (since in this case the resources will not be marked with deleted_time )

@mliptak0 mliptak0 Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to dig into how we can make use of deleted_time property, but I worry that this would mix up semantics with the deletion-only flow.

However, during the update operation, we still can use our conditions (Reconciled/Available), reconciled being false until the whole operation is done. For that, we could have metrics which would tell which resources are Reconciled=false for too long, which would indicate that something's going on and that a change for new generation cannot be reconcilled or processed due to some reason. -> The same thing can be useful for deletion flow as well, since it's pretty much using the same conditions from the time it is soft-deleted until hard deletion happens

WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Deletion is just "another update", and we measure the time it takes to perform that update.

The only concern is if both timelines are equivalent.

  • We can alert for a resource not being deleted after 35min because it is taking very long time to delete remote things..
  • But in the case of update... what would be an appropriate time span?

@mliptak0 mliptak0 Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point..., since deletion-only is for us end of the lifecycle, but recreation must perform both delete and create (create can also take some time...).

To your question, I have no clue how much time we should give it since it can depend on what kind of resource is defined in adapter task configs - in this case I suggest that we make it configurable inside adapter (can be another task/spike) / introducing a field which hyperfleet-api can refer to, to determine threshold for making an alert. However I'm not sure if that would be possible using Gauges, but perhaps there is a direct way to create alerts,,, but that might tight us closely to specific monitoring tool, unless it would be generic or we would be okay with that

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md (1)

347-347: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Missing mandatory Alternatives Considered section.

Component documents must include an "Alternatives Considered" section documenting approaches that were evaluated and rejected. As per coding guidelines: "Trade-offs only make sense when compared to alternatives."

Expected alternatives for this design might include:

  • Blocking deletion with polling loop: Adapter waits synchronously for deletion to complete before returning (Why Rejected: adds complexity, doesn't scale, blocks reconciliation)
  • RecreateOnChange transport flag: Continue using transport-layer boolean flag (Why Rejected: K8s-only, not Maestro-compatible, not retry-safe)
  • Adapter-side state machine: Track deletion progress in adapter state (Why Rejected: stateful, conflicts with stateless Sentinel-driven retries)

Add an "## Alternatives Considered" section before the final summary.

As per coding guidelines: "This section is MANDATORY" and "Skipping Alternatives - 'This was obvious' is not acceptable."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md` at
line 347, Add a mandatory "## Alternatives Considered" section before the final
summary in the adapter-recreation-flow-design document that lists and rejects
evaluated approaches; include at minimum the three expected alternatives:
"Blocking deletion with polling loop" (explain why rejected: complexity,
blocking reconciliation, poor scale), "RecreateOnChange transport flag" (explain
why rejected: K8s-only, not Maestro-compatible, not retry-safe), and
"Adapter-side state machine" (explain why rejected: stateful, conflicts with
stateless Sentinel-driven retries); ensure each alternative has a short
rationale and trade-offs so the "Alternatives Considered" header and content
appear prior to the final summary.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md`:
- Around line 336-346: Update the "Responsibilities & Trade-offs" section to
follow the mandatory structure by splitting it into four subsections: "What We
Gain", "What We Lose / What Gets Harder", "Technical Debt Incurred", and
"Acceptable Because"; explicitly state quantified benefits under "What We Gain"
(e.g., transport-agnostic recreation, stateless retry-safe flow, declarative CEL
triggers such as lifecycle.recreate.when), list costs/risks under "What We Lose"
(e.g., fire-and-forget deletion spanning multiple reconciliation loops,
inability to recover stuck deletions, staging validation burden), describe
concrete remediation and monitoring plans under "Technical Debt Incurred" (e.g.,
reliance on Sentinel retry loop, lack of adapter-side timeout/backoff,
remediation steps), and justify the choice in "Acceptable Because" (MVP
prioritizes declarative model over complex state machines and replaces
RecreateOnChange); ensure each subsection is concise and maps to the existing
text and symbols so reviewers can verify the mapping.

---

Outside diff comments:
In `@hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md`:
- Line 347: Add a mandatory "## Alternatives Considered" section before the
final summary in the adapter-recreation-flow-design document that lists and
rejects evaluated approaches; include at minimum the three expected
alternatives: "Blocking deletion with polling loop" (explain why rejected:
complexity, blocking reconciliation, poor scale), "RecreateOnChange transport
flag" (explain why rejected: K8s-only, not Maestro-compatible, not retry-safe),
and "Adapter-side state machine" (explain why rejected: stateful, conflicts with
stateless Sentinel-driven retries); ensure each alternative has a short
rationale and trade-offs so the "Alternatives Considered" header and content
appear prior to the final summary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 28eee138-e600-47c7-9880-a5d66da7c135

📥 Commits

Reviewing files that changed from the base of the PR and between 528baf1 and a437bb4.

📒 Files selected for processing (1)
  • hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md

Comment on lines +336 to +346
## Responsibilities & Trade-offs

**What the framework provides:** Declarative recreation via CEL expressions. Works identically for K8s and Maestro. Safe to retry (delete is idempotent). Explicit delete-before-recreate evaluation ordering enforced by the framework.

**What config authors own:**
1. Test recreation flow in staging environment before production
2. Validate that stuck deletions are prevented through proper RBAC, finalizers, and cleanup logic in their domain

**Trade-off:** Deletion is fire-and-forget; actual deletion may span multiple reconciliation loops. The framework cannot recover from stuck deletions — prevention through proper design in the staging environment is the only viable approach.

**Why `lifecycle.recreate.when` replaces `RecreateOnChange`:** The previous transport-layer flag only worked for K8s (ignored by Maestro) and was not safe to retry across Sentinel's polling loop. The new CEL-based approach is transport-agnostic and stateless.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Missing mandatory Trade-offs section structure.

The current "Responsibilities & Trade-offs" section doesn't follow the required component document format. As per coding guidelines, every component document must include a Trade-offs section with these subsections:

  • What We Gain (quantified benefits)
  • What We Lose / What Gets Harder (costs, risks)
  • Technical Debt Incurred (with remediation plans)
  • Acceptable Because (why the trade-off makes sense)

Example for this design:

  • What We Gain: Transport-agnostic recreation, stateless retry-safe flow, declarative CEL triggers
  • What We Lose: Fire-and-forget deletion spans multiple loops, no stuck-deletion recovery, requires staging validation
  • Technical Debt: Relies on Sentinel retry loop for eventual consistency, no adapter-side timeout/backoff
  • Acceptable Because: MVP prioritizes declarative model over complex state machine

As per coding guidelines: "This section is MANDATORY. Every design involves trade-offs. Document them honestly."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hyperfleet/components/adapter/framework/adapter-recreation-flow-design.md`
around lines 336 - 346, Update the "Responsibilities & Trade-offs" section to
follow the mandatory structure by splitting it into four subsections: "What We
Gain", "What We Lose / What Gets Harder", "Technical Debt Incurred", and
"Acceptable Because"; explicitly state quantified benefits under "What We Gain"
(e.g., transport-agnostic recreation, stateless retry-safe flow, declarative CEL
triggers such as lifecycle.recreate.when), list costs/risks under "What We Lose"
(e.g., fire-and-forget deletion spanning multiple reconciliation loops,
inability to recover stuck deletions, staging validation burden), describe
concrete remediation and monitoring plans under "Technical Debt Incurred" (e.g.,
reliance on Sentinel retry loop, lack of adapter-side timeout/backoff,
remediation steps), and justify the choice in "Acceptable Because" (MVP
prioritizes declarative model over complex state machines and replaces
RecreateOnChange); ensure each subsection is concise and maps to the existing
text and symbols so reviewers can verify the mapping.

Source: Coding guidelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants