Skip to content

fix: treat daily-memory no-op as success instead of hard failure#2784

Open
chubes4 wants to merge 1 commit into
mainfrom
daily-memory-noop-2783
Open

fix: treat daily-memory no-op as success instead of hard failure#2784
chubes4 wants to merge 1 commit into
mainfrom
daily-memory-noop-2783

Conversation

@chubes4

@chubes4 chubes4 commented Jun 24, 2026

Copy link
Copy Markdown
Member

Closes #2783

Problem

DailyMemoryTask logged a legitimate "MEMORY.md unchanged" outcome as an ERROR-level hard job failure, generating recurring false-positive noise that polluted error-rate metrics and the wake briefing.

Real log evidence from extrachill.com (and wire.extrachill.com), repeating ~3×/day across multiple agents and every day:

Task failed (job #5151): Daily memory completion policy was not satisfied. MEMORY.md unchanged.
{"job_id":5151,"task_type":"daily_memory_generation","error":"Daily memory completion policy was not satisfied. MEMORY.md unchanged."}

Root cause (investigated, not assumed)

Inspecting live failing jobs confirmed the mechanism:

  • The job that "failed" (e.g. job #5151, agent_id 17) runs for ~5s with empty token_usage and the error is the exact fallback string — meaning $response['error'] was empty.
  • That fallback only fires when the conversation loop returns completed=false with no genuine error — i.e. the completion policy never returned complete() within the turn budget.
  • The affected agent's MEMORY.md is 144 bytes (MAX is 8192), so the file is small and already healthy. The task didn't skip at the size-threshold guard only because there was some activity context that day. The model reviewed the day, found nothing memory-worthy, and never emitted an acceptable ===PERSISTENT=== / ===ARCHIVED=== partition. Nothing is written to disk on this path, so MEMORY.md is genuinely unchanged — a successful no-op, not a fault.

The conversation loop (datamachine_run_conversation) sets completed=false for both genuine faults (provider error, runtime exception, malformed result, budget_exceeded/interrupted/failed status) and this benign "ran out of turns without an acceptable changed split" case. The old code blindly failJob'd both.

Fix

At the completed=false branch in DailyMemoryTask::executeTask(), distinguish the two by explicit error signal:

$genuine_failure = '' !== $response_error
    || ! empty( $response['error_code'] )
    || in_array( (string) ( $response['status'] ?? '' ), array( 'error', 'failed', 'interrupted' ), true );
  • Genuine failure (any of the above) → unchanged behavior: log at error and failJob.
  • No-op (no error signal; the model simply produced no acceptable changed split) → completeJob with skipped/no_change markers and log at info. Safe because replace_all() happens later in the method, so MEMORY.md is untouched at this point.

How legitimate no-op is distinguished from genuine failure

Outcome Signal Behavior
Provider/runtime error non-empty error / error_code failJob (error)
Hard loop failure / interruption status ∈ {error, failed, interrupted} failJob (error)
Empty model output later empty($ai_output) guard failJob (error) — unchanged
Lossy/duplicative split planMemoryCompaction conservation checks failJob (error) — unchanged
Policy unsatisfied, file untouched, no error none of the above completeJob no-op (info) ✅

All genuine-failure paths downstream (empty response, parse failure, conservation/expansion failures in planMemoryCompaction) are reached only when completed=true and remain loud — they represent a model that produced bad output, which the issue explicitly says must still fail.

Fork decisions / out of scope

  • Chose to gate on the loop's existing error signals rather than add a new "no-op" state to the Agents API completion-decision substrate. The substrate decision is intentionally binary (complete()/incomplete()); the no-op semantics are a Data-Machine-task concern (the file being untouched at this call site), so the distinction belongs in the task, not the generic loop. This keeps layer purity intact.
  • Did not change the prompt/completion-policy contract to let the model emit an explicit "no change" sentinel. That would be a larger behavioral change; the minimal, evidence-backed fix is to stop treating the existing untouched-file outcome as an error. A future enhancement could add a first-class "decline" signal, but it's not needed to resolve the noise.

Verification

  • php -l clean.
  • phpcs --standard=WordPress clean (exit 0, no warnings — covers the array-arrow/assignment-alignment gate).
  • phpcbf made no changes.

DailyMemoryTask logged a legitimate 'MEMORY.md unchanged' outcome as an
ERROR-level job failure, flooding error-rate metrics and the wake briefing.

The conversation loop sets completed=false both for genuine faults
(provider error, runtime exception, malformed result, interruption) and
for the common case where a small, already-healthy MEMORY.md produced no
acceptable PERSISTENT/ARCHIVED split because there was nothing
memory-worthy to fold in. The file is untouched at this point, so the
latter is a successful no-op, not a failure.

Distinguish the two by explicit error signal (non-empty error string,
error_code, or error/failed/interrupted status). Genuine faults still
failJob and log at error; a no-op completes the job and logs at info.

Closes #2783
@homeboy-ci

homeboy-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Homeboy Results — data-machine

Lint

lint — failed

ℹ️ Auto-fix: homeboy lint data-machine --path /home/runner/work/data-machine/data-machine --changed-since 8a413e6 --fix (or homeboy refactor data-machine --path /home/runner/work/data-machine/data-machine --changed-since 8a413e6 --from lint --write)
ℹ️ Some issues may require manual fixes
ℹ️ Full options: homeboy docs commands/lint
Deep dive: homeboy lint data-machine --changed-since 8a413e6

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-lint-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-lint-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/28076247259

Test

test — passed

ℹ️ No impacted tests found for --changed-since 8a413e6
ℹ️ Run full suite if needed: homeboy test data-machine
Deep dive: homeboy test data-machine --changed-since 8a413e6

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-test-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-test-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/28076247259

Audit

audit — passed

  • audit — 28 finding(s)
  • Total: 28 finding(s)

Deep dive: homeboy audit data-machine --changed-since 8a413e6

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-audit-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-audit-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/28076247259
Tooling versions
  • Homeboy CLI: homeboy 0.259.0+b3d82bf59679+451de638
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: 94ff2c48
  • Action: unknown@unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DailyMemoryTask: 'MEMORY.md unchanged' no-op is logged as a hard job failure

1 participant