Skip to content

fix(mediorum): stop terminal transcode retries#338

Open
RolfAris wants to merge 3 commits into
OpenAudio:mainfrom
RolfAris:fix/mediorum-transcode-terminal-retries-upstream
Open

fix(mediorum): stop terminal transcode retries#338
RolfAris wants to merge 3 commits into
OpenAudio:mainfrom
RolfAris:fix/mediorum-transcode-terminal-retries-upstream

Conversation

@RolfAris
Copy link
Copy Markdown
Contributor

@RolfAris RolfAris commented Jun 4, 2026

Problem

Some bad audio uploads are retried indefinitely by the missed-transcode backlog.

The retry loop is:

Step Before
select missed jobs status IN ('new', 'error'), error_count <= 5
queue job write busy upload update
ffmpeg fails write error upload update
next scan same upload is eligible again

That keeps producing full-row uploads/update CRUD ops for uploads that already failed repeatedly and do not have a 320 transcode result.

Change

Stop selecting terminal transcode failures once error_count >= 5 and there is still no 320 result.

Also guard transcode() itself before writing the busy state, so an over-limit upload cannot produce another transient write if it reaches the worker through another path.

Audio-analysis backlog selection now skips the same terminal transcode failures unless a 320 result exists.

Evidence

The source cap is necessary but not sufficient by itself.

In the canary, val008 ran the source-side retry cap without the receiver-side suppression. Its uploads/update growth stayed essentially flat versus its own pre baseline:

Node Role pre rows source-only rows pre bytes source-only bytes
val008 control: source-only 4,196 4,109 35,694,662 35,699,538

That is -2.07% rows and +0.01% bytes in the latest 1h sample. Across 20 clean hourly samples, source-only averaged -0.47% rows and -0.53% bytes, so it did not materially reduce validator ops growth.

This PR still matters because upgraded nodes should stop producing local terminal retry churn. The material validator-wide reduction requires pairing this with receiver-side suppression of legacy remote retry ops.

Tests

go test ./pkg/mediorum/server -run 'TestTranscodeRetryLimit|TestFindMissedJobCandidates|TestFindMissedJobsMarksUploadsBusyBeforeEnqueue|TestFindMissedAudioAnalysisCandidates' -count=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant